How to find which model is selected by TPOT - python

Hi am using TPOT for machine learning I am getting 99% accuracy but I am not sure to which model did it predict can someone help me with this also does it do SMOTE?

If you stored the TPOTClassifier in the variable my_tpot, then you can access the final trained pipeline by accessing the fitted_pipeline_ attribute:
my_tpot = TPOTClassifier()
my_tpot.fit(…)
print(my_tpot.fitted_pipeline_)

Related

determine hyperparameters used during training from saved XGBoost model

I have a model trained in Sagemaker as a file and can load and ultimately score it locally like so:
local_model_path = "model.tar.gz"
with tarfile.open(local_model_path) as tar:
tar.extractall()
model = xgb.XGBRegressor()
model.load_model("xgboost-model")
I wonder, how I can establish the hyperparameters used to fit the saved model. I do not think that these lines of code work (i.e. they do not show the hyperparameters the model was trained with):
booster = model.get_booster()
print(booster.save_config())
print(model.get_xgb_params())
How can I establish/check the actually used hyper parameters? Any help would be very much appreciated. Thanks.
Ok, forget the other answer which I deleted.
This one works for me, I don't know why get_xgb_params() should not work.
model = xgb.XGBRegressor()
model.load_model("xgboost-model")
model.get_xgb_params()

How to get most important feature coefficients when i used pipeline to preprocess, train and test the linear svc?

I am using a LinearSVC, i pre-processed the numeric and categorical data using column transformer,then used pipeline. I used GridSearchCV to get the best parameters for the model which i later put into the pipeline as you can see.
I fit,tested and got the score as well but i want to know the most important feature coefficients.
So far, i have tried " clf.coef_ " as the classifier step is named as clf in the pipeline but i get a message saying clf not defined.
I also tried gridf.coef_,pipefinal.steps[1].coef_ but nothing worked.
So any help in this regard will be highly appreciated. Thanks.
preprocessing=ColumnTransformer([('hot',OneHotEncoder(),categ),('scale',StandardScaler(),num)],n_jobs=-1)
pipefinal=Pipeline([('pre',preprocessing),('clf',LinearSVC(max_iter=100000,C=0.1))])
gridf=GridSearchCV(pipefinal,param_grid={},cv=10)
gridf.fit(X_train,y_train)
gridf.score(X_val,y_val)
GridSearchCV will make the best estimator available through its best_estimator_ attribute after you have called the fit() method. Since your estimator is a Pipeline object, you have to further subscript it to access the classifier. Then, you can access its coef_ attribute. In your case, that would be:
gridf.best_estimator_['clf'].coef_

get probability from xgb.train()

I am new to Python and Machine learning. I have searched internet regarding my question and tried the solution people have suggested, but still not get it. Would really appreciate it if anyone can help me out.
I am working on my first XGboost model. I have tuned the parameters by using xgb.XGBClassifier, and then would like to enforce monotonicity on model variables. Seemingly I have to use xgb.train() to enforce monotonicity as shown in my code below.
xgb.train() can do predict(), but NOT predict_proba() function. So how can I get probability from xgb.train() ?
I have tried to use 'objective':'multi:softprob' instead of 'objective':'binary:logistic'. then score = bst_constr.predict(dtrain). But the score does not seem right to me.
Thank you so much.
params_constr={
'base_score':0.5,
'learning_rate':0.1,
'max_depth':5,
'min_child_weight':100,
'n_estimators':200,
'nthread':-1,
'objective':'binary:logistic',
'seed':2018,
'eval_metric':'auc'
}
params_constr['monotone_constraints'] = "(1,1,0,1,-1,-1,0,0,1,-1,1,0,1,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,)"
dtrain = xgb.DMatrix(X_train, label = y_train)
bst_constr = xgb.train(params_constr, dtrain)
X_test['score']=bst_constr.predict_proba(X_test)[:,1]
AttributeError: 'Booster' object has no attribute 'predict_proba'
So based on my understanding, you are trying to obtain the probability for each class in the prediction phase. Two options.
It seems that you are using the XGBoost native api. Then just select the 'objective':'multi:softprob' as the parameter, and use the bst_constr.predict instead of bst_constr.predict_proba.
XGBoost also provides the scikit-learn api. But then you should initiate the model with bst_constr = xgb.XGBClassifier(**params_constr), and use bst_constr.fit() for training. Then you can call the bst_constr.predict_proba to obtain what you want. You can refer here for more details Scikit-Learn API in XGBoost.

How do we find validation error of linear regression and elastic-net using Scikit-learn and python?

When do we use test set and validation set while calculating errors?
I have linear regression and elastic-net models working. I am new to Machine learning with Scikit-learn and Python.
I am trying to solve this problem.
Data Set: UCI Machine Learning Forest Fire data
You can use the test set when you build the model by use the method of minimizing the errors.After you build the model, you can calculate the actual error by using the validation set and then use the error to correct the parameter.I hope this can help you.

Training a model in python

I am trying to learn how to build a model and train it on training set. I have done this using MultinomialNB before in Python but not I am trying to build a custom model using a set of equations. Can someone please lead me in right direction? Thank you for your help.
So when I train the model with MultinomialNB, I use the following code.
clf=MultinomialNB()
clf.fit(xtrain, ytrain)
What I am trying to do now involves an equation for tag prediction A_i where i is the tag I am predicting for a given posts. I am not sure how to go about training a model using this equation A_i.

Categories

Resources