I used logistic regression with python and got an accuracy score of 95%, how do I get this equation so that I can actually implement it?
I wrote:
model = LogisticRegression()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('Accuracy:', "\n", '%',metrics.accuracy_score(prediction,test_y) * 100)
and my output was:
Accuracy:
%95.5555555556
The model object has an attribute called coef_ where the coefficients of the model are stored. In addition, the attribute intercept_ gives the intercept of the model.
I'm assuming you're using SkLearn. But what do you mean by implement it? Are you looking to write it into a separate language, or use a different library (i.e. TensorFlow)?
If you just want to keep the model and use it in a python program later, you can save and load it with Pickle.
Related
Is there a way to retrieve from the fitted xgboost object the hyper-parameters used to train the model. More specifically, I would like to know the number of estimators (i.e. trees) used in the model. Since I am using early stopping, the n_estimator parameter would not give me the resulting number of estimators in the model.
If you are trying to get the parameters of your model:
print(model.get_xgb_params())
model.get_params(deep=True) should show n_estimators
Then use model.get_xgb_params() for xgboost specific parameters.
I am using a LinearSVC, i pre-processed the numeric and categorical data using column transformer,then used pipeline. I used GridSearchCV to get the best parameters for the model which i later put into the pipeline as you can see.
I fit,tested and got the score as well but i want to know the most important feature coefficients.
So far, i have tried " clf.coef_ " as the classifier step is named as clf in the pipeline but i get a message saying clf not defined.
I also tried gridf.coef_,pipefinal.steps[1].coef_ but nothing worked.
So any help in this regard will be highly appreciated. Thanks.
preprocessing=ColumnTransformer([('hot',OneHotEncoder(),categ),('scale',StandardScaler(),num)],n_jobs=-1)
pipefinal=Pipeline([('pre',preprocessing),('clf',LinearSVC(max_iter=100000,C=0.1))])
gridf=GridSearchCV(pipefinal,param_grid={},cv=10)
gridf.fit(X_train,y_train)
gridf.score(X_val,y_val)
GridSearchCV will make the best estimator available through its best_estimator_ attribute after you have called the fit() method. Since your estimator is a Pipeline object, you have to further subscript it to access the classifier. Then, you can access its coef_ attribute. In your case, that would be:
gridf.best_estimator_['clf'].coef_
What actually the predict_proba() is? I have read through Platt's method but I still could not understand the concept of it. Is it another model that run on top SVM classifier's output?
If i am choosing the result from predict_proba() instead of predict(), am i still consider as using SVM classifier or I am using another model to classify my data?
After running the logistic regression, I want to see the model equation. Is there a way of finding that?
In sklearn you can access the coefficients and intercept by accessing the coef_ and intercept_ attributes as documented here.
Don't know about statsmodels, but according to the first tutorial that shows up on google here, you can use the params attribute of the result.
So I have currently trained a Multinomial Naive Bayes classifier, using [SKiLearn][1]
Now what I can do is classify test data by using predict.
But if I want to run this every night, as a script, I clearly need to always have a classifier already trained up! Now what I'd like to be able to do, is take classifier coefficients, informative words, and use these to classify new data.
Is this possible - to develop my own method for classification? Or should I be simply training the SkiLearn classifier nightly?
EDIT: One thing, it seems I can do, is retain and save my trained classifier.
However with logistic regression, you can take the coefficients and use these on new data. Is there anything similar to this for NB?
Do you mean [sklearn]? Are you using python? If that is the case, it turns out that [sklearn] provides a function for getting the parameters of the model [get_params(deep=True)] as well as a function for setting them [set_params(**params)].
Therefore, a possible procedure could be:
Training stage:
1) Train the model
2) Get the parameters of the model by using get_params()
3) Save the parameters into a binary file (e.g. by using pickle.dump())
Prediction stage:
1) Load the parameters of the model from the binary file (e.g. by using pickle.load())
2) Set the parameters of the model by using set_params()
3) Classify new data by using the predict() function
Hope that helps.