I use LinearSVC for a multi-label classification problem. Since LinearSVC does not provide a predict_proba method, I decided to use CalibratedClassifierCV to scale the decision function into [0, 1] probabilities.
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
classifier = CalibratedClassifierCV(LinearSVC(class_weight = 'balanced', max_iter = 100000)
classifier.fit(X_train, y_train)
However, I also need to access the weights coef_, but classifier.base_estimator.coef_ raise the following error:
AttributeError: 'LinearSVC' object has no attribute 'coef_'
I thought classifier.base_estimator returned the calibrated classifier and allowed to access all its attributes. Thanks in advance for explaining me what I missunderstood.
Related
Is there any way I can use partial_fit() in a BagginClassifier() which contains multiple MLPClassifier()?
My problem is binary classification, something like this:
clf = MLPClassifier()
model = BaggingClassifier(base_estimator=clf)
model.partial_fit(x, y, classes=[0, 1])
It keeps me giving this error:
AttributeError: 'BaggingClassifier' object has no attribute 'partial_fit'
Seems like it isn't. The documentation of sklearn gave the following list of modules that support partial_fit:
sklearn.naive_bayes.MultinomialNB
sklearn.naive_bayes.BernoulliNB
sklearn.linear_model.Perceptron
sklearn.linear_model.SGDClassifier
sklearn.linear_model.PassiveAggressiveClassifier
sklearn.linear_model.SGDRegressor
sklearn.linear_model.PassiveAggressiveRegressor
sklearn.cluster.MiniBatchKMeans
sklearn.decomposition.MiniBatchDictionaryLearning
sklearn.cluster.MiniBatchKMeans
I'm using sklearn linear implementation of SVM classifier LinearSVM.
I didn't use it directly but I wrap it with CalibratedClassifierCV to get the probabilities in the prediction time, like:
model = CalibratedClassifierCV(LinearSVC(random_state=0))
After fitting the model, I tried to get the coef_ to print the Top features, following this post Visualising Top Features in Linear SVM with Scikit Learn and Matplotlib, but this I got this error:
coef = classifier.coef_.ravel()
AttributeError: 'CalibratedClassifierCV' object has no attribute 'coef_'
How can I get the coef in the case I wrap the classifier with a calibrator?, I'm not totally interested in this way, thus if there is another way to get the features importance, it will be welcomed.
coef_ is not an attribute of CalibratedClassifierCV however, it is an attribute of the base_estimator which is a LinearSVC in your case. You can access your base estimator via the calibrated_classifiers_ which is a list of the fitted models (which depends on the number of models you fit based on your cv value). I have shown a sample code which you can refer to for your need.
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import LinearSVC
iris = datasets.load_iris()
model = CalibratedClassifierCV(LinearSVC(random_state=0))
model.fit(iris.data, iris.target)
model.calibrated_classifiers_
[<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57550>,
<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57c18>,
<sklearn.calibration._CalibratedClassifier at 0x7f15d0aec080>]
In this case my cv is three so I have three models built, so I would simple loop through them and taken an average.
coef_avg = 0
for i in model.calibrated_classifiers_:
coef_avg = coef_avg + i.base_estimator.coef_
coef_avg = coef_avg/len(model.calibrated_classifiers_)
array([[ 0.16464871, 0.45680981, -0.77801375, -0.4170196 ],
[ 0.1238834 , -0.89117967, 0.35451826, -0.89231957],
[-0.83826029, -0.9237139 , 1.30772955, 1.67592916]])
Note: Starting from sklearn version 0.24, CalibratedClassifierCV constructor exposes an ensemble argument, that, if set to False (assuming cv is not set to "prefit"), makes CalibratedClassifierCV expose only one calibrated classifier trained using all training data. This means we no longer need to loop over all calibrated_classifiers_ at prediction time:
model = CalibratedClassifierCV(LinearSVC(random_state=0), ensemble=False)
model.fit(iris.data, iris.target)
model.calibrated_classifiers_
# Returns a list with one element, [<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57550>]
(using an example above, given by Parthasarathy)
I use the method save_model and load_mode but it don't work.
I have an error : AttributeError: 'GridSearchCV' object has no attribute 'get_config'
I don't know if I use correctly this method. I show my code for take an example:
gridSearch = GridSearchCV(estimator = classifier,
param_grid = parameters,
scoring = "accuracy",
cv = 10)
gridSearch.fit(X_train, y_train)
save_model(gridSearch, filepath = 'monModele.h5')
The result is the error attribute Error. Can you help me to find a solution for this problem or to find an other method to save and load a keras model.
That is because GridSearchCV is not a Keras model, but a module from sklearn that also has a fit function with a similar API.
In order to use save_model and load_model you need the actual Keras model, my guess is it is your classifier. Specifically, an instance of the Model class from Keras.
I'm using LogisticRegressionCV on my data in a pipeline. After fitting to the data, I'd like to return my optimal C value. How do I do this since I can't use .best_params_ since that is a feature of GridSearchCV. I know that .C_ is the correct feature of LogisticRegressionCV, but my estimator is in a pipeline, so that doesn't work right now.
lr_cv2 = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegressionCV(solver='liblinear', cv=10, Cs=np.logspace(-5, 8, 15) ))])
lr_cv2.fit(X_train, y_train)
lr_cv2.C_
AttributeError: 'Pipeline' object has no attribute 'C_'
By using the named_steps method of your instance of Pipeline, you can access to the methods composing the single elements of your pipeline:
print(lr_cv2.named_steps['classifier'].C_ )
Problem
I am trying to use scikit-learn's LogisticRegressionCV with roc_auc_score as the scoring metric.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
clf = LogisticRegressionCV(scoring=roc_auc_score)
But when I attempt to fit the model (clf.fit(X, y)), it throws an error.
ValueError: average has to be one of (None, 'micro', 'macro', 'weighted', 'samples')
That's cool. It's clear what's going on: roc_auc_score needs to be called with the average argument specified, per its documentation and the error above. So I tried that.
clf = LogisticRegressionCV(scoring=roc_auc_score(average='weighted'))
But it turns out that roc_auc_score can't be called with an optional argument alone, because this throws another error.
TypeError: roc_auc_score() takes at least 2 arguments (1 given)
Question
Any thoughts on how I can use roc_auc_score as the scoring metric for LogisticRegressionCV in a way that I can specify an argument for the scoring function?
I can't find an SO question on this issue or a discussion of this issue in scikit-learn's GitHub repo, but surely someone has run into this before?
You can use make_scorer, e.g.
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import roc_auc_score, make_scorer
from sklearn.datasets import make_classification
# some example data
X, y = make_classification()
# little hack to filter out Proba(y==1)
def roc_auc_score_proba(y_true, proba):
return roc_auc_score(y_true, proba[:, 1])
# define your scorer
auc = make_scorer(roc_auc_score_proba, needs_proba=True)
# define your classifier
clf = LogisticRegressionCV(scoring=auc)
# train
clf.fit(X, y)
# have look at the scores
print clf.scores_
I found a way to solve this problem!
scikit-learn offers a make_scorer function in its metrics module that allows a user to create a scoring object from one of its native scoring functions with arguments specified to non-default values (see here for more information on this function from the scikit-learn docs).
So, I created a scoring object with the average argument specified.
roc_auc_weighted = sk.metrics.make_scorer(sk.metrics.roc_auc_score, average='weighted')
Then, I passed that object in the call to LogisticRegressionCV and it ran without any issues!
clf = LogisticRegressionCV(scoring=roc_auc_weighted)
A bit late (4 years later). But today you can use:
clf = LogisticRegressionCV(scoring='roc_auc')
Also, all other scoring keys can be obtained through:
from sklearn.metrics import SCORERS
print(SCORERS.keys())