There is a .score() function for classifiers that sklearn provides to us like LogisticRegression,DecisionTreeClassifier,etc.Does this score function returns the score on the basis of accuracy of its prediction?If yes then what about the cases where accuracy might not be the best parameter to evaluate the performance of the model?Is the score function self adjusting according to the use cases?
Yes, as you can see from the documentation of LogisticRegression and DecisionTreeClassifier, the score method returns the "Mean accuracy of self.predict(X) wrt. y.". So, it does indeed return the accuracy of the predictions.
In cases where you want to use other metrics to evaluate a model's performance, you can use the metrics provided in the scikit-learn library which you can find on the scikit-learn's website.
An example would be using F1 as a metric. You can have your true values y_true and your predicted values y_pred, then calling f1_score(y_true, y_pred) to get the F1 result.
Related
I'm using BayesSearchCV from scikit-optimize to train a model on a fairly imbalanced dataset. From what I'm reading precision or ROC AUC would be the best metrics for imbalanced dataset. In my code:
knn_b = BayesSearchCV(estimator=pipe, search_spaces=search_space, n_iter=40, random_state=7, scoring='roc_auc')
knn_b.fit(X_train, y_train)
The number of iterations is just a random value I chose (although I get a warning saying I already reached the best result, and there is not a way to early stop as far as I'm aware?). For the scoring parameter, I specified roc_auc, which I'm assuming it will be the primary metric to monitor for the best parameter in the results. So when I call knn_b.best_params_, I should have the parameters where the roc_auc metrics is higher. Is that correct?
My confusion is when I look at the results using knn_b.cv_results_. Shouldn't the mean_test_score be the roc_auc score because of the scoring param in the BayesSearchCV class? What I'm doing it plotting the results and seeing how each combination of params performed.
sns.relplot(
data=knn_b.cv_results_, kind='line', x='param_classifier__n_neighbors', y='mean_test_score',
hue='param_scaler', col='param_classifier__p',
)
When I try to use to roc_auc_score() function on the true and predicted values, I get something completely different.
Is the mean_test_score here different? How would I be able to get the individual/mean roc_auc score of each CV/split of each iteration? Similarly for when I want to use RandomizedSearchCV or GridSearchCV.
EDIT: tldr; I want to know what's being computed exactly in mean_test_score. I thought it was roc_auc because of the scoring param, or accuracy, but it seems to be neither.
mean_test_score is the AUROC, because of your scoring parameter, yes.
Your main problem is that the ROC curve (and the area under it) require the probability predictions (or other continuous score), not the hard class predictions. Your manual calculation is thus incorrect.
You shouldn't expect exactly the same score anyway. Your second score is on the test set, and the first score is optimistically biased by the hyperparameter selection.
For example, if I am trying to use the line of code:
cross_val_score(model, X, y)
Would the model be:
model = KNeighborsClassifier().fit(X,y)
or
model = KNeighborsClassifier()
It seems like both will be accepted. My intuition was that using the fitted model as an estimator would always produce a one hundred percent accuracy score since the subset of data points that are being tested each fold were already used to train the model, but this doesn't look like its the case. What is cross_val_score "doing" with the model parameter?
According to documentation, the cross_val_score helper function would call fit() function on each model (by default 5) and evaluate them on hold-out portion so it doesn't matter if you have fitted the data already or not, weights would be smashed away!
I use GridSearchCV to find the best parameters in the inner loop of my nested cross-validation. The 'inner winner' is found using GridSearchCV(scorer='balanced_accuracy'), so as I understand the documentation the model with the highest balanced accuracy on average in the inner folds is the 'best_estimator'. I don't understand what the different arguments for refit in GridSearchCV do in combination with the scorer argument. If refit is True, what scoring function will be used to estimate the performance of that 'inner winner' when refitted to the dataset? The same scoring function that was passed to scorer (so in my case 'balanced_accuracy')? Why can you pass also a string to refit? Does that mean that you can use different functions for 1.) finding the 'inner winner' and 2.) to estimate the performance of that 'inner winner' on the whole dataset?
When refit=True, sklearn uses entire training set to refit the model. So, there is no test data left to estimate the performance using any scorer function.
If you use multiple scorer in GridSearchCV, maybe f1_score or precision along with your balanced_accuracy, sklearn needs to know which one of those scorer to use to find the "inner winner" as you say. For example with KNN, f1_score might have best result with K=5, but accuracy might be highest for K=10. There is no way for sklearn to know which value of hyper-parameter K is the best.
To resolve that, you can pass one string scorer to refit to specify which of those scorer should ultimately decide best hyper-parameter. This best value will then be used to retrain or refit the model using full dataset. So, when you've got just one scorer, as your case seems to be, you don't have to worry about this. Simply refit=True will suffice.
I'm using scikit-learn's LassoCV function. During cross-validation, what scoring metric is being used by default?
I would like cross-validation to be based on "Mean squared error regression loss". Can one use this metric with LassoCV? One can specify a scoring metric for LogisticRegressionCV, so it may be possible with LassoCV too?
LassoCV uses R^2 as the scoring metric. From the docs:
By default, parameter search uses the score function of the estimator
to evaluate a parameter setting. These are the
sklearn.metrics.accuracy_score for classification and
sklearn.metrics.r2_score for regression.
To use an alternative scoring metric, such as mean squared error, you need to use GridSearchCV or RandomizedSearchCV (instead of LassoCV) and specify the scoring parameter as scoring='neg_mean_squared_error'. From the docs:
An alternative scoring function can be specified via the scoring
parameter to GridSearchCV, RandomizedSearchCV and many of the
specialized cross-validation tools described below.
I think the accepted answer is wrong, as it quotes the documentation of Grid Search, but LassoCV uses regularisation paths, not grid search.
In fact in the docs page for LassoCV, it says that the loss function is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Meaning that its minimising the MSE (plus the LASSO term).
I use linear SVM from scikit learn (LinearSVC) for binary classification problem. I understand that LinearSVC can give me the predicted labels, and the decision scores but I wanted probability estimates (confidence in the label). I want to continue using LinearSVC because of speed (as compared to sklearn.svm.SVC with linear kernel) Is it reasonable to use a logistic function to convert the decision scores to probabilities?
import sklearn.svm as suppmach
# Fit model:
svmmodel=suppmach.LinearSVC(penalty='l1',C=1)
predicted_test= svmmodel.predict(x_test)
predicted_test_scores= svmmodel.decision_function(x_test)
I want to check if it makes sense to obtain Probability estimates simply as [1 / (1 + exp(-x)) ] where x is the decision score.
Alternately, are there other options wrt classifiers that I can use to do this efficiently?
Thanks.
scikit-learn provides CalibratedClassifierCV which can be used to solve this problem: it allows to add probability output to LinearSVC or any other classifier which implements decision_function method:
svm = LinearSVC()
clf = CalibratedClassifierCV(svm)
clf.fit(X_train, y_train)
y_proba = clf.predict_proba(X_test)
User guide has a nice section on that. By default CalibratedClassifierCV+LinearSVC will get you Platt scaling, but it also provides other options (isotonic regression method), and it is not limited to SVM classifiers.
I took a look at the apis in sklearn.svm.* family. All below models, e.g.,
sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.SVR
sklearn.svm.NuSVR
have a common interface that supplies a
probability: boolean, optional (default=False)
parameter to the model. If this parameter is set to True, libsvm will train a probability transformation model on top of the SVM's outputs based on idea of Platt Scaling. The form of transformation is similar to a logistic function as you pointed out, however two specific constants A and B are learned in a post-processing step. Also see this stackoverflow post for more details.
I actually don't know why this post-processing is not available for LinearSVC. Otherwise, you would just call predict_proba(X) to get the probability estimate.
Of course, if you just apply a naive logistic transform, it will not perform as well as a calibrated approach like Platt Scaling. If you can understand the underline algorithm of platt scaling, probably you can write your own or contribute to the scikit-learn svm family. :) Also feel free to use the above four SVM variations that support predict_proba.
If you want speed, then just replace the SVM with sklearn.linear_model.LogisticRegression. That uses the exact same training algorithm as LinearSVC, but with log-loss instead of hinge loss.
Using [1 / (1 + exp(-x))] will produce probabilities, in a formal sense (numbers between zero and one), but they won't adhere to any justifiable probability model.
If what your really want is a measure of confidence rather than actual probabilities, you can use the method LinearSVC.decision_function(). See the documentation.
Just as an extension for binary classification with SVMs: You could also take a look at SGDClassifier which performs a gradient Descent with a SVM by default. For estimation of the binary-probabilities it uses the modified huber loss by
(clip(decision_function(X), -1, 1) + 1) / 2)
An example would look like:
from sklearn.linear_model import SGDClassifier
svm = SGDClassifier(loss="modified_huber")
svm.fit(X_train, y_train)
proba = svm.predict_proba(X_test)