How to get the model loss in sklearn - python

Whenever an sklearn model is fit to some data, it minimizes some loss function. How can I obtain the model loss using that loss function?
e.g.
model = sklearn.linear_model.LogisticRegression().fit(X_train,y_train)
model.get_loss(X_train, y_train) #gives the loss for these values
model.get_loss(X_test, y_test) #gives the loss for other values
Note that the .score method does NOT do this thing.

LogisticRegression minimises log loss, so you would expect the loss to be the .score, only negated. However, this actually returns the mean accuracy.
To calculate log loss you need to use the log_loss metric:
I haven't tested it, but something like this:
from sklearn.metrics import log_loss
model = sklearn.linear_model.LogisticRegression().fit(X_train, y_train)
loss = log_loss(X_test, model.predict_proba(X_test), eps=1e-15)

Related

Keras model.evaluate() print vector of loss values

Using model.evaluate() prints the mean loss over the test set. Is it possible to instead get out a vector off loss values for all data points in the test set (the average of which would then be what is printed by model.evaluate())?
As a built-in function, the loss is usually more useful as the average so that you can do different visualizations. As for getting the results, just compute them yourself and you will be set. For example
def squared_error(y_true, y_pred):
return tf.math.square(y_true - y_pred)
predictions = model.predict(x_test)
losses = squared_error(y_test, predictions)
Alternatively, it may be doable using built in loss functions with a batch size of 1 like this
loss_fn = tf.keras.losses.mse
# x_test should be (batches, your_dims) where you make sure each batch only has 1 sample
predictions = model.predict(x_test)
losses = loss_fn(y_test, predictions)

Metric for K-fold Cross Validation for Regression models

I wanted to do Cross Validation on a regression (non-classification ) model and ended getting mean accuracies of about 0.90. however, i don't know what metric is used in the method to find out the accuracies. I know how splitting in k-fold cross validation works . I just don't know the formula that the scikit learn library is using to calculate the accuracy of prediction. (I know how it works for classification model though). Can someone give me the metric/formula used by sklearn.model_selection.cross_val_score?
Thanks in advance.
from sklearn.model_selection import cross_val_score
def metrics_of_accuracy(classifier , X_train , y_train) :
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
accuracies.std()
return accuracies
By default, sklearn uses accuracy in case of classification and r2_score for regression when you use the model.score method(same for cross_val_score). So r2_score in this case whose formula is
r2 = 1 - (SSE(y_hat)/SSE(y_mean))
where
SSE(y_hat) is the squared error for predictions made
SSE(y_mean) is the squared error when all predictions are the mean of the actual predictions
Yes, Also I can use the same metric using sklearn.metrics-> r2_score.
r2_score(y_true, y_pred). This score is also called Coefficient of determination or R-squared.
The formula for the same is as follows -
Find the link to image below.
https://i.stack.imgur.com/USaWH.png
For more on this -
https://en.wikipedia.org/wiki/Coefficient_of_determination

How to implement Sklearn Metric in Keras as Metric?

Tried googling up, but could not find how to implement Sklearn metrics like cohen kappa, roc, f1score in keras as a metric for imbalanced data.
How to implement Sklearn Metric in Keras as Metric?
Metrics in Keras and in Sklearn mean different things.
In Keras metrics are almost same as loss. They get called during training at the end of each batch and each epoch for reporting and logging purposes. Example use is having the loss 'mse' but you still would like to see 'mae'. In this case you can add 'mae' as a metrics to the model.
In Sklearn metric functions are applied on predictions as per the definition "The metrics module implements functions assessing prediction error for specific purposes". While there's an overlap, the statistical functions of Sklearn doesn't fit to the definition of metrics in Keras. Sklearn metrics can return float, array, 2D array with both dimensions greater than 1. There is no such object in Keras by the predict method.
Answer to your question:
It depends where you want to trigger:
End of each batch or each epoch
You can write a custom callback that is fired at the end of batch.
After prediction
This seems to be easier. Let Keras predict on the entire dataset, capture the result and then feed the y_true and y_pred arrays to the respective Sklearn metric.
Everything you need to live is in the confuse matrix. Calculate the confuse matrix and follow my formulas:
In practice, this is done as follows:
from sklearn.metrics import confusion_matrix
NBC = NBC.fit(X_train,y_train)
cm = confusion_matrix(y_test, NBC.predict(X_test))
tn, fp, fn, tp = cm.ravel()
print('tn: ',tn)
print('fp: ',fp)
print('fn: ',fn)
print('tp: ',tp)
print('------------------')
print(cm)
and now:
p_0 = (tn+𝑡𝑝)/(tn+fp+fn+𝑡𝑝)
print('p_0:',p_0)
P_class0 = ((tn+fp)/(tn+fp+fn+𝑡𝑝))*((tn+fn)/(tn+fp+fn+𝑡𝑝))
print('P_yes: ',P_yes)
P_class1 = ((fn+𝑡𝑝)/(tn+fp+fn+𝑡𝑝))*((fp+𝑡𝑝)/(tn+fp+fn+𝑡𝑝))
print('P_no: ',P_no)
pe = P_yes + P_no
print('pe: ',pe)
Îș = (p_0-pe)/(1-pe)
print('Îș: ',Îș)

Unable to obtain accuracy score for my linear

I am working on my regression model based on the IMDB data, to predict IMDB value. On my linear-regression, i was unable to obtain the accuracy score.
my line of code:
metrics.accuracy_score(test_y, linear_predicted_rating)
Error :
ValueError: continuous is not supported
if i were to change that line to obtain the r2 score,
metrics.r2_score(test_y,linear_predicted_rating)
i was able to obtain r2 without any error.
Any clue why i am seeing this?
Thanks.
Edit:
One thing i found out is test_y is panda data frame whereas the linear_predicted_rating is in numpy array format.
metrics.accuracy_score is used to measure classification accuracy, it can't be used to measure accuracy of regression model because it doesn't make sense to see accuracy for regression - predictions rarely can equal the expected values. And if predictions differ from expected values by 1%, the accuracy will be zero, though these predictions are great
Here are some metrics for regression: http://scikit-learn.org/stable/modules/classes.html#regression-metrics
NOTE: Accuracy (e.g. classification accuracy) is a measure for classification, not regression so we can't calculate accuracy for a regression model. For regression, one of the matrices we've to get the score (ambiguously termed as accuracy) is R-squared (R2).
You can get the R2 score (i.e accuracy) of your prediction using the score(X, y, sample_weight=None) function from LinearRegression as follows by changing the logic accordingly.
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
r2_score = regressor.score(x_test,y_test)
print(r2_score*100,'%')
output (a/c to my model)
86.23%
The above is R squared value and not the accuracy :
# R squared value
metrics.explained_variance_score(y_test, predictions)
What does your variables look like. Code below works well.
from sklearn import metrics
test_y, linear_predicted_rating = [1,2,3,4], [1,2,3,5]
metrics.accuracy_score(test_y, linear_predicted_rating)
You can not predict the accuracy of regression model,however you can analyze your model using Mean absolute error ,Mean squared error ,Root mean squared error,Max error,median error R-square etc.
for reference
you can go this to gain more knowledge

Relationship between sklearn .fit() and .score()

While working with a linear regression model I split the data into a training set and test set. I then calculated R^2, RMSE, and MAE using the following:
lm.fit(X_train, y_train)
R2 = lm.score(X,y)
y_pred = lm.predict(X_test)
RMSE = np.sqrt(metrics.mean_squared_error(y_test, y_pred))
MAE = metrics.mean_absolute_error(y_test, y_pred)
I thought that I was calculating R^2 for the entire data set (instead of comparing the training and original data). However, I learned that you must fit the model before you score it, therefore I'm not sure if I'm scoring the original data (as inputted in R2) or the data that I used to fit the model (X_train, and y_train). When I run:
lm.fit(X_train, y_train)
lm.score(X_train, y_train)
I get a different result than what I got when I was scoring X and y. So my question is are the inputs to the .score parameter compared to the model that was fitted (thereby making lm.fit(X,y); lm.score(X,y) the R^2 value for the original data and lm.fit(X_train, y_train); lm.score(X,y) the R^2 value for the original data based off the model created in .fit.) or is something else entirely happening?
fit() that only fit the data which is synonymous to train, that is fit the data means train the data.
score is something like testing or predict.
So one should use different dataset for training the classifier and testing the acuracy
One can do like this.
X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.2)
clf=neighbors.KNeighborsClassifier()
clf.fit(X_train,y_train)
accuracy=clf.score(X_test,y_test)

Categories

Resources