Binary classification - Class 1 testing metrics are all zeros

Binary classification - Class 1 testing metrics are all zeros - python

I am working on a binary classification problem for Spam/Not spam emails using Keras and tensorflow. Training accuracy is perfect and the AUC as well.
This is the Frequency of the data
{"Not Spam(0)": 2500, "Spam(1)": 499}
This is the model architecture:
LSTM_model=Sequential()
LSTM_model.add(Embedding(2000,8, input_length=100))
LSTM_model.add(LSTM(units=128))
LSTM_model.add(Dropout(0.2))
LSTM_model.add(Dense(1, activation='sigmoid'))
LSTM_model.compile(optimizer='rmsprop', loss='binary_crossentropy',metrics=['acc'])
Here is the last epoch training
Epoch 10/10
33/33 [==============================] - 0s 13ms/step - loss: 0.0227 - acc: 0.9971 - val_loss: 0.0457 - val_acc: 0.9900
training time was: 6.537448167800903
And this is the classification report of the Testing data
precision recall f1-score support
0 0.84 1.00 0.91 754
1 0.00 0.00 0.00 146
accuracy 0.84 900
macro avg 0.42 0.50 0.46 900
weighted avg 0.70 0.84 0.76 900
The accuracy is acceptable but the metrics of the data for the second class are all zeros, after some research, I found that the reason might be because of an imbalanced dataset. However, I deleted some rows to balance the dataset and the metrics was as shown below:
precision recall f1-score support
0 0.47 1.00 0.64 142
1 0.00 0.00 0.00 158
accuracy 0.47 300
macro avg 0.24 0.50 0.32 300
weighted avg 0.22 0.47 0.30 300
and as you can see I still have the same error.
Although the AUC is : 0.9989511111111111
After saving the model in both cases and predicting real-world examples, all examples are predicted false, even when we try some data from the training set.
the prediction was made using this code line
result=model.predict(x_test)
and here is the confusion matrices code
print(classification_report(y_test,np.argmax(result,axis=1),zero_division=0))
Please help.

Related

Default positive class in multilevel sklearn classification

I am working on a churn classification with 3 classes 0, 1,2 but want to optimize class 0 and 1 for recall, does that mean sklearn needs to take classes 0 & 1 to be the positive classes. How can I explicitly mention for which class do I want to optimise recall , if that is not possible should I consider renaming the classes in an ascending order so that 1, 2 are default positive?
precision recall f1-score support
0 0.71 0.18 0.28 2611
1 0.57 0.54 0.56 5872
2 0.70 0.88 0.78 8913
accuracy 0.66 17396
macro avg 0.66 0.53 0.54 17396
weighted avg 0.66 0.66 0.63 17396
Here is the code I am using for reference (although I need more of an understanding of how to optimize for recall for only 0, 1 class here)
param_test1={'learning_rate':(0.05,0.1),'max_depth':(3,5)}
estimator=GridSearchCV(estimator=GradientBoostingClassifier(loss='deviance',subsample=0.8,random_state=10,
n_estimators=200),param_grid=param_test1,cv=2, refit='recall_score')
estimator.fit(df[predictors],df[target])

how to adjust accuracy only for 1's?

Suppose I have such data :
x1 x2 x3 y
0.85 0.95 0.22 1
0.35 0.26 0.42 0
0.89 0.82 0.82 1
0.36 0.14 0.32 0
0.44 0.53 0.82 1
0.75 0.78 0.52 1
I predict binary classification but the only thing that matters ,is the correct prediction of the 1s, and if the prediction is 0, it will not affect my accuracy.
I simply used the following code :
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
But this code also includes zeros in its accuracy.
How can I apply to the network that only the prediction of 1 is important ?
In other words, During fitting model, if the prediction was zero , this zero predication does not apply to the model accuracy.

It looks like you care about precision of the model. Precision means for all instances that you predict 1, what portion of them is correct.
If yes, use tf.keras.metrics.Precision() as metrics.
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=[tf.keras.metrics.Precision()])

Metrics F1 warning zero division

I want to calculate the F1 score of my models. But I receive a warning and get a 0.0 F1-score and I don't know what to do.
here is the source code:
def model_evaluation(dict):
for key,value in dict.items():
classifier = Pipeline([('tfidf', TfidfVectorizer()),
('clf', value),
])
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)
print("Accuracy Score of" , key , ": ", metrics.accuracy_score(y_test,predictions))
print(metrics.classification_report(y_test,predictions))
print(metrics.f1_score(y_test, predictions, average="weighted", labels=np.unique(predictions), zero_division=0))
print("---------------","\n")
dlist = { "KNeighborsClassifier": KNeighborsClassifier(3),"LinearSVC":
LinearSVC(), "MultinomialNB": MultinomialNB(), "RandomForest": RandomForestClassifier(max_depth=5, n_estimators=100)}
model_evaluation(dlist)
And here is the result:
Accuracy Score of KNeighborsClassifier : 0.75
precision recall f1-score support
not positive 0.71 0.77 0.74 13
positive 0.79 0.73 0.76 15
accuracy 0.75 28
macro avg 0.75 0.75 0.75 28
weighted avg 0.75 0.75 0.75 28
0.7503192848020434
---------------
Accuracy Score of LinearSVC : 0.8928571428571429
precision recall f1-score support
not positive 1.00 0.77 0.87 13
positive 0.83 1.00 0.91 15
accuracy 0.89 28
macro avg 0.92 0.88 0.89 28
weighted avg 0.91 0.89 0.89 28
0.8907396950875212
---------------
Accuracy Score of MultinomialNB : 0.5357142857142857
precision recall f1-score support
not positive 0.00 0.00 0.00 13
positive 0.54 1.00 0.70 15
accuracy 0.54 28
macro avg 0.27 0.50 0.35 28
weighted avg 0.29 0.54 0.37 28
0.6976744186046512
---------------
C:\Users\Cey\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1272: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Accuracy Score of RandomForest : 0.5714285714285714
precision recall f1-score support
not positive 1.00 0.08 0.14 13
positive 0.56 1.00 0.71 15
accuracy 0.57 28
macro avg 0.78 0.54 0.43 28
weighted avg 0.76 0.57 0.45 28
0.44897959183673475
---------------
Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier
Second:
When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message:
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
What should I do here ?

Together with UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples (main credits go there) and #yatu's answer, I could at least find a workaround for the warning:
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0
due to no predicted samples. Use zero_division parameter to control
this behavior. _warn_prf(average, modifier, msg_start, len(result))
Quote from sklearn.metrics.f1_score in the Notes at the bottom:
When true positive + false positive == 0, precision is undefined. When
true positive + false negative == 0, recall is undefined. In such
cases, by default the metric will be set to 0, as will f-score, and
UndefinedMetricWarning will be raised. This behavior can be modified
with zero_division.
Thus, you cannot avoid this error if your data does not output a difference between true positives and false positives.
That being said, you can only suppress the warning at least, adding zero_division=0 to the functions mentioned in the quote. In either case, set to 0 or 1, you will get a 0 value as the return anyway.
precision = precision_score(y_test, y_pred, zero_division=0)
print('Precision score: {0:0.2f}'.format(precision))
recall = recall_score(y_test, y_pred, zero_division=0)
print('Recall score: {0:0.2f}'.format(recall))
f1 = f1_score(y_test, y_pred, zero_division=0)
print('f1 score: {0:0.2f}'.format(recall))

Can someone tell me what to do? I only receive this message when using the "MultinomialNB()" classifier
The first error seems to be indicating that a specific label is not predicted when using the MultinomialNB, which results in an undefined f-score, or ill-defined, since the missing values are set to 0. This is explained here
When extending the dictionary by using the Gausian classifier (GaussianNB()) I receive this error message:
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
As per this question, the error is quite explicit, the issue is that TfidfVectorizer is returning a sparse matrix, which cannot be used as input for the GaussianNB. So the way I see it, you either avoid using the GaussianNB, or you add an intermediate transformer to turn the sparse array to dense, which I wouldn't advise being the result of a tf-idf vectorization.

Same value for Keras 2.3.0 metrics accuracy, precision and recall

I'm trying to get keras metrics for accuracy, precision and recall, but all three of them are showing the same value, which is actually the accuracy.
I'm using the metrics list provided in an example of TensorFlow documentation:
metrics = [keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc')]
Model is a pretty basic CNN for image classification:
model = Sequential()
model.add(Convolution2D(32,
(7, 7),
padding ="same",
input_shape=(255, 255, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64,
(3, 3),
padding ="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256,
activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes,
activation='softmax'))
Compiling with the metric list shown above:
model.compile(loss=loss,
optimizer=optimizer,
metrics=metrics)
This is an example of the problem I see all the time while training:
Epoch 1/15
160/160 [==============================] - 6s 37ms/step - loss: 0.6402 - tp: 215.0000 - fp: 105.0000 - tn: 215.0000 - fn: 105.0000 - accuracy: 0.6719 - precision: 0.6719 - recall: 0.6719 - auc: 0.7315 - val_loss: 0.6891 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.7102
Epoch 2/15
160/160 [==============================] - 5s 30ms/step - loss: 0.6929 - tp: 197.0000 - fp: 123.0000 - tn: 197.0000 - fn: 123.0000 - accuracy: 0.6156 - precision: 0.6156 - recall: 0.6156 - auc: 0.6941 - val_loss: 0.6906 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.6759
Metrics per fold, with the same value for accuracy, precision and recall every time:
['loss', 'tp', 'fp', 'tn', 'fn', 'accuracy', 'precision', 'recall', 'auc']
[[ 0.351 70. 10. 70. 10. 0.875 0.875 0.875 0.945]
[ 0.091 78. 2. 78. 2. 0.975 0.975 0.975 0.995]
[ 0.253 72. 8. 72. 8. 0.9 0.9 0.9 0.974]
[ 0.04 78. 2. 78. 2. 0.975 0.975 0.975 0.999]
[ 0.021 80. 0. 80. 0. 1. 1. 1. 1. ]]
sklearn.metrics.classification_report shows right precision and recall
================ Fold 1 =====================
Accuracy: 0.8875
precision recall f1-score support
normal 0.84 0.95 0.89 38
pm 0.95 0.83 0.89 42
accuracy 0.89 80
macro avg 0.89 0.89 0.89 80
weighted avg 0.89 0.89 0.89 80
================ Fold 2 =====================
Accuracy: 0.9375
precision recall f1-score support
normal 1.00 0.87 0.93 38
pm 0.89 1.00 0.94 42
accuracy 0.94 80
macro avg 0.95 0.93 0.94 80
weighted avg 0.94 0.94 0.94 80
================ Fold 3 =====================
Accuracy: 0.925
precision recall f1-score support
normal 0.88 0.97 0.92 37
pm 0.97 0.88 0.93 43
accuracy 0.93 80
macro avg 0.93 0.93 0.92 80
weighted avg 0.93 0.93 0.93 80
================ Fold 4 =====================
Accuracy: 0.925
precision recall f1-score support
normal 0.97 0.86 0.91 37
pm 0.89 0.98 0.93 43
accuracy 0.93 80
macro avg 0.93 0.92 0.92 80
weighted avg 0.93 0.93 0.92 80
================ Fold 5 =====================
Accuracy: 1.0
precision recall f1-score support
normal 1.00 1.00 1.00 37
pm 1.00 1.00 1.00 43
accuracy 1.00 80
macro avg 1.00 1.00 1.00 80
weighted avg 1.00 1.00 1.00 80

When I posted my question I didn't realize the true positives and false positives had also the same value as true negatives and false negatives. My validation set has 80 observations, so these metrics for tp, fp, tn and fn actually meant that 70 observations were correctly predicted while 10 were wrong, no matter the class of each observation:
10.
I wasn't able to figure out why all these metrics were messed up, maybe it's just the issue Zabir Al Nazi kindly mentioned. However, I was able to get proper metrics thanks to some small changes:
Loss function: binary_crossentropy instead of categorical_crossentropy.
Top layer: 1 neuron sigmoid instead of n_classes neurons softmax.
Labels shape: 1D numpy array instead of one-hot encoded.
I hope this can help someone else.

The problem of having equal TP and TN lies on the use of labels formatted as one-hot encoded vectors for binary classification. The labels in one-hot encoded vector are expressed as: [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]], so, whenever the algorithm predicts correct the class A expressed as [1,0] in the label; the metrics receive as correct both the TP of A and the TN for class B. Therefore, it ends up having 70 TP and 70 TN on a sample of 80 observations.
The solution described in your update with more details:
Transform the output of the dense layer to have 1 output class:
model.add(Dense(1, activation='sigmoid'))
Change the format of y to 1d array having [1,1,0,0,1,0….,1,0] instead of one-hot vector [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]] and
Change the loss function to BinaryCrossentropy like: model.compile(loss="BinaryCrossentropy", optimizer=optimizer, metrics=metrics)
Keras does not offer an "automatic transition" from a multi-label classification problem to a binary one.

There is some issue with precision and recall already.
Look at this issue: https://github.com/keras-team/keras/issues/5400
You can try tensorflow.keras instead. The issue should go away.
Or, you can use custom implementation and pass those in the compile function.
from keras import backend as K
def check_units(y_true, y_pred):
if y_pred.shape[1] != 1:
y_pred = y_pred[:,1:2]
y_true = y_true[:,1:2]
return y_true, y_pred
def precision(y_true, y_pred):
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def recall(y_true, y_pred):
y_true, y_pred = check_units(y_true, y_pred)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (possible_positives + K.epsilon())
return recall
metrics = [keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
precision,
recall,
keras.metrics.AUC(name='auc')]

How can I improve massively classification report of one class using ensemble model?

I have a dataset including
{0: 6624, 1: 75} 0 for nonobservational sentences and 1 for observational sentences. (basically, I annotate my sentences using Named Entity Recognition, If there is a specific entity like DATA, TIME, LONG (coordinate) I put label 1)
Now I want to make a model to classify them, the best model (CV =3 FOR ALL) that I made is the ensembling model of
clf= SGDClassifier()
trial_05=Pipeline([("vect",vec),("clf",clf)])
which has:
precision recall f1-score support
0 1.00 1.00 1.00 6624
1 0.73 0.57 0.64 75
micro avg 0.99 0.99 0.99 6699
macro avg 0.86 0.79 0.82 6699
weighted avg 0.99 0.99 0.99 669
[[6611 37]
[ 13 38]]
and this model which used resampled sgd for classifcation
precision recall f1-score support
0 1.00 0.92 0.96 6624
1 0.13 1.00 0.22 75
micro avg 0.92 0.92 0.92 6699
macro avg 0.56 0.96 0.59 6699
weighted avg 0.99 0.92 0.95 6699
[[6104 0]
[ 520 75]]
As you see the problem in both cases is class 1, but in forst one we have fairly good precision and f1 score versus in the second one we have a very good recall
So I decided to use ensemble model using both in this way:
from sklearn.ensemble import VotingClassifier#create a dictionary of our models
estimators=[("trail_05",trial_05), ("resampled", SGD_RESAMPLED_Model)]#create our voting classifier, inputting our models
ensemble = VotingClassifier(estimators, voting='hard')
now I have this result:
precision recall f1-score support
0 0.99 1.00 1.00 6624
1 0.75 0.48 0.59 75
micro avg 0.99 0.99 0.99 6699
macro avg 0.87 0.74 0.79 6699
weighted avg 0.99 0.99 0.99 6699
[[6612 39]
[ 12 36]]
As you the ensembe model has better precision regarding to class 1,but worse recall and f1 socre which caused to worse confusion matrix regarding classed 1 (36 TP vs 38 TP for class 1)
MY aim is to improve TP for class one (f1 score, recall for class 1)
what do you recommend to improve TP for class one (f1score, recall for class 1?
generaly do you have any idea regarding my workflow?
I have tried parameter tuning, it i does not improve sgd model.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Binary classification - Class 1 testing metrics are all zeros - python

Related

Default positive class in multilevel sklearn classification

how to adjust accuracy only for 1's?

Metrics F1 warning zero division

Same value for Keras 2.3.0 metrics accuracy, precision and recall

How can I improve massively classification report of one class using ensemble model?

Categories

Resources