Using classification_report to evaluate a Keras model

Using classification_report to evaluate a Keras model - python

The Problem:
During training the performance of my model looks quite allright. However, the results of the classification_report from sklearn yields a precision, recall and f1 of zero almost everywhere. What am I doing wrong to get such a missmatch between training performance and inference? (I am using Keras with a TensorFlow backend.)
My code:
I use the valiation_split argument to generate two generators (train, validation) like so:
train_datagen = ImageDataGenerator(
rescale=1. / 255, validation_split=0.15)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical', subset="training")
validation_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical', subset="validation", shuffle=False)
I set shuffle=False in my validation_generator to make sure it does not mix the relationship of images and labels for my evaluation later on.
Next, I train my model like so:
history = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
verbose=1)
Performance is allright:
Epoch 1/5
187/187 [==============================] - 44s 233ms/step - loss: 0.7835 - acc: 0.6744 - val_loss: 1.2918 - val_acc: 0.6079
Epoch 2/5
187/187 [==============================] - 42s 225ms/step - loss: 0.7578 - acc: 0.6901 - val_loss: 1.2962 - val_acc: 0.6149
Epoch 3/5
187/187 [==============================] - 40s 216ms/step - loss: 0.7535 - acc: 0.6907 - val_loss: 1.3426 - val_acc: 0.6061
Epoch 4/5
187/187 [==============================] - 41s 217ms/step - loss: 0.7388 - acc: 0.6977 - val_loss: 1.2866 - val_acc: 0.6149
Epoch 5/5
187/187 [==============================] - 41s 217ms/step - loss: 0.7282 - acc: 0.6960 - val_loss: 1.2988 - val_acc: 0.6297
Now, I extract the necessary info for the classification_report following the method suggested here https://github.com/keras-team/keras/issues/2607#issuecomment-302365916 . This gives me the following:
validation_steps_per_epoch = np.math.ceil(validation_generator.samples / validation_generator.batch_size)
predictions = model.predict_generator(validation_generator, steps=validation_steps_per_epoch)
# Get most likely class
predicted_classes = np.argmax(predictions, axis=1)
true_classes = validation_generator.classes
class_labels = list(validation_generator.class_indices.keys())
Finally, I output the classification report using:
from sklearn.metrics import classification_report
report = classification_report(true_classes, predicted_classes, target_names=class_labels)
print(report)
Which results in zeros all over the place (see avgs. below):
precision recall f1-score support
micro avg 0.01 0.01 0.01 2100
macro avg 0.01 0.01 0.01 2100
weighted avg 0.01 0.01 0.01 2100

Related

Unbelievable difference of custom metric between fit and evaluate

I run a tensorflow u-net model without dropout (but BN) with a custom metric called "average accuracy". This is literally the section of code. As you can see, datasets must be the same as I do nothing in between fit and evaluate.
model.fit(x=train_ds, epochs=epochs, validation_data=val_ds, shuffle=True,
callbacks=callbacks)
model.evaluate(train_ds)
model.evaluate(val_ds)
train_ds and val_ds are tf.Dataset. And here the output.
...
Epoch 10/10
148/148 [==============================] - 103s 698ms/step - loss: 0.1765 - accuracy: 0.5872 - average_accuracy: 0.9620 - val_loss: 0.5845 - val_accuracy: 0.5788 - val_average_accuracy: 0.5432
148/148 [==============================] - 22s 118ms/step - loss: 0.5056 - accuracy: 0.4540 - average_accuracy: 0.3654
29/29 [==============================] - 5s 122ms/step - loss: 0.5845 - accuracy: 0.5788 - average_accuracy: 0.5432
There is an unbelievable difference between average_accuracy during training (fit) and average_accuracy of evaluate (both on training dataset). I know that BN can have this effect and also that performance changes during training so they will never be equal. But from 96% to 36%?
My custom accuracy is defined here but I doubt it's my personal implementation as it should be somehow close no matter what I did (I think).
Any hint here is useful. I don't know if I should review the custom metric, the dataset, the model. It seems outside all of them.
I tried to continue training after stopping and average_accuracy starts from where it left at more than 90%.
Context of custom metric. I use it for semantic segmentation. So each image has an image of labels as output of WxHx4 (4 are my total number of classes).
It computes the average accuracy, for example, the accuracy of each class separately and then, if they were 4 classes it does sum(accuracies per class) / 4.
Here the main code:
def custom_average_accuracy(y_true, y_pred):
# Mask to remove the labels (y_true) that are zero: ex. [0, 0, 0]
remove_zeros_mask = tf.math.logical_not(tf.math.reduce_all(tf.math.logical_not(tf.cast(y_true, bool)), axis=-1))
y_true = tf.boolean_mask(y_true, remove_zeros_mask)
y_pred = tf.boolean_mask(y_pred, remove_zeros_mask)
num_cls = y_true.shape[-1]
y_pred = tf.math.argmax(y_pred, axis=-1) # ex. [0, 0, 1] -> [2]
y_true = tf.math.argmax(y_true, axis=-1)
accuracies = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
for i in range(0, num_cls):
cls_mask = y_true == i
cls_y_true = tf.boolean_mask(y_true, cls_mask)
if not tf.equal(tf.size(cls_y_true), 0): # Some images don't have all the classes present.
new_acc = _accuracy(y_true=cls_y_true, y_pred=tf.boolean_mask(y_pred, cls_mask))
accuracies = accuracies.write(accuracies.size(), new_acc)
accuracies = accuracies.stack()
return tf.math.reduce_sum(accuracies) / tf.cast(len(accuracies), dtype=accuracies.dtype)
I believe the problem might be on the if not tf.equal(tf.size(cls_y_true), 0) line but I still can't seem were.
More wird information. This is exactly my lines of code:
x_input, y_true = np.concatenate([x for x, y in ds], axis=0), np.concatenate([y for x, y in ds], axis=0)
model.evaluate(x=x_input, y=y_true) # This gets 38% accuracy
model.evaluate(ds) # This gets 55% accuracy
What the hell is going on here? How can those lines of code give a different result?!?!
So now I have that if I don't do the ds = ds.shuffle() the example up (30ish vs 50ish ACC values) are Ok.

I tried to reproduce this behavior but could not find the discrepancies you noted. The only thing I changed was not tf.equal to tf.math.not_equal:
import pathlib
import tensorflow as tf
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
num_classes = 5
batch_size = 32
img_height = 180
img_width = 180
val_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
def to_categorical(images, labels):
return images, tf.one_hot(labels, num_classes)
train_ds = train_ds.map(to_categorical)
val_ds = val_ds.map(to_categorical)
model = tf.keras.Sequential([
tf.keras.layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes)
])
def _accuracy(y_true, y_pred):
y_true.shape.assert_is_compatible_with(y_pred.shape)
if y_true.dtype != y_pred.dtype:
y_pred = tf.cast(y_pred, y_true.dtype)
reduced_sum = tf.reduce_sum(tf.cast(tf.math.equal(y_true, y_pred), tf.keras.backend.floatx()), axis=-1)
return tf.math.divide_no_nan(reduced_sum, tf.cast(tf.shape(y_pred)[-1], reduced_sum.dtype))
def custom_average_accuracy(y_true, y_pred):
# Mask to remove the labels (y_true) that are zero: ex. [0, 0, 0]
remove_zeros_mask = tf.math.logical_not(tf.math.reduce_all(tf.math.logical_not(tf.cast(y_true, bool)), axis=-1))
y_true = tf.boolean_mask(y_true, remove_zeros_mask)
y_pred = tf.boolean_mask(y_pred, remove_zeros_mask)
num_cls = y_true.shape[-1]
y_pred = tf.math.argmax(y_pred, axis=-1) # ex. [0, 0, 1] -> [2]
y_true = tf.math.argmax(y_true, axis=-1)
accuracies = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
for i in range(0, num_cls):
cls_mask = y_true == i
cls_y_true = tf.boolean_mask(y_true, cls_mask)
if tf.math.not_equal(tf.size(cls_y_true), 0): # Some images don't have all the classes present.
new_acc = _accuracy(y_true=cls_y_true, y_pred=tf.boolean_mask(y_pred, cls_mask))
accuracies = accuracies.write(accuracies.size(), new_acc)
accuracies = accuracies.stack()
return tf.math.reduce_sum(accuracies) / tf.cast(len(accuracies), dtype=accuracies.dtype)
model.compile(optimizer='adam',
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
metrics=['accuracy', custom_average_accuracy])
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs)
model.evaluate(train_ds)
model.evaluate(val_ds)
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Epoch 1/10
92/92 [==============================] - 11s 95ms/step - loss: 1.6220 - accuracy: 0.2868 - custom_average_accuracy: 0.2824 - val_loss: 1.2868 - val_accuracy: 0.4946 - val_custom_average_accuracy: 0.4597
Epoch 2/10
92/92 [==============================] - 8s 85ms/step - loss: 1.2131 - accuracy: 0.4785 - custom_average_accuracy: 0.4495 - val_loss: 1.2051 - val_accuracy: 0.4673 - val_custom_average_accuracy: 0.4350
Epoch 3/10
92/92 [==============================] - 8s 84ms/step - loss: 1.0713 - accuracy: 0.5620 - custom_average_accuracy: 0.5404 - val_loss: 1.1070 - val_accuracy: 0.5232 - val_custom_average_accuracy: 0.5003
Epoch 4/10
92/92 [==============================] - 8s 83ms/step - loss: 0.9463 - accuracy: 0.6281 - custom_average_accuracy: 0.6203 - val_loss: 0.9880 - val_accuracy: 0.5967 - val_custom_average_accuracy: 0.5755
Epoch 5/10
92/92 [==============================] - 8s 84ms/step - loss: 0.8400 - accuracy: 0.6771 - custom_average_accuracy: 0.6730 - val_loss: 0.9420 - val_accuracy: 0.6308 - val_custom_average_accuracy: 0.6245
Epoch 6/10
92/92 [==============================] - 8s 83ms/step - loss: 0.7594 - accuracy: 0.7027 - custom_average_accuracy: 0.7004 - val_loss: 0.8972 - val_accuracy: 0.6431 - val_custom_average_accuracy: 0.6328
Epoch 7/10
92/92 [==============================] - 8s 82ms/step - loss: 0.6211 - accuracy: 0.7619 - custom_average_accuracy: 0.7563 - val_loss: 0.8999 - val_accuracy: 0.6431 - val_custom_average_accuracy: 0.6174
Epoch 8/10
92/92 [==============================] - 8s 82ms/step - loss: 0.5108 - accuracy: 0.8116 - custom_average_accuracy: 0.8046 - val_loss: 0.8809 - val_accuracy: 0.6689 - val_custom_average_accuracy: 0.6457
Epoch 9/10
92/92 [==============================] - 8s 83ms/step - loss: 0.3985 - accuracy: 0.8535 - custom_average_accuracy: 0.8534 - val_loss: 0.9364 - val_accuracy: 0.6676 - val_custom_average_accuracy: 0.6539
Epoch 10/10
92/92 [==============================] - 8s 83ms/step - loss: 0.3023 - accuracy: 0.8995 - custom_average_accuracy: 0.9010 - val_loss: 1.0118 - val_accuracy: 0.6662 - val_custom_average_accuracy: 0.6405
92/92 [==============================] - 6s 62ms/step - loss: 0.2038 - accuracy: 0.9363 - custom_average_accuracy: 0.9357
23/23 [==============================] - 2s 50ms/step - loss: 1.0118 - accuracy: 0.6662 - custom_average_accuracy: 0.663

Well, I was using a TensorFlow dataset. I changed to NumPy and now all seems logical and works.
Still, I need to know the reason tf ds didn't work but at least I don't longer have these weird results.
Not tested yet (I would need to get the code back to what it was, probably do it someday) but this might be related.

Confusion Matrix giving Bad Results but Validation Accuracy ~95%

This is my Code, I have around 5000 images in Training and roughly 532 in test data. My Val_accuracy shows 95% but when i create Confusion matrix and classification report, it gives very poor results on validation/test set, out of 532 images it predicts 314 correct (TP). I think the problem lies in setting batch_size and other hyperparameters. Please HELP, This is for my Research Paper. Please help, I'M stuck badly!
import os
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.applications import xception
from keras.layers import *
from keras.models import *
from keras.preprocessing import image
model = xception.Xception(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
for layers in model.layers:
layers.trainable=False
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(256, activation='relu')(flat1)
output = Dense(1, activation='sigmoid')(class1)
model = Model(inputs = model.inputs, outputs = output)
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy'])
train_datagen = image.ImageDataGenerator(
rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
)
test_datagen = image.ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
'/Users/xd_anshul/Desktop/Research/Major/CovidDataset/Train',
target_size = (224,224),
batch_size = 10,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'/Users/xd_anshul/Desktop/Research/Major/CovidDataset/Test',
target_size = (224,224),
batch_size = 10,
class_mode='binary')
hist = model.fit(
train_generator,
steps_per_epoch=9,
epochs=5,
validation_data=validation_generator,
validation_steps=2)
from sklearn.metrics import classification_report, confusion_matrix
Y_pred = model.predict(validation_generator)
y_pred = [1 * (x[0]>=0.5) for x in Y_pred]
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
print('Classification Report')
target_names = ['Covid', 'Normal']
print(classification_report(validation_generator.classes, y_pred,
target_names=target_names))
OUTPUT:
Epoch 1/5
9/9 [==============================] - 21s 2s/step - loss: 0.2481 - accuracy: 0.9377 - val_loss: 4.1552 - val_accuracy: 0.9500
Epoch 2/5
9/9 [==============================] - 16s 2s/step - loss: 1.9680 - accuracy: 0.9767 - val_loss: 15.5336 - val_accuracy: 0.8500
Epoch 3/5
9/9 [==============================] - 16s 2s/step - loss: 0.2898 - accuracy: 0.9867 - val_loss: 0.0000e+00 - val_accuracy: 1.0000
Epoch 4/5
9/9 [==============================] - 16s 2s/step - loss: 1.4597 - accuracy: 0.9640 - val_loss: 2.3671 - val_accuracy: 0.9500
Epoch 5/5
9/9 [==============================] - 16s 2s/step - loss: 3.3822 - accuracy: 0.9365 - val_loss: 3.5101e-22 - val_accuracy: 1.0000
Confusion Matrix
[[314 96]
[ 93 29]]
Classification Report
precision recall f1-score support
Covid 0.77 0.77 0.77 410
Normal 0.23 0.24 0.23 122
accuracy 0.64 532
macro avg 0.50 0.50 0.50 532
weighted avg 0.65 0.64 0.65 532

Let's say your predictions array is something like:
preds_sigmoid = np.array([[0.8451], [0.454], [0.5111]])
containing these values as sigmoid squeeze them in a range of [0,1]. When you apply argmax as you did, you will get index 0 everytime because argmax returns the maximum index at specified axis.
pred = np.argmax(preds_sigmoid , axis = 1) # pred is full of zeros.
You should evaluate the predictions like if it is bigger than some threshold, let's say 0.5, it belongs to second class. You can use list comprehension for this:
pred = [1 * (x[0]>=0.5) for x in preds_sigmoid]
Therefore predictions will be handled properly.

Why is history storing auc and val_auc with incrementing integers (auc_2, auc_4, ...)?

I am beginner with keras and today I bumped into this sort of issue I don't know how to handle. The values for auc and val_auc are being stored in history with the first even integers, like auc, auc_2, auc_4, auc_6... and so on.
This is preventing me from managing and studying those values along my Kfold cross validation, as I cannot access history.history['auc'] value because there is not always such key 'auc'. Here is the code:
from tensorflow.keras.models import Sequential # pylint: disable= import-error
from tensorflow.keras.layers import Dense # pylint: disable= import-error
from tensorflow.keras import Input # pylint: disable= import-error
from sklearn.model_selection import StratifiedKFold
from keras.utils.vis_utils import plot_model
from keras.metrics import AUC, Accuracy # pylint: disable= import-error
BATCH_SIZE = 32
EPOCHS = 10
K = 5
N_SAMPLE = 1168
METRICS = ['AUC', 'accuracy']
SAVE_PATH = '../data/exp/final/submodels/'
def create_mlp(model_name, keyword, n_sample= N_SAMPLE, batch_size= BATCH_SIZE, epochs= EPOCHS):
df = readCSV(n_sample)
skf = StratifiedKFold(n_splits = K, random_state = 7, shuffle = True)
for train_index, valid_index in skf.split(np.zeros(n_sample), df[['target']]):
x_train, y_train, x_valid, y_valid = get_train_valid_dataset(keyword, df, train_index, valid_index)
model = get_model(keyword)
history = model.fit(
x = x_train,
y = y_train,
validation_data = (x_valid, y_valid),
epochs = epochs
)
def get_train_valid_dataset(keyword, df, train_index, valid_index):
aux = df[[c for c in columns[keyword]]]
return aux.iloc[train_index].values, df['target'].iloc[train_index].values, aux.iloc[valid_index].values, df['target'].iloc[valid_index].values
def create_callbacks(model_name, save_path, fold_var):
checkpoint = ModelCheckpoint(
save_path + model_name + '_' +str(fold_var),
monitor=CALLBACK_MONITOR,
verbose=1,
save_best_only= True,
save_weights_only= True,
mode='max'
)
return [checkpoint]
In main.py I call create_mlp('model0', 'euler', n_sample=100), and the log is (only relevant lines):
Epoch 9/10
32/80 [===========>..................] - ETA: 0s - loss: 0.6931 - auc: 0.5000 - acc: 0.5625
Epoch 00009: val_auc did not improve from 0.50000
80/80 [==============================] - 0s 1ms/sample - loss: 0.6931 - auc: 0.5000 - acc: 0.5000 - val_loss: 0.6931 - val_auc: 0.5000 - val_acc: 0.5000
Epoch 10/10
32/80 [===========>..................] - ETA: 0s - loss: 0.6932 - auc: 0.5000 - acc: 0.4375
Epoch 00010: val_auc did not improve from 0.50000
80/80 [==============================] - 0s 1ms/sample - loss: 0.6931 - auc: 0.5000 - acc: 0.5000 - val_loss: 0.6931 - val_auc: 0.5000 - val_acc: 0.5000
Train on 80 samples, validate on 20 samples
Epoch 1/10
32/80 [===========>..................] - ETA: 0s - loss: 0.7644 - auc_2: 0.3075 - acc: 0.5000WARNING:tensorflow:Can save best model only with val_auc available, skipping.
80/80 [==============================] - 1s 10ms/sample - loss: 0.7246 - auc_2: 0.4563 - acc: 0.5250 - val_loss: 0.6072 - val_auc_2: 0.8250 - val_acc: 0.6500
Epoch 2/10
32/80 [===========>..................] - ETA: 0s - loss: 0.7046 - auc_2: 0.4766 - acc: 0.5000WARNING:tensorflow:Can save best model only with val_auc available, skipping.
80/80 [==============================] - 0s 1ms/sample - loss: 0.6511 - auc_2: 0.6322 - acc: 0.5625 - val_loss: 0.5899 - val_auc_2: 0.8000 - val_acc: 0.6000
Any help will be appreciated. I am using:
keras==2.3.1
tensorflow==1.14.0

Use tf.keras.backend.clear_session()
https://www.tensorflow.org/api_docs/python/tf/keras/backend/clear_session

In this line of code:
for train_index, valid_index in skf.split(np.zeros(n_sample), df[['target']]):
What is actually happening is that you are running multiple training instances, in principle 5 as defaulted by sklearn.
Although you get different training and validation sets in :
x_train, y_train, x_valid, y_valid = get_train_valid_dataset(keyword, df, train_index, valid_index)
When your run model.fit(),
history = model.fit(
x = x_train,
y = y_train,
validation_data = (x_valid, y_valid),
epochs = epochs,
callbacks=create_callbacks(keyword + '_' + model_name, SAVE_PATH, folder)
)
You can see that the parameters for create_callbacks are static and do not change from one training instance to another. Keyword, model_name, SAVE_PATH and folder are arguments that remain constant during the 5 instances of your training.
Therefore, in TensorBoard, all the results at written at the same path.
You do not want to do that, you want each iteration to have its result written at different paths.
You have to change the logdir parameter, give it a unique identifier. In that situation, each training iteration will have written its results in separate locations, and thus the confusion will disappear.

I solved the issue by changing to tensorflow==2.1.0. Hope it can help anybody else.

Deep learning model is training on very less data

I'm training a deep learning model on 100000 rows with 80% of the training data and 20% of test data. The data is splitting however my model is showing the output of training with 2242. Below is the training code with model and output given. Any help will be highly appreciated.
Training Code:
import time
start_time = time.time()
from sklearn.feature_extraction.text import TfidfVectorizer
tweet_table = cleaning_table(tweet_table)
def tokenization_tweets(dataset, features):
tokenization = TfidfVectorizer(max_features=features)
tokenization.fit(dataset)
dataset_transformed = tokenization.transform(dataset).toarray()
return dataset_transformed
def splitting(table):
X_train, X_test, y_train, y_test = train_test_split(table.tweet, table.test, test_size=0.2, shuffle=True)
return X_train, X_test, y_train, y_test
if __name__ == "__main__":
tweet_table['test'] = tweet_table['Overall_Sentiment'].apply(lambda x: 1 if x == 'Positive' else (0 if x == 'Negative' else 2))
if __name__ == "__main__":
X_train, X_test, y_train, y_test = splitting(tweet_table)
#print(tweet_table["test"].value_counts())
#print(tweet_table["Overall_Sentiment"].value_counts())
#print(list(set(y_train)))
#print(list(set(y_test)))
#Create a Neural Network
#Create the model
def train(X_train_mod, y_train, features, shuffle, drop, layer1, layer2, epoch, lr, epsilon, validation):
model_nn = Sequential()
model_nn.add(Dense(layer1, input_shape=(features,), activation='relu'))
model_nn.add(Dropout(drop))
model_nn.add(Dense(layer2, activation='sigmoid'))
model_nn.add(Dropout(drop))
model_nn.add(Dense(3, activation='softmax'))
optimizer = keras.optimizers.Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=epsilon, decay=0.0, amsgrad=False)
model_nn.compile(loss='sparse_categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model_nn.fit(np.array(X_train_mod), y_train,
batch_size=32,
epochs=epoch,
verbose=1,
validation_split=validation,
shuffle=shuffle)
return model_nn
def test(X_test, model_nn):
prediction = model_nn.predict(X_test)
return prediction
def model1(X_train, y_train):
features = 3500
shuffle = True
drop = 0.5
layer1 = 512
layer2 = 256
epoch = 5
lr = 0.001
epsilon = None
validation = 0.1
X_train_mod = tokenization_tweets(X_train, features)
model = train(X_train_mod, y_train, features, shuffle, drop, layer1, layer2, epoch, lr, epsilon, validation)
return model;
#model1(X_train, y_train)
#model11(X_train, y_train)
def save_model(model):
# lets assume `model` is main model
model_json = model.to_json()
with open("model.json", "w") as json_file:
json.dump(model_json, json_file)
model.save_weights("model_weights.h5")
#print(len(X_train))
#print(len(y_train))
model_final = model1(X_train, y_train)
Output:
Epoch 1/5
2242/2242 [==============================] - 6s 3ms/step - loss: 0.3426 - accuracy: 0.8476 - val_loss: 0.2690 - val_accuracy: 0.8857
Epoch 2/5
2242/2242 [==============================] - 6s 3ms/step - loss: 0.2399 - accuracy: 0.9015 - val_loss: 0.2471 - val_accuracy: 0.8991
Epoch 3/5
2242/2242 [==============================] - 6s 3ms/step - loss: 0.1912 - accuracy: 0.9205 - val_loss: 0.2447 - val_accuracy: 0.9028
Epoch 4/5
2242/2242 [==============================] - 6s 3ms/step - loss: 0.1454 - accuracy: 0.9399 - val_loss: 0.2547 - val_accuracy: 0.9083
Epoch 5/5
2242/2242 [==============================] - 6s 3ms/step - loss: 0.1046 - accuracy: 0.9552 - val_loss: 0.2874 - val_accuracy: 0.9084
--- 192.1562056541443 seconds ---
Many Thanks

Constant Validation Accuracy with a high loss in machine learning

I'm currently trying to do create an image classification model using Inception V3 with 2 classes. I have 1428 images which are balanced about 70/30. When I run my model I get a pretty high loss of as well as a constant validation accuracy. What might be causing this constant value?
data = np.array(data, dtype="float")/255.0
labels = np.array(labels,dtype ="uint8")
(trainX, testX, trainY, testY) = train_test_split(
data,labels,
test_size=0.2,
random_state=42)
img_width, img_height = 320, 320 #InceptionV3 size
train_samples = 1145
validation_samples = 287
epochs = 20
batch_size = 32
base_model = keras.applications.InceptionV3(
weights ='imagenet',
include_top=False,
input_shape = (img_width,img_height,3))
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(350,activation='relu'))
model_top.add(keras.layers.Dropout(0.2))
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
for layer in model.layers[:30]:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
lr=0.00001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-08),
loss='binary_crossentropy',
metrics=['accuracy'])
#Image Processing and Augmentation
train_datagen = keras.preprocessing.image.ImageDataGenerator(
zoom_range = 0.05,
#width_shift_range = 0.05,
height_shift_range = 0.05,
horizontal_flip = True,
vertical_flip = True,
fill_mode ='nearest')
val_datagen = keras.preprocessing.image.ImageDataGenerator()
train_generator = train_datagen.flow(
trainX,
trainY,
batch_size=batch_size,
shuffle=True)
validation_generator = val_datagen.flow(
testX,
testY,
batch_size=batch_size)
history = model.fit_generator(
train_generator,
steps_per_epoch = train_samples//batch_size,
epochs = epochs,
validation_data = validation_generator,
validation_steps = validation_samples//batch_size,
callbacks = [ModelCheckpoint])
This is my log when I run my model:
Epoch 1/20
35/35 [==============================]35/35[==============================] - 52s 1s/step - loss: 0.6347 - acc: 0.6830 - val_loss: 0.6237 - val_acc: 0.6875
Epoch 2/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6364 - acc: 0.6756 - val_loss: 0.6265 - val_acc: 0.6875
Epoch 3/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6420 - acc: 0.6743 - val_loss: 0.6254 - val_acc: 0.6875
Epoch 4/20
35/35 [==============================]35/35 [==============================] - 14s 414ms/step - loss: 0.6365 - acc: 0.6851 - val_loss: 0.6289 - val_acc: 0.6875
Epoch 5/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6359 - acc: 0.6727 - val_loss: 0.6244 - val_acc: 0.6875
Epoch 6/20
35/35 [==============================]35/35 [==============================] - 15s 415ms/step - loss: 0.6342 - acc: 0.6862 - val_loss: 0.6243 - val_acc: 0.6875

I think you have too low learning rate and too few epochs. try with lr = 0.001 and epochs = 100.

Your accuracy is 68.25%. Given that your classes are split roughly 70/30 it is likely that your model is just predicting the same thing every time, ignoring the input. That would give the accuracy you are seeing. Your model has not yet learned from your data.
As Novak said, your learning rate seems very low, so maybe try increasing that first to see if that helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using classification_report to evaluate a Keras model - python

Related

Unbelievable difference of custom metric between fit and evaluate

Confusion Matrix giving Bad Results but Validation Accuracy ~95%

Why is history storing auc and val_auc with incrementing integers (auc_2, auc_4, ...)?

Deep learning model is training on very less data

Constant Validation Accuracy with a high loss in machine learning

Categories

Resources