Confusion Matrix giving Bad Results but Validation Accuracy ~95% - python

This is my Code, I have around 5000 images in Training and roughly 532 in test data. My Val_accuracy shows 95% but when i create Confusion matrix and classification report, it gives very poor results on validation/test set, out of 532 images it predicts 314 correct (TP). I think the problem lies in setting batch_size and other hyperparameters. Please HELP, This is for my Research Paper. Please help, I'M stuck badly!
import os
import numpy as np
import matplotlib.pyplot as plt
import keras
from keras.applications import xception
from keras.layers import *
from keras.models import *
from keras.preprocessing import image
model = xception.Xception(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
for layers in model.layers:
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(256, activation='relu')(flat1)
output = Dense(1, activation='sigmoid')(class1)
model = Model(inputs = model.inputs, outputs = output)
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy'])
train_datagen = image.ImageDataGenerator(
rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
test_datagen = image.ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
target_size = (224,224),
batch_size = 10,
validation_generator = test_datagen.flow_from_directory(
target_size = (224,224),
batch_size = 10,
hist =
from sklearn.metrics import classification_report, confusion_matrix
Y_pred = model.predict(validation_generator)
y_pred = [1 * (x[0]>=0.5) for x in Y_pred]
print('Confusion Matrix')
print(confusion_matrix(validation_generator.classes, y_pred))
print('Classification Report')
target_names = ['Covid', 'Normal']
print(classification_report(validation_generator.classes, y_pred,
Epoch 1/5
9/9 [==============================] - 21s 2s/step - loss: 0.2481 - accuracy: 0.9377 - val_loss: 4.1552 - val_accuracy: 0.9500
Epoch 2/5
9/9 [==============================] - 16s 2s/step - loss: 1.9680 - accuracy: 0.9767 - val_loss: 15.5336 - val_accuracy: 0.8500
Epoch 3/5
9/9 [==============================] - 16s 2s/step - loss: 0.2898 - accuracy: 0.9867 - val_loss: 0.0000e+00 - val_accuracy: 1.0000
Epoch 4/5
9/9 [==============================] - 16s 2s/step - loss: 1.4597 - accuracy: 0.9640 - val_loss: 2.3671 - val_accuracy: 0.9500
Epoch 5/5
9/9 [==============================] - 16s 2s/step - loss: 3.3822 - accuracy: 0.9365 - val_loss: 3.5101e-22 - val_accuracy: 1.0000
Confusion Matrix
[[314 96]
[ 93 29]]
Classification Report
precision recall f1-score support
Covid 0.77 0.77 0.77 410
Normal 0.23 0.24 0.23 122
accuracy 0.64 532
macro avg 0.50 0.50 0.50 532
weighted avg 0.65 0.64 0.65 532

Let's say your predictions array is something like:
preds_sigmoid = np.array([[0.8451], [0.454], [0.5111]])
containing these values as sigmoid squeeze them in a range of [0,1]. When you apply argmax as you did, you will get index 0 everytime because argmax returns the maximum index at specified axis.
pred = np.argmax(preds_sigmoid , axis = 1) # pred is full of zeros.
You should evaluate the predictions like if it is bigger than some threshold, let's say 0.5, it belongs to second class. You can use list comprehension for this:
pred = [1 * (x[0]>=0.5) for x in preds_sigmoid]
Therefore predictions will be handled properly.


Regression model output looks inconsistent

Some context about my project: I intend to study various parameters about bullets and how they affect the ballistics coefficient (i.e. bullet performance) of the projectile. I have different parameters, such as weight, caliber, sectional density, etc. I feel that I did this all wrong though; I am just reading through tutorials and applying what I feel could be useful and relevant in my project.
The output of my regression model looks a bit off to me; the trained model continuously outputs 0.0201 as MSE throughout the part of my program.
Also, the model.predict(X) seems to have an accuracy of 100%, however, this does not seem right; I borrowed some code from a tutorial describing Keras models to display the model output while displaying the expected output.
This is the program constructing the model and training it
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.utils import shuffle
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from pandas.plotting import scatter_matrix
import time
name = 'Bullet Database Analysis v2-{}'.format(int(time.time()))
tensorboard = TensorBoard(log_dir='logs/{}'.format(name))
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
dataset = df.values
X = dataset[:,0:12]
X = np.asarray(X).astype(np.float32)
y = dataset[:,13]
y = np.asarray(y).astype(np.float32)
X_train, X_val_and_test, y_train, y_val_and_test = train_test_split(X, y, test_size=0.3, shuffle=True)
X_val, X_test, y_val, y_test = train_test_split(X_val_and_test, y_val_and_test, test_size=0.5)
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization
model = Sequential(
#2430 is the shape of X_train
#BatchNormalization(axis=-1, momentum = 0.1),
Dense(2430, activation='relu'),
Dense(32, activation='relu'),
model.compile(loss='mse', metrics=['mse'])
history =, y_train,
validation_data=(X_val, y_val),
#callbacks = [tensorboard]
# plt.plot(history.history['loss'],'r')
# plt.plot(history.history['val_loss'],'m')
model.summary()"Bullet Optimization\Bullet Database Analysis.h5")
Here is my code, loading my previously trained model via h5
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.models import load_model
import pandas as pd
df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')
model = load_model('Bullet Optimization\Bullet Database Analysis.h5')
dataset = df.values
X = dataset[:,0:12]
y = dataset[:,13],y, epochs=10)
#predictions = np.argmax(model.predict(X), axis=-1)
predictions = model.predict(X)
# summarize the first 5 cases
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
This is the output
Epoch 1/10
2021-03-09 10:38:06.372303: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublas64_11.dll
2021-03-09 10:38:07.747241: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublasLt64_11.dll
109/109 [==============================] - 2s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 2/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 3/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 4/10
109/109 [==============================] - 0s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 5/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 6/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 7/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 8/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 9/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 10/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
[0.314, 7.9756, 100.0, 100.0, 31.4, 0.00314, 318.4713376, 6.480041472000001, 0.51, 12.95400001, 4.067556004, 0.145] => 0 (expected 0)
[0.358, 9.0932, 148.0, 148.0, 52.983999999999995, 0.002418919, 413.4078212, 9.590461379, 0.635, 16.12900002, 5.774182006, 0.165] => 0 (expected 0)
[0.313, 7.9502, 83.0, 83.0, 25.979, 0.003771084, 265.1757188, 5.378434422000001, 0.504, 12.80160001, 4.006900804, 0.121] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.4, 10.16000001, 2.5501600030000002, 0.113] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.41, 10.41400001, 2.613914003, 0.113] => 0 (expected 0)
Here is a link to my training dataset. Within my code, I used train_test_split to create both the test and train dataset.
Lastly, is there a way within Tensorboard to visualize the model fitting with the dataset? I really feel that although my model is training, it is not making any significant fitting even though the MSE error is reduced.
Because you have nan values in your dataset. Before splitting up you can check it with df.isna().sum(). These can have a negative impact on your network. Here I just simply dropped them (df.dropna(inplace = True, axis = 0)) but you can use some imputation techniques to replace them.
Also 2430 neurons can be overkill for this data, start with less neurons.
model = tf.keras.models.Sequential(
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(32, activation='relu'),
Here is the last epoch:
Epoch 20/20
27/27 [==============================] - 0s 8ms/step - loss: 8.2077e-04 - mse: 8.2077e-04 -
val_loss: 8.5023e-04 - val_mse: 8.5023e-04
While doing regression, calculating accuracy straight forward is not a valid option. You can use model.evaluate(X_test, y_test) or when you get predictions by model.predict, you can use other regression metrics to compute how close your predictions are.

Calculating the Accuracy of A Keras Neural Network in Python

I have created a Keras neural network. The neural network was trained during eight epochs, and it outputs this loss value and accuracy:
Epoch 1/8
2009/2009 [==============================] - 0s 177us/step - loss: 0.0824 - acc: 4.9776e-04
Epoch 2/8
2009/2009 [==============================] - 0s 34us/step - loss: 0.0080 - acc: 4.9776e-04
Epoch 3/8
2009/2009 [==============================] - 0s 37us/step - loss: 0.0071 - acc: 4.9776e-04
Epoch 4/8
2009/2009 [==============================] - 0s 38us/step - loss: 0.0071 - acc: 4.9776e-04
Epoch 5/8
2009/2009 [==============================] - 0s 35us/step - loss: 0.0070 - acc: 4.9776e-04
Epoch 6/8
2009/2009 [==============================] - 0s 38us/step - loss: 0.0071 - acc: 4.9776e-04
Epoch 7/8
2009/2009 [==============================] - 0s 36us/step - loss: 0.0068 - acc: 4.9776e-04
Epoch 8/8
2009/2009 [==============================] - 0s 40us/step - loss: 0.0070 - acc: 4.9776e-04
How do I interpret the loss function provided within the output?
Is there any way to find the variation percentage between the actual price and prediction for every single day in the data set?
Here is the neural network:
import tensorflow as tf
import keras
import numpy as np
#import quandle
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
import math
import pandas_datareader as web
def func_stock_prediction(stockdata, start, end):
start = start
end = end
df = web.DataReader(stockdata, "yahoo", start, end)
df = df[['Close']]
previous = 5
def create_dataset(df, previous):
dataX, dataY = [], []
for i in range(len(df)-previous-1):
a = df[i:(i+previous), 0]
dataY.append(df[i + previous, 0])
return np.array(dataX), np.array(dataY)
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (0, 1))
df = scaler.fit_transform(df)
train_size = math.ceil(len(df) * 0.5)
train, val = df[0:train_size,:], df[train_size:len(df),:]
X_train, Y_train = create_dataset(train, previous)
X_val, Y_val = create_dataset(val, previous)
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_val = np.reshape(X_val, (X_val.shape[0], 1, X_val.shape[1]))
model = keras.models.Sequential()
model.add(keras.layers.Dense(units = 64, activation = 'relu', input_shape = (1, 5)))
model.add(keras.layers.Dense(units = 1, activation = 'linear'))
history =, Y_train, epochs=8)
train = model.predict(X_train)
val = model.predict(X_val)
train = scaler.inverse_transform(train)
Y_train = scaler.inverse_transform([Y_train])
val = scaler.inverse_transform(val)
Y_val = scaler.inverse_transform([Y_val])
predictions = val
trainPlot = np.empty_like(df)
trainPlot[:, :] = np.nan
trainPlot[previous:len(train)+previous, :] = train
valPlot = np.empty_like(df)
valPlot[:, :] = np.nan
valPlot[len(train)+(previous*2)+1:len(df)-1, :] = val
inversetransform, =plt.plot(scaler.inverse_transform(df))
train, =plt.plot(trainPlot)
val, =plt.plot(valPlot)
plt.xlabel('Number of Days')
plt.ylabel('Stock Price')
plt.title("Predicted vs. Actual Stock Price Per Day")
func_stock_prediction("PLAY", 2010-1-1, 2020-1-1)
You are using accuracy as a metric. Accuracy measures the proportion of predicted labels that match the true labels. Accuracy is used mostly (to my knowledge) for classification tasks. As far as I know, the accuracy is not really interpretable when you're predicting a continuous outcome variable.
Based on your code, it looks like you're using the neural network for a regression problem (you're predicting a continuous variable). For regression problem meterics, people often use "mean squared error", "root mean squared error", "mean absolute error", "R^2", etc.
If you're interested in percentage differences, then maybe you could try the keras loss, "mean_absolute_percentage_error".

Why is my CNN pre trained image classifier overfitting?

I have just started with Computer Vision and in the current task i am classifying images in 4 categories.
Total number of image files=1043
I am using pretrained InceptionV3 and fine tuning it on my dataset.
This is what i have after the epoch:
Epoch 1/5
320/320 [==============================] - 1925s 6s/step - loss: 0.4318 - acc: 0.8526 - val_loss: 1.1202 - val_acc: 0.5557
Epoch 2/5
320/320 [==============================] - 1650s 5s/step - loss: 0.1807 - acc: 0.9446 - val_loss: 1.2694 - val_acc: 0.5436
Epoch 3/5
320/320 [==============================] - 1603s 5s/step - loss: 0.1236 - acc: 0.9572 - val_loss: 1.2597 - val_acc: 0.5546
Epoch 4/5
320/320 [==============================] - 1582s 5s/step - loss: 0.1057 - acc: 0.9671 - val_loss: 1.3845 - val_acc: 0.5457
Epoch 5/5
320/320 [==============================] - 1580s 5s/step - loss: 0.0982 - acc: 0.9700 - val_loss: 1.2771 - val_acc: 0.5572
That is a huge difference. Kindly help me to figure out why is my model not able to generalize as it is fitting quite well on the train data.
my code for reference:-
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Dropout
from keras.applications.inception_v3 import InceptionV3, preprocess_input
# setup model
base_model = InceptionV3(weights='imagenet', include_top=False)
from sklearn.preprocessing import OneHotEncoder
x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['Category']= encoder.fit_transform(df['Category'])
from keras.preprocessing.image import ImageDataGenerator
WIDTH = 299
HEIGHT = 299
train_datagen = ImageDataGenerator(rescale=1./255,preprocessing_function=preprocess_input)
validation_datagen = ImageDataGenerator(rescale=1./255)
df['Category'] =df['Category'].astype(str)
#dfval['Category'] = dfval['Category'].astype(str)
from sklearn.utils import shuffle
df = shuffle(df)
from sklearn.model_selection import train_test_split
dftrain,dftest = train_test_split(df, test_size = 0.2, random_state = 0)
train_generator = train_datagen.flow_from_dataframe(dftrain,target_size=(HEIGHT, WIDTH),batch_size=BATCH_SIZE,class_mode='categorical', x_col='Path', y_col='Category')
validation_generator = validation_datagen.flow_from_dataframe(dftest,target_size=(HEIGHT, WIDTH),batch_size=BATCH_SIZE,class_mode='categorical', x_col='Path', y_col='Category')
MODEL_FILE = 'filename.model'
history = model.fit_generator(
Any help would be appreciated :)
If you don't use preprocess_input in "all" your data, you will get terrible results.
Look at these:
train_datagen = ImageDataGenerator(
validation_datagen = ImageDataGenerator()
Now, I notice you are using rescale. Since you imported the correct preprocess_input function from the inception code, I really think you should not be using this rescale. The preprocess_input function is supposed to do all the necessary preprocessing. (Not all models were trained with normalized inputs)
But would rescale be a problem if you're applying it to both batasets?
Well... if the trainable=False applied correctly to the BatchNormalization layers, this means that these layers have stored values for mean and variation which will only work well if the data is within the expected range.

Using classification_report to evaluate a Keras model

The Problem:
During training the performance of my model looks quite allright. However, the results of the classification_report from sklearn yields a precision, recall and f1 of zero almost everywhere. What am I doing wrong to get such a missmatch between training performance and inference? (I am using Keras with a TensorFlow backend.)
My code:
I use the valiation_split argument to generate two generators (train, validation) like so:
train_datagen = ImageDataGenerator(
rescale=1. / 255, validation_split=0.15)
train_generator = train_datagen.flow_from_directory(
target_size=(img_height, img_width),
class_mode='categorical', subset="training")
validation_generator = train_datagen.flow_from_directory(
target_size=(img_height, img_width),
class_mode='categorical', subset="validation", shuffle=False)
I set shuffle=False in my validation_generator to make sure it does not mix the relationship of images and labels for my evaluation later on.
Next, I train my model like so:
history = model.fit_generator(
steps_per_epoch=nb_train_samples // batch_size,
validation_steps=nb_validation_samples // batch_size,
Performance is allright:
Epoch 1/5
187/187 [==============================] - 44s 233ms/step - loss: 0.7835 - acc: 0.6744 - val_loss: 1.2918 - val_acc: 0.6079
Epoch 2/5
187/187 [==============================] - 42s 225ms/step - loss: 0.7578 - acc: 0.6901 - val_loss: 1.2962 - val_acc: 0.6149
Epoch 3/5
187/187 [==============================] - 40s 216ms/step - loss: 0.7535 - acc: 0.6907 - val_loss: 1.3426 - val_acc: 0.6061
Epoch 4/5
187/187 [==============================] - 41s 217ms/step - loss: 0.7388 - acc: 0.6977 - val_loss: 1.2866 - val_acc: 0.6149
Epoch 5/5
187/187 [==============================] - 41s 217ms/step - loss: 0.7282 - acc: 0.6960 - val_loss: 1.2988 - val_acc: 0.6297
Now, I extract the necessary info for the classification_report following the method suggested here . This gives me the following:
validation_steps_per_epoch = np.math.ceil(validation_generator.samples / validation_generator.batch_size)
predictions = model.predict_generator(validation_generator, steps=validation_steps_per_epoch)
# Get most likely class
predicted_classes = np.argmax(predictions, axis=1)
true_classes = validation_generator.classes
class_labels = list(validation_generator.class_indices.keys())
Finally, I output the classification report using:
from sklearn.metrics import classification_report
report = classification_report(true_classes, predicted_classes, target_names=class_labels)
Which results in zeros all over the place (see avgs. below):
precision recall f1-score support
micro avg 0.01 0.01 0.01 2100
macro avg 0.01 0.01 0.01 2100
weighted avg 0.01 0.01 0.01 2100

Constant Validation Accuracy with a high loss in machine learning

I'm currently trying to do create an image classification model using Inception V3 with 2 classes. I have 1428 images which are balanced about 70/30. When I run my model I get a pretty high loss of as well as a constant validation accuracy. What might be causing this constant value?
data = np.array(data, dtype="float")/255.0
labels = np.array(labels,dtype ="uint8")
(trainX, testX, trainY, testY) = train_test_split(
img_width, img_height = 320, 320 #InceptionV3 size
train_samples = 1145
validation_samples = 287
epochs = 20
batch_size = 32
base_model = keras.applications.InceptionV3(
weights ='imagenet',
input_shape = (img_width,img_height,3))
model_top = keras.models.Sequential()
model_top.add(keras.layers.GlobalAveragePooling2D(input_shape=base_model.output_shape[1:], data_format=None)),
model_top.add(keras.layers.Dense(1,activation = 'sigmoid'))
model = keras.models.Model(inputs = base_model.input, outputs = model_top(base_model.output))
for layer in model.layers[:30]:
layer.trainable = False
model.compile(optimizer = keras.optimizers.Adam(
#Image Processing and Augmentation
train_datagen = keras.preprocessing.image.ImageDataGenerator(
zoom_range = 0.05,
#width_shift_range = 0.05,
height_shift_range = 0.05,
horizontal_flip = True,
vertical_flip = True,
fill_mode ='nearest')
val_datagen = keras.preprocessing.image.ImageDataGenerator()
train_generator = train_datagen.flow(
validation_generator = val_datagen.flow(
history = model.fit_generator(
steps_per_epoch = train_samples//batch_size,
epochs = epochs,
validation_data = validation_generator,
validation_steps = validation_samples//batch_size,
callbacks = [ModelCheckpoint])
This is my log when I run my model:
Epoch 1/20
35/35 [==============================]35/35[==============================] - 52s 1s/step - loss: 0.6347 - acc: 0.6830 - val_loss: 0.6237 - val_acc: 0.6875
Epoch 2/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6364 - acc: 0.6756 - val_loss: 0.6265 - val_acc: 0.6875
Epoch 3/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6420 - acc: 0.6743 - val_loss: 0.6254 - val_acc: 0.6875
Epoch 4/20
35/35 [==============================]35/35 [==============================] - 14s 414ms/step - loss: 0.6365 - acc: 0.6851 - val_loss: 0.6289 - val_acc: 0.6875
Epoch 5/20
35/35 [==============================]35/35 [==============================] - 14s 411ms/step - loss: 0.6359 - acc: 0.6727 - val_loss: 0.6244 - val_acc: 0.6875
Epoch 6/20
35/35 [==============================]35/35 [==============================] - 15s 415ms/step - loss: 0.6342 - acc: 0.6862 - val_loss: 0.6243 - val_acc: 0.6875
I think you have too low learning rate and too few epochs. try with lr = 0.001 and epochs = 100.
Your accuracy is 68.25%. Given that your classes are split roughly 70/30 it is likely that your model is just predicting the same thing every time, ignoring the input. That would give the accuracy you are seeing. Your model has not yet learned from your data.
As Novak said, your learning rate seems very low, so maybe try increasing that first to see if that helps.

