Tensorflow Keras - High accuracy during training, low accuracy during prediction - python

I have a very basic multiclass CNN model for classifying vehicles into 4 classes [pickup, sedan, suv, van] that I have written using Tensorflow 2.0 tf.keras:
he_initialiser = tf.keras.initializers.VarianceScaling()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=(3,128,128), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format='channels_first'))
model.add(tf.keras.layers.Flatten(data_format='channels_first'))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(4, activation='softmax', kernel_initializer=he_initialiser))
I use the following configuration for training:
Image size: 3x128x128 (planar data)
Number of epochs: 45
Batch size: 32
Loss function: tf.keras.losses.CategoricalCrossentropy(from_logits=True)
Optimizer: optimizer=tf.optimizers.Adam
training data size: 67.5% of all data
validation data size: 12.5% of all data
test data size: 20% of all data
I have an unbalanced dataset, which has the following distribution:
pickups: 1202
sedans: 1954
suvs: 2510
vans: 196
For this reason I have used class weights to mitigate this imbalance:
pickup_weight: 4.87
sedan_weight: 3.0
suv_weight: 2.33
van_weight: 30.0
This seems like a small dataset but I am using this for fine tuning since I first train the model on a larger dataset of 16k images of these classes, though with images of vehicles taken from different angles as compared to my fine tune dataset.
Now the questions that I'm having stem from the following observations:
At the end of the final epoch, the results returned by model.fit gave:
training accuracy of 0.9229
training loss of 3.5055
validation accuracy of 0.7906
validation loss of 0.9382
training precision for class pickup of 0.9186
training precision for class sedan of 0.9384
training precision for class suv of 0.9196
training precision for class van of 0.8378
validation precision for class pickup of 0.7805
validation precision for class sedan of 0.8026
validation precision for class suv of 0.0.8029
validation precision for class van of 0.4615
The results returned by model.evaluate on my hold-out test set after training gave similar accuracy and loss values as the corresponding validation values in the last epoch and the precision values for each class were also nearly identical to the corresponding validation precisions.
The lower, but still high enough, validation accuracy leads me to believe there is no overfitting problem as the model can generalize.
My first question is how can the validation loss be so much lower than the training loss?
Furthermore, when I created a confusion matrix using:
test_images = np.array([x[0].numpy() for x in list(labeled_ds_test)])
test_labels = np.array([x[1].numpy() for x in list(labeled_ds_test)])
test_predictions = model.predict(test_images, batch_size=32)
print(tf.math.confusion_matrix(tf.argmax(test_labels, 1), tf.argmax(test_predictions, 1)))
The results I got back were:
tf.Tensor(
[[ 42 85 109 3]
[ 72 137 177 4]
[ 91 171 228 11]
[ 9 12 16 1]], shape=(4, 4), dtype=int32)
This shows an accuracy of only 35%!!
My second question is therefore this: how can the accuracy given by model.predict be so small when during training and evaluation the values seemed to indicate that my model was quite precise with its predictions?
Am I using the predict method wrong or is my theoretical understanding of what's expected to happen completely off?
I am at a bit of a loss here and would greatly appreciate any feedback. Thanks for reading this.

I aggree #gallen. There are several reason that can cause overfitting and several methods for preventing overfitting. One of the good solutions is adding dropout between layers. You can see stackoverflow answer and towardsdatascience article

There is an overfitting of course but let's answer the questions.
For the first question the low number of validation data plays a role why it's loss is less than the training data as the loss is the sum of all differences in y_true and y_pred.
As for the second question how can the test accuracy be lower than the expected even if validation doesn't show any sign of overfitting?
The distribution of the validation set must be the same as the test set for it not to be miss leading.
So my advice is check the distribution of the train, validation, test datasets separately. make sure that they are the same.

you need to divide your dataset properly like, 70% training and 30% validation and then check your model on new set of data as test data this might be helpful as machine learning is all about trial and error.

Related

CNN model did not learn anything from the training data. Where are the mistakes I made?

The shape of the train/test data is (samples, 256, 256, 1). The training dataset has around 1400 samples, the validation dataset has 150 samples, and the test dataset has 250 samples. Then I build a CNN model for a six-object classification task. However, no matter how hard I tuning the parameters and add/remove layers(conv&dense), I get a chance level of accuracy all the time (around 16.5%). Thus, I would like to know whether I made some deadly mistakes while building the model. Or there is something wrong with the data itself, not the CNN model.
Code:
def build_cnn_model(input_shape, activation='relu'):
model = Sequential()
# 3 Convolution layer with Max polling
model.add(Conv2D(64, (5, 5), activation=activation, padding = 'same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(256, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
# 3 Full connected layer
model.add(Dense(1024, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(512, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(6, activation = 'softmax')) # 6 classes
# summarize the model
print(model.summary())
return model
def compile_and_fit_model(model, X_train, y_train, X_vali, y_vali, batch_size, n_epochs, LR=0.01):
# compile the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# fit the model
history = model.fit(x=X_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
verbose=1,
validation_data=(X_vali, y_vali))
return model, history
I transformed the MEG data my professor recorded into Magnitude Scalogram using CWT. pywt.cwt(data, scales, wavelet) was used. And if I plot the coefficients I got from cwt, I will have a graph like this (I emerged 62 channels into one graph). enter image description here
I used the coefficients as train/test data for the CNN model. However, I tuned the parameters and tried to add/remove layers for the CNN model, and the classification accuracy was unchanged. Thus, I want to know where I made mistakes. Did I make mistakes with building the CNN model, or did I make mistakes with CWT (the way I handled data)?
Please give me some advices, thank you.
How is the accuracy of the training data? If you have a small dataset and the model does not overfit after training for a while, then something is wrong with the model. You can also test with existing datasets, which the model should be able to handle (like Fashion MNIST).
Testing if you handled the data correctly is harder. Did you write unit tests for the different steps in the preprocessing pipeline?

Sigmoid activation output layer produce Many near-1 value

:)
I have a Datset of ~16,000 .wav recording from 70 bird species.
I'm training a model using tensorflow to classify the mel-spectrogram of these recordings using Convolution based architectures.
One of the architectures used is simple multi-layer convolutional described below.
The pre-processing phase include:
extract mel-spectrograms and convert to dB Scale
segment audio to 1-second segment (pad with zero Or gaussian noise if residual is longer than 250ms, discard otherwise)
z-score normalization of training data - reduce mean and divide result by std
pre-processing while inference:
same as described above
z-score normalization BY training data - reduce mean (of training) and divide result by std (of training data)
I understand that the output layer's probabilities with sigmoid activation is not suppose to accumulate to 1, But I get many (8-10) very high prediction (~0.999) probabilities. and some is exactly 0.5.
The current test set correct classification rate is ~84%, tested with 10-fold cross validation, So it seems that the the network mostly operates well.
notes:
1.I understand there are similar features in the vocalization of different birds species, but the recieved probabilities doesn't seem to reflect them correctly
2. probabilities for example - a recording of natural noise:
Natural noise: 0.999
Mallard - 0.981
I'm trying to understand the reason for these results, if it's related the the data etc extensive mislabeling (probably not) or from another source.
Any help will be much appreciated! :)
EDIT: I use sigmoid because the probabilities of all classes are necessary, and I don't need them to accumulate to 1.
def convnet1(input_shape, numClasses, activation='softmax'):
# Define the network
model = tf.keras.Sequential()
model.add(InputLayer(input_shape=input_shape))
# model.add(Augmentations1(p=0.5, freq_type='mel', max_aug=2))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 1)))
model.add(Conv2D(128, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (5, 5), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Flatten())
# model.add(Dense(numClasses, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(numClasses, activation='sigmoid'))
model.compile(
loss='categorical_crossentropy',
metrics=['accuracy'],
optimizer=optimizers.Adam(learning_rate=0.001),
run_eagerly=False) # this parameter allows to debug and use regular functions inside layers: print(), save() etc..
return model
For future searches - this problem was solved, and the reason was found(!).
The initial batch size that was used was 256 or 512. reducing the batch size to 16 or 32 SOLVED THE PROBLEM, and now the difference in probabilities are as expected for training AND test set samples - very high for the correct label and very low for other classes.

CNN having high overfitting despite having dropout layers?

For some background, my dataset is roughly 75000+ images, 200x200 greyscale, with 26 classes (the letters of the alphabet). My model is:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=[tf.keras.metrics.CategoricalAccuracy()])
model.fit(X_train, y_train, epochs=1, batch_size=64, verbose=1, validation_data=(X_test, y_test))
The output of the model.fit is:
Train on 54600 samples, validate on 23400 samples
Epoch 1/1
54600/54600 [==============================] - 54s 984us/step - loss: nan - categorical_accuracy: 0.9964 - val_loss: nan - val_categorical_accuracy: 0.9996
99.9+ valadiation accuracy. When I run a test, it gets all the predictions incorrect. So, I assume it is overfitting. Why is this happening, despite adding the dropout layers? What other options do I have to fix this? Thank you!
The only way you would get all the predictions on a held-out test set incorrect while simultaneously getting almost 100% on validation accuracy is if you have a data leak. i.e. Your training data must contain the same images as your validation data (or they are VERY similar to the point of being identical).
Or the data in your test set is very different than your training and validation datasets.
To fix this ensure that across all your datasets no single image exists in more than one of the datasets. Also ensure that the images are generally similar. i.e. if training using cell phone photos, do not then test with images taken using a DSLR or images that have watermarks pulled from Google.
It is also odd that your loss is nan. It may be due to using categorical accuracy. To fix this just put the metric to be 'accuracy'. This will dynamically determine the best accuracy to use. One of [binary, categorical or sparse_categorical].
Hope this helps.
It's totally not overfiting, look at your loss it's equal to nan. That means your gradients have exploded during the training. To see what's really happening i recommend you look at the loss after every mini-batch and see at what point the loss becomes nan.

Validation accuracy is low and not increasing while training accuracy is increasing

I am a newbie to Keras and machine learning in general. I’m trying to build a classification model using the Sequential model. After some experiments, I see that my validation accuracy behavior is very low and not increasing, although the training accuracy works well. I added regularization parameters to the layers and dropouts also in between the layers. Still, the behavior exists. Here’s my code.
from keras.regularizers import l2
model = keras.models.Sequential()
model.add(keras.layers.Conv1D(filters=32, kernel_size=1, strides=1, padding="SAME", activation="relu", input_shape=[512,1],kernel_regularizer=keras.regularizers.l2(l=0.1))) # 一定要加 input shape
keras.layers.Dropout=0.35
model.add(keras.layers.MaxPool1D(pool_size=1,activity_regularizer=l2(0.01)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation="softmax",activity_regularizer=l2(0.01)))
model.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
Ahistory = model.fit(train_x, trainy, epochs=300,
validation_split = 0.2,
batch_size = 16)
And here is the final results I got.
What is the reason behind this.? How do I fine-tune the model.?

Keras - text classification, overfitting, and how to improve my model?

i am developing a text classification neural network
based on this two articles - https://github.com/jiegzhan/multi-class-text-classification-cnn-rnn
https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
For the training i am using, text data in Russian language (language essentially doesn't matter,because text contains a lot of special professional terms, and sadly to employ existing word2vec won't be an option.)
I have such parameters of training data -
Maximum lengths of an article - 969 words
Size of vocabulary - 53886
Amount of labels - 12 (sadly they are distributed quite unevenly, for instance i have first label - and have around 5000 examples of this, and second contains only 1500 examples.)
Amount of training data set - Only 9876 entries. I'ts the biggest problem, because sadly i can't increase size of the training set by any means (only way out to wait another year☻, but even it will only make twice the size of training date, and even double amount is'not enough)
Here is my code -
x, x_test, y, y_test = train_test_split(x_, y_, test_size=0.1)
x_train, x_dev, y_train, y_dev = train_test_split(x, y, test_size=0.1)
embedding_vecor_length = 100
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=5, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=7, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=9, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=12, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=15, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(LSTM(200,dropout=0.3, recurrent_dropout=0.3))
model.add(Dense(labels_count, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, epochs=25, batch_size=30)
scores = model.evaluate(x_, y_)
I tried different parameters and it gets really high accuracy in training (up to 98%)
But i really performs badly on test set. Maximum that i managed to achieve was around 74%, usual result something around 64%
And the best result was achieved with small embedding_vecor_length and small batch_size.
I know - that my test set is only 10 percent of training test, and overall data-set is the biggest problem, but i want to find a way around this problem.
So my questions are -
1) Is it correctly builded model for text classification purpose? (it works)
Do i need to use simultaneous convolution an merge results instead?
I just don't get how the text information doesn't get lost in the process of convolution with different filter sized (like in my example)
Can you explain hot the convolution works with text data?
There are mainly articles about image recognition..
2)i obliviously got a problem with overfitting my model. How can i make the performance better?
I have already added Dropout layers. What can i do next?
3)May be i need something different? I mean pure RNN without convolution?

Categories

Resources