CNN having high overfitting despite having dropout layers? - python

For some background, my dataset is roughly 75000+ images, 200x200 greyscale, with 26 classes (the letters of the alphabet). My model is:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=[tf.keras.metrics.CategoricalAccuracy()])
model.fit(X_train, y_train, epochs=1, batch_size=64, verbose=1, validation_data=(X_test, y_test))
The output of the model.fit is:
Train on 54600 samples, validate on 23400 samples
Epoch 1/1
54600/54600 [==============================] - 54s 984us/step - loss: nan - categorical_accuracy: 0.9964 - val_loss: nan - val_categorical_accuracy: 0.9996
99.9+ valadiation accuracy. When I run a test, it gets all the predictions incorrect. So, I assume it is overfitting. Why is this happening, despite adding the dropout layers? What other options do I have to fix this? Thank you!

The only way you would get all the predictions on a held-out test set incorrect while simultaneously getting almost 100% on validation accuracy is if you have a data leak. i.e. Your training data must contain the same images as your validation data (or they are VERY similar to the point of being identical).
Or the data in your test set is very different than your training and validation datasets.
To fix this ensure that across all your datasets no single image exists in more than one of the datasets. Also ensure that the images are generally similar. i.e. if training using cell phone photos, do not then test with images taken using a DSLR or images that have watermarks pulled from Google.
It is also odd that your loss is nan. It may be due to using categorical accuracy. To fix this just put the metric to be 'accuracy'. This will dynamically determine the best accuracy to use. One of [binary, categorical or sparse_categorical].
Hope this helps.

It's totally not overfiting, look at your loss it's equal to nan. That means your gradients have exploded during the training. To see what's really happening i recommend you look at the loss after every mini-batch and see at what point the loss becomes nan.

Related

CNN model did not learn anything from the training data. Where are the mistakes I made?

The shape of the train/test data is (samples, 256, 256, 1). The training dataset has around 1400 samples, the validation dataset has 150 samples, and the test dataset has 250 samples. Then I build a CNN model for a six-object classification task. However, no matter how hard I tuning the parameters and add/remove layers(conv&dense), I get a chance level of accuracy all the time (around 16.5%). Thus, I would like to know whether I made some deadly mistakes while building the model. Or there is something wrong with the data itself, not the CNN model.
Code:
def build_cnn_model(input_shape, activation='relu'):
model = Sequential()
# 3 Convolution layer with Max polling
model.add(Conv2D(64, (5, 5), activation=activation, padding = 'same', input_shape=input_shape))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(256, (5, 5), activation=activation, padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
# 3 Full connected layer
model.add(Dense(1024, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(512, activation = activation))
model.add(Dropout(0.5))
model.add(Dense(6, activation = 'softmax')) # 6 classes
# summarize the model
print(model.summary())
return model
def compile_and_fit_model(model, X_train, y_train, X_vali, y_vali, batch_size, n_epochs, LR=0.01):
# compile the model
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=LR),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
# fit the model
history = model.fit(x=X_train,
y=y_train,
batch_size=batch_size,
epochs=n_epochs,
verbose=1,
validation_data=(X_vali, y_vali))
return model, history
I transformed the MEG data my professor recorded into Magnitude Scalogram using CWT. pywt.cwt(data, scales, wavelet) was used. And if I plot the coefficients I got from cwt, I will have a graph like this (I emerged 62 channels into one graph). enter image description here
I used the coefficients as train/test data for the CNN model. However, I tuned the parameters and tried to add/remove layers for the CNN model, and the classification accuracy was unchanged. Thus, I want to know where I made mistakes. Did I make mistakes with building the CNN model, or did I make mistakes with CWT (the way I handled data)?
Please give me some advices, thank you.
How is the accuracy of the training data? If you have a small dataset and the model does not overfit after training for a while, then something is wrong with the model. You can also test with existing datasets, which the model should be able to handle (like Fashion MNIST).
Testing if you handled the data correctly is harder. Did you write unit tests for the different steps in the preprocessing pipeline?

Tensorflow Keras - High accuracy during training, low accuracy during prediction

I have a very basic multiclass CNN model for classifying vehicles into 4 classes [pickup, sedan, suv, van] that I have written using Tensorflow 2.0 tf.keras:
he_initialiser = tf.keras.initializers.VarianceScaling()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=(3,128,128), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format=cfg_data_fmt))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu', padding='same', data_format='channels_first', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.MaxPooling2D((2, 2), data_format='channels_first'))
model.add(tf.keras.layers.Flatten(data_format='channels_first'))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer=he_initialiser))
model.add(tf.keras.layers.Dense(4, activation='softmax', kernel_initializer=he_initialiser))
I use the following configuration for training:
Image size: 3x128x128 (planar data)
Number of epochs: 45
Batch size: 32
Loss function: tf.keras.losses.CategoricalCrossentropy(from_logits=True)
Optimizer: optimizer=tf.optimizers.Adam
training data size: 67.5% of all data
validation data size: 12.5% of all data
test data size: 20% of all data
I have an unbalanced dataset, which has the following distribution:
pickups: 1202
sedans: 1954
suvs: 2510
vans: 196
For this reason I have used class weights to mitigate this imbalance:
pickup_weight: 4.87
sedan_weight: 3.0
suv_weight: 2.33
van_weight: 30.0
This seems like a small dataset but I am using this for fine tuning since I first train the model on a larger dataset of 16k images of these classes, though with images of vehicles taken from different angles as compared to my fine tune dataset.
Now the questions that I'm having stem from the following observations:
At the end of the final epoch, the results returned by model.fit gave:
training accuracy of 0.9229
training loss of 3.5055
validation accuracy of 0.7906
validation loss of 0.9382
training precision for class pickup of 0.9186
training precision for class sedan of 0.9384
training precision for class suv of 0.9196
training precision for class van of 0.8378
validation precision for class pickup of 0.7805
validation precision for class sedan of 0.8026
validation precision for class suv of 0.0.8029
validation precision for class van of 0.4615
The results returned by model.evaluate on my hold-out test set after training gave similar accuracy and loss values as the corresponding validation values in the last epoch and the precision values for each class were also nearly identical to the corresponding validation precisions.
The lower, but still high enough, validation accuracy leads me to believe there is no overfitting problem as the model can generalize.
My first question is how can the validation loss be so much lower than the training loss?
Furthermore, when I created a confusion matrix using:
test_images = np.array([x[0].numpy() for x in list(labeled_ds_test)])
test_labels = np.array([x[1].numpy() for x in list(labeled_ds_test)])
test_predictions = model.predict(test_images, batch_size=32)
print(tf.math.confusion_matrix(tf.argmax(test_labels, 1), tf.argmax(test_predictions, 1)))
The results I got back were:
tf.Tensor(
[[ 42 85 109 3]
[ 72 137 177 4]
[ 91 171 228 11]
[ 9 12 16 1]], shape=(4, 4), dtype=int32)
This shows an accuracy of only 35%!!
My second question is therefore this: how can the accuracy given by model.predict be so small when during training and evaluation the values seemed to indicate that my model was quite precise with its predictions?
Am I using the predict method wrong or is my theoretical understanding of what's expected to happen completely off?
I am at a bit of a loss here and would greatly appreciate any feedback. Thanks for reading this.
I aggree #gallen. There are several reason that can cause overfitting and several methods for preventing overfitting. One of the good solutions is adding dropout between layers. You can see stackoverflow answer and towardsdatascience article
There is an overfitting of course but let's answer the questions.
For the first question the low number of validation data plays a role why it's loss is less than the training data as the loss is the sum of all differences in y_true and y_pred.
As for the second question how can the test accuracy be lower than the expected even if validation doesn't show any sign of overfitting?
The distribution of the validation set must be the same as the test set for it not to be miss leading.
So my advice is check the distribution of the train, validation, test datasets separately. make sure that they are the same.
you need to divide your dataset properly like, 70% training and 30% validation and then check your model on new set of data as test data this might be helpful as machine learning is all about trial and error.

Significance of loss in classification with Keras

I am training a three layer neural network with keras:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(128, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(layers.Flatten())
model.add(layers.Dense(neurons, activation='relu', kernel_regularizer=l2(reg)))
model.add(Dropout(0.50))
model.add(Dense(2))
model.add(Activation("softmax"))
My data has two classes, and I am using sparse categorical cross entropy:
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
history = model.fit(x=X, y=y, batch_size=batch_size, epochs=epochs, validation_data=(X_val, y_val),
shuffle=True,
callbacks=callbacks,
verbose=1)
My data has the following shape:
X: (232, 100, 150, 3)
y: (232,)
Where X are images and y is either 1 or 0, because of using the sparse loss function
The loss is very high for both accuracy and validation, even if the training accuracy is 1! I get values over 20 for the loss, which I understand are not reasonable.
If I set the model to try for a few epochs and output the predictions for the labels and the true values, and I get the categorical cross entropy from them, the value I get is <1, as expected, even when I make the calculation with keras' function (I change to categorical because the sparse gives an error)
21/21 [==============================] - 7s 313ms/step - loss: 44.1764 - acc: 1.0000 - val_loss: 44.7084 - val_acc: 0.7857
cce = tf.keras.losses.CategoricalCrossentropy()
pred = model.predict(x=X_val, batch_size=len(X_val))
loss = cce(true_categorical, pred)
Categorical loss 0.6077293753623962
Is there a way to know exactly how this is calculated and why the high values? Batch size is 8.
The loss printed by Keras is the total loss.
Regularization is also a loss added to the model based on the value of the weights.
Since you have a lot of weights, you also have a lot of contributions to the total loss.
That is why it's big.
If you remove the regularization, you will see the final loss equal to the categorical crossentropy loss.

Getting strange accuracy in sentiment analysis with keras

I'm working on a sentiment analysis project in python with word2vec as an embedding method. (in my NON_ENGLISH corpus I considered every negative tweets with the 0 label, positive = 1 and neutral = 2) I have 2 questions.
**assuming that my corpus is compeletely balance and I set 9000 tweets for train and 900 for test
1.8900/8900 [==============================] - 15s 2ms/step - loss: 0.5896 - acc: 0.6330 - val_loss: 0.0000e+00 - val_acc: 1.0000
As you can see, the validation accuracy (val_acc) is 1.0000 !!!!!!!!
2.While val_acc is 1 my model predicts all sentences negative!! How Can I solve it?
nb_epochs = 100
batch_size = 32
model = Sequential()
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same', input_shape=(max_tweet_length,vector_size)))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Dropout(0.25))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Dropout(0.25))
model.add(Dense(256, activation='tanh'))
model.add(Dense(256, activation='tanh'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))
thank you
There are several issues with your question; attempting to offer a general answer here, since there are parts of your process you do not show.
assuming that my corpus is compeletely balance and I set 9000 tweets for train and 900 for test
This is not what your output shows; the Keras output during training
8900/8900
clearly says that your training set consists of 8900 samples, and not of 9000 - 900 = 8100, as you claim. So, if your initial dataset is indeed 9000 samples, this leaves you with only 100 samples for the validation set. Very small validation sets are not a good idea, and in extreme cases they can lead to spurious reported accuracies, like here (notice that, not only your validation accuracy is a perfect 1.0, it is also significantly higher than your training accuracy).
Additionally to the above, I have seen similar cases when there are duplicates in the initial data; in such a case, a random split procedure can easily lead a sample being in the training set, while its duplicate(s) being also present in the validation set, further jeopardizing the whole process and leading to absurd results. So, check for duplicates and remove them before splitting.
Finally, as #today remarks in the comments, since you have 3 classes, your output layer should have 3 units, not 2; this seems irrelevant to your issue, but I am not sure it really is...

Keras - text classification, overfitting, and how to improve my model?

i am developing a text classification neural network
based on this two articles - https://github.com/jiegzhan/multi-class-text-classification-cnn-rnn
https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
For the training i am using, text data in Russian language (language essentially doesn't matter,because text contains a lot of special professional terms, and sadly to employ existing word2vec won't be an option.)
I have such parameters of training data -
Maximum lengths of an article - 969 words
Size of vocabulary - 53886
Amount of labels - 12 (sadly they are distributed quite unevenly, for instance i have first label - and have around 5000 examples of this, and second contains only 1500 examples.)
Amount of training data set - Only 9876 entries. I'ts the biggest problem, because sadly i can't increase size of the training set by any means (only way out to wait another year☻, but even it will only make twice the size of training date, and even double amount is'not enough)
Here is my code -
x, x_test, y, y_test = train_test_split(x_, y_, test_size=0.1)
x_train, x_dev, y_train, y_dev = train_test_split(x, y, test_size=0.1)
embedding_vecor_length = 100
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=5, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=7, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=9, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=12, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(Conv1D(filters=32, kernel_size=15, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(LSTM(200,dropout=0.3, recurrent_dropout=0.3))
model.add(Dense(labels_count, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, epochs=25, batch_size=30)
scores = model.evaluate(x_, y_)
I tried different parameters and it gets really high accuracy in training (up to 98%)
But i really performs badly on test set. Maximum that i managed to achieve was around 74%, usual result something around 64%
And the best result was achieved with small embedding_vecor_length and small batch_size.
I know - that my test set is only 10 percent of training test, and overall data-set is the biggest problem, but i want to find a way around this problem.
So my questions are -
1) Is it correctly builded model for text classification purpose? (it works)
Do i need to use simultaneous convolution an merge results instead?
I just don't get how the text information doesn't get lost in the process of convolution with different filter sized (like in my example)
Can you explain hot the convolution works with text data?
There are mainly articles about image recognition..
2)i obliviously got a problem with overfitting my model. How can i make the performance better?
I have already added Dropout layers. What can i do next?
3)May be i need something different? I mean pure RNN without convolution?

Categories

Resources