Getting strange accuracy in sentiment analysis with keras - python

I'm working on a sentiment analysis project in python with word2vec as an embedding method. (in my NON_ENGLISH corpus I considered every negative tweets with the 0 label, positive = 1 and neutral = 2) I have 2 questions.
**assuming that my corpus is compeletely balance and I set 9000 tweets for train and 900 for test
1.8900/8900 [==============================] - 15s 2ms/step - loss: 0.5896 - acc: 0.6330 - val_loss: 0.0000e+00 - val_acc: 1.0000
As you can see, the validation accuracy (val_acc) is 1.0000 !!!!!!!!
2.While val_acc is 1 my model predicts all sentences negative!! How Can I solve it?
nb_epochs = 100
batch_size = 32
model = Sequential()
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same', input_shape=(max_tweet_length,vector_size)))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=3, activation='elu', padding='same'))
model.add(Dropout(0.25))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Conv1D(32, kernel_size=2, activation='elu', padding='same'))
model.add(Dropout(0.25))
model.add(Dense(256, activation='tanh'))
model.add(Dense(256, activation='tanh'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(2, activation='softmax'))
thank you

There are several issues with your question; attempting to offer a general answer here, since there are parts of your process you do not show.
assuming that my corpus is compeletely balance and I set 9000 tweets for train and 900 for test
This is not what your output shows; the Keras output during training
8900/8900
clearly says that your training set consists of 8900 samples, and not of 9000 - 900 = 8100, as you claim. So, if your initial dataset is indeed 9000 samples, this leaves you with only 100 samples for the validation set. Very small validation sets are not a good idea, and in extreme cases they can lead to spurious reported accuracies, like here (notice that, not only your validation accuracy is a perfect 1.0, it is also significantly higher than your training accuracy).
Additionally to the above, I have seen similar cases when there are duplicates in the initial data; in such a case, a random split procedure can easily lead a sample being in the training set, while its duplicate(s) being also present in the validation set, further jeopardizing the whole process and leading to absurd results. So, check for duplicates and remove them before splitting.
Finally, as #today remarks in the comments, since you have 3 classes, your output layer should have 3 units, not 2; this seems irrelevant to your issue, but I am not sure it really is...

Related

improve Keras model accuracy Conv1D and LSTM

I've a dataset where I need to predict the target, that it is 0 or 1,
for me is good to know the prediction is near to 0, like 0.20 or near to 1, like 0.89 and so on.
my model structure is this:
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(128, return_sequences=True, recurrent_dropout=0.2,activation='relu'))
model.add(Dense(128, activation="relu",
kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(64, return_sequences=True,activation='relu'))
model.add(Dense(64, activation="relu",kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(Conv1D(filters=32, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=1, strides=1))
model.add(LSTM(32, return_sequences=True, recurrent_dropout=0.2, activation='relu'))
model.add(Dense(32, activation="relu",kernel_regularizer=regularizers.l1_l2(l1=1e-5, l2=1e-4),
bias_regularizer=regularizers.l2(1e-4),
activity_regularizer=regularizers.l2(1e-5)))
model.add(Dropout(0.4))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))
from keras.metrics import categorical_accuracy
model.compile(optimizer='rmsprop',loss="mse",metrics=['accuracy'])
model.fit(X_train,y_train,epochs=1000, batch_size=16, verbose=1, validation_split=0.1, callbacks=callback)
Summary of model is here: https://pastebin.com/Ba6ErEzj
Verbosity on training is:
Epoch 58/1000
277/277 [==============================] - 1s 5ms/step - loss: 0.2510 - accuracy: 0.4937 - val_loss: 0.2523 - val_accuracy: 0.4878
Epoch 59/1000
277/277 [==============================] - 1s 5ms/step - loss: 0.2515 - accuracy: 0.4941 - val_loss: 0.2504 - val_accuracy: 0.5122
How can I improve that? accuracy around 0.50 on 0 or 1 output is useless.
This is my Colab code.
To wrap-up suggestions (some already offered in the comments), with some justification...
Mistakes. You are in a binary classification setting, so:
Using MSE is wrong; you should use loss='binary_crossentropy'
In your last single-node layer, you should use activation='sigmoid'.
Best practices. Things like dropout, batch normalization, and kernel & bial regularizers are used for regularization, i.e. (roughly speaking) to avoid overfitting. They should not be used by default, and doing so is well-known to prevent learning (as it seems to be the case here):
Remove all dropout layers
Remove all batch normalization layers
Remove all kernel, bias, and activity regularizers.
You can consider adding some of these back step by step later, but only if you see signs of overfitting.
General advice. Nowadays, usually the first choice for an optimizer is Adam, so change to optimizer='adam' as a first approach.
That said, at the end of the day, everything depends on your data (both their quantity & quality) and the particular problem to be addressed. Experimentation is king (but keeping in mind the general principles stated above).

Why is my val_loss going down but my val_accuracy stagnates

So I train my model with a dataset and for each epoch I can see the loss and val_loss go down (it is important to note that val_loss will go down to a certain point, but then it stagnates as well, having some minor ups and downs) and accuracy go up but for some reason my val_accuracy stays at roughly 0.33.
I browsed this and it seems to be a problem of overfitting so i added Dropout layers and regularization by using l2 on some layers of the model but it seems to have no effect. Therefore I would like to ask you what do you think I could improve in my model in order to make the val_loss keep going down and my val_accuracy not stagnate and therefore keep going up.
I've tried to use more images but the problem seems to be the same.. Not sure if my increment of images was enough tho.
Should I add Dropout layers in the Conv2D layers?
Should I use less or more l2 regularization?
Should I use even more images?
Just some questions that might have something to do with my problem.
My model is below:
model = Sequential()
model.add(Conv2D(16, kernel_size=(3, 3), input_shape=(580, 360, 1), padding='same', activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.02)))
model.add(BatchNormalization())
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.02)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.05)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(532, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(266, activation='softmax'))
model.add(Reshape((7, 38)))
print(model.summary())
optimizer = keras.optimizers.SGD(lr=0.00001)
model.compile(optimizer='SGD', loss='categorical_crossentropy', metrics=['accuracy'])
Thanks in advance!
PS: Here is the graph of training:
PS2: Here is the end of training:
209/209 [==============================] - 68s 327ms/step - loss: 0.7421 - accuracy: 0.9160 - val_loss: 3.8159 - val_accuracy: 0.3152
Epoch 40/40
This seems to be a classic overfitting problem.
It would be nice to have a more detailed introduction to the problem, like is it a classification task? Are your images grayscale? What is the purpose of this network?
With this information I would say that any proper regularization to the network should help. Some item you could try:
For conv layers I recommend using SpatialDropout layers.
Get more data (if possible)
Use data augmentation (if possible)
Increase the rate of the dropout layers
Try reducing the complexity of your model architecture (maybe fewer layers, fewer number of filters in general, fewer number of neurons in dense layers, etc.)
Hope this helps!
Just a hint :
You have a problem with your CNN architecture, the size must be lower and lower at each convolution, but in your case it is growing: you have 16, 32, 64, 64, 128. You should do that in the reverse manner. Start from input_shape=(580,360) and then you may go, let us say to shapes 256, 128, 64, 32 for Conv2D.

CNN having high overfitting despite having dropout layers?

For some background, my dataset is roughly 75000+ images, 200x200 greyscale, with 26 classes (the letters of the alphabet). My model is:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(200, 200, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(26, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=[tf.keras.metrics.CategoricalAccuracy()])
model.fit(X_train, y_train, epochs=1, batch_size=64, verbose=1, validation_data=(X_test, y_test))
The output of the model.fit is:
Train on 54600 samples, validate on 23400 samples
Epoch 1/1
54600/54600 [==============================] - 54s 984us/step - loss: nan - categorical_accuracy: 0.9964 - val_loss: nan - val_categorical_accuracy: 0.9996
99.9+ valadiation accuracy. When I run a test, it gets all the predictions incorrect. So, I assume it is overfitting. Why is this happening, despite adding the dropout layers? What other options do I have to fix this? Thank you!
The only way you would get all the predictions on a held-out test set incorrect while simultaneously getting almost 100% on validation accuracy is if you have a data leak. i.e. Your training data must contain the same images as your validation data (or they are VERY similar to the point of being identical).
Or the data in your test set is very different than your training and validation datasets.
To fix this ensure that across all your datasets no single image exists in more than one of the datasets. Also ensure that the images are generally similar. i.e. if training using cell phone photos, do not then test with images taken using a DSLR or images that have watermarks pulled from Google.
It is also odd that your loss is nan. It may be due to using categorical accuracy. To fix this just put the metric to be 'accuracy'. This will dynamically determine the best accuracy to use. One of [binary, categorical or sparse_categorical].
Hope this helps.
It's totally not overfiting, look at your loss it's equal to nan. That means your gradients have exploded during the training. To see what's really happening i recommend you look at the loss after every mini-batch and see at what point the loss becomes nan.

Significance of loss in classification with Keras

I am training a three layer neural network with keras:
model = models.Sequential()
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(Conv2D(128, (3, 3), padding="same",
input_shape=input_shape, strides=2, kernel_regularizer=l2(reg)))
model.add(BatchNormalization(axis=channels))
model.add(Activation("relu"))
model.add(layers.Flatten())
model.add(layers.Dense(neurons, activation='relu', kernel_regularizer=l2(reg)))
model.add(Dropout(0.50))
model.add(Dense(2))
model.add(Activation("softmax"))
My data has two classes, and I am using sparse categorical cross entropy:
model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
history = model.fit(x=X, y=y, batch_size=batch_size, epochs=epochs, validation_data=(X_val, y_val),
shuffle=True,
callbacks=callbacks,
verbose=1)
My data has the following shape:
X: (232, 100, 150, 3)
y: (232,)
Where X are images and y is either 1 or 0, because of using the sparse loss function
The loss is very high for both accuracy and validation, even if the training accuracy is 1! I get values over 20 for the loss, which I understand are not reasonable.
If I set the model to try for a few epochs and output the predictions for the labels and the true values, and I get the categorical cross entropy from them, the value I get is <1, as expected, even when I make the calculation with keras' function (I change to categorical because the sparse gives an error)
21/21 [==============================] - 7s 313ms/step - loss: 44.1764 - acc: 1.0000 - val_loss: 44.7084 - val_acc: 0.7857
cce = tf.keras.losses.CategoricalCrossentropy()
pred = model.predict(x=X_val, batch_size=len(X_val))
loss = cce(true_categorical, pred)
Categorical loss 0.6077293753623962
Is there a way to know exactly how this is calculated and why the high values? Batch size is 8.
The loss printed by Keras is the total loss.
Regularization is also a loss added to the model based on the value of the weights.
Since you have a lot of weights, you also have a lot of contributions to the total loss.
That is why it's big.
If you remove the regularization, you will see the final loss equal to the categorical crossentropy loss.

Convolutional Neural Network with Non-Existent Loss and Accuracy of 0

I am attempting to train a simple convolutional neural network shown below.
model= Sequential()
model.add(Conv1D(32, 3, padding='same', input_shape=(700,7))
model.add(Activation('relu'))
model.add(Conv1D(32,3))
model.add(Activation('relu'))
model.add(MaxPooling1D())
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
I fit it using 100 epochs and a validation training split 0.2 on input data shaped [1000L, 700L, 7L]. Every single one of my epochs displaed the following:
loss: nan - acc:0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
So my question is, what went wrong and how do I fix it? Is the problem with the network or how my data is being inputed and fitted to the model?

Categories

Resources