Keras: My model loss and accuracy randomly drop to zero

Keras: My model loss and accuracy randomly drop to zero - python

I have a rather complex sequence to sequence encoder decoder model. I run into an issue where my loss and accuracy drop to zero and I can't reproduce this error. It has nothing to do with the training data as it happens with different sets.
It seems to be learning as the loss slowly drops. Below is what it is like just before:
Epoch 1/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9825 - acc: 0.7077
Epoch 2/2
5000/5000 [==============================] - 235s 47ms/step - loss: 0.9443 - acc: 0.7177
And here is what is like during the next mode.fit() iteration:
Epoch 1/2
2882/2882 [==============================] - 136s 47ms/step - loss: 0.7033 - acc: 0.4399
Epoch 2/2
2882/2882 [==============================] - 136s 47ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
After this, the loss and accuracy remain the same:
Epoch 1/2
5000/5000 [==============================] - 278s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
Epoch 2/2
5000/5000 [==============================] - 279s 56ms/step - loss: 1.1921e-07 - acc: 0.0000e+00
The reason I have to train in such a manner is because I have variable input sizes and output sizes. So I have to make batches of my training data with fixed input size before I train.
sgd = optimizers.SGD(lr= 0.015, decay=0.002)
out2 = model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
I need to use curriculum learning to reach sentence level predictions, so I am doing the following:
I initially train my model to output "1 word + end" token. Training on this works fine. When i start to train on "2 words + end", this problem starts to arise.
After training on 1 word, I save the model. Then I define a new model with output size for 2 words, and use the following:
new_model = createModel(...,num_output_words)
new_model.set_weights(old_model.get_weights())
I have to do this as I can't define a model with variable output length.
I can provide more information if needed. I can't find any information online.

Related

High train accuracy poor test accuracy

I have a neural network which classify 3 output.My dataset is very small, I have 340 images for train, and 60 images for test. I build a model and when I compile at my result is this:
Epoch 97/100
306/306 [==============================] - 46s 151ms/step - loss: 0.2453 - accuracy: 0.8824 - val_loss: 0.3557 - val_accuracy: 0.8922
Epoch 98/100
306/306 [==============================] - 47s 152ms/step - loss: 0.2096 - accuracy: 0.9031 - val_loss: 0.3795 - val_accuracy: 0.8824
Epoch 99/100
306/306 [==============================] - 47s 153ms/step - loss: 0.2885 - accuracy: 0.8627 - val_loss: 0.4501 - val_accuracy: 0.7745
Epoch 100/100
306/306 [==============================] - 46s 152ms/step - loss: 0.1998 - accuracy: 0.9150 - val_loss: 0.4586 - val_accuracy: 0.8627
when I predict the test images, test accuracy is poor.
What should I do ? I also use ImageDatagenerator for data augmentation but the result is same.Is it because I have small dataset.

You can use Regularization on fully connected layers. But the fact that you already have high validation accuracy it's probably your data. your train data might not fully represent your test data. try to analyze that and make sure you do all the pre processing on the test data before testing as you did for the train data.

Why there's a bad accuracy on dataset when it's used both for validation and training?

I trained a model with ResNet50 and got an amazing accuracy of 95% on training set.
I took the same training set for validation and the accuracy seem very bad.(<0.05%)
from keras.preprocessing.image import ImageDataGenerator
train_set = ImageDataGenerator(horizontal_flip=True,rescale=1./255,shear_range=0.2,zoom_range=0.2).flow_from_directory(data,target_size=(256,256),classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=31)
from keras.applications import ResNet50
from keras.applications.resnet50 import preprocess_input
from keras import layers,Model
conv_base = ResNet50(
include_top=False,
weights='imagenet')
for layer in conv_base.layers:
layer.trainable = False
x = conv_base.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
predictions = layers.Dense(31, activation='softmax')(x)
model = Model(conv_base.input, predictions)
# here you will write the path for train data or if you create your val data then you can test using that too.
# test_dir = ""
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
data,
target_size=(256, 256), classes=['airplane','airport','baseball_diamond',
'basketball_court','beach','bridge',
'chaparral','church','circular_farmland',
'commercial_area','dense_residential','desert',
'forest','freeway','golf_course','ground_track_field',
'harbor','industrial_area','intersection','island',
'lake','meadow','medium_residential','mobile_home_park',
'mountain','overpass','parking_lot','railway','rectangular_farmland',
'roundabout','runway'],batch_size=1,shuffle=True)
model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
model.fit_generator(train_set,steps_per_epoch=1488//31,epochs=10,verbose=True,validation_data = test_generator,
validation_steps = test_generator.samples // 31)
Epoch 1/10
48/48 [==============================] - 27s 553ms/step - loss: 1.9631 - acc: 0.4825 - val_loss: 4.3134 - val_acc: 0.0208
Epoch 2/10
48/48 [==============================] - 22s 456ms/step - loss: 0.6395 - acc: 0.8212 - val_loss: 4.7584 - val_acc: 0.0833
Epoch 3/10
48/48 [==============================] - 23s 482ms/step - loss: 0.4325 - acc: 0.8810 - val_loss: 5.3852 - val_acc: 0.0625
Epoch 4/10
48/48 [==============================] - 23s 476ms/step - loss: 0.2925 - acc: 0.9153 - val_loss: 6.0963 - val_acc: 0.0208
Epoch 5/10
48/48 [==============================] - 23s 477ms/step - loss: 0.2275 - acc: 0.9341 - val_loss: 5.6571 - val_acc: 0.0625
Epoch 6/10
48/48 [==============================] - 23s 478ms/step - loss: 0.1855 - acc: 0.9489 - val_loss: 6.2440 - val_acc: 0.0208
Epoch 7/10
48/48 [==============================] - 23s 483ms/step - loss: 0.1704 - acc: 0.9543 - val_loss: 7.4446 - val_acc: 0.0208
Epoch 8/10
48/48 [==============================] - 23s 487ms/step - loss: 0.1828 - acc: 0.9476 - val_loss: 7.5198 - val_acc: 0.0417
What could be the reason?!

You have assigned train_set and test_datagen differently. In particular one is flipped and scaled where the other isn't. As I mentioned in my comment, if its the same data it will have the same accuracy. You can see a model is overfitting when you use validation correctly and use unseen data for validation. Using the same data will always give the same accuracy for training and validation

not sure what is exactly wrong but it is NOT an over fitting issue. It is clear your validation data(same as training data) is not going in correctly. For one thing you set the validation batch size =1 but you set the validation steps as validation_steps = test_generator.samples // 31) . If test_generator,samples = 1488 then you have 48 steps but with a batch size of 1 you will only validate 48 samples. You want to set the batch size and steps so that batch_size X validation_steps equals the total number of samples. That way you go through the validation set exactly one time. I also recommend that for the test generator you set shuffle=False. Also why do you bother entering all the class names. If you have your class directories labeled as 'airplane','airport','baseball_diamond' etc then you don;t need to specifically define the classes flow from directory will do that for you automatically. See documentation below.
classes: Optional list of class subdirectories (e.g. ['dogs', 'cats']). Default: None. If not provided, the list of classes will be automatically inferred from the subdirectory names/structure under directory, where each subdirectory will be treated as a different class (and the order of the classes, which will map to the label indices, will be alphanumeric). The dictionary containing the mapping from class names to class indices can be obtained via the attribute class_indices.
Your training data is actually different than your test data because you are using data augmentation in the generator. That's OK it may lead to a small difference between your test and validation accuracy but your validation accuracy should be pretty close once you get the validation data to go in correctly

How to understand loss acc val_loss val_acc in Keras model fitting

I'm new on Keras and have some questions on how to understanding my model results. Here is my result:(for your convenience, I only paste the loss acc val_loss val_acc after each epoch here)
Train on 4160 samples, validate on 1040 samples as below:
Epoch 1/20
4160/4160 - loss: 3.3455 - acc: 0.1560 - val_loss: 1.6047 - val_acc: 0.4721
Epoch 2/20
4160/4160 - loss: 1.7639 - acc: 0.4274 - val_loss: 0.7060 - val_acc: 0.8019
Epoch 3/20
4160/4160 - loss: 1.0887 - acc: 0.5978 - val_loss: 0.3707 - val_acc: 0.9087
Epoch 4/20
4160/4160 - loss: 0.7736 - acc: 0.7067 - val_loss: 0.2619 - val_acc: 0.9442
Epoch 5/20
4160/4160 - loss: 0.5784 - acc: 0.7690 - val_loss: 0.2058 - val_acc: 0.9433
Epoch 6/20
4160/4160 - loss: 0.5000 - acc: 0.8065 - val_loss: 0.1557 - val_acc: 0.9750
Epoch 7/20
4160/4160 - loss: 0.4179 - acc: 0.8296 - val_loss: 0.1523 - val_acc: 0.9606
Epoch 8/20
4160/4160 - loss: 0.3758 - acc: 0.8495 - val_loss: 0.1063 - val_acc: 0.9712
Epoch 9/20
4160/4160 - loss: 0.3202 - acc: 0.8740 - val_loss: 0.1019 - val_acc: 0.9798
Epoch 10/20
4160/4160 - loss: 0.3028 - acc: 0.8788 - val_loss: 0.1074 - val_acc: 0.9644
Epoch 11/20
4160/4160 - loss: 0.2696 - acc: 0.8923 - val_loss: 0.0581 - val_acc: 0.9856
Epoch 12/20
4160/4160 - loss: 0.2738 - acc: 0.8894 - val_loss: 0.0713 - val_acc: 0.9837
Epoch 13/20
4160/4160 - loss: 0.2609 - acc: 0.8913 - val_loss: 0.0679 - val_acc: 0.9740
Epoch 14/20
4160/4160 - loss: 0.2556 - acc: 0.9022 - val_loss: 0.0599 - val_acc: 0.9769
Epoch 15/20
4160/4160 - loss: 0.2384 - acc: 0.9053 - val_loss: 0.0560 - val_acc: 0.9846
Epoch 16/20
4160/4160 - loss: 0.2305 - acc: 0.9079 - val_loss: 0.0502 - val_acc: 0.9865
Epoch 17/20
4160/4160 - loss: 0.2145 - acc: 0.9185 - val_loss: 0.0461 - val_acc: 0.9913
Epoch 18/20
4160/4160 - loss: 0.2046 - acc: 0.9183 - val_loss: 0.0524 - val_acc: 0.9750
Epoch 19/20
4160/4160 - loss: 0.2055 - acc: 0.9120 - val_loss: 0.0440 - val_acc: 0.9885
Epoch 20/20
4160/4160 - loss: 0.1890 - acc: 0.9236 - val_loss: 0.0501 - val_acc: 0.9827
Here are my understandings:
The two losses (both loss and val_loss) are decreasing and the tow acc (acc and val_acc) are increasing. So this indicates the modeling is trained in a good way.
The val_acc is the measure of how good the predictions of your model are. So for my case, it looks like the model was trained pretty well after 6 epochs, and the rest training is not necessary.
My Questions are:
The acc (the acc on training set) is always smaller, actually much smaller, than val_acc. Is this normal? Why this happens?In my mind, acc should usually similar to better than val_acc.
After 20 epochs, the acc is still increasing. So should I use more epochs and stop when acc stops increasing? Or I should stop where val_acc stops increasing, regardless of the trends of acc?
Is there any other thoughts on my results?
Thanks!

Answering your questions:
As described on official keras FAQ
the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
Training should be stopped when val_acc stops increasing, otherwise your model will probably overffit. You can use earlystopping callback to stop training.
Your model seems to achieve very good results. Keep up the good work.

What are loss and val_loss?
In deep learning, the loss is the value that a neural network is trying to minimize: it's the distance between the ground truth and the predictions. In order to minimize this distance, the neural network learns by adjusting weights and biases in a manner that reduces the loss.
For instance, in regression tasks, you have a continuous target, e.g., height. What you want to minimize is the difference between your predictions, and the actual height. You can use mean_absolute_error as loss so the neural network knows this is what it needs to minimize.
In classification, it's a little more complicated, but very similar. Predicted classes are based on probability. The loss is therefore also based on probability. In classification, the neural network minimizes the likelihood to assign a low probability to the actual class. The loss is typically categorical_crossentropy.
loss and val_loss differ because the former is applied to the train set, and the latter the test set. As such, the latter is a good indication of how the model performs on unseen data. You can get a validation set by using validation_data=[x_test, y_test] or validation_split=0.2.
It's best to rely on the val_loss to prevent overfitting. Overfitting is when the model fits the training data too closely, and the loss keeps decreasing while the val_loss is stale, or increases.
In Keras, you can use EarlyStopping to stop training when the val_loss stops decreasing. Read here.
Read more about deep learning losses here: Loss and Loss Functions for Training Deep Learning Neural Networks.
What are acc and val_acc?
Accuracy is a metric only for classification. It makes no sense on a task with a continuous target. It gives the percentage of instances that are correctly classified.
Once again, acc is on the training data, and val_acc is on the validation data. It's best to rely on val_acc for a fair representation of model performance because a good neural network will end up fitting the training data at 100%, but would perform poorly on unseen data.

What does the standard Keras model output mean? What is epoch and loss in Keras?

I have just built my first model using Keras and this is the output. It looks like the standard output you get after building any Keras artificial neural network. Even after looking in the documentation, I do not fully understand what the epoch is and what the loss is which is printed in the output.
What is epoch and loss in Keras?
(I know it's probably an extremely basic question, but I couldn't seem to locate the answer online, and if the answer is really that hard to glean from the documentation I thought others would have the same question and thus decided to post it here.)
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Epoch 11/20
1213/1213 [==============================] - 0s - loss: 0.1923
Epoch 12/20
1213/1213 [==============================] - 0s - loss: 0.1910
Epoch 13/20
1213/1213 [==============================] - 0s - loss: 0.2104
Epoch 14/20
1213/1213 [==============================] - 0s - loss: 0.1976
Epoch 15/20
1213/1213 [==============================] - 0s - loss: 0.1979
Epoch 16/20
1213/1213 [==============================] - 0s - loss: 0.2036
Epoch 17/20
1213/1213 [==============================] - 0s - loss: 0.2019
Epoch 18/20
1213/1213 [==============================] - 0s - loss: 0.1978
Epoch 19/20
1213/1213 [==============================] - 0s - loss: 0.1954
Epoch 20/20
1213/1213 [==============================] - 0s - loss: 0.1949

Just to answer the questions more specifically, here's a definition of epoch and loss:
Epoch: A full pass over all of your training data.
For example, in your view above, you have 1213 observations. So an epoch concludes when it has finished a training pass over all 1213 of your observations.
Loss: A scalar value that we attempt to minimize during our training of the model. The lower the loss, the closer our predictions are to the true labels.
This is usually Mean Squared Error (MSE) as David Maust said above, or often in Keras, Categorical Cross Entropy
What you'd expect to see from running fit on your Keras model, is a decrease in loss over n number of epochs. Your training run is rather abnormal, as your loss is actually increasing. This could be due to a learning rate that is too large, which is causing you to overshoot optima.
As jaycode mentioned, you will want to look at your model's performance on unseen data, as this is the general use case of Machine Learning.
As such, you should include a list of metrics in your compile method, which could look like:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
As well as run your model on validation during the fit method, such as:
model.fit(data, labels, validation_split=0.2)
There's a lot more to explain, but hopefully this gets you started.

One epoch ends when your model had run the data through all nodes in your network and ready to update the weights to reach optimal loss value. That is, smaller is better. In your case, as there are higher loss scores on higher epoch, it "seems" the model is better on first epoch.
I said "seems" since we can't actually tell for sure yet as the model has not been tested using proper cross validation method i.e. it is evaluated only against its training data.
Ways to improve your model:
Use cross validation in your Keras model in order to find out how the model actually perform, does it generalize well when predicting new data it has never seen before?
Adjust your learning rate, structure of neural network model, number of hidden units / layers, init, optimizer, and activator parameters used in your model among myriad other things.
Combining sklearn's GridSearchCV with Keras can automate this process.

How to train and tune an artificial multilayer perceptron neural network using Keras?

I am building my first artificial multilayer perceptron neural network using Keras.
This is my input data:
This is my code which I used to build my initial model which basically follows the Keras example code:
model = Sequential()
model.add(Dense(64, input_dim=14, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, init='uniform'))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(2, init='uniform'))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd)
model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
Output:
Epoch 1/20
1213/1213 [==============================] - 0s - loss: 0.1760
Epoch 2/20
1213/1213 [==============================] - 0s - loss: 0.1840
Epoch 3/20
1213/1213 [==============================] - 0s - loss: 0.1816
Epoch 4/20
1213/1213 [==============================] - 0s - loss: 0.1915
Epoch 5/20
1213/1213 [==============================] - 0s - loss: 0.1928
Epoch 6/20
1213/1213 [==============================] - 0s - loss: 0.1964
Epoch 7/20
1213/1213 [==============================] - 0s - loss: 0.1948
Epoch 8/20
1213/1213 [==============================] - 0s - loss: 0.1971
Epoch 9/20
1213/1213 [==============================] - 0s - loss: 0.1899
Epoch 10/20
1213/1213 [==============================] - 0s - loss: 0.1957
Epoch 11/20
1213/1213 [==============================] - 0s - loss: 0.1923
Epoch 12/20
1213/1213 [==============================] - 0s - loss: 0.1910
Epoch 13/20
1213/1213 [==============================] - 0s - loss: 0.2104
Epoch 14/20
1213/1213 [==============================] - 0s - loss: 0.1976
Epoch 15/20
1213/1213 [==============================] - 0s - loss: 0.1979
Epoch 16/20
1213/1213 [==============================] - 0s - loss: 0.2036
Epoch 17/20
1213/1213 [==============================] - 0s - loss: 0.2019
Epoch 18/20
1213/1213 [==============================] - 0s - loss: 0.1978
Epoch 19/20
1213/1213 [==============================] - 0s - loss: 0.1954
Epoch 20/20
1213/1213 [==============================] - 0s - loss: 0.1949
How do I train and tune this model and get my code to output my best predictive model? I am new to neural networks and am just wholly confused as to what is the next step after building the model. I know I want to optimize it, but I'm not sure which features to tweak or if I am supposed to do it manually or how to write code to do so.

Some things that you could do are:
Change your loss function from mean_squared_error to binary_crossentropy. mean_squared_error is intended for regression, but you want to classify your data.
Add show_accuracy=True to your fit() function, which outputs the accuracy of your model at every epoch. That information is probably more useful to you than just the loss value.
Add validation_split=0.2 to your fit() function. Currently you are only training on a training set and validating on nothing. That's a no-go in machine learning as you can't be sure that your model hasn't simply memorized the correct answers for your dataset (without really understanding why these answers are correct).
Change from Obama/Romney to Democrat/Republican and add data from previous elections. ~1200 examples is a pretty small dataset for neural networks. Also add columns with valuable information, like unemployment rate or population density. Note that quite some of the values (like population number) are probably similar to providing the name of the state, so e.g. your net will likely learn that Texas means Republican.
If you haven't done that already, normalize all your values to the range of 0 to 1 (by subtracting from each value the minimum of the column and then dividing by the (max - min) of the column). Neural networks can handle normalized data better than unnormalized data.
Try Adam and Adagrad instead of SGD. Sometimes they perform better. (See documentation about optimizers.)
Try Activation('relu'), LeakyReLU, PReLU and ELU instead of Activation('tanh'). Tanh is rarely the best choice. (See advanced activation functions.)
Try increasing/decreasing your dense layers sizes (e.g. from 64 to 128). Also try adding/removing layers.
Try adding BatchNormalization layers (before the Activation layers). (See documentation.)
Try changing the dropout rates (e.g. from 0.5 to 0.25).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.