Loss is not decreasing while training Keras Sequential Model

Loss is not decreasing while training Keras Sequential Model - python

I'm creating a very simple 2 layer feed forward network but am finding that the loss is not updating at all. I have some ideas but I wanted to get additional feedback/guidance.
Details about the data:
X_train:
(336876, 158)
X_dev:
(42109, 158)
Y_train counts:
0 285793
1 51083
Name: default, dtype: int64
Y_dev counts:
0 35724
1 6385
Name: default, dtype: int64
And here is my model architecture:
# define the architecture of the network
model = Sequential()
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform", activation="relu"))
model.add(Dense(3print("[INFO] compiling model...")
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)Dense(1, activation = 'sigmoid'))
Now, with this, my loss after the first few epochs are as follows:
Epoch 1/12
336876/336876 [==============================] - 8s - loss: 2.4441 - acc: 0.8484
Epoch 2/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 3/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 4/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 5/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 6/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 7/12
336876/336876 [==============================] - 7s - loss: 2.4441 - acc: 0.8484
Epoch 8/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
Epoch 9/12
336876/336876 [==============================] - 6s - loss: 2.4441 - acc: 0.8484
And when I test the model after, my f1_score is 0. My main thought was that I may need more data but I'd still expect it to perform better than it is now on the test set. Could it be that it is overfitting? I added Dropout but no luck there either.
Any help would be much appreciated.

at first glance, I believe that your learning rate is too high. Also, please consider normalizing your data especially if different features have different ranges of values (look at Scaling). Also, please consider changing your layer activations depending on whether your labels are multi-class or not. Assuming your code is of this form (you seem to have some typos in problem description):
# define the architecture of the network
model = Sequential()
#also what is the init="uniform" argument? I did not find this in keras documentation, consider removing this.
model.add(Dense(50, input_dim=X_train.shape[1], init="uniform",
activation="relu"))
model.add(Dense(1, activation = 'sigmoid')))
#a slightly more conservative learning rate, play around with this.
adam = Adam(lr=0.0001)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128,
verbose=1)
This should lead the loss to converge. If not, please consider deepening your neural net (think about how many parameters you may need).

Consider adding the classification layer before compiling your model.
model.add(Dense(1, activation = 'sigmoid'))
adam = Adam(lr=0.01)
model.compile(loss="binary_crossentropy", optimizer=adam,
metrics=['accuracy'])
model.fit(np.array(X_train), np.array(Y_train), epochs=12, batch_size=128, verbose=1)

Related

What exactly does the loss value give while training a LSTM model in keras?

I have a LSTM model that predicts tomorrow's water outflow volume based on today's outflow volume, temperature, and precipitation.
model = Sequential()
model.add(LSTM(units=24, return_sequences=True,
input_shape=(X_Train.shape[1],X_Train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(20, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
history = model.fit(X_Train, Y_Train, epochs=8,
validation_data=(X_Test, Y_Test))
While training I got:
Epoch 1/8
4638/4638 [==============================] - 78s 17ms/step - loss:
1.9951e-04 - val_loss: 1.5074e-04
Epoch 2/8
4638/4638 [==============================] - 77s 17ms/step - loss:
9.6735e-05 - val_loss: 1.0922e-04
Epoch 3/8
4638/4638 [==============================] - 78s 17ms/step - loss:
6.5202e-05 - val_loss: 5.9079e-05
Epoch 4/8
4638/4638 [==============================] - 77s 17ms/step - loss:
5.1011e-05 - val_loss: 4.9478e-05
Epoch 5/8
4638/4638 [==============================] - 77s 17ms/step - loss:
4.3992e-05 - val_loss: 5.1148e-05
Epoch 6/8
4638/4638 [==============================] - 77s 17ms/step - loss:
3.9901e-05 - val_loss: 4.2351e-05
Epoch 7/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.6884e-05 - val_loss: 4.0763e-05
Epoch 8/8
4638/4638 [==============================] - 74s 16ms/step - loss:
3.5287e-05 - val_loss: 3.6736e-05
But when I manually calculate the mean-square error, I get a different result
mean_square_root = mean_squared_error(predicted_y_values_unnor, Y_test_actual)
130.755469707972
Manual Calculation:
I wanted to know why they would the validation loss be different while training than while calculating manually. How is the loss calculated while training?

The loss you have chosen is mean_squared_error in the line
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
That is the loss your LSTM model is minimizing.
The Mean Squared Error, or MSE, loss is the default loss to use for regression problems. Mean squared error is calculated as the average of the squared differences between the predicted and actual values. The result is always positive regardless of the sign of the predicted and actual values and a perfect value is 0.0. The squaring means that larger mistakes result in more error than smaller mistakes, meaning that the model is punished for making larger mistakes.
LSTM is a general model and you can choose lots of different loss functions. Here is keras inbuilt available functions list
https://keras.io/api/losses/
You need to select a loss function as per your problem.

I was just having the same exact problem and i solved it with flatten() function.
When you do: "mean_squared_error(predicted_y_values_unnor, Y_test_actual)" the shape of "Y_test_actual" is just a one dimensional array. Example: array([297, 290, 308, 308, 214]) but the other argument is a bidimensional array in wich one of the dimensions is 1. Example: "array([300.53693, 299.9416 , 295.61334, 218.2563 , 219.74983],
dtype=float32)"
you just have to do flatten() on "predicted_y_values_unnor"

Keras model evaluation accuracy unchanged, and designing model

I'm trying to design a CNN in Keras to classify small images of emojis in other images. Below is an example of one of the 13 classes. All images are the same size and all the emojis are of the same size as well. I would think that one should rather easily be able to achieve VERY high accuracy when classifying, as emojis from one class is exactly the same! My intuition told me that if an emoji is 50x50 I could create a convolutional layer of the same size to match one type of emoji. My supervisor did not think that was feasible however. Anyway, my problem is that, no matter how I design my model, I always get the same validation accuracy for each epoch, which corresponds to 1/13 (or simply guessing that each emoji belongs to the same class).
My model looks like this:
model = Sequential()
model.add(Conv2D(16, kernel_size=3, activation="relu", input_shape=IMG_SIZE))
model.add(Dropout(0.5))
model.add(Conv2D(32, kernel_size=3, activation="relu"))
model.add(Conv2D(64, kernel_size=3, activation="relu"))
model.add(Conv2D(128, kernel_size=3, activation="relu"))
#model.add(Conv2D(256, kernel_size=3, activation="relu"))
model.add(Dropout(0.5))
model.add(Flatten())
#model.add(Dense(256, activation="relu"))
model.add(Dense(128, activation="relu"))
model.add(Dense(64, activation="relu"))
model.add(Dense(NUM_CLASSES, activation='softmax', name="Output"))
And I train it like this:
# ------------------ Compile and train ---------------
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
rms = optimizers.RMSprop(lr=0.004, rho=0.9, epsilon=None, decay=0.0)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=["accuracy"]) # TODO Read more about this
train_hist = model.fit_generator(
train_generator,
steps_per_epoch=train_generator.n // BATCH_SIZE,
validation_steps=validation_generator.n // BATCH_SIZE, # TODO que?
epochs=EPOCHS,
validation_data=validation_generator,
#callbacks=[EarlyStopping(patience=3, restore_best_weights=True)]
)
Even with this model, that has over 200 million parameters, I get exactly 0.0773 in validation accuracy for each epoch:
Epoch 1/10
56/56 [==============================] - 21s 379ms/step - loss: 14.9091 - acc: 0.0737 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 2/10
56/56 [==============================] - 6s 108ms/step - loss: 14.9308 - acc: 0.0737 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 3/10
56/56 [==============================] - 6s 108ms/step - loss: 14.7869 - acc: 0.0826 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 4/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8948 - acc: 0.0759 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 5/10
56/56 [==============================] - 6s 109ms/step - loss: 14.8897 - acc: 0.0762 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 6/10
56/56 [==============================] - 6s 109ms/step - loss: 14.8178 - acc: 0.0807 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 7/10
56/56 [==============================] - 6s 108ms/step - loss: 15.0747 - acc: 0.0647 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 8/10
56/56 [==============================] - 6s 108ms/step - loss: 14.7509 - acc: 0.0848 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 9/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8948 - acc: 0.0759 - val_loss: 14.8719 - val_acc: 0.0773
Epoch 10/10
56/56 [==============================] - 6s 108ms/step - loss: 14.8228 - acc: 0.0804 - val_loss: 14.8719 - val_acc: 0.0773
Because its not learning anything, I'm starting to think that it's not my models fault, but maybe the dataset or how I train it. I have tried training with "adam" as well but get the same result. I try to change the input size of the images but still, same result. Below is a sample from my dataset. Do you guys have any ideas what could be wrong?

I think the main issue currently is that your model has way too many parameters relative to how few samples you have for training. For image classification nowadays, you generally want to just have conv layers, a GlobalSomethingPooling layer, and then a single Dense layer for your outputs. You just need to make sure that your conv section ends up with a large enough receptive field to be able to find all the features you need.
The first thing to think about is making sure that you have a large enough receptive field (further reading about that here). There are three main ways to achieve that: pooling, stride >= 2, and/or dilation >= 2. Because you have only 13 "features" that you want to identify, and all of them will always be pixel-perfect, I'm thinking dilation will be the way to go so that the model can easily "overfit" on those 13 "features". If we use 4 conv layers with dilations of 1, 2, 4, and 8, respectively, then we'll end up with a receptive field of 31. This should be enough to easily recognize 50-pixel emoji.
Next, how many filters should each layer have? Normally, you start with a few filters, and increase as you go through the model, as you are doing here. However, we want to "overfit" on specific features, so we should probably increase the amounts in the earlier layers. Just to make it easy, let's give all layers 64 filters.
Last, how do we convert this all to a single prediction? Rather than using a dense layer, which would use a ton of parameters and would not be translation invariant, people nowadays use GlobalAveragePooling or GlobalMaxPooling. GlobalAveragePooling is more common, because it's good for helping to find combinations of many features. However, we just want to find 13 or so exact features here, so GlobalMaxPooling might work even better. Then a single dense layer after that will be enough to get a prediction. Because we're using GlobalMaxPooling, it doesn't need to be flattened first—global pooling already does that for us.
Resulting model:
Conv2D(64, kernel_size=3, activation="relu", input_shape=IMG_SIZE)
Conv2D(64, kernel_size=3, dilation_rate=2, activation="relu")
Conv2D(64, kernel_size=3, dilation_rate=4, activation="relu")
Conv2D(64, kernel_size=3, dilation_rate=8, activation="relu")
GlobalMaxPooling()
Dense(NUM_CLASSES, activation='softmax', name="Output")
Try that. You'll probably also want to add BatchNormalization layers in there after every layers except the last. Once you get decent training accuracy, check if it's overfitting. If it is (which is likely), try these steps:
batch norm if you haven't already
weight decay on all conv and dense layers
SpatialDropout2D after the conv layers and regular Dropout after the pooling layer
augment your data with an ImageDataGenerator
Designing nets like this is almost more of an art than a science, so change stuff around if you think you should. Eventually, you will get an intuition for what is more or less likely to work in any given situation.

Loss not decrasing and is very high keras

I'm learning deep learning in keras and I have a problem.
The loss isn't decreasing and it's very high, about 650.
I'm working on MNIST dataset from tensorflow.keras.datasets.mnist
There is no error, just my NN isn't learning.
There is my model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
import tensorflow.nn as tfnn
inputdim = 28 * 28
model = Sequential()
model.add(Flatten())
model.add(Dense(inputdim, activation = tfnn.relu))
model.add(Dense(128, activation = tfnn.relu))
model.add(Dense(10, activation = tfnn.softmax))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(X_train, Y_train, epochs = 4)
and my output:
Epoch 1/4
60000/60000 [==============================] - 32s 527us/sample - loss: 646.0926 - acc: 6.6667e-05
Epoch 2/4
60000/60000 [==============================] - 39s 652us/sample - loss: 646.1003 - acc: 0.0000e+00 - l - ETA: 0s - loss: 646.0983 - acc: 0.0000e
Epoch 3/4
60000/60000 [==============================] - 35s 590us/sample - loss: 646.1003 - acc: 0.0000e+00
Epoch 4/4
60000/60000 [==============================] - 33s 544us/sample - loss: 646.1003 - acc: 0.0000e+00
```

Ok, I added BatchNormalization between lines and changed loss function to 'sparse_categorical_crossentropy'. That's how my NN looks like:
model = Sequential()
model.add(Flatten())
model.add(BatchNormalization(axis = 1, momentum = 0.99))
model.add(Dense(inputdim, activation = tfnn.relu))
model.add(BatchNormalization(axis = 1, momentum = 0.99))
model.add(Dense(128, activation = tfnn.relu))
model.add(BatchNormalization(axis = 1, momentum = 0.99))
model.add(Dense(10, activation = tfnn.softmax))
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
and thats a results:
Epoch 1/4
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.2045 - acc: 0.9374
Epoch 2/4
60000/60000 [==============================] - 55s 916us/sample - loss: 0.1007 - acc: 0.9689
Thanks for your help!

You may try sparse_categorical_crossentropy loss function. Also what is your batch size? and as has already been suggested you may want to increase number of epochs.

Keras network fit: loss is 'nan', accuracy doesn't change

I try to fit keras network, but in each epoch loss is 'nan' and accuracy doesn't change... I tried to change epoch, layers count, neurons count, learning rate, optimizers, I checked nan data in datasets, normalize data by different ways, but problem was not solved. Thanks for your help.
np.random.seed(1337)
# example of input vector: [-1.459746, 0.2694708, ... 0.90043]
# example of output vector: [1, 0] or [0, 1]
model = Sequential()
model.add(Dense(1000, activation='tanh', init='normal', input_dim=503))
model.add(Dense(2, init='normal', activation='softmax'))
opt = optimizers.sgd(lr=0.01)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=['accuracy'])
print(model.summary())
model.fit(x_train, y_train, batch_size=1000, nb_epoch=100, verbose=1)
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 1/100
99804/99804 [==============================] - 5s 49us/step - loss: nan - acc: 0.4938
Epoch 2/100
99804/99804 [==============================] - 5s 51us/step - loss: nan - acc: 0.4938
Epoch 3/100
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 4/100
99804/99804 [==============================] - 5s 52us/step - loss: nan - acc: 0.4938
Epoch 5/100
99804/99804 [==============================] - 5s 51us/step - loss: nan - acc: 0.4938
...

Oh, problem has been found! After normalization, one nan neuron appeared in the input vector

First convert your output to categorical, as described in Keras documentation:
Note: when using the categorical_crossentropy loss, your targets should be in categorical format. In order to convert integer targets into categorical targets, you can use the Keras utility to_categorical:
from keras.utils import to_categorical
categorical_labels = to_categorical(int_labels, num_classes=None)

Running Keras Sequential model with different optimizers

I want to check the performance of my model against various optimizers
(sgd, rmsprop, adam, adamax etc)
So i define a keras sequential model and then i do this
epochs = 50
print('--sgd start---')
model.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])
checkpointer_sgd = ModelCheckpoint(filepath='my_model_sgd.h5',
verbose=1, save_best_only=True)
history_sgd = model.fit(X_train, y_train,
validation_split=0.2,epochs=epochs, batch_size=32, callbacks=[checkpointer_sgd],verbose=1)
print('--sgd end---')
print('--------------------------------------------')
print('--rmsprop start---')
model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])
checkpointer_rmsprop = ModelCheckpoint(filepath='my_model_rmsprop.h5',
verbose=1, save_best_only=True)
history_rmsprop = model.fit(X_train, y_train,
validation_split=0.2,
epochs=epochs, batch_size=32, callbacks=[checkpointer_rmsprop],verbose=1)
print('--rmsprop end---')
I do this for all the optimizers (in the code above have mentioned only sgd and rmsprop) and then execute the statements. So now what happens is the first optimizer starts from low accuracy and then accuracy is increased as more epochs happen. But the next optimizer starts from already a high accuracy.
Is the above code correct or do i need to reset the model everytime
before i compile
See below the first epoch output for different optimizers
--sgd start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 46s 27ms/step - loss: 0.0510 - acc: 0.2985 - val_loss: 0.0442 - val_acc: 0.6986
--rmsprop start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 46s 27ms/step - loss: 0.0341 - acc: 0.5940 - val_loss: 0.0148 - val_acc: 0.6963
--adagrad start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 44s 26ms/step - loss: 0.0068 - acc: 0.6951 - val_loss: 0.0046 - val_acc: 0.6963
--adadelta start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 52s 30ms/step - loss: 8.0430e-04 - acc: 0.8125 - val_loss: 9.4660e-04 - val_acc: 0.7850
--adam start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 47s 27ms/step - loss: 7.7599e-04 - acc: 0.8201 - val_loss: 9.8981e-04 - val_acc: 0.7757
--adamax start---
Train on 1712 samples, validate on 428 samples
Epoch 1/50
1712/1712 [==============================] - 54s 31ms/step - loss: 6.4941e-04 - acc: 0.8359 - val_loss: 9.2495e-04 - val_acc: 0.7991

use K.clear_session() which will clean up everything.
from keras import backend as K
def get_model():
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1,activation='sigmoid'))
return model
model = get_model()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
K.clear_session() # it will destroy keras object
model1 = get_model()
model1.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
model1.fit(X, Y, epochs=150, batch_size=10, verbose=0)
K.clear_session()
This solution should solve your problem. Let me know if it works.

Recompiling the model does not change it's state. Weights learn before compilation will be same after compilation. You need to delete the model object to clear the weights and create a new one before compiling again.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.