EarlyStopping callback behaving mysteriously in Keras

EarlyStopping callback behaving mysteriously in Keras - python

My model stops training after the 4th epoch even though I expect it to continue training beyond that. I've set monitor to validation loss and patience to 2, which I thought means that training stops after validation loss increases consecutively for 2 epochs. However, training seems to stop before that happens.
I've defined EarlyStopping as follows:
callbacks = [
EarlyStopping(monitor='val_loss', patience=2, verbose=0),
]
And in the fit function I use it like this:
hist = model.fit_generator(
generator(imgIds, batch_size=batch_size, is_train=True),
validation_data=generator(imgIds, batch_size=batch_size, is_val=True),
validation_steps=steps_per_val,
steps_per_epoch=steps_per_epoch,
epochs=epoch_count,
verbose=verbose_level,
callbacks=callbacks)
I don't understand why training ends after the 4th epoch.
675/675 [==============================] - 1149s - loss: 0.1513 - val_loss: 0.0860
Epoch 2/30
675/675 [==============================] - 1138s - loss: 0.0991 - val_loss: 0.1096
Epoch 3/30
675/675 [==============================] - 1143s - loss: 0.1096 - val_loss: 0.1040
Epoch 4/30
675/675 [==============================] - 1139s - loss: 0.1072 - val_loss: 0.1019
Finished training intermediate1.

I think your interpretation of the EarlyStopping callback is a little off; it stops when the loss doesn't improve from the best loss it has ever seen for patience epochs. The best loss your model had was 0.0860 at epoch 1, and for epochs 2 and 3 the loss did not improve, so it should have stopped training after epoch 3. However, it continues to train for one more epoch due to an off-by-one error, at least I would call it that given what the docs say about patience, which is:
patience: number of epochs with no improvement after which training will be stopped.
From the Keras source code (edited slightly for clarity):
class EarlyStopping(Callback):
def on_epoch_end(self, epoch, logs=None):
current = logs.get(self.monitor)
if np.less(current - self.min_delta, self.best):
self.best = current
self.wait = 0
else:
if self.wait >= self.patience:
self.stopped_epoch = epoch
self.model.stop_training = True
self.wait += 1
Notice how self.wait isn't incremented until after the check against self.patience, so while your model should have stopped training after epoch 3, it continued for one more epoch.
Unfortunately it seems if you want a callback that behaves the way you described, where it stops training without consecutive improvement in patience epochs, you'd have to write it yourself. But I think you could just modify the EarlyStopping callback slightly to accomplish this.
Edit: The off-by-one error is fixed.

Related

Keras training with Adam stops prematurely

I am using Keras for the first time on a regression problem. I have set up an early stopping callback, monitoring val_loss (which is mean squared error) with patience=3. However, the training stops even if val_loss is decreasing for the last few epochs. Either there is a bug in my code, or I fail to understand the true meaning of my callback. Can anyone understand what is going on? I provide the training progress and the model building code below.
As you see below, the training stopped at epoch 8, but val_loss has been decreasing since epoch 6 and I think it should have continued running. There was only one time when val_loss increased (from epoch 5 to 6), and patience is 3.
Epoch 1/100
35849/35849 - 73s - loss: 11317667.0000 - val_loss: 7676812.0000
Epoch 2/100
35849/35849 - 71s - loss: 11095449.0000 - val_loss: 7635795.0000
Epoch 3/100
35849/35849 - 71s - loss: 11039211.0000 - val_loss: 7627178.5000
Epoch 4/100
35849/35849 - 71s - loss: 10997918.0000 - val_loss: 7602583.5000
Epoch 5/100
35849/35849 - 65s - loss: 10955304.0000 - val_loss: 7599179.0000
Epoch 6/100
35849/35849 - 59s - loss: 10914252.0000 - val_loss: 7615204.0000
Epoch 7/100
35849/35849 - 59s - loss: 10871920.0000 - val_loss: 7612452.0000
Epoch 8/100
35849/35849 - 59s - loss: 10827388.0000 - val_loss: 7603128.5000
The model is built as follows:
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras import initializers
# create model
model = Sequential()
model.add(Dense(len(predictors), input_dim=len(predictors), activation='relu',name='input',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
model.add(Dense(155, activation='relu',name='hidden1',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
model.add(Dense(1, activation='linear',name='output',
kernel_initializer=initializers.he_uniform(seed=seed_value)))
callback = EarlyStopping(monitor='val_loss', patience=3,restore_best_weights=True)
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
# Fit the model
history = model.fit(X,y, validation_split=0.2, epochs=100,
batch_size=50,verbose=2,callbacks=[callback])
After experimenting with some of the hyperparameters, such as the activation functions, I keep having the same problem. It doesn't always stop at epoch 8, though. I also tried changing patience.
Details:
Ubuntu 18.04
Tensorflow 2.6.0
Python 3.8.5

You are misunderstanding how Keras defines improvement. You are correct in that the val_loss decreased in epochs 7 and 8 and only increased in epoch 6. What you are missing though is that the improvements in 7 and 8 did not improve on the current best value from epoch 5 (7599179.0000). The current best value for loss occurred in epoch 5 and your callback waited 3 epochs to see if anything could beat it, NOT if there would be an improvement from within those 3 epochs. In epoch 8 when the loss did not dip below the 5th epoch the callback terminated the training.

keras model.fit output - "val_accuracy improved from -inf to 0.29846" - what does it mean by -inf?

I'm training a CNN model using keras.
After end of each epoch, I save the weights as checkpoints if the validation accuracy has improved.
from keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint(checkpoint_path, monitor='val_accuracy', mode='max',
save_best_only=True, verbose=1)
callbacks = [checkpoint]
#load checkpoints if existing
import os
num_of_epochs = 65
epochs_done = 0
if(os.path.exists(checkpoint_path)):
model.load_weights(checkpoint_path)
num_of_epochs = num_of_epochs - epochs_done
print('checkpoints loaded')
When I restart training after stopping, this is how my first epoch output looks like.
Epoch 1/65
425/425 [==============================] - 224s 526ms/step - loss: 2.1739 - accuracy: 0.2939 - val_loss: 2.1655 - val_accuracy: 0.2985
Epoch 00001: val_accuracy improved from -inf to 0.29846, saving model to checkpoints-finetuning.hdf5
I noticed this happening at at the first epoch every time I restart training. Why does it happen? Does my checkpoint file get overwritten by worse accurate weights each time I restart?

This is because the callback instance is recreated every time you run the script; it isn't saved with the model. As such, the first epoch will always begin from the default value which is either np.Inf or -np.Inf as per this right here.

How to see the loss of the best epoch from early stopping in Keras?

I have managed to implement early stopping into my Keras model, but I am not sure how I can view the loss of the best epoch.
es = EarlyStopping(monitor='val_out_soft_loss',
mode='min',
restore_best_weights=True,
verbose=2,
patience=10)
model.fit(tr_x,
tr_y,
batch_size=batch_size,
epochs=epochs,
verbose=1,
callbacks=[es],
validation_data=(val_x, val_y))
loss = model.history.history["val_out_soft_loss"][-1]
return model, loss
The way I have defined the loss score, means that the returned score comes from the final epoch, not the best epoch.
Example:
from sklearn.model_selection import train_test_split, KFold
losses = []
models = []
for k in range(2):
kfold = KFold(5, random_state = 42 + k, shuffle = True)
for k_fold, (tr_inds, val_inds) in enumerate(kfold.split(train_y)):
print("-----------")
print("-----------")
model, loss = get_model(64, 100)
models.append(model)
print(k_fold, loss)
losses.append(loss)
print("-------")
print(losses)
print(np.mean(losses))
Epoch 23/100
18536/18536 [==============================] - 7s 362us/step - loss: 0.0116 - out_soft_loss: 0.0112 - out_reg_loss: 0.0393 - val_loss: 0.0131 - val_out_soft_loss: 0.0127 - val_out_reg_loss: 0.0381
Epoch 24/100
18536/18536 [==============================] - 7s 356us/step - loss: 0.0116 - out_soft_loss: 0.0112 - out_reg_loss: 0.0388 - val_loss: 0.0132 - val_out_soft_loss: 0.0127 - val_out_reg_loss: 0.0403
Restoring model weights from the end of the best epoch
Epoch 00024: early stopping
0 0.012735568918287754
So in this example, I would like to see the loss at Epoch 00014 (which is 0.0124).
I also have a separate question: How can I set the decimal places for the val_out_soft_loss score?

Assign the fit() call in Keras to a variable so you can track the metrics through the epochs.
history = model.fit(tr_x, ...
It will return a dictionary, access it like this:
loss_hist = history.history['loss']
And then get the min() to get the minimum loss, and argmin() to get the best epoch (zero-based).
np.min(loss_hist)
np.argmin(loss_hist)

tf.keras loss becomes NaN

I'm programming a neural network in tf.keras, with 3 layers. My dataset is the MNIST dataset. I decreased the number of examples in the dataset, so the runtime is lower. This is my code:
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import pandas as pd
!git clone https://github.com/DanorRon/data
%cd data
!ls
batch_size = 32
epochs = 10
alpha = 0.0001
lambda_ = 0
h1 = 50
train = pd.read_csv('/content/first-repository/mnist_train.csv.zip')
test = pd.read_csv('/content/first-repository/mnist_test.csv.zip')
train = train.loc['1':'5000', :]
test = test.loc['1':'2000', :]
train = train.sample(frac=1).reset_index(drop=True)
test = test.sample(frac=1).reset_index(drop=True)
x_train = train.loc[:, '1x1':'28x28']
y_train = train.loc[:, 'label']
x_test = test.loc[:, '1x1':'28x28']
y_test = test.loc[:, 'label']
x_train = x_train.values
y_train = y_train.values
x_test = x_test.values
y_test = y_test.values
nb_classes = 10
targets = y_train.reshape(-1)
y_train_onehot = np.eye(nb_classes)[targets]
nb_classes = 10
targets = y_test.reshape(-1)
y_test_onehot = np.eye(nb_classes)[targets]
model = tf.keras.Sequential()
model.add(layers.Dense(784, input_shape=(784,)))
model.add(layers.Dense(h1, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.add(layers.Dense(10, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.compile(optimizer=tf.train.GradientDescentOptimizer(alpha),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
model.fit(x_train, y_train_onehot, epochs=epochs, batch_size=batch_size)
Whenever I run it, one of 3 things happens:
The loss decreases and the accuracy increases for a few epochs, until the loss becomes NaN for no apparent reason and the accuracy plummets.
The loss and accuracy stay the same for each epoch. Usually the loss is 2.3025 and the accuracy is 0.0986.
The loss starts at NaN(and stays that way), while the accuracy stays low.
Most of the time, the model does one of these things, but sometimes it does something random. It seems like the type of erratic behavior that occurs is completely random. I have no idea what the problem is. How do I fix this problem?
Edit: Sometimes, the loss decreases, but the accuracy stays the same. Also, sometimes the loss decreases and the accuracy increases, then after a while the accuracy decreases while the loss still decreases. Or, the loss decreases and the accuracy increases, then it switches and the loss goes up fast while the accuracy plummets, eventually ending with loss: 2.3025 acc: 0.0986.
Edit 2: This is an example of something that sometimes happens:
Epoch 1/100
49999/49999 [==============================] - 5s 92us/sample - loss: 1.8548 - acc: 0.2390
Epoch 2/100
49999/49999 [==============================] - 5s 104us/sample - loss: 0.6894 - acc: 0.8050
Epoch 3/100
49999/49999 [==============================] - 4s 90us/sample - loss: 0.4317 - acc: 0.8821
Epoch 4/100
49999/49999 [==============================] - 5s 104us/sample - loss: 2.2178 - acc: 0.1345
Epoch 5/100
49999/49999 [==============================] - 5s 90us/sample - loss: 2.3025 - acc: 0.0986
Epoch 6/100
49999/49999 [==============================] - 4s 90us/sample - loss: 2.3025 - acc: 0.0986
Epoch 7/100
49999/49999 [==============================] - 4s 89us/sample - loss: 2.3025 - acc: 0.0986
Edit 3: I changed the loss to mean squared error and the network works well now. Is there a way to keep it in cross entropy without it converging to a local minimum?

I changed the loss to mean squared error and the network works well now
MSE is not the appropriate loss function for such classification problems; you should certainly stick to loss = 'categorical_crossentropy'.
Most probably, the issue is due to your MNIST data being not normalized; you should normalize your final variables as
x_train = x_train.values/255
x_test = x_test.values/255
Not normalizing input data is a known cause of exploding gradient problems, which is probably what is happening here.
Other advice: set activation='relu' for your first dense layer, and get rid of both the regularizer & initializer arguments from all layers (the default glorot_uniform is actually a better initializer, while regularization here may actually be harmful for the performance).
As a general advice, try not to reinvent the wheel - start with a Keras example using the built-in MNIST data...

The frustration your feeling towards the seemly random output of your code is understandable and correctly identified. Every time the model begins training it randomly initializes the weights. Depending on this initialization you see one of your three output scenarios.
The issue is most likely due to vanishing gradients. It's a phenomenon that occurs when the backpropagation causes very small weights to be multiplied by a small number to create an almost infinitely small value. The solution is to add small jitter (1e-10) to each of your gradients (from within the cost function) so that they never reach zero.
There are tons of more detailed blogs about vanishing gradients online and for an implementation example checkout line 217 of this TensorFlow Network

Training loss doesn't match validation loss when X_train = X_test

I am running:
D.fit(X_train, y_train, nb_epoch=12,validation_data=(X_train,y_train))
But I get outputs like:
Train on 61936 samples, validate on 61936 samples
Epoch 1/12
61936/61936 [==============================] - 10s 166us/step - loss: 0.0021 - val_loss: 1.5650e-04
Epoch 2/12
61936/61936 [==============================] - 10s 165us/step - loss: 0.0014 - val_loss: 6.6482e-04
...
Epoch 10/12
61936/61936 [==============================] - 11s 170us/step - loss: 0.0104 - val_loss: 9.6666e-05

known issue
https://github.com/keras-team/keras/issues/605
The other reason that the results are different is because the model
is being trained while the "loss" is being computed, whereas the model
is fixed while "val_loss" is being computed. Since the model is
training, "loss" is typically going to be larger than the true
training set loss at the end of the epoch. I.e. "loss" is the average
loss during the epoch, and "val_loss" is the average loss after the
end of the epoch. Since the model changes during the epoch, the loss
changes.

These will never match. Validation loss is computed on the whole dataset at once (with weights fixed), with training loss is the average of loss across batches (weights change after every batch). If you want the real loss on the training set, you should run model.evaluate(X_train)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

EarlyStopping callback behaving mysteriously in Keras - python

Related

Keras training with Adam stops prematurely

keras model.fit output - "val_accuracy improved from -inf to 0.29846" - what does it mean by -inf?

How to see the loss of the best epoch from early stopping in Keras?

tf.keras loss becomes NaN

Training loss doesn't match validation loss when X_train = X_test

Categories

Resources