Keras: EarlyStopping save best model

Keras: EarlyStopping save best model - python

When I use EarlyStopping callback does Keras save best model in terms of val_loss or it save model on save_epoch = [best epoch in terms of val_loss] + YEARLY_STOPPING_PATIENCE_EPOCHS ?
If it's second option, how to just save best model?
Here is code snippet:
early_stopping = EarlyStopping(monitor='val_loss', patience=YEARLY_STOPPING_PATIENCE_EPOCHS)
history = model.fit_generator(
train_generator,
steps_per_epoch=100, # 1 epoch = BATCH_SIZE * steps_per_epoch samples
epochs=N_EPOCHS,
validation_data=test_generator,
validation_steps=20,
callbacks=[early_stopping])
#Save train log to .csv
pd.DataFrame(history.history).to_csv('vgg16_binary_crossentropy_train_log.csv', index=False)
model.save('vgg16_binary_crossentropy.h5')

In v2.2.4+ of Keras, EarlyStopping has a restore_best_weights parameter which, when set to True, will set the model to the state of best CV performance. For example:
EarlyStopping(restore_best_weights=True)

From my experience using the 'earlystopping' callback, the model will not be saved automatically...it will just stop training and when you save it manually, it will be the second option you present.
To have your model save each time val_loss decreases, see the following documentation page:
https://keras.io/callbacks/ and look at the "Example: model checkpoints" section which will tell you exactly what to do.
note that if you wish to re-use your saved model, I have had better luck using 'save_weights' in combo with saving the architecture in json. YMMV.

Related

Why doesn't Keras' ModelCheckPoint save my best model with the highest validation accuracy during training?

I am training a ResNet18 with Keras. As shown below, I used ModelCheckPoint to save the best model based on the validation accuracy.
model = ResNet18(2)
model.build(input_shape = (None,128,128,3))
model.summary()
model.save_weights('./Adam_resnet18_original.hdf5')
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
mcp_save = ModelCheckpoint('Adam_resnet18_weights.hdf5', save_best_only=True, monitor='val_accuracy', mode='max')
batch_size = 128
model.fit(generator(batch_size, x_train, y_train), steps_per_epoch = len(x_train) // batch_size, validation_data = generator(batch_size, x_valid, y_valid), validation_steps = len(x_valid) // batch_size, callbacks=[mcp_save], epochs = 300)
As shown in the picture below, the validation accuracy could go up to 0.8281 during training.
Training History
However, when I used the final model to get the final validation accuracy with the code below, I got an accuracy that's only 0.78109. Can anybody enlighten me what might be the problem here? Thanks a lot!
model.load_weights('Adam_resnet18_weights.hdf5')
predictions_validation = model.predict(generator(batch_size, x_valid, y_valid), steps = len(x_valid) // batch_size + 1)
predictions_validation_label = np.argmax(predictions_validation, axis=1)
Y_valid_label = np.argmax(Y_valid, axis=1)
accuracy_validation_conventional = accuracy_score(Y_valid_label, predictions_validation_label[:len(Y_valid_label)])
print(f'Accuracy on the validation set: {accuracy_validation_conventional}')

The biggest clue here is that the accuracy is stuck to 1.000 for the last couple epochs. From this, it appears that this model is overfitting. An intuitive understanding of overfitting would be like a student taking the exact same test over and over again, to the point where they just memorize the answers to each question and are unable to adapt to small changes in wording. The net has "memorized" the training data but is unable to adapt to the testing data.
It's a little tricky to figure out what the best approach would be since I don't know the size of the dataset you are working with or the details of the model. I am under the assumption that the dataset is of a decent size (if not, try data augmentation) and you have defined a multi-layered net (if you are importing this model from Keras, your options may be a little more limited). Here are some suggestions though:
Stop earlier. Set your ephochs to be a smaller number to prevent overtraining. This is the simplest and easiest solution, and it would make sense in your case since accuracy is already at 1.00 for the last several epochs. If you are able to graph your accuracy and loss over time, this will help as you will be able to visually pinpoint the number of epochs where overfitting begins, as you can see in this example. There are fancier ways to implement early stopping, but simply running for fewer epochs will probably be sufficient for your purposes.
Add dropout layers. Put simply, this will "turn off" random weights in the network, which prevents the network from over-relying on a small subset of nodes. This is also a common technique to prevent overfitting.
A fuller explanation along with other suggestions can be found here. Hope this was helpful!

Is there any method/function within Keras to recover the weights of a model during different training epochs?

I have a model that I want to train for 10 epochs for a certain hyperparameters setting. After training, I will use the history.history object and find the epoch where the validation loss was at a minimum. Once I have this best scoring epoch, I would like to retrieve this model and use it to predict the test data. Now, imagine that my best scoring epoch was not the last one. Is there any option within this Keras history object (such as history.model) to retrieve past values of weights? I imagine that, if there is not, I would have to create a dictionary and temporarily store each model per epoch until finishing training and finding the best one. But, when using model.fit, there is no option to store each model per epoch right. How would you do this?

Keras offers the option of evaluate your model on validation data after each epoch
After divide your data into trainning, test and validation data you can train you model like this:
model=modelmlp(np.shape(x_trai)[0],hidden,4)
model.compile(loss='categorical_crossentropy', optimizer = 'adam', metrics='accuracy'])
hist=model.fit(x_train,y_train,epochs=epochs,batch_size=batch,
validation_data(x_valid,y_valid),verbose=verbose[1],
callbacks=[ModelCheckpoint(filepath='bestweigths.hdf5',
monitor='val_loss',verbose=verbose[2],save_best_only=True,mode='min')])
model=load_model('bestweigths.hdf5')
This code will train your model and, after each epoch, your model will be evaluated at the validation data. Every time the result on the validation data is improved the model will be saved on a file
After the trainning process end, you just need to load the model from the file

you can use the callback class of keras for this matter.
You can save the model weights based on the metric you need. Let's say for example you need to save the model with minimum loss. You'll have to define a modelcheckpoint.
first before training the model define the checkpoint in the below given format
callbacks = [ModelCheckpoint(filepath='resNet_centering2.h5', monitor='val_loss', mode='min', save_best_only=True)]
now since you have defined callback, you'll have to use use this callbacks in the model.fit call
history = model.fit(
x=X_train,
y=Y_train,
callbacks=callbacks,
batch_size=4,
epochs=100,
verbose=1,
validation_data=(X_test, Y_test))
this will save the best weights of your model at defined filepath and you can fetch those weights using the below given call.
model=load_model('bestweigths.hdf5')
I hope it solves your problem.

How to save a TensorFlow model after a certain amount of epochs?

I have a model that train images, I want to know how to save the model after a certain amount of epochs so I have multiple reference points rather that having just one saved model at the end. Also how do I specify the folder or directory on which I would like to save the model?
Here's an example, where would I add the new code to save after a number of epochs? (Also side question, would the model save command at the end work? I haven't started training and I don't want to get to the end to find the model is not saving)
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
# Adam optimizer
# loss function will be categorical cross entropy
# evaluation metric will be accuracy
step_size_train=train_generator.n//train_generator.batch_size
model.fit_generator(generator=train_generator,
steps_per_epoch=step_size_train,
epochs=15)
model.save('C:\Users\Omar\Desktop\trainedmodel.h5')

You can use the keras model checkpoint callback. Here is the code:
checkpoint = keras.callbacks.ModelCheckpoint('model{epoch:08d}.h5', period=5)
Add this to the fit generator using the following command:
model.fit_generator(generator=train_generator,
steps_per_epoch=step_size_train,
epochs=15,
callbacks=[checkpoint])

Keras load model after saving makes random predictions in a new python session

I am using tensorflow version '2.0.0' and keras version '2.3.0' to develop the model. Here's how I saved the model:
seed = 1234
random.seed(seed)
np.random.seed(seed)
tf.compat.v1.random.set_random_seed(seed)
I then save the entire model as instructed here:
model.save('some_model_name.h5')
I am getting an accuracy of about 95% during training. When I load the model from a different python session, like:
# Recreate the exact same model
new_model = load_model('some_model_name.h5', custom_objects={'SeqSelfAttention': SeqSelfAttention})
score = new_model.evaluate([x_img_train, x_txt_train], y_train, verbose=2)
print("%s: %.2f%%" % (new_model.metrics_names[1], score[1]*100))
The accuracy now is about 4%. Please note that I have batch norm and dropout layers. How can I make the predictions of my model consistent across different sessions?

Firstly, I have downgraded the TensorFlow version to 1.13.1, owing to stability issues of 2.0.0.
Secondly, I had to ensure a few things before I could achieve some level of reproducibility:
Use Adagrad optimizer instead of Adam gave me performance comparable to the train session. When every time I loaded the session, it was giving me a high variance in the predictions (for Adam)
Loading architecture from json and loading model weights subsequently gave me different results as compared to saving and loading weights only. The former approach seemed to produce comparable performance (to training)
Using tf.session to train and saving it and reloading the tf.session in a new python session did the trick.
There is no variation in the results with or without dropouts or Batch norm.
Please note that following these steps gave me some level of consistency although it's not 100% reproducible. If you're facing a similar issue, perhaps these insights could help.

After loading the model in a new kernel instance, make sure to config losses and metrics again with .compile() in the same way you did before saving.
For example:
old_model = tf.keras.Sequential([ ... ])
old_model.compile(loss = 'mean_squared_error', optimizer = 'sgd', metrics = ['accuracy'])
old_model.fit(train_ds, validation_data=valid_ds, epochs=3)
old_model.evaluate(test_ds)
old_model.save('some_model_name.h5')
Then in the new kernel:
from tensorflow.keras.models import load_model
new_model = load_model("some_model_name.h5")
new_model.compile(loss = 'mean_squared_error', optimizer = 'sgd', metrics = ['accuracy'])
new_model.evaluate(test_ds) # should be the same now

when to call compile while training a tensorflow (2.0) model in incremental fashion?

I am writing a neural network to train incrementally (not online). Here is a snippet of the code
output = create_model()
model = Model(inputs=values, outputs=output)
if start_epoch > 1:
weights_list = load_model_from_pickle()
model.set_weights(weights_list)
model.compile(loss='binary_crossentropy', optimizer='adam')
model.fit(data , label, epochs=1, verbose=1, batch_size=1024, shuffle=False)
In essence, I want to load previously trained weights and train for a few more epochs. I read some SO reply that calling compile changes the weights? Is there any other way to do it? Does it make sense to set weight after calling compile? Will the answer change if I run my model in multi gpu setting?

You need to compile the model ones and after training when you reload the model, you dont' require to compile it again. Read more here.
Compile function defines the optimizer, loss functions and metrics you want. It does not change any weights. For more detailed information, read here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.