keras model.save() isn't saving

keras model.save() isn't saving - python

I have a keras NN that I want to train and validate using two sets of data, and then test the ultimate performance of using a third set. In order to avoid having to rerun the training every time I restart my google colab runtime or want to change my test data, I want to save the final state of the model after training in one script and then load it again in another script.
I've looked everywhere and it seems that model.save("content/drive/My Drive/Directory/ModelName", save_format='tf') should do the trick, but even though it outputs INFO:tensorflow:Assets written to: content/drive/My Drive/Directory/ModelName/assets nothing appears in my Google Drive, so I assume it isn't actually saving.
Please can someone help me solve this issue?
Thanks in advance!

The standard way of saving and retrieving your model's state after Google Colab terminated your connection is to use a feature called ModelCheckpoint. This is a callback in Keras that would run after each epoch and it will save your model for instance any time there's an improvement. Here's is the steps needed to accomplish what you want:
Connect to Google Drive
Use this code in order to connect to Google Drive:
from google.colab import drive
drive.mount('/content/gdrive')
Give access to Google Colab
Then you'll presented with a link that you should go to and after authorizing Google Colab by copying the given code to the text box as shown below:
Define your ModelCheckpoint
This is how you could define your ModelCheckpoint's callback:
from keras.callbacks import *
filepath="/content/gdrive/My Drive/MyCNN/epochs:{epoch:03d}-val_acc:{val_acc:.3f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
Use it as a callback in while you're training the model
Then you need to tell your model that after each epoch run this functionality for me to save the model's state.
model.fit(X_train, y_train,
batch_size=64,
epochs=epochs,
verbose=1,
validation_data=(X_val, y_val),
callbacks=callbacks_list)
Load the model after Google Colab terminated
Finally after your session was terminated, you can load your previous model's state by simply running the following code. Don't forget to re-define your model first and only load weights at this stage.
model.load_weights('/content/gdrive/My Drive/MyCNN/epochs:047-val_acc:0.905.hdf5'
Hope that this answers your question.

Related

How to continue printing the Intermediate results in jupyter after the reconnection?

Today, I used jupyter to run a deep learning model remotely.
After the browser was disconnected for some time, I reconnected the running kernel, but jupyter did not continue to print the intermediate output results.
From the usage of GPU and the command line of jupyter, we can see that the kernel continues to run.
Is there any way I can continue to observe the intermediate output of the kernel?
the situation of the running kernel

The Google colab lifetime with the open browser is usually 12 hours.
The best way to save your changes is to use Checkpoint for your deep learning model to avoid losing the last trained model.
This is an example of how you can use checkpoint callback in your deep learning model while training, more examples and details can be found here.
# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
batch_size = 32
# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# Create a new model instance
model = create_model()
# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=50,
batch_size=batch_size,
callbacks=[cp_callback],
validation_data=(test_images, test_labels),
verbose=0)

Apply any callback in a custom training loop in TF2.0

I am writing a custom CycleGan training loop following TF's documentation. I'd like to use several existing callbacks, and prefer not re-writing their logic.
I've found this SO question, whose answers unfortunately focused on EarlyStopping, rather than on the broad range of callbacks.
I further found this reddit post, which suggested to call them manually. Several callbacks however work with an internal model object (they call self.model, which is the model they are applied on).
How can I simply use the existing callbacks in a custom training loop? I appreciate any code outlines or further recommendations!

If you want to use the existing callbacks you have to create the callbacks you want and pass them to callbacks argument while using model.fit().
For example, i want to use the ModelCheckpoint and EarlyStopping
checkpoint = keras.callbacks.ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
es = keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=1)
Create a callbacks_list
callbacks_list = [checkpoint, es]
pass the callbacks_list to callbacks argument while training the model.
model.fit(x_train,y_train,validation_data=(x_test, y_test),epochs=10,callbacks=callbacks_list)
Please refer to this gist for complete code example. Thank You.

Restoring correct version of tensorflow

A few weeks ago, I was working on a project and I installed an older version of tensorflow to try to fix a problem I was having. It didn't work as I had hoped and I pip install the newest version of tensorflow but now I'm regularly getting error messages related to tensorflow being out of date. They don't stop program execution but they are there. As far as I know, I have the most recent version installed but I think I must be missing something. This is an example of one of the errors I'm getting: WARNING: tensorflow: Can save best model only with val_loss available, skipping. This is happening when I try to save a keras model using ModelCheckpoint. I get a different message when I use model_save(). It seems the issues arise whenever I try to save any model in any way. If anyone has any advice, I would love it.
I'm using Python on Google Colab. Please let me know if you need more info from me.
Edit: Adding code for ModelCheckpoint:
save=ModelCheckpoint("/content/drive/My Drive/Colab Notebooks/cavity data/Frequency Model.h5", save_best_only=True, verbose=1)
it was then called in model.fit() like this:
model.fit(X_train, Y_train, epochs=500, callbacks=[save, stop], verbose=1)

The default monitor for ModelCheckpoint is the validation loss or "val_loss".
As the warning suggests, the key "val_loss" is missing because you didn't use validation data in model.fit().
Either specify the validation split or validation data in model.fit() or just use training loss or accuracy as a monitor for ModelCheckpoint as in my example below.
monitor = "accuracy" # or "loss"
save = ModelCheckpoint("/content/drive/My Drive/Colab Notebooks/cavity data/Frequency Model.h5", monitor=monitor, save_best_only=True, verbose=1)
model.fit(X_train, Y_train, epochs=500, callbacks=[save, stop], verbose=1)

Python crashes when saving Keras model

I am building a model in Keras that contains roughly 4.2M parameters. When I try to save the model using ModelCheckpoint or using model.save('best_model.hdf5'), Python crashes.
The model runs without any issues when I comment out the code, to save the model, so there isn't any other issue that could potentially be causing python to crash.
My reasoning here is that a large number of parameters is causing python to crash.
I have looked but haven't been able to find any solution.
Are there any alternatives available to save my model and reuse it in Keras? Or is there a way to fix this issue?
checkpoint = ModelCheckpoint(filepath, monitor='val_mean_squared_error', verbose=1, save_best_only=True, mode='max')
model.save(filepath)
Python doesn't shout out any error. This is all that pops up -
PythonErrorPopup

How can I get a Keras models' history after loading it from a file in Python?

I have saved a keras model as a h5py file and now want to load it from disk.
When training the model I use:
from keras.models import Sequential
model = Sequential()
H = model.fit(....)
When the model is trained, I want to load it from disk with
model = load_model()
How can I get H from the model variable? It unfortunately does not have a history parameter that I can just call. Is it because the save_model function doesn't save history?

Unfortunately it seems that Keras hasn't implemented the possibility of loading the history directly from a loaded model. Instead you have to set it up in advance. This is how I solved it using CSVLogger (it's actually very convenient storing the entire training history in a separate file. This way you can always come back later and plot whatever history you want instead of being dependent on a variable you can easily lose stored in the RAM):
First we have to set up the logger before initiating the training.
from keras.callbacks import CSVLogger
csv_logger = CSVLogger('training.log', separator=',', append=False)
model.fit(X_train, Y_train, callbacks=[csv_logger])
The entire log history will now be stored in the file 'training.log' (the same information you would get, by in your case, calling H.history). When the training is finished, the next step would simply be to load the data stored in this file. You can do that with pandas read_csv:
import pandas as pd
log_data = pd.read_csv('training.log', sep=',', engine='python')
From here on you can treat the data stored in log_data just as you would by loading it from K.history.
More information in Keras callbacks docs.

Using pickle to save the history object threw a whole host of errors. As it turns out you can instead use pickle on H.history instead of H to save your history file!
Kind of annoying having to have a model and a history file saved, but whatever

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

keras model.save() isn't saving - python

Related

How to continue printing the Intermediate results in jupyter after the reconnection?

Apply any callback in a custom training loop in TF2.0

Restoring correct version of tensorflow

Python crashes when saving Keras model

How can I get a Keras models' history after loading it from a file in Python?

Categories

Resources