Resume Training tf.keras Tensorboard - python

I encountered some problems when I continued training my model and visualized the progress on tensorboard.
My question is how do I resume training from the same step without specifying any epoch manually? If possible, simply by loading the saved model, it somehow could read the global_step from the optimizer saved and continue training from there.
I have provided some codes below to reproduce similar errors.
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.models import load_model
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
model.save('./final_model.h5', include_optimizer=True)
del model
model = load_model('./final_model.h5')
model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
You can run the tensorboard by using the command:
tensorboard --logdir ./logs

You can set the parameter initial_epoch in the function model.fit() to the number of the epoch you want your training to start from. Take into account that the model trains until the epoch of index epochs is reached (and not a number of iterations given by epochs).
In your example, if you want to train for 10 epochs more, it should be:
model.fit(x_train, y_train, initial_epoch=9, epochs=19, callbacks=[Tensorboard()])
It will allow you to visualise your plots on Tensorboard in a correct manner.
More extensive information about these parameters can be found in the docs.

Here is sample code in case someone needs it. It implements the idea proposed by Abhinav Anand:
mca = ModelCheckpoint(join(dir, 'model_{epoch:03d}.h5'),
monitor = 'loss',
save_best_only = False)
tb = TensorBoard(log_dir = join(dir, 'logs'),
write_graph = True,
write_images = True)
files = sorted(glob(join(fold_dir, 'model_???.h5')))
if files:
model_file = files[-1]
initial_epoch = int(model_file[-6:-3])
print('Resuming using saved model %s.' % model_file)
model = load_model(model_file)
else:
model = nn.model()
initial_epoch = 0
model.fit(x_train,
y_train,
epochs = 100,
initial_epoch = initial_epoch,
callbacks = [mca, tb])
Replace nn.model() with your own function for defining the model.

It's very simple. Create checkpoints while training the model and then use those checkpoints to resume training from where you left of.
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import load_model
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, callbacks=[Tensorboard()])
model.save('./final_model.h5', include_optimizer=True)
model = load_model('./final_model.h5')
callbacks = list()
tensorboard = Tensorboard()
callbacks.append(tensorboard)
file_path = "model-{epoch:02d}-{loss:.4f}.hdf5"
# now here you can create checkpoints and save according to your need
# here period is the no of epochs after which to save the model every time during training
# another option is save_weights_only, for your case it should be false
checkpoints = ModelCheckpoint(file_path, monitor='loss', verbose=1, period=1, save_weights_only=False)
callbacks.append(checkpoints)
model.fit(x_train, y_train, epochs=10, callbacks=callbacks)
After this just load the checkpoint from where you want to resume training again
model = load_model(checkpoint_of_choice)
model.fit(x_train, y_train, epochs=10, callbacks=callbacks)
And you are done.
Let me know if you have more questions about this.

Related

Why can't I save my model with tensorflow lite?

Once the training is finished, what I need is to save and convert the model to later export it, but I get the following error:
converter = tf.lite.TFLiteConverter.from_keras_model_file('models/modelo.h5')
AttributeError: type object 'TFLiteConverterV2' has no attribute 'from_keras_model_file'
to be honest I found a problem similar to this on the web but it doesn't suit my problem. Also who gives the answer is not very explicit.
here my code:
import tensorflow as tf
from tensorflow import keras
#dataset
mnist = keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
#class myCallback(tf.keras.callbacks.Callback):
# def on_epoch_end(self, epoch, logs={}):
# If you are using Tensorflow 1.x, replace 'accuracy' for 'acc' in the next line
# if(logs.get('accuracy')>0.99):
# print("\nReached 99.0% accuracy so cancelling training!")
# self.model.stop_training = True
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train,
y_train,
epochs=25,)
# callbacks=[myCallback()])
# Evaluate the model
model.evaluate(x_test, y_test)
# Save the model
model.save('models/modelo.h5')
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model_file('models/modelo.h5')
tflite_model = converter.convert()
open("models/converted_mnist_model.tflite", "wb").write(tflite_model)

Any difference?

I have 2 pieces of code written using tensorflow.
One is this:
import tensorflow as tf
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>0.99):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
callbacks = myCallback()
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, callbacks=[callbacks])
The other one is this:
import tensorflow as tf
def train_mnist():
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>99):
print("\n Se incheie antrenamentul")
self.model.stop_training = True
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
callbacks = myCallback()
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# model fitting
history = model.fit(x_train, y_train, epochs = 10, callbacks=[callbacks])
# model fitting
return history.epoch, history.history['acc'][-1]
train_mnist()
The first one gives an accuracy of 0.99 after 3 or 4 epochs. The second one gives an accuracy of 0.91 after 10 epochs. Why? They both look the same to me. Any ideas?
They both are almost identical. I have just checked both accuracy. The only reason why your accuracy is showing different is because in 2nd code you have returned
history.history['acc'][-1]
instead of
history.history['accuracy'][-1]
Also you need to save the history of 1st code for comparision like this:
history = model.fit(x_train, y_train, epochs = 10, callbacks=[callbacks])
Also figured out that you have stopped model training outside of if condition in 1st code.
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>0.99):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
It should be like this:
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>0.99):
print("\nReached 99% accuracy so cancelling training!")
self.model.stop_training = True
Both code must be giving around 0.99% accuracy.
Since your 2nd code is not showing the exact accuracy. I am posting the whole modified code for your 2nd code
import tensorflow as tf
from os import path, getcwd, chdir
path = f"{getcwd()}/../tmp2/mnist.npz"
def train_mnist():
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy')>99):
print("\n Se incheie antrenamentul")
self.model.stop_training = True
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
callbacks = myCallback()
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# model fitting
history = model.fit(x_train, y_train, epochs = 10, callbacks=[callbacks])
# model fitting
return history.epoch, history.history['accuracy'][-1]
train_mnist()

How to get other metrics in Tensorflow 2.0 (not only accuracy)?

I'm new in the world of Tensorflow and I'm working on the simple example of mnist dataset classification. I would like to know how can I obtain other metrics (e.g precision, recall etc) in addition to accuracy and loss (and possibly to show them). Here's my code:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import TensorBoard
import os
#load mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#create and compile the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.summary()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
#model checkpoint (only if there is an improvement)
checkpoint_path = "logs/weights-improvement-{epoch:02d}-{accuracy:.2f}.hdf5"
cp_callback = ModelCheckpoint(checkpoint_path, monitor='accuracy',save_best_only=True,verbose=1, mode='max')
#Tensorboard
NAME = "tensorboard_{}".format(int(time.time())) #name of the model with timestamp
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME))
#train the model
model.fit(x_train, y_train, callbacks = [cp_callback, tensorboard], epochs=5)
#evaluate the model
model.evaluate(x_test, y_test, verbose=2)
Since I get only accuracy and loss, how can i get other metrics?
Thank you in advance, I'm sorry if it is a simple question or If was already answered somewhere.
I am adding another answer because this is the cleanest way in order to compute these metrics correctly on your test set (as of 22nd of March 2020).
The first thing you need to do is to create a custom callback, in which you send your test data:
import tensorflow as tf
from tensorflow.keras.callbacks import Callback
from sklearn.metrics import classification_report
class MetricsCallback(Callback):
def __init__(self, test_data, y_true):
# Should be the label encoding of your classes
self.y_true = y_true
self.test_data = test_data
def on_epoch_end(self, epoch, logs=None):
# Here we get the probabilities
y_pred = self.model.predict(self.test_data))
# Here we get the actual classes
y_pred = tf.argmax(y_pred,axis=1)
# Actual dictionary
report_dictionary = classification_report(self.y_true, y_pred, output_dict = True)
# Only printing the report
print(classification_report(self.y_true,y_pred,output_dict=False)
In your main, where you load your dataset and add the callbacks:
metrics_callback = MetricsCallback(test_data = my_test_data, y_true = my_y_true)
...
...
#train the model
model.fit(x_train, y_train, callbacks = [cp_callback, metrics_callback,tensorboard], epochs=5)
Starting from TensorFlow 2.X, precision and recall are both available as built-in metrics.
Therefore, you do not need to implement them by hand. In addition to this, they were removed before in Keras 2.X versions because they were misleading --- as they were being computed in a batch-wise manner, the global(true) values of precision and recall would be actually different.
You can have a look here : https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Recall
Now they have a built-in accumulator, which ensures the correct calculation of those metrics.
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy',tf.keras.metrics.Precision(),tf.keras.metrics.Recall()])
There is a list of available metrics in the Keras documentation. It includes recall, precision, etc.
For instance, recall:
model.compile('adam', loss='binary_crossentropy',
metrics=[tf.keras.metrics.Recall()])
I could not get Timbus' answer to work and I found a very interesting explanation here.
It says:
The meaning of 'accuracy' depends on the loss function. The one that corresponds to sparse_categorical_crossentropy is tf.keras.metrics.SparseCategoricalAccuracy(), not tf.metrics.Accuracy().
Which makes a lot of sense.
So what metrics you can use depend on the loss you chose. E.g. using the metric 'TruePositives' won't work in the case of SparseCategoricalAccuracy, because that loss means you're working with more than 1 class, which in turn means True Positives cannot be defined because it is only used in binary classification problems.
A loss like tf.keras.metrics.CategoricalCrossentropy() will work because it is designed with multiple classes in mind! Example:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import TensorBoard
import time
import os
#load mnist dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#create and compile the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.summary()
# This will work because it makes sense
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=[tf.keras.metrics.SparseCategoricalAccuracy(),
tf.keras.metrics.CategoricalCrossentropy()])
# This will not work because it isn't designed for the multiclass classification problem
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=[tf.keras.metrics.SparseCategoricalAccuracy(),
tf.keras.metrics.TruePositives()])
#model checkpoint (only if there is an improvement)
checkpoint_path = "logs/weights-improvement-{epoch:02d}-{accuracy:.2f}.hdf5"
cp_callback = ModelCheckpoint(checkpoint_path,
monitor='accuracy',
save_best_only=True,
verbose=1,
mode='max')
#Tensorboard
NAME = "tensorboard_{}".format(int(time.time())) # name of the model with timestamp
tensorboard = TensorBoard(log_dir="logs/{}".format(NAME))
#train the model
model.fit(x_train, y_train, epochs=5)
#evaluate the model
model.evaluate(x_test, y_test, verbose=2)
In my case the other 2 answers gave me shape mismatches.
For a list of supported metrics, see:
tf.keras Metrics
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy',tf.keras.metrics.Precision(),tf.keras.metrics.Recall()])

Load epochs logs Keras model

I load the Keras model I have been training with 150 epochs
tbCallBack = tensorflow.keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
my_model.fit(X_train, X_train,
epochs=200,
batch_size=100,
shuffle=True,
validation_data = (X_test, X_test),
callbacks=[tbCallBack]
)
# Save the model
my_model.save('my_model.hdf5')
Then, I will load the Keras model
my_model = load_model("my_model.hdf5")
Is there a way to load all the epochs logs (loss, accuracy.. ) ?
You can use the keras callback called CSVLogger.
According to the documentation, it streams the results from each epoch into a csv file.
This is the code from the documentation of it.
from keras.callbacks import CSVLogger
csv_logger = CSVLogger('training.log')
model.fit(X_train, Y_train, callbacks=[csv_logger])
You can then manipulate it as a normal CSV file, for your needs.

How can I restart model.fit after x epochs if loss remains high?

I am having trouble sometimes getting a fit on my data, and when I restart fit (with shuffle=true) then I sometimes get a good fit.
See my previous question:
https://datascience.stackexchange.com/questions/62516/why-does-my-model-sometimes-not-learn-well-from-same-data
As a work around, I want to automatically restart the fitting process, if loss is high after x epochs. How can I achieve this?
I assume I would need to use a custom version of EarlyStopping callback? How could I differentiate between ES because of finding a low loss ( < 0.5) so training is finished, or because loss > 0.5 after x epochs so need to restart training?
Here is a simplified structure:
def train_till_good():
while not_finished:
train()
def train():
load_data()
model = VerySimpleNet2();
checkpoint = keras.callbacks.ModelCheckpoint(filepath=images_root + dataset_name + '\\CheckPoint.hdf5')
myOpt = keras.optimizers.Adam(lr=0.001,decay=0.01)
model.compile(optimizer=myOpt, loss='categorical_crossentropy', metrics=['accuracy'])
LRS = CyclicLR(base_lr=0.000005, max_lr=0.0003, step_size=200.)
tensorboard = keras.callbacks.TensorBoard(log_dir='C:\\Tensorflow', histogram_freq=0,write_graph=True, write_images=False)
ES = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)
model.fit(train_images, train_labels, shuffle=True, epochs=num_epochs,
callbacks=[checkpoint,
tensorboard,
ES,
LRS],
validation_data = (test_images, test_labels)
)
def VerySimpleNet2():
model = keras.Sequential([
keras.layers.Dense(112, activation=tf.nn.relu, input_shape=(224, 224, 3)),
keras.layers.Dropout(0.4),
keras.layers.Flatten(),
keras.layers.Dense(3, activation=tf.nn.softmax)
])
return model

Categories

Resources