Google Colab not saving model or model checkpoints?

Google Colab not saving model or model checkpoints? - python

I've been using Colab to train my models, but it's quite infuriating that so far I have only been able to save the weights to my Google Drive, not the whole model, or even model checkpoints.
I mounted Google Drive with:
from google.colab import drive
drive.mount('/content/gdrive')
And I know that I can read files from the Drive as this code works:
import numpy as np
with np.load("/content/gdrive/MyDrive/trainingData.npz") as f:
dataX = f["dataX"]
dataY = f["dataY"]
And I set up the TPU using the following:
%tensorflow_version 2.x
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
But when I run the following code, no model checkpoints get saved:
with tpu_strategy.scope():
model = Sequential()
model.add(LSTM(256, input_shape=(dataX.shape[1], dataX.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(dataY.shape[1], activation="softmax"))
model.compile(loss='categorical_crossentropy', optimizer='adam')
filepath="/content/gdrive/MyDrive/weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
model.fit(dataX, dataY, epochs=50, batch_size=128)
I can't even just save the model normally: model.save("/content/gdrive/MyDrive/model") gives:
UnimplementedError: File system scheme '[local]' not implemented (file: 'model/variables/variables_temp/part-00000-of-00001')
Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
The interesting thing is that I can still save model weights, via model.save_weights("/content/gdrive/MyDrive/model.h5")
However, as I want to be able to save the whole model for future training, just saving the weights is not satisfactory.
What errors have I made and how can I save my model?

Related

Tensorflow Callback, Multiple Issues on Saving and Loading Weights

I'm training a model and using the tensorflow callbacks function to save my training logs and I have a model checkpoint to save my model's weights.
During training, every epoch I ran it says "WARNING:tensorflow: Can save best model only with val_acc available, skipping". This is issue 1.
Here are the code I used to be include in callbacks[] during model.fit.
def create_tensorboard_callback(dir_name, experiment_name):
"""
Creates a TensorBoard callback instand to store log files.
Stores log files with the filepath:
"dir_name/experiment_name/current_datetime/"
Args:
dir_name: target directory to store TensorBoard log files
experiment_name: name of experiment directory (e.g. efficientnet_model_1)
"""
log_dir = dir_name + "/" + experiment_name + "/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=log_dir
)
print(f"Saving TensorBoard log files to: {log_dir}")
return tensorboard_callback
# Create ModelCheckpoint callback to save model's progress
checkpoint_path = "model_checkpoints/cp.ckpt"
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
monitor="val_acc",
save_best_only=True, #SAVING BEST ONLY
save_weights_only=True,
verbose=0)
Code for fitting the model with callbacks:
history_101_food_classes_feature_extract = model.fit(train_data,
epochs=3,
steps_per_epoch=len(train_data),
validation_data=test_data,
validation_steps=int(0.15 * len(test_data)),
callbacks=[create_tensorboard_callback("training_logs",
"efficientnetb0_101_classes_all_data_feature_extract"),
model_checkpoint])
Also, I cloned my model and used cloned_mode.load_weights(checkpoint_path) to evaluate both orignal and cloned model results using model.evaluate(test_data) Original model scores 70+% accuracy, while cloned_model always returns this exact accuracy. This is the issue 2.
My guess was that I have some previously trained and saved a very high accuracy model, hence issue 1 where it refuses to save at every epoch. But my model_checkpoint path looks clean to me.
And, if I did previously saved a high accuracy to my checkpoint_path, when I cloned a new model using weights load from that path, why would it give 0.54 accuracy everytime and not something higher? (Issue 2)
I need help. Let me know if you need more info from my side to solve this issue, happy to answer. Thanks. If you want to see the full code, here's the link to it.
https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/07_food_vision_milestone_project_1.ipynb

Error when converting a tf model to TFlite model

I am currently building a model to use it onto my nano 33 BLE sense board to predict weather by mesuring Humidity, Pressure, Temperature, I have 5 classes.
I have used a kaggle dataset to train on it.
df_labels = to_categorical(df.pop('Summary'))
df_features = np.array(df)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_features, df_labels, test_size=0.15)
normalize = preprocessing.Normalization()
normalize.adapt(X_train)
activ_func = 'gelu'
model = tf.keras.Sequential([
normalize,
tf.keras.layers.Dense(units=6, input_shape=(3,)),
tf.keras.layers.Dense(units=100,activation=activ_func),
tf.keras.layers.Dense(units=100,activation=activ_func),
tf.keras.layers.Dense(units=100,activation=activ_func),
tf.keras.layers.Dense(units=100,activation=activ_func),
tf.keras.layers.Dense(units=5, activation='softmax')
])
model.compile(optimizer='adam',#tf.keras.optimizers.Adagrad(lr=0.001),
loss='categorical_crossentropy',metrics=['acc'])
model.summary()
model.fit(x=X_train,y=y_train,verbose=1,epochs=15,batch_size=32, use_multiprocessing=True)
Then the model is trained, I want to convert it into a tflite model when I run the command convert I get the following message :
# Convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model to disk
open("gesture_model.tflite", "wb").write(tflite_model)
import os
basic_model_size = os.path.getsize("gesture_model.tflite")
print("Model is %d bytes" % basic_model_size)
<unknown>:0: error: failed while converting: 'main': Ops that can be supported by the flex runtime (enabled via setting the -emit-select-tf-ops flag):
tf.Erf {device = ""}
For your information I use google colab to design the model.
If anyone has any idea or solution to this issue, I would be glad to hear it !

This often happens when you have not set the converter's supported Operations.
Here is an example:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
This list of supported operations are constantly changing so in case the error still appears you can also try to set the experimental converter features as follow:
converter.experimental_new_converter = True

I solved the problem ! It was the activation function 'gelu' not yet supported by TFlite. I changed it to 'relu' and no more problem.

How to access log file with tensorboard on anaconda

I've made a NN model with Keras in a Ananconda enviroment (i'm using Jupiter).
I would want to access the log file that I'm writing with tensorboard, and I would like to see the accuracy and the loss function graphs.
However, when I try to access to the log file from the terminal this error occurs: AttributeError: module 'tensorboard.util' has no attribute 'PersistentOpEvaluator'
Anyone can help me to write these graphs and to see them opening tensorboard?
This is my code:
hidden_size = 256
sl_model = keras.models.Sequential()
[...]
sl_model.add(keras.layers.Dense(max_length, activation='softmax'))
optimizer = keras.optimizers.Adam()
sl_model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['acc'])
batch_size = 128
epochs = 3
# Let's print a summary of the model
sl_model.summary()
#I'd like to access to this file
cbk = keras.callbacks.TensorBoard("logging/keras_model")
print("\nStarting training...")
sl_model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size,
shuffle=True, validation_data=(x_dev, y_dev), callbacks=[cbk])
How can I fix this? thank you!

You must delete tensorboard directory in site-packages then pip install tensorboard --upgrade supposing your tensorflow version is up-to-date.

Exporting/Importing Keras Model to Tensorflow fails when using multi_gpu_model

I'm currently struggeling with importing my exported Keras model into Tensorflow. The code worked fine with a sequential model. I was able to train the model in python and then import it into my c++ application. Since I needed more ressources I decided to distribute the model onto several GPUs. Afterwards I was not able to import the model.
This is how I created my model before:
input_img = Input(shape=(imgDim, imgDim, 1))
# add several layers to net
model = Model(input_img, net)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=100,
batch_size=100,
shuffle=True,
validation_data=(x_test, y_test))
saveKerasModelAsProtobuf(model, outpath)
This is how I export my model:
def saveKerasModelAsProtobuf(model, outputPath):
signature = tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'image': model.input}, outputs={'scores': model.output})
builder = tf.saved_model.builder.SavedModelBuilder(outputPath)
builder.add_meta_graph_and_variables(
sess=keras.backend.get_session(),
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
signature
}
)
builder.save()
return
This is how I changed the code to run on multiple GPUs:
input_img = Input(shape=(imgDim, imgDim, 1))
# add several layers to net
model = Model(input_img, net)
parallel_model = multi_gpu_model(model, gpus=4)
parallel_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
parallel_model.fit(x_train, y_train,
epochs=100,
batch_size=100,
shuffle=True,
validation_data=(x_test, y_test))
# export model rather than parallel_model:
saveKerasModelAsProtobuf(model, outpath)
When I try to import the model in C++ on a single GPU machine I get the following error, indicating that it's not actually the sequential model (as I would expect) but the parallel_model:
Cannot assign a device for operation 'replica_3/lambda_4/Shape': Operation was explicitly assigned to /device:GPU:3 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
[[Node: replica_3/lambda_4/Shape = Shape[T=DT_FLOAT, _output_shapes=[[4]], out_type=DT_INT32, _device="/device:GPU:3"](input_1)]]
From what I read, they should share the same weights, but not the internal structure. What am I doing wrong? Is there a better/more generic way to export the model?
Thanks!

How to save final model using keras?

I use KerasClassifier to train the classifier.
The code is below:
import numpy
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
#print("encoded_Y")
#print(encoded_Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)
#print("dummy_y")
#print(dummy_y)
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
#model.add(Dense(4, init='normal', activation='relu'))
model.add(Dense(3, init='normal', activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
#global_model = baseline_model()
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
But How to save the final model for future prediction?
I usually use below code to save model:
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
But I don't know how to insert the saving model's code into KerasClassifier's code.
Thank you.

The model has a save method, which saves all the details necessary to reconstitute the model. An example from the keras documentation:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')

you can save the model in json and weights in a hdf5 file format.
# keras library import for Saving and loading model and weights
from keras.models import model_from_json
from keras.models import load_model
# serialize model to JSON
# the keras model which is trained is defined as 'model' in this example
model_json = model.to_json()
with open("model_num.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model_num.h5")
files "model_num.h5" and "model_num.json" are created which contain our model and weights
To use the same trained model for further testing you can simply load the hdf5 file and use it for the prediction of different data.
here's how to load the model from saved files.
# load json and create model
json_file = open('model_num.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model_num.h5")
print("Loaded model from disk")
loaded_model.save('model_num.hdf5')
loaded_model=load_model('model_num.hdf5')
To predict for different data you can use this
loaded_model.predict_classes("your_test_data here")

You can use model.save(filepath) to save a Keras model into a single HDF5 file which will contain:
the architecture of the model, allowing to re-create the model.
the weights of the model.
the training configuration (loss, optimizer)
the state of the optimizer, allowing to resume training exactly where you left off.
In your Python code probable the last line should be:
model.save("m.hdf5")
This allows you to save the entirety of the state of a model in a single file.
Saved models can be reinstantiated via keras.models.load_model().
The model returned by load_model() is a compiled model ready to be used (unless the saved model was never compiled in the first place).
model.save() arguments:
filepath: String, path to the file to save the weights to.
overwrite: Whether to silently overwrite any existing file at the target location, or provide the user with a manual prompt.
include_optimizer: If True, save optimizer's state together.

you can save the model and load in this way.
from keras.models import Sequential, load_model
from keras_contrib.losses import import crf_loss
from keras_contrib.metrics import crf_viterbi_accuracy
# To save model
model.save('my_model_01.hdf5')
# To load the model
custom_objects={'CRF': CRF,'crf_loss':crf_loss,'crf_viterbi_accuracy':crf_viterbi_accuracy}
# To load a persisted model that uses the CRF layer
model1 = load_model("/home/abc/my_model_01.hdf5", custom_objects = custom_objects)

Generally, we save the model and weights in the same file by calling the save() function.
For saving,
model.compile(optimizer='adam',
loss = 'categorical_crossentropy',
metrics = ["accuracy"])
model.fit(X_train, Y_train,
batch_size = 32,
epochs= 10,
verbose = 2,
validation_data=(X_test, Y_test))
#here I have use filename as "my_model", you can choose whatever you want to.
model.save("my_model.h5") #using h5 extension
print("model saved!!!")
For Loading the model,
from keras.models import load_model
model = load_model('my_model.h5')
model.summary()
In this case, we can simply save and load the model without re-compiling our model again.
Note - This is the preferred way for saving and loading your Keras model.

Saving a Keras model:
model = ... # Get model (Sequential, Functional Model, or Model subclass)
model.save('path/to/location')
Loading the model back:
from tensorflow import keras
model = keras.models.load_model('path/to/location')
For more information, read Documentation

You can save the best model using keras.callbacks.ModelCheckpoint()
Example:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_checkpoint_callback = keras.callbacks.ModelCheckpoint("best_Model.h5",save_best_only=True)
history = model.fit(x_train,y_train,
epochs=10,
validation_data=(x_valid,y_valid),
callbacks=[model_checkpoint_callback])
This will save the best model in your working directory.

Since the syntax of keras, how to save a model, changed over the years I will post a fresh answer. In principle the earliest answer of bogatron, posted Mar 13 '17 at 12:10 is still good, if you want to save your model including the weights into one file.
model.save("my_model.h5")
This will save the model in the older Keras H5 format.
However, there is a new format, the TensorFlow SavedModel format, which will be used if you do not specify the extension .h5, .hdf5 or .keras after the filename.
The syntax in this case is
model.save("path/to/folder")
If the given folder name does not yet exist, it will be created. Two files and two folders will be created within this folder:
keras_metadata.pb, saved_model.pb, assets, variables
So far you can still decide whether you want to store your model into one single file or into a folder containing files and folders. (See keras documentation at www.tensorflow.org.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Colab not saving model or model checkpoints? - python

Related

Tensorflow Callback, Multiple Issues on Saving and Loading Weights

Error when converting a tf model to TFlite model

How to access log file with tensorboard on anaconda

Exporting/Importing Keras Model to Tensorflow fails when using multi_gpu_model

How to save final model using keras?

Categories

Resources