Unable to load and use multiple keras models

Unable to load and use multiple keras models - python

I'm trying to load three different models in the same process. Only the first one works as expected, the rest of them return like random results.
Basically the order is as follows:
define and compile first model
load trained weights before
rename layers
the same process for the second model
the same process for the third model
So, something like:
model1 = Model(inputs=Input(shape=input_size_im) , outputs=layers_firstmodel)
model1.compile(optimizer='sgd', loss='mse')
model1.load_weights(weights_first, by_name=True)
# rename layers but didn't work
model2 = Model(inputs=Input(shape=input_size_im) , outputs=layers_secondmodel)
model2.compile(optimizer='sgd', loss='mse')
model2.load_weights(weights_second, by_name=True)
# rename layers but didn't work
model3 = Model(inputs=Input(shape=input_size_im) , outputs=layers_thirdmodel)
model3.compile(optimizer='sgd', loss='mse')
model3.load_weights(weights_third, by_name=True)
# rename layers but didn't work
for im in list_images:
results_firstmodel = model1.predict(im)
results_secondmodel = model2.predict(im)
results_thirdmodel = model2.predict(im)
I'd like to perform some inference over a bunch of images. To do that the idea consists in looping over the images and perform inference with these three algorithms, and return the results.
I have tried to rename all layers to make them unique with no success. Also I created a different graph for each network, and with a different session do the inference. This works but it's very inefficient (in addition I have to set their weights every time because of sess.run(tf.global_variables_initializer()) removes them). Each time it's created a session tensorflow prints "creating tensorflow device (/device:GPU:0)".
I am running Tensorflow 1.4.0-rc0, Keras 2.1.1 and Ubuntu 16.04 kernel 4.14.

The OP is correct here. There is a serious bug when you try to load multiple weight files in the same script. The above answer doesn't solve this. If you actually interrogate the weights when loading weights for multiple models in the same script you will notice that the weights are different than when you just load weights for one model on its own. This is where the randomness is the OP observes coming from.
EDIT: To solve this problem you have to encapsulate the model.load_weight command within a function and the randomness that you are experiencing should go away. The problem is that something weird screws up when you have multiple load_weight commands in the same script like you have above. If you load those model weights with a function you issues should go away.

From the Keras docs we have this explanation for the user of load_weights:
loads the weights of the model from a HDF5 file (created by save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.
Therefore, if your architecture is unchanged you should drop the by_name=True or make it False (its default value). This could be causing the inconsistencies that you are facing, as your weights are not being loaded probably due to having different names on your layers.
Another important thing to consider is the nature of your HDF5 file, and the way you created it. If it indeed contains only the weights (created with save_weights as the docs point out) then there should be no problem in proceeding as explained before.
Now, if that HDF5 contains weights and architecture in the same file, then you should be loading it with keras.models.load_model instead (further reading if you like here). If this is the case then this would also explain those inconsistencies.
As a side suggestion, I prefer to save my models using Callbacks, like the ModelCheckpoint or the EarlyStopping if you want to automatically determine when to stop training. This not only gives you greater flexibility when training and saving your models (as you can stop them on the optimal training epoch or when you desire), but also makes loading those models easily, as you can simply use the load_model method to load both architecture and weights to your desired variable.
Finally, here is one useful SO post where saving (and loading) Keras models is explained.

Related

Increasing number of classes on TensorFlow checkpoint

For a project I'm working on I have to compare the performance of a small, simulated dataset to the performance of well-known datasets (e.g. ImageNet, CIFAR-10) using two neural networks, ResNet v2 at different depths and Inception v3.
To account for the imbalance between my set and the big popular ones, one of the methods I want to employ is augmenting the big sets using either a class from my simulated set or the same class but instead with real-life images. Then, using a pre-trained model, I want to train on both augmented sets and later compare performance.
The issue
The augmentation works fine: I copy over the .tfrecord files containing only images of the extra class to the original dataset and I modify the labels.txt file to contain the appropriate extra labels. However, the issue occurs when trying to train on this augmented dataset:
I0125 16:40:31.279634 140094914221888 coordinator.py:224] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [258] rhs shape= [257]
which is to be expected, as the graph cannot handle the extra class.
My attempted solution
To alleviate this issue I tried applying the same method as was posed in this answer, using the Python script below (modified for brevity):
checkpoint = {}
saver = tf.compat.v1.train.import_meta_graph(f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}.meta")
with tf.Session() as sess:
saver.restore(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
# get appropriate nodes where number of classes is one of its dimensions
if "inception_v3" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
elif "resnet_v2" in TRAIN_PATH:
target_nodes = [
'<nodes where one of its dimensions is equal to the number of classes>'
]
# select the nodes previously defined
nodes = [node for node in tf.global_variables() if node.name in target_nodes]
for node in nodes:
# load the values of those nodes from the checkpoint
checkpoint[node.name] = node.eval()
# extend dimensions for extra class
checkpoint.update({node.name: np.insert(checkpoint[node.name], checkpoint[node.name].shape[-1], 0.0, axis=-1)})
# assign new nodes to the checkpoint
sess.run(tf.assign(node, checkpoint[node.name], validate_shape=False))
saver.save(sess, f"{TRAIN_PATH}/model.ckpt-{CHECKPOINT_NUM}")
In short, this script loads the checkpoint and isolates all nodes where one of the dimensions is equal to the number of classes and increases it. I have verified these are always equal to the number of classes using the checkpoints from the different datasets the models have been trained on. In fact, it seems that those tensors are the only thing that's different between the checkpoints.
Although this method does execute correctly and allows me to run the training, it feels very boneheaded and I'm almost positive it is not at all what I'm supposed to do. However I'm curious as to what exactly is wrong with this (apart from initializing the new values as 0.0), as except from a wild fluctuation in loss at the start of the tranining it seems to work as I had hoped.
Other solutions
When looking further into this issue I also found other users' answers on similar questions suggesting that it is in fact impossible to modify a checkpoint in the way that I want to and suggested transfer learning or modifying the output layer. I am very new to neural network training and I don't know who to believe, as the linked answer seems to suggest that what I'm trying to do should work.
My question
So I ask: which approach is correct? Could my initial approach work, or should I try a different method to fix this issue? Starting from scratch would not be ideal, as the progress so far has taken over a month of continuous training.
I am using a modified version the TensorFlow Model Garden as my environment to train the networks, with the modificiations pertaining to creating custom datasets and using them using the supplied training scripts.

Is it possible to load only weights to TF-TRT model?

I have two models with the exact same architecture, but different weights as the same network is used for two different problems. We're using TF-TRT to optimize the model in order to use it on edge devices.
We'd like to be able to switch from one model to the other as fast as possible. As of now, we load the next model using tf.saved_model.load(), however, this reloads the entire model including the architecture. In order to speed up the process, we'd like to simply load the weights & switch them in the model architecture.
From what I've seen, it is possible in Keras by loading a .w1 file, but we don't have such file after converting to TF-TRT.
I've found out that TRT has a Refitter object but I don't think we can use it in this case.
I'd like to know if it is possible to switch weights of a TF-TRT model, perhaps there is something I'm missing out.
Thank you for your help.

Loading checkpoints while training a Faster-RCNN model on a custom dataset

I'm trying to load checkpoints and populate model weights using The Faster-RCNN architecture (Faster R-CNN ResNet50 V1 640x640 to be precise, from here. I'm trying to load the weights for this network similar to how it's done in the example notebook for RetinaNet, where they do the following:
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor
)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
I'm trying to get a similar checkpoint loading mechanism going for the Faster-RCNN network I want to use, but the properties like _base_tower_layers_for_heads, _box_prediction_head only exist for the architecture used in the example, and not for anything else.
I also couldn't find documentation on which parts of the model to populate using Checkpoint for my particular use case. Would greatly appreciate any help on how to approach this!

As you said the main problem that you have is that you don't have a layer tensor to the layers that you want to do transfer learning on it.
This is part of the original implementation of the Faster R-CNN ResNet50 V1 640x640 copy in the Zoo. They didn't name the layers, or maybe they did name it but they didn't published the names (or the code).
To solve this you need to find out which layers you want to keep and which you want to relearn. You can print out all the layers in a network using (ref):
[n.name for n in tf.get_default_graph().as_graph_def().node]
Names to layer can manually added but tf reserve default names for each node. This list can be long and exhausting however you need to find the tensor to start your transfer learning. Therefore you need to follow the list and try to understand from which layers you want to freeze and which you want to continue the learning process. Freezing a layer (ref):
if layer.name == 'layer_name':
layer.trainable = False

Save entire model but load weights only

I have defined a deep learning model my_unet() in tensorflow. During training I set save_weigths=False since I wanted to save the entire model (not only the wieghts bu the whole configuration). The generated file is path_to_model.hdf5.
However, when loading back the model I used the earlier version (I forgot to update it) in which I first called the model and then load the model using:
model = my_unet()
model.load_weights(path_to_model.hdf5)
Instead of simply using: model = tf.keras.models.load_model(path_to_model.hdf5) to load the entire model.
Both ways of loading the model and the weights provided the same predictions when run in some dummy data and there were no errors.
My question is: Why loading the entire model using model.load_weights() does not generate any problem? What is the structure of the hdf5 file and how theese two ways of loading exactly work? Where can I find this information?

You can please see the documentation here for any future reference: http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/03_DataModel.html

keras problems loading custom model from yolov2

I've searched around for a couple of answers regarding the load_model from keras but I still have a question.
I am following this model really closely (https://github.com/experiencor/keras-yolo2), and am training on a custom dataset.
I have done the training which gives me a yolov2.h5 file, basically model weights to fit into the keras model. But I am encountering some problems with the loading of the model.
After loading the model (in a separate.py file)
model = load_model('file_dir/yolov2.h5')
First I encounter the issue
NameError: name 'tf' is not defined
Which I then searched up to modify my code to add custom objects as such:
model = load_model('file_dir/yolov2.h5', custom_objects={'tf':tf})
This clears the first error but results in another
ValueError: Unknown loss function : custom_loss
I used the custom_loss function from the yolov2 (https://github.com/experiencor/keras-yolo2/blob/master/frontend.py), so i tried to solve it by
from frontend import YOLO
model = load_model('file_dir/yolov2.h5' custom_objects={'tf':tf, 'custom_loss':YOLO.custom_loss)
But ran into another error:
TypeError: custom_loss() missing 1 required positional argument
I got rather stuck here because I have no idea how to fit in the parameters for custom_loss. Seek some help regarding this (Don't particularly understand this part since I'm loading my model in a different python script separate.py). Thank you so much!
(Edit: This fix doesn't work for me either)
model = load_model('file_dir/yolov2.h5', compile = False)

To resolve this problem, as you already have the network at hand, only save trained weights (like what keras trainer does in callback).
For testing, make model, no need to compile, and then load trained weights using model.load_weights(path/to/saved/weights).
You also can use "by_name=True" if you make the network in a different way, this time you should keep layer names.
Another option id to manually set weights; for this you will load .h5 file bu "h5py" (h5py.File(path/to/weights, mode='r')) for example (have look how keras do that), then try to correspond layer names of the model and loaded weights.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.