Loading checkpoints while training a Faster-RCNN model on a custom dataset

Loading checkpoints while training a Faster-RCNN model on a custom dataset - python

I'm trying to load checkpoints and populate model weights using The Faster-RCNN architecture (Faster R-CNN ResNet50 V1 640x640 to be precise, from here. I'm trying to load the weights for this network similar to how it's done in the example notebook for RetinaNet, where they do the following:
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor
)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
I'm trying to get a similar checkpoint loading mechanism going for the Faster-RCNN network I want to use, but the properties like _base_tower_layers_for_heads, _box_prediction_head only exist for the architecture used in the example, and not for anything else.
I also couldn't find documentation on which parts of the model to populate using Checkpoint for my particular use case. Would greatly appreciate any help on how to approach this!

As you said the main problem that you have is that you don't have a layer tensor to the layers that you want to do transfer learning on it.
This is part of the original implementation of the Faster R-CNN ResNet50 V1 640x640 copy in the Zoo. They didn't name the layers, or maybe they did name it but they didn't published the names (or the code).
To solve this you need to find out which layers you want to keep and which you want to relearn. You can print out all the layers in a network using (ref):
[n.name for n in tf.get_default_graph().as_graph_def().node]
Names to layer can manually added but tf reserve default names for each node. This list can be long and exhausting however you need to find the tensor to start your transfer learning. Therefore you need to follow the list and try to understand from which layers you want to freeze and which you want to continue the learning process. Freezing a layer (ref):
if layer.name == 'layer_name':
layer.trainable = False

Related

AttributeError: Layer tf_bert_model has no inbound nodes

I have a deep learning model that uses BERT layer from Huggingface library (TF=2.0, Transformers=2.8.0).
The model consists of: BERT embeddings -> Attention -> Softmax. Also I am using tf.keras.callbacks.ModelCheckpoint to save the best model during training.
I'm trying to slice the model to get the attention weights.
My problem is that, if I try to access the output of any layer after loading the saved model using output1 = model.layers[3].output, I get this error:
AttributeError: Layer tf_bert_model has no inbound nodes.
but if I do the same on the original model without saving it (instantly after finishing model.fit()), I have no problem.
This is how I load the saved model:
model = tf.keras.models.load_model(model_dir)
Is there a problem with this way?, giving that it works for prediction.

Answering here for the benefit of the community since the user has already found the solution.
Issue was resolved by following steps
Loading the saved model
Creating another new model with the same structure
Copying the weights from the saved model to the newly created one
Then access the needed layer without any problem

saving and loading RNN hidden states in PyTorch

I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.

Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']

Adding new units to a Keras model layer and changing their weights

I am working on a project that requires me to add new units to the output layer of a neural network to implement a form of transfer learning. I was wondering if I could do this and set the units' weights using either Keras or TensorFlow.
Specifically I would like to append an output neuron to the output layer of the Keras model and set that neuron's initial weights and bias.

Stumbled upon the answer to my own question. Thanks everyone for the answers/comments.
https://keras.io/layers/about-keras-layers/
The first few lines of this source detail how to load and set weights.
Essentially, appending an output neuron to a Keras model can be accomplished by loading the old output layer, appending the new weights, and setting weights for a new layer. Code is below.
# Load weights of previous output layer, set weights for new layer
old_layer_weights = model.layers.pop().get_weights()
new_neuron_weights = np.ndarray(shape=[1,bottleneck_size])
# Set new weights
# Append new weights, add new layer
new_layer = Dense(num_classes).set_weights(np.append(old_layer_weights,new_neuron_weights))
model.add(new_layer)

You could add new units to the output layer of a pre-trained neural network. This form of transfer learning is said to be called using the bottleneck features of a pre-trained network. This could be implemented both in tensorflow as well as in Keras.
Please find the tutorial in Keras below:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Also, find the tutorial for tensorflow below:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb
Hope this helps!

Unable to load and use multiple keras models

I'm trying to load three different models in the same process. Only the first one works as expected, the rest of them return like random results.
Basically the order is as follows:
define and compile first model
load trained weights before
rename layers
the same process for the second model
the same process for the third model
So, something like:
model1 = Model(inputs=Input(shape=input_size_im) , outputs=layers_firstmodel)
model1.compile(optimizer='sgd', loss='mse')
model1.load_weights(weights_first, by_name=True)
# rename layers but didn't work
model2 = Model(inputs=Input(shape=input_size_im) , outputs=layers_secondmodel)
model2.compile(optimizer='sgd', loss='mse')
model2.load_weights(weights_second, by_name=True)
# rename layers but didn't work
model3 = Model(inputs=Input(shape=input_size_im) , outputs=layers_thirdmodel)
model3.compile(optimizer='sgd', loss='mse')
model3.load_weights(weights_third, by_name=True)
# rename layers but didn't work
for im in list_images:
results_firstmodel = model1.predict(im)
results_secondmodel = model2.predict(im)
results_thirdmodel = model2.predict(im)
I'd like to perform some inference over a bunch of images. To do that the idea consists in looping over the images and perform inference with these three algorithms, and return the results.
I have tried to rename all layers to make them unique with no success. Also I created a different graph for each network, and with a different session do the inference. This works but it's very inefficient (in addition I have to set their weights every time because of sess.run(tf.global_variables_initializer()) removes them). Each time it's created a session tensorflow prints "creating tensorflow device (/device:GPU:0)".
I am running Tensorflow 1.4.0-rc0, Keras 2.1.1 and Ubuntu 16.04 kernel 4.14.

The OP is correct here. There is a serious bug when you try to load multiple weight files in the same script. The above answer doesn't solve this. If you actually interrogate the weights when loading weights for multiple models in the same script you will notice that the weights are different than when you just load weights for one model on its own. This is where the randomness is the OP observes coming from.
EDIT: To solve this problem you have to encapsulate the model.load_weight command within a function and the randomness that you are experiencing should go away. The problem is that something weird screws up when you have multiple load_weight commands in the same script like you have above. If you load those model weights with a function you issues should go away.

From the Keras docs we have this explanation for the user of load_weights:
loads the weights of the model from a HDF5 file (created by save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.
Therefore, if your architecture is unchanged you should drop the by_name=True or make it False (its default value). This could be causing the inconsistencies that you are facing, as your weights are not being loaded probably due to having different names on your layers.
Another important thing to consider is the nature of your HDF5 file, and the way you created it. If it indeed contains only the weights (created with save_weights as the docs point out) then there should be no problem in proceeding as explained before.
Now, if that HDF5 contains weights and architecture in the same file, then you should be loading it with keras.models.load_model instead (further reading if you like here). If this is the case then this would also explain those inconsistencies.
As a side suggestion, I prefer to save my models using Callbacks, like the ModelCheckpoint or the EarlyStopping if you want to automatically determine when to stop training. This not only gives you greater flexibility when training and saving your models (as you can stop them on the optimal training epoch or when you desire), but also makes loading those models easily, as you can simply use the load_model method to load both architecture and weights to your desired variable.
Finally, here is one useful SO post where saving (and loading) Keras models is explained.

How to make Lasagne layers nontrainable

I want to use a pretrained neural network and just fine-tune it to my specific needs. I wanted to use Python and the Lasagne framework for this. On:
https://github.com/Lasagne/Recipes/blob/master/examples/ImageNet%20Pretrained%20Network%20%28VGG_S%29.ipynb
I found an example of how to use a pretrained network for specific images. My problem is that I would like to use
the network described in the link above as a starting point and add a final layer to it that makes it implement a TWO CLASS
classifier which is what I need. I therefore wanted to keep all the layers in the network frozen and allow training ONLY in my last added layer.
Apparently there is a way to indicate that layers should be "nontrainable" in lasagne, but I have found no expemples of how to do this on the web.
Any thoughts on this would be highly appreciated.

Set those layers that you want to frozen with lr to be 0 and only set those layer you want to fine tune lr nonzero. There is not a online example yet. But you should check this thread https://groups.google.com/forum/#!topic/lasagne-users/2z-6RrgiHkE

Remove trainable tag from all parameters of the layers that you want to keep frozen:
def freeze_layer(layer):
for param in layer.params.values():
param.remove('trainable')
To freeze all your network up to a certain layer you can simply iterate over its lower layers:
from lasagne.layers import get_all_layers
def freeze_net(net):
layers = get_all_layers(net)
for l in layers:
freeze_layer(l)
Code untested. See this discussion for more info.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.