I have a deep learning model that uses BERT layer from Huggingface library (TF=2.0, Transformers=2.8.0).
The model consists of: BERT embeddings -> Attention -> Softmax. Also I am using tf.keras.callbacks.ModelCheckpoint to save the best model during training.
I'm trying to slice the model to get the attention weights.
My problem is that, if I try to access the output of any layer after loading the saved model using output1 = model.layers[3].output, I get this error:
AttributeError: Layer tf_bert_model has no inbound nodes.
but if I do the same on the original model without saving it (instantly after finishing model.fit()), I have no problem.
This is how I load the saved model:
model = tf.keras.models.load_model(model_dir)
Is there a problem with this way?, giving that it works for prediction.
Answering here for the benefit of the community since the user has already found the solution.
Issue was resolved by following steps
Loading the saved model
Creating another new model with the same structure
Copying the weights from the saved model to the newly created one
Then access the needed layer without any problem
Related
I'm trying to load checkpoints and populate model weights using The Faster-RCNN architecture (Faster R-CNN ResNet50 V1 640x640 to be precise, from here. I'm trying to load the weights for this network similar to how it's done in the example notebook for RetinaNet, where they do the following:
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor
)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
I'm trying to get a similar checkpoint loading mechanism going for the Faster-RCNN network I want to use, but the properties like _base_tower_layers_for_heads, _box_prediction_head only exist for the architecture used in the example, and not for anything else.
I also couldn't find documentation on which parts of the model to populate using Checkpoint for my particular use case. Would greatly appreciate any help on how to approach this!
As you said the main problem that you have is that you don't have a layer tensor to the layers that you want to do transfer learning on it.
This is part of the original implementation of the Faster R-CNN ResNet50 V1 640x640 copy in the Zoo. They didn't name the layers, or maybe they did name it but they didn't published the names (or the code).
To solve this you need to find out which layers you want to keep and which you want to relearn. You can print out all the layers in a network using (ref):
[n.name for n in tf.get_default_graph().as_graph_def().node]
Names to layer can manually added but tf reserve default names for each node. This list can be long and exhausting however you need to find the tensor to start your transfer learning. Therefore you need to follow the list and try to understand from which layers you want to freeze and which you want to continue the learning process. Freezing a layer (ref):
if layer.name == 'layer_name':
layer.trainable = False
According to the demo code
"Image similarity estimation using a Siamese Network with a contrastive loss"
https://keras.io/examples/vision/siamese_contrastive/
I'm trying to save model by model.save to h5 or hdf5; however, after I used load_model (even tried load_weights)
it showed error message for : unknown opcode
Have done googling job which all tells me it's python version problem between py3.5~py3.6
But actually I use only python 3.8....
other info say that there's some extra job need to be done either in model building or load_model
It would be very kind for any one to help provide the save and load model part
to make this demo code more completed
thanks!!
Actually here they are using two individual factors which come in a custom object.
Custom objects:
contrastive loss
embedding layer: where we are finding euclidean_distance.
Saving model:
for the saving model, it's straightforward
<model_name>.save("siamese_contrastive.h5")
Loading model:
Here the good part will come model will not load directly here because it doesn't have an understanding of two things one is your custom layer and 2nd is your loss.
model = tf.keras.models.load_model('siamese_contrastive.h5', custom_objects={ })
In the custom object mentioned above, you have to provide the definition of those two objects.
After that, it will accept your model and it will run separately at inferencing time.
Still figuring out how??
Have a look at my implementation let me know if you still have any questions: https://github.com/anukash/Keras_siamese_contrastive
I am using a model (SimCLR) to learn representations from images. While pre-training, the model was trained against a single dummy label. Now I want to fine-tune the model with 8-class data.
While loading the pre-trained model checkpoint to the yet to be fine-tuned model with 8-class head I am encountering a ValueError.
ValueError: Tensor's shape (2048, 1) is not compatible with supplied shape [2048, 8]
Is there a solution to exclude the last head layer weights before loading to the checkpoint for fine-tuning the model?
System information
TensorFlow version: 2.5.0
Python version: 3.7.3
Well, to have your pre-trained model be able to successfully deal with your new inputs, they would need to be in the exact same shape as the old input it expects (from the old 1D model). To have your 8-class data work with this model, you need to change the model itself to handle the inputs of 8 classes. This will likely require you to edit the attributes of the model itself, and without a visual of the code it is hard to say exactly where you need to make that change.
I've searched around for a couple of answers regarding the load_model from keras but I still have a question.
I am following this model really closely (https://github.com/experiencor/keras-yolo2), and am training on a custom dataset.
I have done the training which gives me a yolov2.h5 file, basically model weights to fit into the keras model. But I am encountering some problems with the loading of the model.
After loading the model (in a separate.py file)
model = load_model('file_dir/yolov2.h5')
First I encounter the issue
NameError: name 'tf' is not defined
Which I then searched up to modify my code to add custom objects as such:
model = load_model('file_dir/yolov2.h5', custom_objects={'tf':tf})
This clears the first error but results in another
ValueError: Unknown loss function : custom_loss
I used the custom_loss function from the yolov2 (https://github.com/experiencor/keras-yolo2/blob/master/frontend.py), so i tried to solve it by
from frontend import YOLO
model = load_model('file_dir/yolov2.h5' custom_objects={'tf':tf, 'custom_loss':YOLO.custom_loss)
But ran into another error:
TypeError: custom_loss() missing 1 required positional argument
I got rather stuck here because I have no idea how to fit in the parameters for custom_loss. Seek some help regarding this (Don't particularly understand this part since I'm loading my model in a different python script separate.py). Thank you so much!
(Edit: This fix doesn't work for me either)
model = load_model('file_dir/yolov2.h5', compile = False)
To resolve this problem, as you already have the network at hand, only save trained weights (like what keras trainer does in callback).
For testing, make model, no need to compile, and then load trained weights using model.load_weights(path/to/saved/weights).
You also can use "by_name=True" if you make the network in a different way, this time you should keep layer names.
Another option id to manually set weights; for this you will load .h5 file bu "h5py" (h5py.File(path/to/weights, mode='r')) for example (have look how keras do that), then try to correspond layer names of the model and loaded weights.
I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.
Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']