I am using a model (SimCLR) to learn representations from images. While pre-training, the model was trained against a single dummy label. Now I want to fine-tune the model with 8-class data.
While loading the pre-trained model checkpoint to the yet to be fine-tuned model with 8-class head I am encountering a ValueError.
ValueError: Tensor's shape (2048, 1) is not compatible with supplied shape [2048, 8]
Is there a solution to exclude the last head layer weights before loading to the checkpoint for fine-tuning the model?
System information
TensorFlow version: 2.5.0
Python version: 3.7.3
Well, to have your pre-trained model be able to successfully deal with your new inputs, they would need to be in the exact same shape as the old input it expects (from the old 1D model). To have your 8-class data work with this model, you need to change the model itself to handle the inputs of 8 classes. This will likely require you to edit the attributes of the model itself, and without a visual of the code it is hard to say exactly where you need to make that change.
Related
I have a deep learning model that uses BERT layer from Huggingface library (TF=2.0, Transformers=2.8.0).
The model consists of: BERT embeddings -> Attention -> Softmax. Also I am using tf.keras.callbacks.ModelCheckpoint to save the best model during training.
I'm trying to slice the model to get the attention weights.
My problem is that, if I try to access the output of any layer after loading the saved model using output1 = model.layers[3].output, I get this error:
AttributeError: Layer tf_bert_model has no inbound nodes.
but if I do the same on the original model without saving it (instantly after finishing model.fit()), I have no problem.
This is how I load the saved model:
model = tf.keras.models.load_model(model_dir)
Is there a problem with this way?, giving that it works for prediction.
Answering here for the benefit of the community since the user has already found the solution.
Issue was resolved by following steps
Loading the saved model
Creating another new model with the same structure
Copying the weights from the saved model to the newly created one
Then access the needed layer without any problem
I retrained ssd_mobilenet_v2_coco_2018_03_29 from Tensorflow detection model zoo by following the procedures defined in Tensorflow Object Detection API using my own dataset. When I try to export the model by
python object_detection/export_inference_graph.py --input_type=image_tensor --pipeline_config_path=.../pipeline.config --trained_checkpoint_prefix=.../model.ckpt-200000 --output_directory=.../export-model-200000 --input_shape=1,300,300,3
it says "20 ops no flops stats due to incomplete shapes." although I overwrite input_shape with 1,300,300,3.
How can I overcome this problem?
My final goal is to convert this model into IR representation, in which I got problems related to shape inference during model conversion whereas I have no problem with converting the original model, which I used for transfer learning, into IR.
Thanks
Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?
Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.
I want to utilize not only the feature-extractor pre-trained weights but also the feature-map layers' classifier/localization pre-trained weights for fine-tuning tensorflow object detection models (SSD) using tensorflow object detection API. When my new model has a different number of classes from the pre-trained model that I'm using for the fine-tuning checkpoint, how would the TensorFlow object detection API handle the classification weight tensors?
When fine-tuning pre-trained models in ML object detection models like SSD, I can initialize not only the feature-extractor weights with the pre-trained weights but also initialize the feature-map's localization layer weights and classification layer weights, with latter only choosing the pre-trained class weights of choice, so that I can decrease the number of classes that the model can initially identify (for example from 90 MSCOCO classes to whichever classes of choice within those 90 classes like cars & pedestrian only etc.)
https://github.com/pierluigiferrari/ssd_keras/blob/master/weight_sampling_tutorial.ipynb
This is how it's done in keras models (ie in h5 files) and I want to do the same in Tensorflow object detection API as well. It seems that at training time I can specify the number of classes the new model is going to have in the config protobuf file, but since I'm new to the API (and tensorflow) I haven't been able to follow the source structure and understand how that number is going to be handled at fine-tuning. Most SSD models I know just ignore and initialize the classification weight tensor in case the pre-trained model's class weight shape is different from the new model's classification weight shape, but I want to retain the necessary classification weights and train upon those. Also, how would I do that within the API structure?
Thanks!
As I read through the code I found the responsible code, which only retains the pre-trained model's weights if the shape of the layers between the newly-defined model and the pre-trained model match. So if I change the number of the class, the shape of the classifier layers change, and the pre-trained weights are not retained.
https://github.com/tensorflow/models/blob/master/research/object_detection/utils/variables_helper.py#L133