duplicated weights in trained tensorflow model - python

python 3.7, tensorflow 2.9.1
After training deep learning model and running
model.weights
There are duplicated weights for embedding vectors.
Printing summary of it and comparing it to new model (same architecture)
It show same number of parameters in each layer however
number of total params, trainable params at the bottom are different.
Sadly I cannot share model architecture.

Related

How to implement DEEPExpectation (DEX) age detection?

First of all, I'm new to the Deep Learning platform, please correct me if I did any mistake.
I'm trying to implement the age detection by using DEX method. As of now my understand is that, they tried to train a CNN weight model using VGG-16 architechture. I'm using IMDB_WIKI dataset as they suggest in their paper.
I'm using TensorFlow, Keras to train my weight model in Python3 language.
My Steps to train the model(I just starts with the IMDB set):
Load the IMDB mat file and get training data and validation data set(10% of total dataset)
Create a VGG-16 model with ImageNet weight(I belive its a large dataset)
As ImageNet have 1000 classes, remove the last layer of model and put my single age class output layer instead.
Also add a dropout layer on the top of output layer(frankly don't know how it is working)
My experiment start from here :)
Freeze the layers alreay pretrain into VGG-16 architechture, except my new added layers, now there are some non-trainable objects available. In that case my training age accuracy is just 19% which is too poor, i hope detecting real age it should be 50-56%.
By seeing that I guess that may be due to, I didn't train all the layers. I remove the freezeness of the layers and tried to train but it's showing me a out of memory exception, After that I have just freeze 8 layers of my total architecture, after training 40 epochs I found that age accuracy is 11% which is less that before :(
Can any one please help me to understand this paper properly, please ?
Thank you.
"Also add a dropout layer on the top of output layer(frankly don't know how it is working)" - That is just plainly wrong. A dropout layer sets multiplies the output with 0, making the activation and the gradient 0. If you use this as your final layer with k percent, then your result will be rubbish in k percent of cases, e.g. dropping your accuracy. Just remove it and it should be better.
It is already implemented in deepface package for python
#!pip install deepface
from deepface import DeepFace
obj = DeepFace.analyze("img1.jpg", actions = ["age", "gender"])
print(obj)
The model is trained based on the instructions of DEX paper. It builds a VGG model in the background and loads pre-trained weights. Besides, it runs on TensorFlow framework and Keras APIs.

Updating pre-trained Deep Learning model with respect to new data points

Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

ELMo - How to train trainable parameters

I am new to tensorflow-hub and came across the ELMo model (https://www.tensorflow.org/hub/modules/google/elmo/2).
According to the original paper, the ELMo representation is a weighted average of hidden state activations and these weights are trainable according to the task at hand i.e task specific. As expected, I can see the 4 trainable parameters when I use tf.trainable_variables(). How do I exactly train these variables in tensorflow?
They just mention that these weights are trainable. But who should train it? Me or ELMo model itself trains it? The paper seems to suggest that I should be training it. If so, how do I train it in tensorflow?
You can start off by importing a module into your model with trainable=True, then train the model as you would any other TF model. In the process of this training the model the weight imported a part of the module will be trained as well. You can also use this tutorial as a good starting point as well, and just replace nnlm embedding with ELMo.

Usage of 'learning_phase' in keras for tensorflow backend?

I am trying to train a resnet network using keras backend in tensorflow. The feed dictionary for each batch update is written as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]], keras.backend.learning_phase():1}
I am using keras backend (keras.backend.set_session(sess)) because the original resnet network is defined with keras. As the model contains dropout and batch_norm layers, it requires a learning phase to distinct between training and testing.
I observe that whenever I set keras.backend.learning_phase():1, the model train/test accuracy hardly increase above 10%. In contrast, if the learning phase is not set i.e., the feed dictionary is defined as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]]}
Then as expected, the model accuracy keeps in increasing with epochs in a standard way.
I would appreciate if someone clarifies whether the use of learning phase is not necessary or if something else is wrong. Keras 2.0 documentation seems to suggest using learning phase with dropout and batch_norm layers.
set the learning phase to 1 (training)
K.set_learning_phase(1)
Then you need to set the training=false for all batch normalization layers
if layer.name.startswith('bn'):
layer.call(layer.input, training=False)

TensorFlow object detection api: classification weights initialization when changing number of classes at training using pre-trained models

I want to utilize not only the feature-extractor pre-trained weights but also the feature-map layers' classifier/localization pre-trained weights for fine-tuning tensorflow object detection models (SSD) using tensorflow object detection API. When my new model has a different number of classes from the pre-trained model that I'm using for the fine-tuning checkpoint, how would the TensorFlow object detection API handle the classification weight tensors?
When fine-tuning pre-trained models in ML object detection models like SSD, I can initialize not only the feature-extractor weights with the pre-trained weights but also initialize the feature-map's localization layer weights and classification layer weights, with latter only choosing the pre-trained class weights of choice, so that I can decrease the number of classes that the model can initially identify (for example from 90 MSCOCO classes to whichever classes of choice within those 90 classes like cars & pedestrian only etc.)
https://github.com/pierluigiferrari/ssd_keras/blob/master/weight_sampling_tutorial.ipynb
This is how it's done in keras models (ie in h5 files) and I want to do the same in Tensorflow object detection API as well. It seems that at training time I can specify the number of classes the new model is going to have in the config protobuf file, but since I'm new to the API (and tensorflow) I haven't been able to follow the source structure and understand how that number is going to be handled at fine-tuning. Most SSD models I know just ignore and initialize the classification weight tensor in case the pre-trained model's class weight shape is different from the new model's classification weight shape, but I want to retain the necessary classification weights and train upon those. Also, how would I do that within the API structure?
Thanks!
As I read through the code I found the responsible code, which only retains the pre-trained model's weights if the shape of the layers between the newly-defined model and the pre-trained model match. So if I change the number of the class, the shape of the classifier layers change, and the pre-trained weights are not retained.
https://github.com/tensorflow/models/blob/master/research/object_detection/utils/variables_helper.py#L133

Categories

Resources