I want to use a pretrained neural network and just fine-tune it to my specific needs. I wanted to use Python and the Lasagne framework for this. On:
https://github.com/Lasagne/Recipes/blob/master/examples/ImageNet%20Pretrained%20Network%20%28VGG_S%29.ipynb
I found an example of how to use a pretrained network for specific images. My problem is that I would like to use
the network described in the link above as a starting point and add a final layer to it that makes it implement a TWO CLASS
classifier which is what I need. I therefore wanted to keep all the layers in the network frozen and allow training ONLY in my last added layer.
Apparently there is a way to indicate that layers should be "nontrainable" in lasagne, but I have found no expemples of how to do this on the web.
Any thoughts on this would be highly appreciated.
Set those layers that you want to frozen with lr to be 0 and only set those layer you want to fine tune lr nonzero. There is not a online example yet. But you should check this thread https://groups.google.com/forum/#!topic/lasagne-users/2z-6RrgiHkE
Remove trainable tag from all parameters of the layers that you want to keep frozen:
def freeze_layer(layer):
for param in layer.params.values():
param.remove('trainable')
To freeze all your network up to a certain layer you can simply iterate over its lower layers:
from lasagne.layers import get_all_layers
def freeze_net(net):
layers = get_all_layers(net)
for l in layers:
freeze_layer(l)
Code untested. See this discussion for more info.
Related
I'm trying to load checkpoints and populate model weights using The Faster-RCNN architecture (Faster R-CNN ResNet50 V1 640x640 to be precise, from here. I'm trying to load the weights for this network similar to how it's done in the example notebook for RetinaNet, where they do the following:
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor
)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
I'm trying to get a similar checkpoint loading mechanism going for the Faster-RCNN network I want to use, but the properties like _base_tower_layers_for_heads, _box_prediction_head only exist for the architecture used in the example, and not for anything else.
I also couldn't find documentation on which parts of the model to populate using Checkpoint for my particular use case. Would greatly appreciate any help on how to approach this!
As you said the main problem that you have is that you don't have a layer tensor to the layers that you want to do transfer learning on it.
This is part of the original implementation of the Faster R-CNN ResNet50 V1 640x640 copy in the Zoo. They didn't name the layers, or maybe they did name it but they didn't published the names (or the code).
To solve this you need to find out which layers you want to keep and which you want to relearn. You can print out all the layers in a network using (ref):
[n.name for n in tf.get_default_graph().as_graph_def().node]
Names to layer can manually added but tf reserve default names for each node. This list can be long and exhausting however you need to find the tensor to start your transfer learning. Therefore you need to follow the list and try to understand from which layers you want to freeze and which you want to continue the learning process. Freezing a layer (ref):
if layer.name == 'layer_name':
layer.trainable = False
I use transfer leaning with efficientnet_B0, and what im trying to do is to gradually unfreeze layers while network is learning. At first, I train 1 dense layer on top of whole network, while every other layer is frozen. I use this code to freeze layers:
for layer in model_base.layers[:-2]:
layer.trainable = False
then I unfreeze the whole model and freeze the exact layers I need using this code:
model.trainable = True
for layer in model_base.layers[:-13]:
layer.trainable = False
Everything works fine. I model.compile one more time and it starts to train from where it left, great. But then, when I unfreeze all layers one more time with
model.trainable = True
and try to do fine-tuning, my model start to learn from the scratch.
I tried different approaches and ways to fix this, but nothing seems to work. I tried to use layer.training = False and layer.trainable = False for all batch_normalization layers in the model too, but it doesn't help either.
In addition to the previous answer, I would like to point out one very overlooked factor: that the freezing/unfreezing is also dependent on the problem you are trying to solve, i.e.
On the level of similarity of your own dataset and of the dataset on which the network was pre-trained.
The dimension of the new dataset.
You should consult the next diagram prior to opting for a decision
Moreover, note that if you are constrained by the hardware, you can opt for leaving some of the layers completely frozen, since in this way you have a smaller number of trainable parameters.
Picture taken from here (although I remember having seen it in several blogs): https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
This tends to be application-specific and not every problem can benefit from retraining the whole neural network.
my model start to learn from the scratch
While this is most likely not the case (weights are not reinitialized), it can definitely seem like that. Your model has been fine-tuned to some other task and now you are forcing it to retrain itself to do something different.
If you are observing behavior like that, the most likely cause is that you are simply using a large learning rate which will destroy those fine-tuned weights of the original model.
Retraining the whole model as you have described (the final step) should be done very, very carefully with very small learning rate (I have seen instances where Adam with 10^-8 learning rate was too much).
My advice is to keep lowering learning rate until it starts improving instead of damaging the weights but this may lead to such a small learning rate that it will be of no practical use.
The way you freeze and unfreeze your layers is correct and that is how it is done on the official website :
Setting layer.trainable to False moves all the layer's weights from
trainable to non-trainable.
From https://keras.io/guides/transfer_learning/
As discussed on the other answers, the problem you encounter is indeed theorical and has nothing to do with the way you programmed it.
I've encountered this problem before. It seems that if I construct my model with Sequential API, the network will start learning from scratch when I set base_model.trainable = True. But if I create my model with Functional API, it seems that everything is okay. The way I create my model is like the one in this official tutorial https://www.tensorflow.org/tutorials/images/transfer_learning
I am creating a model somewhat similar to the one mentioned below:
model
I am using Keras to create such model but have struck a dead end as I have not been able find a way to add SoftMax to outputs of the LSTM units. So far all the tutorials and helping material provides with information about outputting a single class even like in the case of image captioning as provided in this link.
So is it possible to apply SoftMax to every unit of LSTM (where return sequence is true) or do I have to move to pytorch.
The answer is: yes, it is possible to apply to each unit of LSTM and no, you do not have to move to PyTorch.
While in Keras 1.X you needed to explicitly state that you add a TimeDistributed layer, in Keras 2.X you can just write:
model.add(LSTM(50,activation='relu',return_sequences=False))
model.add(Dense(number_of_classes,activation='softmax'))
I'm studying different object detection algorithms for my interest.
The main reference are Andrej Karpathy's slides on object detection slides here.
I would like to start from some reference, in particular something which allows me to directly test some of the network mentioned on my data (mainly consisting in onboard cameras of car and bike races).
Unfortunately I already used some pretrained network (repo forked from JunshengFu one, where I slightly adapt Yolo to my use case), but the classification accuracy is rather poor, I guess because there were not many training instances of racing cars like Formula 1.
For this reason I would like to retrain the networks and here is where I'm finding the most issues:
properly training some of the networks requires either hardware (powerful GPUs) or time I don't have so I was wondering whether I could retrain just some part of the network, in particular the classification network and if there is any repo already allowing that.
Thank you in advance
That is called fine-tuning of the network or transfer-learning. Basically you can do that for any network you find (having similar problem domains of course), and then depending on the amount of the data you have you will either fine-tune whole network or freeze some layers and train only last layers. For your case you would probably need to freeze whole network except last fully-connected layers (which you will actually replace with new ones, satisfying your number of classes), which perform classification. I don't know what library you use, but tensorflow has official tutorial on transfer-learning. However it's not very clear tbh.
More user-friendly tutorial you can find here by some enthusiast: tutorial. Here you can find a code repository as well. One correction you need thou is that the author performs fine-tuning of the whole network, while if you want to freeze some layers you will need to get list of the trainable variables and remove those you want to freeze and pass the resultant list to the optimizer (so he ignores removed vars), like following:
all_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='InceptionResnetV2')
to_train = all_vars[-6:] // you better specify them by name explicitely, but this still will work
optimizer = tf.train.AdamOptimizer(lr=0.0001)
train_op = slim.learning.create_train_op(total_loss,optimizer, variables_to_train=to_train)
Further, tensorflow has a so called model zoo (bunch of trained models you can use for your purposes and transfer-learning). You can find it here.
I'm trying to replicate the code in this blog article How convolutional neural networks see the world
It works well in a CNN where there's no dropout layer but when there's one (or more) dropout layers, I can't directly use the layer.output line because it expects a learning phase.
When I use the recommend way to extract the output of a layer :
get_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[layer_index].output])
layer_output = get_3rd_layer_output([input_img, 0])[0]
The problem is that I can't put a placeholder in input_img because it expects "real" data but if I put directly "real" data then the rest of the code doesn't work (creating the loss, gradients and iterating needs a placeholder).
Is there a way I can make this work?
I'm using the Tensorflow backend.
EDIT : I solved my issue by using the K.set_learning_phase() method before doing anything like building my model (I had to start from a new environment and I used the method right after the imports).