I trained a model with several layers than for each layer in model.layers set
layer.trainable = False
I added several layers to this model, called
model.compile(...)
And trained this new model for several epochs with part of the layers frozen.
Later I decided to unfreeze layers and ran
for layer in model.layers:
layer.trainable = True
model.compile(...)
When I start learning the model with unfrozen layers I get loss function value very high even though I just wanted to continue training from previously learned weights. I also checked that after model.compile(...) model still predicts well (not resetting previously learned weights) but as soon as learning process starts everything gets 'erased' and I start as from scratch.
Could someone clarify, whether this behavior is ok? How to recompile the model and not start from scratch?
P.S. I also asked manually saving weights and assigning them back to a newly compiled model using layer.get_weights() and layer.set_weights()
I used the same compile parameters (similar optimizer and similar loss)
You might need to lower your learning rate while starting fine-tuning the trained layers. For example, a learning rate of 0.01 might work for your new dense layers (top) with all others layers set to untrainable. But when setting all layers to be trainable, you might need to reduce the learning rate to say 0.001 There is no need to manually copy or set weights.
Related
I use transfer leaning with efficientnet_B0, and what im trying to do is to gradually unfreeze layers while network is learning. At first, I train 1 dense layer on top of whole network, while every other layer is frozen. I use this code to freeze layers:
for layer in model_base.layers[:-2]:
layer.trainable = False
then I unfreeze the whole model and freeze the exact layers I need using this code:
model.trainable = True
for layer in model_base.layers[:-13]:
layer.trainable = False
Everything works fine. I model.compile one more time and it starts to train from where it left, great. But then, when I unfreeze all layers one more time with
model.trainable = True
and try to do fine-tuning, my model start to learn from the scratch.
I tried different approaches and ways to fix this, but nothing seems to work. I tried to use layer.training = False and layer.trainable = False for all batch_normalization layers in the model too, but it doesn't help either.
In addition to the previous answer, I would like to point out one very overlooked factor: that the freezing/unfreezing is also dependent on the problem you are trying to solve, i.e.
On the level of similarity of your own dataset and of the dataset on which the network was pre-trained.
The dimension of the new dataset.
You should consult the next diagram prior to opting for a decision
Moreover, note that if you are constrained by the hardware, you can opt for leaving some of the layers completely frozen, since in this way you have a smaller number of trainable parameters.
Picture taken from here (although I remember having seen it in several blogs): https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
This tends to be application-specific and not every problem can benefit from retraining the whole neural network.
my model start to learn from the scratch
While this is most likely not the case (weights are not reinitialized), it can definitely seem like that. Your model has been fine-tuned to some other task and now you are forcing it to retrain itself to do something different.
If you are observing behavior like that, the most likely cause is that you are simply using a large learning rate which will destroy those fine-tuned weights of the original model.
Retraining the whole model as you have described (the final step) should be done very, very carefully with very small learning rate (I have seen instances where Adam with 10^-8 learning rate was too much).
My advice is to keep lowering learning rate until it starts improving instead of damaging the weights but this may lead to such a small learning rate that it will be of no practical use.
The way you freeze and unfreeze your layers is correct and that is how it is done on the official website :
Setting layer.trainable to False moves all the layer's weights from
trainable to non-trainable.
From https://keras.io/guides/transfer_learning/
As discussed on the other answers, the problem you encounter is indeed theorical and has nothing to do with the way you programmed it.
I've encountered this problem before. It seems that if I construct my model with Sequential API, the network will start learning from scratch when I set base_model.trainable = True. But if I create my model with Functional API, it seems that everything is okay. The way I create my model is like the one in this official tutorial https://www.tensorflow.org/tutorials/images/transfer_learning
I followed a blog on how to implement a vgg16-model from scratch and want to do the same with the pretrained model from Keras. I looked up some other blogs but can't find a fitting solution I think. My task is to classify integrated circuit images into defect or non defects.
I have seen on a paper that they used pretrained imagenet model of vgg16 for fabric defect detection, where they freezed the first seven layers and fine tuned the last nine for their own problem.
(Source: https://journals.sagepub.com/doi/full/10.1177/1558925019897396)
I have already seen examples on how to freeze all layers except the fully connected layers, but how can I try the example with freezing first x layers and fine tune the others for my problem?
The VGG16 is fairly easy to implement from scratch but for models like resnet or xception it is getting a little trickier.
It is not necessary to implement a model from scratch to freeze a few layers. You can do this on pre-trained models as well. In keras, you'd use trainable = False.
For example, let's say you want to use the pre-trained Xception model from keras and want to freeze the first x layers:
#In your includes
from keras.applications import Xception
#Since you're using the model for a different task, you'd want to remove the top
base_model = Xception(weights='imagenet', include_top=False)
#Freeze layers 0 to x
for layer in base_model.layers[0:x]:
layer.trainable = False
#To see all the layers in detail and to check trainable parameters
base_model.summary()
Ideally you'd want to add another layer on top of this model with the output as your classes. For more details, you can check this keras guide: https://keras.io/guides/transfer_learning/
A lot of times the pre-trained weights can be very useful in other classification tasks but in case you want to train a model from scratch on your dataset, you can load the model without the imagenet weights. Or better, load the weights but don't freeze any layers. This will retrain every layer taking imagenet weights as an initialization.
I hope I've answered your question.
So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!
Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.
I am trying to train a resnet network using keras backend in tensorflow. The feed dictionary for each batch update is written as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]], keras.backend.learning_phase():1}
I am using keras backend (keras.backend.set_session(sess)) because the original resnet network is defined with keras. As the model contains dropout and batch_norm layers, it requires a learning phase to distinct between training and testing.
I observe that whenever I set keras.backend.learning_phase():1, the model train/test accuracy hardly increase above 10%. In contrast, if the learning phase is not set i.e., the feed dictionary is defined as:
feed_dict= {x:X_train[indices[start:end]], y:Y_train[indices[start:end]]}
Then as expected, the model accuracy keeps in increasing with epochs in a standard way.
I would appreciate if someone clarifies whether the use of learning phase is not necessary or if something else is wrong. Keras 2.0 documentation seems to suggest using learning phase with dropout and batch_norm layers.
set the learning phase to 1 (training)
K.set_learning_phase(1)
Then you need to set the training=false for all batch normalization layers
if layer.name.startswith('bn'):
layer.call(layer.input, training=False)
I would like to train a GAN in Keras. My final target is BEGAN, but I'm starting with the simplest one. Understanding how to freeze weights properly is necessary here and that's what I'm struggling with.
During the generator training time the discriminator weights might not be updated. I would like to freeze and unfreeze discriminator alternately for training generator and discriminator alternately. The problem is that setting trainable parameter to false on discriminator model or even on its' weights doesn't stop model to train (and weights to update). On the other hand when I compile the model after setting trainable to False the weights become unfreezable. I can't compile the model after each iteration because that negates the idea of whole training.
Because of that problem it seems that many Keras implementations are bugged or they work because of some non-intuitive trick in old version or something.
I've tried this example code a couple months ago and it worked:
https://github.com/fchollet/keras/blob/master/examples/mnist_acgan.py
It's not the simplest form of GAN, but as far as I remembered, it's not too difficult to remove the classification loss and turn the model into a GAN.
You don't need to turn on/off the discriminator's trainable property and recompile. Simply create and compile two model objects, one with trainable=True (discriminator in the code) and another one with trainable=False (combined in the code).
When you're updating the discriminator, call discriminator.train_on_batch(). When you're updating the generator, call combined.train_on_batch().
Can you use tf.stop_gradient to conditionally freeze weights?
Maybe your adversarial net(generator plus discriminator) are wrote in 'Model'.
However, even you set the d.trainable=False, the independent d net are set non-trainable, but the d in the whole adversarial net is still trainable.
You can use the d_on_g.summary() before then after set d.trainable=False and you would know What I mean(pay attention to the trainable variables).