I would like to train a GAN in Keras. My final target is BEGAN, but I'm starting with the simplest one. Understanding how to freeze weights properly is necessary here and that's what I'm struggling with.
During the generator training time the discriminator weights might not be updated. I would like to freeze and unfreeze discriminator alternately for training generator and discriminator alternately. The problem is that setting trainable parameter to false on discriminator model or even on its' weights doesn't stop model to train (and weights to update). On the other hand when I compile the model after setting trainable to False the weights become unfreezable. I can't compile the model after each iteration because that negates the idea of whole training.
Because of that problem it seems that many Keras implementations are bugged or they work because of some non-intuitive trick in old version or something.
I've tried this example code a couple months ago and it worked:
https://github.com/fchollet/keras/blob/master/examples/mnist_acgan.py
It's not the simplest form of GAN, but as far as I remembered, it's not too difficult to remove the classification loss and turn the model into a GAN.
You don't need to turn on/off the discriminator's trainable property and recompile. Simply create and compile two model objects, one with trainable=True (discriminator in the code) and another one with trainable=False (combined in the code).
When you're updating the discriminator, call discriminator.train_on_batch(). When you're updating the generator, call combined.train_on_batch().
Can you use tf.stop_gradient to conditionally freeze weights?
Maybe your adversarial net(generator plus discriminator) are wrote in 'Model'.
However, even you set the d.trainable=False, the independent d net are set non-trainable, but the d in the whole adversarial net is still trainable.
You can use the d_on_g.summary() before then after set d.trainable=False and you would know What I mean(pay attention to the trainable variables).
Related
I use transfer leaning with efficientnet_B0, and what im trying to do is to gradually unfreeze layers while network is learning. At first, I train 1 dense layer on top of whole network, while every other layer is frozen. I use this code to freeze layers:
for layer in model_base.layers[:-2]:
layer.trainable = False
then I unfreeze the whole model and freeze the exact layers I need using this code:
model.trainable = True
for layer in model_base.layers[:-13]:
layer.trainable = False
Everything works fine. I model.compile one more time and it starts to train from where it left, great. But then, when I unfreeze all layers one more time with
model.trainable = True
and try to do fine-tuning, my model start to learn from the scratch.
I tried different approaches and ways to fix this, but nothing seems to work. I tried to use layer.training = False and layer.trainable = False for all batch_normalization layers in the model too, but it doesn't help either.
In addition to the previous answer, I would like to point out one very overlooked factor: that the freezing/unfreezing is also dependent on the problem you are trying to solve, i.e.
On the level of similarity of your own dataset and of the dataset on which the network was pre-trained.
The dimension of the new dataset.
You should consult the next diagram prior to opting for a decision
Moreover, note that if you are constrained by the hardware, you can opt for leaving some of the layers completely frozen, since in this way you have a smaller number of trainable parameters.
Picture taken from here (although I remember having seen it in several blogs): https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
This tends to be application-specific and not every problem can benefit from retraining the whole neural network.
my model start to learn from the scratch
While this is most likely not the case (weights are not reinitialized), it can definitely seem like that. Your model has been fine-tuned to some other task and now you are forcing it to retrain itself to do something different.
If you are observing behavior like that, the most likely cause is that you are simply using a large learning rate which will destroy those fine-tuned weights of the original model.
Retraining the whole model as you have described (the final step) should be done very, very carefully with very small learning rate (I have seen instances where Adam with 10^-8 learning rate was too much).
My advice is to keep lowering learning rate until it starts improving instead of damaging the weights but this may lead to such a small learning rate that it will be of no practical use.
The way you freeze and unfreeze your layers is correct and that is how it is done on the official website :
Setting layer.trainable to False moves all the layer's weights from
trainable to non-trainable.
From https://keras.io/guides/transfer_learning/
As discussed on the other answers, the problem you encounter is indeed theorical and has nothing to do with the way you programmed it.
I've encountered this problem before. It seems that if I construct my model with Sequential API, the network will start learning from scratch when I set base_model.trainable = True. But if I create my model with Functional API, it seems that everything is okay. The way I create my model is like the one in this official tutorial https://www.tensorflow.org/tutorials/images/transfer_learning
I have a model that is essentially an Auxiliary Conditional GAN; the first part of the model is the Generator, the last part is the Discriminator. The Discriminator makes multiclass (k=10) predictions.
Following the work of http://arxiv.org/abs/1912.07768 (pp3 for a helpful diagram, but note I ignore network structure modifications for the purposes of this question) I train the entire model for T=32 iterations by generating synthetic input and class labels (The 'inner loop'). I can predict on real data and labels using just the Discriminator(Learner) to get losses. However I need to back-propagate the Discriminator's error all the way back through the inner loop to the Generator.
How can I achieve this with Keras? Is it possible to do loop unrolling in Keras? How can I provide an arbitrary loss and backprop this down the unrolled layers?
Update: There's now one implementation, in PyTorch, which uses Facebooks 'Higher' library. This appears to mean that the updates made during the inner loop must be 'unwrapped' in order for the final meta-loss to be applied throughout the entire network. Is there a Keras way of achieving this? https://github.com/GoodAI/GTN
This seems to be a Generative Adversarial Network (GAN) which both model learns; one is to classify and the other is for generation.
To summarize, the output of the Generator is fed as a Input in the Discriminator and then feeds back its output to the Generator as its Input.
There is also a discussion and implementation of this in TensorFlow Keras using MNIST Digit Dataset in TensorFlow GAN/DCGAN Documentation in this link.
So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!
Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.
I'm trying to include a Dense layer that is not trainable and initialized as an identity matrix, as part of my tensorflow Estimator. The intuition is for this Dense layer to pass through its inputs during standard training and a fine tuning step afterward. The catch is that I don't want these weights updated at all during the initial round, only during fine tuning.
I can do several things to make these weights non-trainable, including using the trainable argument in the Dense constructor or by filtering out anything with dense in its name before passing to MomentumOptimizer.compute_gradients().
But in either case (make dense non-trainable or just don't pass it to optimizer), tf will throw an error saying that it cannot find a key related to the dense layer.
I understand that since on the first run, where dense is non-trainable, that it won't be persisted in the checkpoint file. Likewise, if it's filtered out from being passed to compute_gradients, then the same issue occurs.
Is there any method to just persist untrained variables, even with just their initialized values, across runs?
NotFoundError (see above for traceback): Key dense/kernel/Momentum not
found in checkpoint
I will answer my own question here because it wasn't immediately obvious to me, as tf documentation doesn't seem to make it clear. If you want to introduce a new trainable variable, then it will need to fundamentally be a different model in later ones. Thus in order to handle fine-tuning of existing weights, those existing weights in the new model must be resolved from the warm start settings.
So train a model, conditionally not including the fine-tune layer when your Estimator's model function runs. Train the existing model and then create another model that is separate. Technically this just means you need to use a fresh model directory, but warm start settings should point to the model you trained beforehand.
On fine-tuning runs, your model function should conditionally include the fine-tuning layers, but it should restore the weights from the previous run setting the warm start settings to look at previous model directory.
I trained a model with several layers than for each layer in model.layers set
layer.trainable = False
I added several layers to this model, called
model.compile(...)
And trained this new model for several epochs with part of the layers frozen.
Later I decided to unfreeze layers and ran
for layer in model.layers:
layer.trainable = True
model.compile(...)
When I start learning the model with unfrozen layers I get loss function value very high even though I just wanted to continue training from previously learned weights. I also checked that after model.compile(...) model still predicts well (not resetting previously learned weights) but as soon as learning process starts everything gets 'erased' and I start as from scratch.
Could someone clarify, whether this behavior is ok? How to recompile the model and not start from scratch?
P.S. I also asked manually saving weights and assigning them back to a newly compiled model using layer.get_weights() and layer.set_weights()
I used the same compile parameters (similar optimizer and similar loss)
You might need to lower your learning rate while starting fine-tuning the trained layers. For example, a learning rate of 0.01 might work for your new dense layers (top) with all others layers set to untrainable. But when setting all layers to be trainable, you might need to reduce the learning rate to say 0.001 There is no need to manually copy or set weights.