So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!
Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.
Related
I followed this tutorial to train a pytorch model for instance segmentation:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
I would not like to train a model on entirely different data and classes, totally unrelated to COCO. What changes do I need to make to retrain the model. From my reading I'm guessing besides have the correct number of classes I just need to train this line:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
to
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
But I notice there is another parameters: pretrained_backbone=True, trainable_backbone_layers=None should they be changed too?
The function signature is
torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs)
Setting pretrained=False will tell PyTorch not to download model pre-trained on COCO train2017. You want it as you're interested in training.
Usually, this is enough if you want to train on a different dataset.
When you set pretrained=False, PyTorch will download pretrained ResNet50 on ImageNet. And by default, it'll freeze first two blocks named conv1 and layer1. This is how it was done in Faster R-CNN paper which frooze the initial layers of pretrained backbone.
(Just print model to check its structure).
layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
Now, if you don't even want the first two layers to freeze, you can set trainable_backbone_layers=5 (done automatically, when you set pretrained_backbone=False), which will train the entire resnet backbone from scratch.
Check PR#2160.
From maskrcnn_resnet50_fpn document:
pretrained (bool) – If True, returns a model pre-trained on COCO train2017
pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet
trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
So for training from scratch using:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False, trainable_backbone_layers=5, num_classes=your_num_classes)
or:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False, num_classes=your_num_classes)
because in source code of maskrcnn_resnet50_fpn:
if not (pretrained or pretrained_backbone):
trainable_backbone_layers = 5
I followed a blog on how to implement a vgg16-model from scratch and want to do the same with the pretrained model from Keras. I looked up some other blogs but can't find a fitting solution I think. My task is to classify integrated circuit images into defect or non defects.
I have seen on a paper that they used pretrained imagenet model of vgg16 for fabric defect detection, where they freezed the first seven layers and fine tuned the last nine for their own problem.
(Source: https://journals.sagepub.com/doi/full/10.1177/1558925019897396)
I have already seen examples on how to freeze all layers except the fully connected layers, but how can I try the example with freezing first x layers and fine tune the others for my problem?
The VGG16 is fairly easy to implement from scratch but for models like resnet or xception it is getting a little trickier.
It is not necessary to implement a model from scratch to freeze a few layers. You can do this on pre-trained models as well. In keras, you'd use trainable = False.
For example, let's say you want to use the pre-trained Xception model from keras and want to freeze the first x layers:
#In your includes
from keras.applications import Xception
#Since you're using the model for a different task, you'd want to remove the top
base_model = Xception(weights='imagenet', include_top=False)
#Freeze layers 0 to x
for layer in base_model.layers[0:x]:
layer.trainable = False
#To see all the layers in detail and to check trainable parameters
base_model.summary()
Ideally you'd want to add another layer on top of this model with the output as your classes. For more details, you can check this keras guide: https://keras.io/guides/transfer_learning/
A lot of times the pre-trained weights can be very useful in other classification tasks but in case you want to train a model from scratch on your dataset, you can load the model without the imagenet weights. Or better, load the weights but don't freeze any layers. This will retrain every layer taking imagenet weights as an initialization.
I hope I've answered your question.
I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.
Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']
Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
I trained a model with several layers than for each layer in model.layers set
layer.trainable = False
I added several layers to this model, called
model.compile(...)
And trained this new model for several epochs with part of the layers frozen.
Later I decided to unfreeze layers and ran
for layer in model.layers:
layer.trainable = True
model.compile(...)
When I start learning the model with unfrozen layers I get loss function value very high even though I just wanted to continue training from previously learned weights. I also checked that after model.compile(...) model still predicts well (not resetting previously learned weights) but as soon as learning process starts everything gets 'erased' and I start as from scratch.
Could someone clarify, whether this behavior is ok? How to recompile the model and not start from scratch?
P.S. I also asked manually saving weights and assigning them back to a newly compiled model using layer.get_weights() and layer.set_weights()
I used the same compile parameters (similar optimizer and similar loss)
You might need to lower your learning rate while starting fine-tuning the trained layers. For example, a learning rate of 0.01 might work for your new dense layers (top) with all others layers set to untrainable. But when setting all layers to be trainable, you might need to reduce the learning rate to say 0.001 There is no need to manually copy or set weights.