Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
Related
I followed this tutorial to train a pytorch model for instance segmentation:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
I would not like to train a model on entirely different data and classes, totally unrelated to COCO. What changes do I need to make to retrain the model. From my reading I'm guessing besides have the correct number of classes I just need to train this line:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
to
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
But I notice there is another parameters: pretrained_backbone=True, trainable_backbone_layers=None should they be changed too?
The function signature is
torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs)
Setting pretrained=False will tell PyTorch not to download model pre-trained on COCO train2017. You want it as you're interested in training.
Usually, this is enough if you want to train on a different dataset.
When you set pretrained=False, PyTorch will download pretrained ResNet50 on ImageNet. And by default, it'll freeze first two blocks named conv1 and layer1. This is how it was done in Faster R-CNN paper which frooze the initial layers of pretrained backbone.
(Just print model to check its structure).
layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
Now, if you don't even want the first two layers to freeze, you can set trainable_backbone_layers=5 (done automatically, when you set pretrained_backbone=False), which will train the entire resnet backbone from scratch.
Check PR#2160.
From maskrcnn_resnet50_fpn document:
pretrained (bool) – If True, returns a model pre-trained on COCO train2017
pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet
trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.
So for training from scratch using:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False, trainable_backbone_layers=5, num_classes=your_num_classes)
or:
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False, num_classes=your_num_classes)
because in source code of maskrcnn_resnet50_fpn:
if not (pretrained or pretrained_backbone):
trainable_backbone_layers = 5
I want to use a set of pre-trained weights to train my model for MNIST classification. More specifically, I train my model on one dataset. I want to use the final weights as the starting weights to train the model on a different dataset. To do this, I use
intial_weights = model1.get_weights()
model2 = create_model()
model2.set_weights(initial_weights)
model2.fit(x=x_train59,y=y_train59, epochs=20,callbacks = [cp_callback2])
My question is that whether model.fit() will ignore the initial weights set using model2.set_weights() or not. And if it does ignore, is there a way to make sure that model2.fit() uses the weights obtained previously. Also, is there a way to visualize the starting weights before model.fit() starts training. Thanks much in advance!
When you do model2.set_weights, you changed the weights of model2. That's all.
You can see the weights the same way: w2 = model2.get_weights(). Then print w2 in a convenient way.
So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!
Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.
I want extract 1000 image features using pretrained Xception model.
but xception models last layer(avg_pool) give 2048 features.
Can i reduce final output feature number without additional training?
I want image features before the softmax not predcition result.
base_model = xception.Xception(include_top=True, weights='imagenet')
base_model.summary()
self.model = Model(inputs = base_model.input, outputs=base_model.get_layer('avg_pool').output)
This model was trained to produce embeddings in 2048-dimensional space to the classifier after it. There is no sense in trying to reduce the dimensionality of the embedding space, unless you are combining very complex and inflexible models. If you are just doing simple transfer learning with no memory constraints, just snap your new classifier (extra layers) on top of it, and retrain after freezing (or not) all layers in the original Xception. That should work regardless of Xception output_shape. See the keras docs.
That said, if you REALLY need to reduce dimensionality to 1000-d, you will need a method that preserves (or at least tries to preserve) the original topology of the embedding space, otherwise your model will not benefit at all from transfer learning. Take a look at PCA, SVD, or T-SNE.
I am working on a project that requires me to add new units to the output layer of a neural network to implement a form of transfer learning. I was wondering if I could do this and set the units' weights using either Keras or TensorFlow.
Specifically I would like to append an output neuron to the output layer of the Keras model and set that neuron's initial weights and bias.
Stumbled upon the answer to my own question. Thanks everyone for the answers/comments.
https://keras.io/layers/about-keras-layers/
The first few lines of this source detail how to load and set weights.
Essentially, appending an output neuron to a Keras model can be accomplished by loading the old output layer, appending the new weights, and setting weights for a new layer. Code is below.
# Load weights of previous output layer, set weights for new layer
old_layer_weights = model.layers.pop().get_weights()
new_neuron_weights = np.ndarray(shape=[1,bottleneck_size])
# Set new weights
# Append new weights, add new layer
new_layer = Dense(num_classes).set_weights(np.append(old_layer_weights,new_neuron_weights))
model.add(new_layer)
You could add new units to the output layer of a pre-trained neural network. This form of transfer learning is said to be called using the bottleneck features of a pre-trained network. This could be implemented both in tensorflow as well as in Keras.
Please find the tutorial in Keras below:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Also, find the tutorial for tensorflow below:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb
Hope this helps!