I am working on a project that requires me to add new units to the output layer of a neural network to implement a form of transfer learning. I was wondering if I could do this and set the units' weights using either Keras or TensorFlow.
Specifically I would like to append an output neuron to the output layer of the Keras model and set that neuron's initial weights and bias.
Stumbled upon the answer to my own question. Thanks everyone for the answers/comments.
https://keras.io/layers/about-keras-layers/
The first few lines of this source detail how to load and set weights.
Essentially, appending an output neuron to a Keras model can be accomplished by loading the old output layer, appending the new weights, and setting weights for a new layer. Code is below.
# Load weights of previous output layer, set weights for new layer
old_layer_weights = model.layers.pop().get_weights()
new_neuron_weights = np.ndarray(shape=[1,bottleneck_size])
# Set new weights
# Append new weights, add new layer
new_layer = Dense(num_classes).set_weights(np.append(old_layer_weights,new_neuron_weights))
model.add(new_layer)
You could add new units to the output layer of a pre-trained neural network. This form of transfer learning is said to be called using the bottleneck features of a pre-trained network. This could be implemented both in tensorflow as well as in Keras.
Please find the tutorial in Keras below:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Also, find the tutorial for tensorflow below:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb
Hope this helps!
Related
I'm trying to load checkpoints and populate model weights using The Faster-RCNN architecture (Faster R-CNN ResNet50 V1 640x640 to be precise, from here. I'm trying to load the weights for this network similar to how it's done in the example notebook for RetinaNet, where they do the following:
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor
)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
I'm trying to get a similar checkpoint loading mechanism going for the Faster-RCNN network I want to use, but the properties like _base_tower_layers_for_heads, _box_prediction_head only exist for the architecture used in the example, and not for anything else.
I also couldn't find documentation on which parts of the model to populate using Checkpoint for my particular use case. Would greatly appreciate any help on how to approach this!
As you said the main problem that you have is that you don't have a layer tensor to the layers that you want to do transfer learning on it.
This is part of the original implementation of the Faster R-CNN ResNet50 V1 640x640 copy in the Zoo. They didn't name the layers, or maybe they did name it but they didn't published the names (or the code).
To solve this you need to find out which layers you want to keep and which you want to relearn. You can print out all the layers in a network using (ref):
[n.name for n in tf.get_default_graph().as_graph_def().node]
Names to layer can manually added but tf reserve default names for each node. This list can be long and exhausting however you need to find the tensor to start your transfer learning. Therefore you need to follow the list and try to understand from which layers you want to freeze and which you want to continue the learning process. Freezing a layer (ref):
if layer.name == 'layer_name':
layer.trainable = False
I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.
Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']
Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
I just got started with Keras and built a Q-learning example program. I created a tensorboard callback and I include it in the call to model.fit, but the only things that appear in TensorBoard are the scalar summary for the loss and the network graph. Interestingly, if I open up the dense layer in the graph, I see a little summary icon labeled "bias_0" and one labeled "kernel_0", but I don't see these appearing in the distributions or histograms tabs in TensorBoard like I did when I built a model in pure tensorflow.
Do I need to do something else to enable these in Tensorboard? Do I need to look into the details of the model that Keras produces and add my own tensor_summary() calls?
You can get the weights and biases per layer and for the entire model with .get_weights().
For example if the first layer of your model is the dense layer for which you would like to have your weights and biases, you can get them with:
weights, biases = model.layers[0].get_weights()
I debugged this and found that the problem was I was not providing any validation data when I called fit(). The TensorBoard callback will only report on the weights when validation data is provided. That seems a bit restrictive, but I at least have something that works.
I'm trying to replicate the code in this blog article How convolutional neural networks see the world
It works well in a CNN where there's no dropout layer but when there's one (or more) dropout layers, I can't directly use the layer.output line because it expects a learning phase.
When I use the recommend way to extract the output of a layer :
get_layer_output = K.function([model.layers[0].input, K.learning_phase()],
[model.layers[layer_index].output])
layer_output = get_3rd_layer_output([input_img, 0])[0]
The problem is that I can't put a placeholder in input_img because it expects "real" data but if I put directly "real" data then the rest of the code doesn't work (creating the loss, gradients and iterating needs a placeholder).
Is there a way I can make this work?
I'm using the Tensorflow backend.
EDIT : I solved my issue by using the K.set_learning_phase() method before doing anything like building my model (I had to start from a new environment and I used the method right after the imports).