saving and loading RNN hidden states in PyTorch

saving and loading RNN hidden states in PyTorch - python

I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.

Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']

Related

Keras question about training with frozen layers

So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!

Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.

Updating pre-trained Deep Learning model with respect to new data points

Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.

If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

How to persist non-trainable variables in tf.Estimator checkpoint?

I'm trying to include a Dense layer that is not trainable and initialized as an identity matrix, as part of my tensorflow Estimator. The intuition is for this Dense layer to pass through its inputs during standard training and a fine tuning step afterward. The catch is that I don't want these weights updated at all during the initial round, only during fine tuning.
I can do several things to make these weights non-trainable, including using the trainable argument in the Dense constructor or by filtering out anything with dense in its name before passing to MomentumOptimizer.compute_gradients().
But in either case (make dense non-trainable or just don't pass it to optimizer), tf will throw an error saying that it cannot find a key related to the dense layer.
I understand that since on the first run, where dense is non-trainable, that it won't be persisted in the checkpoint file. Likewise, if it's filtered out from being passed to compute_gradients, then the same issue occurs.
Is there any method to just persist untrained variables, even with just their initialized values, across runs?
NotFoundError (see above for traceback): Key dense/kernel/Momentum not
found in checkpoint

I will answer my own question here because it wasn't immediately obvious to me, as tf documentation doesn't seem to make it clear. If you want to introduce a new trainable variable, then it will need to fundamentally be a different model in later ones. Thus in order to handle fine-tuning of existing weights, those existing weights in the new model must be resolved from the warm start settings.
So train a model, conditionally not including the fine-tune layer when your Estimator's model function runs. Train the existing model and then create another model that is separate. Technically this just means you need to use a fresh model directory, but warm start settings should point to the model you trained beforehand.
On fine-tuning runs, your model function should conditionally include the fine-tuning layers, but it should restore the weights from the previous run setting the warm start settings to look at previous model directory.

Adding new units to a Keras model layer and changing their weights

I am working on a project that requires me to add new units to the output layer of a neural network to implement a form of transfer learning. I was wondering if I could do this and set the units' weights using either Keras or TensorFlow.
Specifically I would like to append an output neuron to the output layer of the Keras model and set that neuron's initial weights and bias.

Stumbled upon the answer to my own question. Thanks everyone for the answers/comments.
https://keras.io/layers/about-keras-layers/
The first few lines of this source detail how to load and set weights.
Essentially, appending an output neuron to a Keras model can be accomplished by loading the old output layer, appending the new weights, and setting weights for a new layer. Code is below.
# Load weights of previous output layer, set weights for new layer
old_layer_weights = model.layers.pop().get_weights()
new_neuron_weights = np.ndarray(shape=[1,bottleneck_size])
# Set new weights
# Append new weights, add new layer
new_layer = Dense(num_classes).set_weights(np.append(old_layer_weights,new_neuron_weights))
model.add(new_layer)

You could add new units to the output layer of a pre-trained neural network. This form of transfer learning is said to be called using the bottleneck features of a pre-trained network. This could be implemented both in tensorflow as well as in Keras.
Please find the tutorial in Keras below:
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Also, find the tutorial for tensorflow below:
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb
Hope this helps!

Keras Functional API changing layer names in every API

When I run the functional API in the model for k-fold cross-validation, the numbers in the naming the dense layer is increased in the return fitted model of each fold.
Like in the first fold it’s dense_2_acc, then in 2nd fold its dense_5_acc.
By my model summary shows my model is correct. why is it changing the names in the fitted model history object of each fold?

This is a really good question which shows something really important about keras. The reason why names change in such manner is that keras is not clearing previously defined variables even when you overwrite the model. You can easily check that variables are still in session.graph by calling:
from keras import backend as K
K.get_session().graph.get_collection('variables')
In order to clear previous model variables one may call:
K.clear_session()
However - be careful - as you might lose an existing model. If you want to keep names the same you can simply name your layers by adding name parameter to your layer instantiation, e.g.:
Dense(10, activation='softmax', name='output')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.