I'm trying to include a Dense layer that is not trainable and initialized as an identity matrix, as part of my tensorflow Estimator. The intuition is for this Dense layer to pass through its inputs during standard training and a fine tuning step afterward. The catch is that I don't want these weights updated at all during the initial round, only during fine tuning.
I can do several things to make these weights non-trainable, including using the trainable argument in the Dense constructor or by filtering out anything with dense in its name before passing to MomentumOptimizer.compute_gradients().
But in either case (make dense non-trainable or just don't pass it to optimizer), tf will throw an error saying that it cannot find a key related to the dense layer.
I understand that since on the first run, where dense is non-trainable, that it won't be persisted in the checkpoint file. Likewise, if it's filtered out from being passed to compute_gradients, then the same issue occurs.
Is there any method to just persist untrained variables, even with just their initialized values, across runs?
NotFoundError (see above for traceback): Key dense/kernel/Momentum not
found in checkpoint
I will answer my own question here because it wasn't immediately obvious to me, as tf documentation doesn't seem to make it clear. If you want to introduce a new trainable variable, then it will need to fundamentally be a different model in later ones. Thus in order to handle fine-tuning of existing weights, those existing weights in the new model must be resolved from the warm start settings.
So train a model, conditionally not including the fine-tune layer when your Estimator's model function runs. Train the existing model and then create another model that is separate. Technically this just means you need to use a fresh model directory, but warm start settings should point to the model you trained beforehand.
On fine-tuning runs, your model function should conditionally include the fine-tuning layers, but it should restore the weights from the previous run setting the warm start settings to look at previous model directory.
Related
I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.
Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']
I have a problem when training a neural net with Keras in Jupyter Notebook. I created a sequential model with several hidden layers. After training the model and saving the results, I want to delete this model and create a new model in the same session, as I have a for loop that checks the results for different parameters. But as I understand the errors I get, when changing the parameters, when I loop over, I am just adding layers to the model (even though I initialise it again with network = Sequential() inside the loop). So my question is, how can I completely clear the previous model or how can I initialise a completely new model in the same session?
keras.backend.clear_session() should clear the previous model. From https://keras.io/backend/:
Destroys the current TF graph and creates a new one.
Useful to avoid clutter from old models / layers.
I know it is a bit old thread, but I was looking for something to clear the session. For TensorFlow 2.8 I think you need to use tf.keras.backend.clear_session()
When I run the functional API in the model for k-fold cross-validation, the numbers in the naming the dense layer is increased in the return fitted model of each fold.
Like in the first fold it’s dense_2_acc, then in 2nd fold its dense_5_acc.
By my model summary shows my model is correct. why is it changing the names in the fitted model history object of each fold?
This is a really good question which shows something really important about keras. The reason why names change in such manner is that keras is not clearing previously defined variables even when you overwrite the model. You can easily check that variables are still in session.graph by calling:
from keras import backend as K
K.get_session().graph.get_collection('variables')
In order to clear previous model variables one may call:
K.clear_session()
However - be careful - as you might lose an existing model. If you want to keep names the same you can simply name your layers by adding name parameter to your layer instantiation, e.g.:
Dense(10, activation='softmax', name='output')
I'm trying to load three different models in the same process. Only the first one works as expected, the rest of them return like random results.
Basically the order is as follows:
define and compile first model
load trained weights before
rename layers
the same process for the second model
the same process for the third model
So, something like:
model1 = Model(inputs=Input(shape=input_size_im) , outputs=layers_firstmodel)
model1.compile(optimizer='sgd', loss='mse')
model1.load_weights(weights_first, by_name=True)
# rename layers but didn't work
model2 = Model(inputs=Input(shape=input_size_im) , outputs=layers_secondmodel)
model2.compile(optimizer='sgd', loss='mse')
model2.load_weights(weights_second, by_name=True)
# rename layers but didn't work
model3 = Model(inputs=Input(shape=input_size_im) , outputs=layers_thirdmodel)
model3.compile(optimizer='sgd', loss='mse')
model3.load_weights(weights_third, by_name=True)
# rename layers but didn't work
for im in list_images:
results_firstmodel = model1.predict(im)
results_secondmodel = model2.predict(im)
results_thirdmodel = model2.predict(im)
I'd like to perform some inference over a bunch of images. To do that the idea consists in looping over the images and perform inference with these three algorithms, and return the results.
I have tried to rename all layers to make them unique with no success. Also I created a different graph for each network, and with a different session do the inference. This works but it's very inefficient (in addition I have to set their weights every time because of sess.run(tf.global_variables_initializer()) removes them). Each time it's created a session tensorflow prints "creating tensorflow device (/device:GPU:0)".
I am running Tensorflow 1.4.0-rc0, Keras 2.1.1 and Ubuntu 16.04 kernel 4.14.
The OP is correct here. There is a serious bug when you try to load multiple weight files in the same script. The above answer doesn't solve this. If you actually interrogate the weights when loading weights for multiple models in the same script you will notice that the weights are different than when you just load weights for one model on its own. This is where the randomness is the OP observes coming from.
EDIT: To solve this problem you have to encapsulate the model.load_weight command within a function and the randomness that you are experiencing should go away. The problem is that something weird screws up when you have multiple load_weight commands in the same script like you have above. If you load those model weights with a function you issues should go away.
From the Keras docs we have this explanation for the user of load_weights:
loads the weights of the model from a HDF5 file (created by save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.
Therefore, if your architecture is unchanged you should drop the by_name=True or make it False (its default value). This could be causing the inconsistencies that you are facing, as your weights are not being loaded probably due to having different names on your layers.
Another important thing to consider is the nature of your HDF5 file, and the way you created it. If it indeed contains only the weights (created with save_weights as the docs point out) then there should be no problem in proceeding as explained before.
Now, if that HDF5 contains weights and architecture in the same file, then you should be loading it with keras.models.load_model instead (further reading if you like here). If this is the case then this would also explain those inconsistencies.
As a side suggestion, I prefer to save my models using Callbacks, like the ModelCheckpoint or the EarlyStopping if you want to automatically determine when to stop training. This not only gives you greater flexibility when training and saving your models (as you can stop them on the optimal training epoch or when you desire), but also makes loading those models easily, as you can simply use the load_model method to load both architecture and weights to your desired variable.
Finally, here is one useful SO post where saving (and loading) Keras models is explained.
I would like to train a GAN in Keras. My final target is BEGAN, but I'm starting with the simplest one. Understanding how to freeze weights properly is necessary here and that's what I'm struggling with.
During the generator training time the discriminator weights might not be updated. I would like to freeze and unfreeze discriminator alternately for training generator and discriminator alternately. The problem is that setting trainable parameter to false on discriminator model or even on its' weights doesn't stop model to train (and weights to update). On the other hand when I compile the model after setting trainable to False the weights become unfreezable. I can't compile the model after each iteration because that negates the idea of whole training.
Because of that problem it seems that many Keras implementations are bugged or they work because of some non-intuitive trick in old version or something.
I've tried this example code a couple months ago and it worked:
https://github.com/fchollet/keras/blob/master/examples/mnist_acgan.py
It's not the simplest form of GAN, but as far as I remembered, it's not too difficult to remove the classification loss and turn the model into a GAN.
You don't need to turn on/off the discriminator's trainable property and recompile. Simply create and compile two model objects, one with trainable=True (discriminator in the code) and another one with trainable=False (combined in the code).
When you're updating the discriminator, call discriminator.train_on_batch(). When you're updating the generator, call combined.train_on_batch().
Can you use tf.stop_gradient to conditionally freeze weights?
Maybe your adversarial net(generator plus discriminator) are wrote in 'Model'.
However, even you set the d.trainable=False, the independent d net are set non-trainable, but the d in the whole adversarial net is still trainable.
You can use the d_on_g.summary() before then after set d.trainable=False and you would know What I mean(pay attention to the trainable variables).