Keras Functional API changing layer names in every API - python

When I run the functional API in the model for k-fold cross-validation, the numbers in the naming the dense layer is increased in the return fitted model of each fold.
Like in the first fold it’s dense_2_acc, then in 2nd fold its dense_5_acc.
By my model summary shows my model is correct. why is it changing the names in the fitted model history object of each fold?

This is a really good question which shows something really important about keras. The reason why names change in such manner is that keras is not clearing previously defined variables even when you overwrite the model. You can easily check that variables are still in session.graph by calling:
from keras import backend as K
K.get_session().graph.get_collection('variables')
In order to clear previous model variables one may call:
K.clear_session()
However - be careful - as you might lose an existing model. If you want to keep names the same you can simply name your layers by adding name parameter to your layer instantiation, e.g.:
Dense(10, activation='softmax', name='output')

Related

Does model get retrained entirely using .fit() in sklearn and tensorflow

I am trying to use machine learning in Python. Right now I am using sklearn and TensorFlow. I was wondering what to do if I have a model that needs updating when new data comes. For example, I have financial data. I built an LSTM model with TensorFlow and trained it. But new data comes in every day, and I don't want to retrain the model every day. Is there a way just to update the model and not retrain it from scratch?
In sklearn, the documentation for .fit() method (using DecisionTreeClassifier as an example) says that it
Build a decision tree classifier from the training set (X, y).
So it seems like it will retrain the entire model from scratch.
In tensorflow, .fit() method (using Sequential as an example) say
Trains the model for a fixed number of epochs (iterations on a
dataset).
So it seems like it does update the model instead of retraining. But I am not sure if my understanding is correct. I would be grateful for some clarification. And if sklearn indeed retrains the entire model using .fit(), is there a function that would just update the model instead of retraining from scratch?
When you say update and not train. Is it just updating the weights using the new data?
If so you can adopt two approaches with Transfer learning.
Finetune: Initialise a model with the weights from old model and retrain it on new data.
Add a new layer: Add a new layer and update the weights in this layer only while freezing the remaining weights in the network.
for more details read the tensorflow guide on tansferlearning
In tensorflow, there is a method called train_on_batch() that you can call on your model.
Say you defined your model as sequential, and you initially trained it on the existing initial_dataset using the fit method.
Now, you have new data in your hand -> call it X_new_train,y_new_train
so you can update the existing model using train_on_batch()
An example would be:
#generate some X_new_train (one batch)
X_new_train = tf.random.normal(shape=[no_of_samples_in_one_batch,100])
#generate corresponding y_new_train
y_new_train = tf.constant([[1.0]]*no_of_samples_in_one_batch)
model.train_on_batch(X_new_train,y_new_train)
Note that the idea of no_of_samples_in_one_batch (also called batch size) is not so important here. I mean whatever number of samples that you have in your data will be considered as one batch!
Now, coming to sklearn, I am not sure whether all machine learning models can incrementally learn (update weights from new examples). There is a list of models that support incremental learning:
https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning
In sklearn, the .fit() method retrains on the dataset i.e as you use .fit() on any dataset, any info pertaining to previous training will all be discarded. So assuming you have new data coming in every day you will have to retrain each time in the case of most sklearn algorithms.
Although, If you like to retrain the sklearn models instead of training from scratch, some algorithms of sklearn (like SGDClassifier) provide a method called partial_fit(). These can be used to retrain and update the weights of an existing model.
As per Tensorflow, the .fit() method actually trains the model without discarding any info pertaining to previous trainings. Hence each time .fit() is used via TF it will actually retrain the model.
Tip: you can use SaveModel from TF to save the best model and reload and re-train the model as and when more data keeps flowing in.

Can I run a metric on part of the model in keras?

I have a sequential model with a custom loss function for training. For prediction and validation however, I want to remove one layer. Is there any way to do this? The easiest thing I could think would be within a custom metric by being able to get the value of output from a previous layer without access to the input. Alternatively, I could run prediction and verification on a separate model, but I worry about constructing a separate model because I want the weights to be saved. Any suggestions? I have spent a lot of time with this and any thing I try has involved scope issues. I took a look at this: Keras, How to get the output of each layer? but every answer I see requires me to know the inputs.
You can create separate models. Each model will need to be compiled. My solution was of this form...
inputs = Input(input_shape)
model = Conv2D(32, [3,3])(inputs)
# pass the model through some layers
# finish the model
model = Model(inputs=inputs, outputs=model)
input_2 = Input(input_shape)
second_model = model(input_2)
# pass the second model through some layers
second_model = Model(inputs=inputs, outputs=second_model)
model.compile(...
second_model.compile(...
Now any training done to second_model affects the weights of model, allowing you to do training off of second_model and predictions with model.

saving and loading RNN hidden states in PyTorch

I am trying to use an RNN network in PyTorch for regression task. In the training phase the model is learned. I want to use the trained model in testing phase. For this purpose I have saved the learned model by:
torch.save(learned_model, "model_path")
Then I can load the model again by:
loaded_model = torch.load("model_path")
For testing phase I must use this loaded model but I want to know what is the value of the first hidden state of the model? I can initialize the first hidden state by zero but I think maybe this is not correct. Is there any function other than torch.save which can return the last hidden state in the learned mode? Then I can restore that hidden state and use it as the first hidden state in the loaded model for testing phase.
Thanks in advance.
Your question is a bit unclear. As far as I understand you want to know the weights of the last hidden layer in the trained model, i.e. loaded_model. In that case, you can simply use model's state_dict, which is basically a python dictionary object that maps each layer to its parameter tensor. Read more about it from here.
for param in loaded_model.state_dict():
print(param)
Sample output:
rnn.weight_ih_l0
rnn.weight_hh_l0
rnn.bias_ih_l0
rnn.bias_hh_l0
out.weight
out.bias
After that, you can get the weights of the last hidden layer using below code:
out_weights, out_bias = loaded_model.state_dict()['out.weight'], loaded_model.state_dict()['out.bias']

How to persist non-trainable variables in tf.Estimator checkpoint?

I'm trying to include a Dense layer that is not trainable and initialized as an identity matrix, as part of my tensorflow Estimator. The intuition is for this Dense layer to pass through its inputs during standard training and a fine tuning step afterward. The catch is that I don't want these weights updated at all during the initial round, only during fine tuning.
I can do several things to make these weights non-trainable, including using the trainable argument in the Dense constructor or by filtering out anything with dense in its name before passing to MomentumOptimizer.compute_gradients().
But in either case (make dense non-trainable or just don't pass it to optimizer), tf will throw an error saying that it cannot find a key related to the dense layer.
I understand that since on the first run, where dense is non-trainable, that it won't be persisted in the checkpoint file. Likewise, if it's filtered out from being passed to compute_gradients, then the same issue occurs.
Is there any method to just persist untrained variables, even with just their initialized values, across runs?
NotFoundError (see above for traceback): Key dense/kernel/Momentum not
found in checkpoint
I will answer my own question here because it wasn't immediately obvious to me, as tf documentation doesn't seem to make it clear. If you want to introduce a new trainable variable, then it will need to fundamentally be a different model in later ones. Thus in order to handle fine-tuning of existing weights, those existing weights in the new model must be resolved from the warm start settings.
So train a model, conditionally not including the fine-tune layer when your Estimator's model function runs. Train the existing model and then create another model that is separate. Technically this just means you need to use a fresh model directory, but warm start settings should point to the model you trained beforehand.
On fine-tuning runs, your model function should conditionally include the fine-tuning layers, but it should restore the weights from the previous run setting the warm start settings to look at previous model directory.

How can I clear a model created with Keras and Tensorflow(as backend)?

I have a problem when training a neural net with Keras in Jupyter Notebook. I created a sequential model with several hidden layers. After training the model and saving the results, I want to delete this model and create a new model in the same session, as I have a for loop that checks the results for different parameters. But as I understand the errors I get, when changing the parameters, when I loop over, I am just adding layers to the model (even though I initialise it again with network = Sequential() inside the loop). So my question is, how can I completely clear the previous model or how can I initialise a completely new model in the same session?
keras.backend.clear_session() should clear the previous model. From https://keras.io/backend/:
Destroys the current TF graph and creates a new one.
Useful to avoid clutter from old models / layers.
I know it is a bit old thread, but I was looking for something to clear the session. For TensorFlow 2.8 I think you need to use tf.keras.backend.clear_session()

Categories

Resources