I have a sequential model with a custom loss function for training. For prediction and validation however, I want to remove one layer. Is there any way to do this? The easiest thing I could think would be within a custom metric by being able to get the value of output from a previous layer without access to the input. Alternatively, I could run prediction and verification on a separate model, but I worry about constructing a separate model because I want the weights to be saved. Any suggestions? I have spent a lot of time with this and any thing I try has involved scope issues. I took a look at this: Keras, How to get the output of each layer? but every answer I see requires me to know the inputs.
You can create separate models. Each model will need to be compiled. My solution was of this form...
inputs = Input(input_shape)
model = Conv2D(32, [3,3])(inputs)
# pass the model through some layers
# finish the model
model = Model(inputs=inputs, outputs=model)
input_2 = Input(input_shape)
second_model = model(input_2)
# pass the second model through some layers
second_model = Model(inputs=inputs, outputs=second_model)
model.compile(...
second_model.compile(...
Now any training done to second_model affects the weights of model, allowing you to do training off of second_model and predictions with model.
Related
I'm currently developing a model using Keras + Tensorflow in order to determine the temperature range of a set of proteins. What I first did was create a pre-trained model that converts the proteins into embeddings and then predicts its respective temperature.
What I want to do now is incorporate this pre=trained model to a new model which can use this given model and respective weights as input. Then fit on a new dataset and predict once again. The following code for the new top model is:
UPDATED CODE
'Load Pretrained Model'
loaded_model = keras.models.load_model('pretrained_model')
#Freeze all model layer weights
loaded_model.trainable = False
input1 = np.expand_dims(x_train['input1'],1)
input2 = np.expand_dims(x_train['input2'], 1)
input3 = x_train['input3']
#Redefine Input Layers for ANN
input1 = Input(shape = (input1.shape[1],), name = "input1")
input2 = Input(shape = (input2.shape[1],), name = "input2")
input3 = Input(shape = (input3.shape[1],), name = "input2")
base_inputs = [input1, input2, input3]
x = loaded_model(base_inputs, training = False)
x = Dense(64, activation = "relu", kernel_regularizer=regularizers.l2(0.01))(x)
output = Dense(1, activation = "sigmoid")(x)
top_model = Model(inputs = base_inputs, outputs = output)
# Compile the Model
top_model.compile(loss='mse', optimizer = Adam(lr = 0.0001), metrics = ['mse'])
This is not working correctly and I'm not sure on how to get this up and running. I'm struggling a bit to get this and come across this error quite often:
AttributeError: 'Dense' object has no attribute 'shape'
Any thoughts?
Could you please try to use the initialize the inputs using the keras layers and try?
You have initialized the input shapes using Numpy.
But, If I am right unfortunately the dense layer which you have imported from keras does not support this ('Dense' object has no attribute 'op').
Kindly note that, 'Input' is a keras layer
Could you try to as specified in the following link to initialize the keras inputs (https://keras.io/guides/functional_api/)?
As an Example,
input1 = keras.Input(shape=(1,))
input2 = keras.Input(shape=(1,))
input3 = keras.Input(shape=(1,))
It totally depends on your machine learning architecture whether to make layers trainable or not. In Case of transfer learning, You can just use the trained weights from a pre-trained model and train your new network using the trained weights acquired from that model. In this case, you have to freeze the layers of the pre-trained model. Hence trainable = False. You use these weights in the mathematical calculation of the hidden layers you will use in your custom architecture.
But from your code snippet, I could predict that you are not using any hidden layers like LSTM, RNN or any other cells for your sequential data. Also, you are trying to provide the initialized Numpy inputs to a pre-trained model. I don't think whether it is a right way to do so. From the code snippet, you are making the layer not trainable, but you are then trying to train the model.
Also, If I am right, I think that you have to train with new set of data using the Pre-trained model right? If so, then kindly look at the following link(https://keras.io/guides/transfer_learning/).
Considering your problem, I could suggest that transfer learning approach would be a feasible solution. In transfer learning, you could use the trained model from one domain of a set of data to train similar kind of problem using other set of data. To clearly understand the how to make the layers trainable and freeze it, and fine-tuning can be understood in the following link(https://keras.io/guides/transfer_learning/)
Concerning the Attribute error, it is recommended to have a look at the following link (https://keras.io/guides/functional_api/)
At first, you to initialize the input node for keras along with the shape of the inputs with respect to the data you will feed to the train the model. An Example is shown below as follows,
inputs = keras.Input(shape=(784,))
or it can be something like as follows, if you are providing the image data
img_inputs = keras.Input(shape=(32, 32, 3))
The dense layer expects the input should be in a specific shape, which you can find according to your data. If you are not sure about it, please analyse the data at first. It will give you lot of information to proceed further.
Is it possible in Keras that the training of each or some of outputs in multi-output training start at different epochs? For example one of the outputs takes some other outputs as its input. But those outputs at the beginning are quite premature and it brings huge computational burdens to the model. This output that I would like its training to be postponed to some time later is a custom layer that has to apply some image processing operations to its input which is an image generated by another output but at the beginning that the generated image is quite meaningless, I think it's just waste of time for first epochs to apply this custom layer. Is there a way to do that? Like we have weights over each output's loss, do we have different starting point for calculating each output's loss?
Build a model that does not contain the later output.
Train that model to the degree you want.
Build a new model that incorporates the old model into it.
Compile the new model with the new loss functions you want.
Train that model.
To elaborate on step 3: Keras models can be used like layers in Keras' functional API.
You can build a normal model like so:
input = Input((100,))
x = Dense(50)(input)
x = Dense(1, activation='sigmoid')(x)
model = Model(input, x)
However, if you have another standard Keras model, it can be used just like any other layer. For example, if we have a model (created with Sequential(), Model(), or keras.models.load_model()) called model1, we can put it in like this:
input = Input((100,))
x = model1(input)
x = Dense(1, activation='sigmoid')(x)
model = Model(input, x)
This would be the equivalent of putting in each layer in model1 individually.
So I'm going through this GAN tutorial, and the author sets up a discriminator like this:
model_discriminator = Sequential()
model_discriminator.add(net_discriminator)
where net_discriminator is another Sequential model.
He then sets up the adversarial model like this:
model_adversarial = Sequential()
model_adversarial.add(net_generator)
# Disable layers in discriminator
for layer in net_discriminator.layers:
layer.trainable = False
model_adversarial.add(net_discriminator)
where net_generator is another sequential model.
Both models are trained at the same time using train_on_batch.
What I don't understand is how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator. To me, they're two separate networks and training one model which contains the layers of net_discriminator should not effect the other. Also, the layers are frozen in the adversarial model so isn't that supposed to stop them from being trained?
Can someone provide me a lower level explanation of how this works? Thanks!
Answer to your first question is already been given by the author of the tutorial in the following lines where he says:
It is important to note that we add the discriminator network to a
new Sequential model and do not directly compile the discriminator
itself. We do this because the discriminator is also required in the
next step and we are able to do so by adding it to a new model before
compiling.
Our adversarial model uses random noise as its input, and outputs the
eventual prediction of the discriminator on the generated images. This why we
added the discriminator to a new model in the previous step, by doing so we
are able to reuse the network here.
So, I think the way he is creating model_discriminator model by adding net_discriminator model to a new Sequential() layer is the reason how the weights of the net_discriminator part of model_adversarial get updated by training model_discriminator, as during the training of model_discriminator, it's actually net_discriminator part of it which is getting trained.
Answer to second question:
According to the author,
If we would use normal back propagation here on the full adversarial
model we would slowly push the discriminator to update itself and
start classifying fake images as real. Namely, the target vector of
the adversarial model consists of all ones. To prevent this we must
freeze the part of the model that belongs to the discriminator.
So, the above expaination by the author clearly suggests why we want to freeze layers of discriminator part of the adverserial model. The adverserial model contains both generator and discriminator networks. The adverserial model uses random noise as its input and outputs the eventual prediction of the discriminator on the generated images. So, here the already trained discriminator network is used just for prediction, no need to involve it in training.
I want to add new layers to a pre-trained model, using Tensorflow and Keras. The problem is, those new layers are not to be added on top of the model, but at the start. I want to create a triple-siamese model, which takes 3 different inputs and gives 3 different outputs, using the pre-trained network as the core of the model. For that, I need to insert 3 new input layers at the beginning of the model.
The default path would be to just chain the layers and the model, but this method treats the pre-trained model as a new layer (when a new model with the new inputs and the pre-trained model is created, the new model only contains4 layers, the 3 input layers, and the whole pre-trained model):
input_1 = tf.keras.layers.Input(shape = (224,224,3))
input_2 = tf.keras.layers.Input(shape = (224,224,3))
input_3 = tf.keras.layers.Input(shape = (224,224,3))
output_1 = pre_trained_model(input_1)
output_2 = pre_trained_model(input_2)
output_3 = pre_trained_model(input_3)
new_model = tf.keras.Model([input_1, input_2, input_3], [output_1, output_2, output_3])
new_model has only 4 layers, due to the Keras API considering the pre_trained_model a layer.
I know that the above option works, as I have seen in many code samples, but I wonder if there is a better option for this. It feels awkward for me, because the access to inner layers of the final model will be messed up, not to mention the fact that the model will have an extra input layer after the added 3 input layers (the input layer from the pre trained model is still intact, and is totally unnecessary).
No, this does not add layers, you are making a multi-input multi-output model, where each siamese branch shares weights. There is no other API in Keras to do this, so this is your only option.
And you can always access the layers of the inner model through the pre_trained_model variable, so there is nothing lost.
Considering the example of Image classification on ImageNet, How to update the pre-trained model using the new data points.
I have loaded the pre-trained model. I have a new data point that is quite different from the distribution of the original data on which the model was previously trained. So, I would like to update/fine-tune the model with the help of new data point. How to go about doing it? Can anyone help me out in doing it? I am using pytorch 0.4.0 for implementation, running on GPU Tesla K40C.
If you don't want to change the output of the classifier (i.e. the number of classes), then you can simply continue training the model with new example images, assuming that they are reshaped to the same shape that the pretrained model accepts.
On the other hand, if you want to change the number of classes in a pre-trained model, then you can replace the last fully connected layer with a new one and train only this specific layer on new samples. Here's a sample code for this case from PyTorch's autograd mechanics notes:
model = torchvision.models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)
# Optimize only the classifier
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)