Training parts of a neural net with independent optimizers - python

Consider a neural net that consists of two parts:
from torch import nn
class Model(nn.Module):
def __init__(self, x_dim, out_dim, h_dim=42):
super().__init__()
self.part1 = nn.Sequential(
nn.Linear(x_dim, h_dim),
nn.ReLU(),
...
)
self.part2 = nn.Sequential(
nn.Linear(h_dim, h_dim),
nn.ReLU(),
...
)
def forward(self, x):
x = self.part1(x)
x = self.part2(x)
return x
In principle, would it be possible to train it's model parameters with multiple optimizers - one for each part, such that
model = Model(...)
opt1 = Adam(model.part1.parameters(), lr=1e-5)
opt2 = Adam(model.part2.parameters(), lr=1e-3)
The rationale is that if we assume that model.part1 is a pre-trained network. When we train the composite Model for a downstream task, we would like to adjust the parameters of part1 to a lower degree than part2, because we expect the pre-trained parameters of part1 to be already close to a minimum.
Would such an approach work, or are there other ways of implementing that logic?

what you need is
opt = Adam([
{'params': model.part1.parameters(), 'lr': 1e-5},
{'params': model.part2.parameters(), 'lr': 1e-3))}, ...)
Where ... are keyword arguments to default to when a per-group parameter is unspecified.

Related

The Number of Classes in Pytorch Pretrained Model

I want to use the pre-trained models in Pytorch to do image classification in my own datasets, but how should I change the number of classes while freezing the parameters of the feature extraction layer?
These are the models I want to include:
resnet18 = models.resnet18(pretrained=True)
densenet161 = models.densenet161(pretrained=True)
inception_v3 = models.inception_v3(pretrained=True)
shufflenet_v2_x1_0 = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
mnasnet1_0 = models.mnasnet1_0(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
Thanks a lot in advance!
New codes I added:
import torch
from torchvision import models
class MyResModel(torch.nn.Module):
def __init__(self):
super(MyResModel, self).__init__()
self.classifier = nn.Sequential(
nn.Linear(512,256),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(256,3),
)
def forward(self, x):
return self.classifier(x)
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = MyResModel()
for param in resnet18.parameters():
param.requires_grad_(False)
You have to change the final Linear layer of the respective model.
For example in the case of resnet, when we print the model, we see that the last layer is a fully connected layer as shown below:
(fc): Linear(in_features=512, out_features=1000, bias=True)
Thus, you must reinitialize model.fc to be a Linear layer with 512 input features and 2 output features with:
model.fc = nn.Linear(512, num_classes)
For other models you can check here
To freeze the parameters of the network you have to use the following code:
for name, param in model.named_parameters():
if 'fc' not in name:
print(name, param.requires_grad)
param.requires_grad=False
To validate:
for name, param in model.named_parameters():
print(name,param.requires_grad)
Note that for this example 'fc' was the name of the classification layer. This is not the case for other models. You have to inspect the model in order to find the name of the classification layer.

Pytorch Deep Learning - Class Model() and training function

I am new to Pytorch and I am going through this tutorial to figure out how to do deep learning with this library. I have problem figuring out part of the code.
There is a class called Net and an object called model instantiated from it. Then there is the training function called train(epoch). In the next line in the train function body I see this: model.train() which I can't make any sense of. Would you please help me in understanding this part of the code? how we are calling a method of a class when this method has not been defined inside the class? and how come the method has the exact same name as function it has been called inside? This is the class definition:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(2304, 256)
self.fc2 = nn.Linear(256, 17)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.size(0), -1) # Flatten layer
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.sigmoid(x)
model = Net() # On CPU
# model = Net().cuda() # On GPU
And this is the function defined after this class:
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
# data, target = data.cuda(async=True), target.cuda(async=True) # On GPU
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.binary_cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))
train vs model.train
The def train(epochs): ... is the method to train the model and is not an attribute of Net class. model is an object of Net class, that inherits from nn.Module.
In PyTorch, all layers inherit from nn.Module and that gives them a lot of common functionality like model.children() or layer.children(), model.apply(), etc.
The model.train() is similarly implemented in nn.Module. It doesn't actually train the model or any layer that inherits from nn.Module but sets it in train mode, so say it's equivalent to doing model.set_train().
There is an equivalent method model.eval() you would call in to set the evaluation mode before testing the model.
Why do you even need it?
There can be some parameters in a layer/model that are suppose to act differently while training and evaluation modes. The obvious example are BatchNorm's γ and β

Why listing model components in pyTorch is not useful?

I am trying to create Feed forward neural networks with N layers
So idea is suppose If I want 2 inputs 3 hidden and 2 outputs than I will just pass [2,3,2] to neural network class and neural network model will get created so if I want [100,1000,1000,2]
where in this case 100 is inputs, two hidden layers contains 1000 neuron each and 2 outputs so I want fully connected neural network where I just wanted to pass list which contains number of neuron in each layer.
So for that I have written following code
class FeedforwardNeuralNetModel(nn.Module):
def __init__(self, layers):
super(FeedforwardNeuralNetModel, self).__init__()
self.fc=[]
self.sigmoid=[]
self.activationValue = []
self.layers = layers
for i in range(len(layers)-1):
self.fc.append(nn.Linear(layers[i],layers[i+1]))
self.sigmoid.append(nn.Sigmoid())
def forward(self, x):
out=x
for i in range(len(self.fc)):
out=self.fc[i](out)
out = self.sigmoid[i](out)
return out
when I tried to use it I found it kind of empty model
model=FeedforwardNeuralNetModel([3,5,10,2])
print(model)
>>FeedforwardNeuralNetModel()
and when I used following code
class FeedforwardNeuralNetModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(FeedforwardNeuralNetModel, self).__init__()
# Linear function
self.fc1 = nn.Linear(input_dim, hidden_dim)
# Non-linearity
self.tanh = nn.Tanh()
# Linear function (readout)
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Linear function
out = self.fc1(x)
# Non-linearity
out = self.tanh(out)
# Linear function (readout)
out = self.fc2(out)
return out
and when I tried to print this model I found following result
print(model)
>>FeedforwardNeuralNetModel(
(fc1): Linear(in_features=3, out_features=5, bias=True)
(sigmoid): Sigmoid()
(fc2): Linear(in_features=5, out_features=10, bias=True)
)
in my code I am just creating lists that is what difference
I just wanted to understand why in torch listing model components is not useful?
If you do print(FeedForwardNetModel([1,2,3]) it gives the following error
AttributeError: 'FeedforwardNeuralNetModel' object has no attribute '_modules'
which basically means that the object is not able to recognize modules that you have declared.
Why does this happen?
Currently, modules are declared in self.fc which is list and hence torch has no way of knowing if it is a model unless it does a deep search which is bad and inefficient.
How can we let torch know that self.fc is a list of modules?
By using nn.ModuleList (See modified code below). ModuleList and ModuleDict are python list and dictionaries respectively, but they tell torch that the list/dict contains a nn module.
#modified init function
def __init__(self, layers):
super().__init__()
self.fc=nn.ModuleList()
self.sigmoid=[]
self.activationValue = []
self.layers = layers
for i in range(len(layers)-1):
self.fc.append(nn.Linear(layers[i],layers[i+1]))
self.sigmoid.append(nn.Sigmoid())

Find input that maximises output of a neural network using Keras and TensorFlow

I have used Keras and TensorFlow to classify the Fashion MNIST following this tutorial .
It uses the AdamOptimizer to find the value for model parameters that minimize the loss function of the network. The input for the network is a 2-D tensor with shape [28, 28], and output is a 1-D tensor with shape [10] which is the result of a softmax function.
Once the network has been trained, I want to use the optimizer for another task: find an input that maximizes one of the elements of the output tensor. How can this be done? Is it possible to do so using Keras or one have to use a lower level API?
Since the input is not unique for a given output, it would be even better if we could impose some constraints on the values the input can take.
The trained model has the following format
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
I feel you would want to backprop with respect to the input freezing all the weights to your model. What you could do is:
Add a dense layer after the input layer with the same dimensions as input and set it as trainable
Freeze all the other layers of your model. (except the one you added)
As an input, feed an identity matrix and train your model based on whatever output you desire.
This article and this post might be able to help you if you want to backprop based on the input instead. It's a bit like what you are aiming for but you can get the intuition.
It would be very similar to the way that filters of a Convolutional Network is visualized: we would do gradient ascent optimization in input space to maximize the response of a particular filter.
Here is how to do it: after training is finished, first we need to specify the output and define a loss function that we want to maximize:
from keras import backend as K
output_class = 0 # the index of the output class we want to maximize
output = model.layers[-1].output
loss = K.mean(output[:,output_class]) # get the average activation of our desired class over the batch
Next, we need to take the gradient of the loss we have defined above with respect to the input layer:
grads = K.gradients(loss, model.input)[0] # the output of `gradients` is a list, just take the first (and only) element
grads = K.l2_normalize(grads) # normalize the gradients to help having an smooth optimization process
Next, we need to define a backend function that takes the initial input image and gives the values of loss and gradients as outputs, so that we can use it in the next step to implement the optimization process:
func = K.function([model.input], [loss, grads])
Finally, we implement the gradient ascent optimization process:
import numpy as np
input_img = np.random.random((1, 28, 28)) # define an initial random image
lr = 1. # learning rate used for gradient updates
max_iter = 50 # number of gradient updates iterations
for i in range(max_iter):
loss_val, grads_val = func([input_img])
input_img += grads_val * lr # update the image based on gradients
Note that, after this process is finished, to display the image you may need to make sure that all the values in the image are in the range [0, 255] (or [0,1]).
After the hints Saket Kumar Singh gave in his answer, I wrote the following that seems to solve the question.
I create two custom layers. Maybe Keras offers already some classes that are equivalent to them.
The first on is a trainable input:
class MyInputLayer(keras.layers.Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(MyInputLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.kernel = self.add_weight(name='kernel',
shape=self.output_dim,
initializer='uniform',
trainable=True)
super(MyInputLayer, self).build(input_shape)
def call(self, x):
return self.kernel
def compute_output_shape(self, input_shape):
return self.output_dim
The second one gets the probability of the label of interest:
class MySelectionLayer(keras.layers.Layer):
def __init__(self, position, **kwargs):
self.position = position
self.output_dim = 1
super(MySelectionLayer, self).__init__(**kwargs)
def build(self, input_shape):
super(MySelectionLayer, self).build(input_shape)
def call(self, x):
mask = np.array([False]*x.shape[-1])
mask[self.position] = True
return tf.boolean_mask(x, mask,axis=1)
def compute_output_shape(self, input_shape):
return self.output_dim
I used them in this way:
# Build the model
layer_flatten = keras.layers.Flatten(input_shape=(28, 28))
layerDense1 = keras.layers.Dense(128, activation=tf.nn.relu)
layerDense2 = keras.layers.Dense(10, activation=tf.nn.softmax)
model = keras.Sequential([
layer_flatten,
layerDense1,
layerDense2
])
# Compile the model
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
# ...
# Freeze the model
layerDense1.trainable = False
layerDense2.trainable = False
# Build another model
class_index = 7
layerInput = MyInputLayer((1,784))
layerSelection = MySelectionLayer(class_index)
model_extended = keras.Sequential([
layerInput,
layerDense1,
layerDense2,
layerSelection
])
# Compile it
model_extended.compile(optimizer=tf.train.AdamOptimizer(),
loss='mean_absolute_error')
# Train it
dummyInput = np.ones((1,1))
target = np.ones((1,1))
model_extended.fit(dummyInput, target,epochs=300)
# Retrieve the weights of layerInput
layerInput.get_weights()[0]
Interesting. Maybe a solution would be to feed all your data to the network and for each sample save the output_layer after softmax.
This way, for 3 classes, where you want to find the best input for class 1, you are looking for outputs where the first component is high. For example: [1 0 0]
Indeed the output means the probability, or the confidence of the network, for the sample being one of the classes.
Funny coincident I was just working on the same "problem". I'm interested in the direction of adversarial training etc. What I did was to insert a LocallyConnected2D Layer after the input and then train with data which is all one and has as targets the class of interest.
As model I use
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (28, 28, 1)
inp = tf.keras.layers.Input(shape=input_shape)
conv1 = tf.keras.layers.Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal')(inp)
pool1 = tf.keras.layers.MaxPool2D((2, 2))(conv1)
drop1 = tf.keras.layers.Dropout(0.20)(pool1)
flat = tf.keras.layers.Flatten()(drop1)
fc1 = tf.keras.layers.Dense(128, activation='relu')(flat)
norm1 = tf.keras.layers.BatchNormalization()(fc1)
dropfc1 = tf.keras.layers.Dropout(0.25)(norm1)
out = tf.keras.layers.Dense(num_classes, activation='softmax')(dropfc1)
model = tf.keras.models.Model(inputs = inp , outputs = out)
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
model.summary()
after training I insert the new layer
def insert_intermediate_layer_in_keras(model,position, before_layer_id):
layers = [l for l in model.layers]
if(before_layer_id==0) :
x = new_layer
else:
x = layers[0].output
for i in range(1, len(layers)):
if i == before_layer_id:
x = new_layer(x)
x = layers[i](x)
else:
x = layers[i](x)
new_model = tf.keras.models.Model(inputs=layers[0].input, outputs=x)
return new_model
def fix_model(model):
for l in model.layers:
l.trainable=False
fix_model(model)
new_layer = tf.keras.layers.LocallyConnected2D(1, kernel_size=(1, 1),
activation='linear',
kernel_initializer='he_normal',
use_bias=False)
new_model = insert_intermediate_layer_in_keras(model,new_layer,1)
new_model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.RMSprop(),
metrics=['accuracy'])
and finally rerun training with my fake data.
X_fake = np.ones((60000,28,28,1))
print(Y_test.shape)
y_fake = np.ones((60000))
Y_fake = tf.keras.utils.to_categorical(y_fake, num_classes)
new_model.fit(X_fake, Y_fake, epochs=100)
weights = new_layer.get_weights()[0]
imshow(weights.reshape(28,28))
plt.show()
Results are not yet satisfying but I'm confident of the approach and guess I need to play around with the optimiser.

Building recurrent neural network with feed forward network in pytorch

I was going through this tutorial. I have a question about the following class code:
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax()
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def init_hidden(self):
return Variable(torch.zeros(1, self.hidden_size))
This code was taken from Here. There it was mentioned that
Since the state of the network is held in the graph and not in the layers, you can simply create an nn.Linear and reuse it over and over again for the recurrence.
What I don't understand is, how can one just increase input feature size in nn.Linear and say it is a RNN. What am I missing here?
The network is recurrent, because you evaluate multiple timesteps in the example.
The following code is also taken from the pytorch tutorial you linked to.
loss_fn = nn.MSELoss()
batch_size = 10
TIMESTEPS = 5
# Create some fake data
batch = torch.randn(batch_size, 50)
hidden = torch.zeros(batch_size, 20)
target = torch.zeros(batch_size, 10)
loss = 0
for t in range(TIMESTEPS):
# yes! you can reuse the same network several times,
# sum up the losses, and call backward!
hidden, output = rnn(batch, hidden)
loss += loss_fn(output, target)
loss.backward()
So the network itself is not recurrent, but in this loop you use it as a recurrent network by feeding the hidden state of the previous forward step together with your batch-input multiple times.
You could also use it non-recurrent by just backpropagating the loss in every step and ignoring the hidden state.
Since the state of the network is held in the graph and not in the layers, you can simply create an nn.Linear and reuse it over and over again for the recurrence.
This means, that the information to compute the gradient is not held in the model itself, so you can append multiple evaluations of the module to the graph and then backpropagate through the full graph.
This is described in the previous paragraphs of the tutorial.

Categories

Resources