Why listing model components in pyTorch is not useful? - python

I am trying to create Feed forward neural networks with N layers
So idea is suppose If I want 2 inputs 3 hidden and 2 outputs than I will just pass [2,3,2] to neural network class and neural network model will get created so if I want [100,1000,1000,2]
where in this case 100 is inputs, two hidden layers contains 1000 neuron each and 2 outputs so I want fully connected neural network where I just wanted to pass list which contains number of neuron in each layer.
So for that I have written following code
class FeedforwardNeuralNetModel(nn.Module):
def __init__(self, layers):
super(FeedforwardNeuralNetModel, self).__init__()
self.fc=[]
self.sigmoid=[]
self.activationValue = []
self.layers = layers
for i in range(len(layers)-1):
self.fc.append(nn.Linear(layers[i],layers[i+1]))
self.sigmoid.append(nn.Sigmoid())
def forward(self, x):
out=x
for i in range(len(self.fc)):
out=self.fc[i](out)
out = self.sigmoid[i](out)
return out
when I tried to use it I found it kind of empty model
model=FeedforwardNeuralNetModel([3,5,10,2])
print(model)
>>FeedforwardNeuralNetModel()
and when I used following code
class FeedforwardNeuralNetModel(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(FeedforwardNeuralNetModel, self).__init__()
# Linear function
self.fc1 = nn.Linear(input_dim, hidden_dim)
# Non-linearity
self.tanh = nn.Tanh()
# Linear function (readout)
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Linear function
out = self.fc1(x)
# Non-linearity
out = self.tanh(out)
# Linear function (readout)
out = self.fc2(out)
return out
and when I tried to print this model I found following result
print(model)
>>FeedforwardNeuralNetModel(
(fc1): Linear(in_features=3, out_features=5, bias=True)
(sigmoid): Sigmoid()
(fc2): Linear(in_features=5, out_features=10, bias=True)
)
in my code I am just creating lists that is what difference
I just wanted to understand why in torch listing model components is not useful?

If you do print(FeedForwardNetModel([1,2,3]) it gives the following error
AttributeError: 'FeedforwardNeuralNetModel' object has no attribute '_modules'
which basically means that the object is not able to recognize modules that you have declared.
Why does this happen?
Currently, modules are declared in self.fc which is list and hence torch has no way of knowing if it is a model unless it does a deep search which is bad and inefficient.
How can we let torch know that self.fc is a list of modules?
By using nn.ModuleList (See modified code below). ModuleList and ModuleDict are python list and dictionaries respectively, but they tell torch that the list/dict contains a nn module.
#modified init function
def __init__(self, layers):
super().__init__()
self.fc=nn.ModuleList()
self.sigmoid=[]
self.activationValue = []
self.layers = layers
for i in range(len(layers)-1):
self.fc.append(nn.Linear(layers[i],layers[i+1]))
self.sigmoid.append(nn.Sigmoid())

Related

Training parts of a neural net with independent optimizers

Consider a neural net that consists of two parts:
from torch import nn
class Model(nn.Module):
def __init__(self, x_dim, out_dim, h_dim=42):
super().__init__()
self.part1 = nn.Sequential(
nn.Linear(x_dim, h_dim),
nn.ReLU(),
...
)
self.part2 = nn.Sequential(
nn.Linear(h_dim, h_dim),
nn.ReLU(),
...
)
def forward(self, x):
x = self.part1(x)
x = self.part2(x)
return x
In principle, would it be possible to train it's model parameters with multiple optimizers - one for each part, such that
model = Model(...)
opt1 = Adam(model.part1.parameters(), lr=1e-5)
opt2 = Adam(model.part2.parameters(), lr=1e-3)
The rationale is that if we assume that model.part1 is a pre-trained network. When we train the composite Model for a downstream task, we would like to adjust the parameters of part1 to a lower degree than part2, because we expect the pre-trained parameters of part1 to be already close to a minimum.
Would such an approach work, or are there other ways of implementing that logic?
what you need is
opt = Adam([
{'params': model.part1.parameters(), 'lr': 1e-5},
{'params': model.part2.parameters(), 'lr': 1e-3))}, ...)
Where ... are keyword arguments to default to when a per-group parameter is unspecified.

How to use python spektral library DisjointLoader to feed FC network instead of Graph Isomorphism Network?

I want to compare the performance of classification problem using GIN vs. Fully Connected Network. I have started with example from the spektral library TUDataset classification with GIN. I have created custom dataset for my problem and it is being loaded using DisjointLoader from spektral.data.
I am seeing my supervised learning is showing good results on this data using GIN network. However, to compare these results with Fully Connected network, I am facing problem in loading inputs from dataset into FC network input. The dataset is stored in graph format with Node attributes matrix and adjacency matrix. There are 18 nodes in the graph and each node has 7 attributes in the attribute matrix.
I have tried loading the FC network with just Node attributes matrix but I am facing mismatch error.
here is the FC network that I have defined instead of GIN0 network from the example shared above:
class FCN0(Model):
def __init__(self, channels, outputs):
super().__init__()
self.dense1 = Dense(channels, activation="relu")
self.dropout = Dropout(0.5)
self.dense2 = Dense(channels*3, activation="relu")
self.dense3 = Dense(outputs, activation="relu")
def call(self, inputs):
x, a, i = inputs
x = self.dense1(x)
x = self.dense2(x)
return self.dense3(x)
The error message is as follows:
File "/home/xx/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [126,1] vs. [7,1]
[[node gradient_tape/mean_squared_error/BroadcastGradientArgs (defined at dut_gin_vs_circuits.py:147) ]] [Op:__inference_train_step_588]
Function call stack:
train_step
I would highly appreciate, if someone can help me identify what transformations are needed at the input to be fit the data into to FC network while loading data using the same dataset loader.
The problem is that your FC network does not have a global pooling layer (also sometimes called "readout"), and so the output of the network will have shape (batch_size * 18, 1) instead of (batch_size, 1) which is the shape of the target.
Essentially, your FC network is suitable for node-level prediction, but not graph-level prediction.
To fix this, you can introduce a global pooling layer as follows:
from spektral.layers import GlobalSumPool
class FCN0(Model):
def __init__(self, channels, outputs):
super().__init__()
self.dense1 = Dense(channels, activation="relu")
self.dropout = Dropout(0.5)
self.dense2 = Dense(channels*3, activation="relu")
self.dense3 = Dense(outputs, activation="relu")
self.pool = GlobalSumPool()
def call(self, inputs):
x, a, i = inputs
x = self.dense1(x)
x = self.dense2(x)
x = self.dense3(x)
return self.pool([x, i]) # Only pass `i` if in disjoint mode
You can move the pooling layer wherever in your computational graph, the important thing is that at some point you reduce the node-level representation to a graph-level representation.
Cheers

The Number of Classes in Pytorch Pretrained Model

I want to use the pre-trained models in Pytorch to do image classification in my own datasets, but how should I change the number of classes while freezing the parameters of the feature extraction layer?
These are the models I want to include:
resnet18 = models.resnet18(pretrained=True)
densenet161 = models.densenet161(pretrained=True)
inception_v3 = models.inception_v3(pretrained=True)
shufflenet_v2_x1_0 = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
mnasnet1_0 = models.mnasnet1_0(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
Thanks a lot in advance!
New codes I added:
import torch
from torchvision import models
class MyResModel(torch.nn.Module):
def __init__(self):
super(MyResModel, self).__init__()
self.classifier = nn.Sequential(
nn.Linear(512,256),
nn.ReLU(),
nn.Dropout(p=0.5),
nn.Linear(256,3),
)
def forward(self, x):
return self.classifier(x)
resnet18 = models.resnet18(pretrained=True)
resnet18.fc = MyResModel()
for param in resnet18.parameters():
param.requires_grad_(False)
You have to change the final Linear layer of the respective model.
For example in the case of resnet, when we print the model, we see that the last layer is a fully connected layer as shown below:
(fc): Linear(in_features=512, out_features=1000, bias=True)
Thus, you must reinitialize model.fc to be a Linear layer with 512 input features and 2 output features with:
model.fc = nn.Linear(512, num_classes)
For other models you can check here
To freeze the parameters of the network you have to use the following code:
for name, param in model.named_parameters():
if 'fc' not in name:
print(name, param.requires_grad)
param.requires_grad=False
To validate:
for name, param in model.named_parameters():
print(name,param.requires_grad)
Note that for this example 'fc' was the name of the classification layer. This is not the case for other models. You have to inspect the model in order to find the name of the classification layer.

Naming weights of a layer within a custom layer

I have a custom layer within a Dense sublayer. I want to be able to name the weights of this sublayer. However, using name="my_dense" on the sublayer initializer doesn't seem to do this; the weights simply get named after the outer custom layer.
To illustrate the problem, suppose I want a custom layer that simply stacks two dense layers. I'll print the names of the weights of this custom layer.
class DoubleDense(keras.layers.Layer):
def __init__(self, units, **kwargs):
self.dense1 = keras.layers.Dense(units, name="first_dense")
self.dense2 = keras.layers.Dense(units, name="second_dense")
super(DoubleDense, self).__init__(**kwargs)
def build(self, input_shape):
self.dense1.build(input_shape)
self.dense2.build(self.dense1.units)
def call(self, input):
hidden = self.dense1(input)
return self.dense2(hidden)
dd = DoubleDense(3)
# We need to evaluate the layer once to build the weights
trivial_input = tf.ones((1,10))
output = dd(trivial_input)
# Print the names of all variables in the DoubleDense layer
print([weight.name for weight in dd.weights])
The output is this:
['double_dense_1/kernel:0',
'double_dense_1/bias:0',
'double_dense_1/kernel:0',
'double_dense_1/bias:0']
...but I was expecting something more like this:
['double_dense_1/first_dense_1/kernel:0',
'double_dense_1/first_dense_1/bias:0',
'double_dense_1/second_dense_1/kernel:0',
'double_dense_1/second_dense_1/bias:0']
So, Keras has named these weights ambiguously; there is no way to tell whether a weight tensor belongs to dd.dense1 or dd.dense2 by its name alone. I realise I could select the layer first and then the weights (dd.dense1.weights), but I would prefer not to do this in my application.
Is there a way to name the weights of a sublayer of a custom layer?
If you want the name for the subclass layers you need to include name_scope and then call build for each layer.
Below is the modified code which will give names for each layer in the output.
class DoubleDense(keras.layers.Layer):
def __init__(self, units, **kwargs):
self.dense1 = keras.layers.Dense(units)
self.dense2 = keras.layers.Dense(units)
super(DoubleDense, self).__init__( **kwargs)
def build(self, input_shape):
with tf.name_scope("first_dense"):
self.dense1.build(input_shape)
with tf.name_scope("second_dense"):
self.dense2.build(self.dense1.units)
def call(self, input):
hidden = self.dense1(input)
return self.dense2(hidden)
dd = DoubleDense(3)
# We need to evaluate the layer once to build the weights
trivial_input = tf.ones((1,10))
output = dd(trivial_input)
# Print the names of all variables in the DoubleDense layer
print([weight.name for weight in dd.weights])
Output:
['double_dense/first_dense/kernel:0', 'double_dense/first_dense/bias:0', 'double_dense/second_dense/kernel:0', 'double_dense/second_dense/bias:0']
Hope this answers your question, Happy Learning!

Building recurrent neural network with feed forward network in pytorch

I was going through this tutorial. I have a question about the following class code:
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax()
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def init_hidden(self):
return Variable(torch.zeros(1, self.hidden_size))
This code was taken from Here. There it was mentioned that
Since the state of the network is held in the graph and not in the layers, you can simply create an nn.Linear and reuse it over and over again for the recurrence.
What I don't understand is, how can one just increase input feature size in nn.Linear and say it is a RNN. What am I missing here?
The network is recurrent, because you evaluate multiple timesteps in the example.
The following code is also taken from the pytorch tutorial you linked to.
loss_fn = nn.MSELoss()
batch_size = 10
TIMESTEPS = 5
# Create some fake data
batch = torch.randn(batch_size, 50)
hidden = torch.zeros(batch_size, 20)
target = torch.zeros(batch_size, 10)
loss = 0
for t in range(TIMESTEPS):
# yes! you can reuse the same network several times,
# sum up the losses, and call backward!
hidden, output = rnn(batch, hidden)
loss += loss_fn(output, target)
loss.backward()
So the network itself is not recurrent, but in this loop you use it as a recurrent network by feeding the hidden state of the previous forward step together with your batch-input multiple times.
You could also use it non-recurrent by just backpropagating the loss in every step and ignoring the hidden state.
Since the state of the network is held in the graph and not in the layers, you can simply create an nn.Linear and reuse it over and over again for the recurrence.
This means, that the information to compute the gradient is not held in the model itself, so you can append multiple evaluations of the module to the graph and then backpropagate through the full graph.
This is described in the previous paragraphs of the tutorial.

Categories

Resources