Softmax vs ReLU in output layer

Softmax vs ReLU in output layer - python

I have a dataset with 2 classes.
By simply using SGDClassifier from sklearn, I get the perfect confusion matrix for the test data:
array([[1081, 0],
[0, 982]])
I want to reproduce this result with a pyTorch, but remove the binary classification.
The code looks something like this:
class TestClassificator(nn.Module):
def __init__(self) -> None:
super().__init__()
self.input_layer = nn.Linear(12, 128)
self.hidden_layer = nn.Linear(128, 128)
self.output_layer = nn.Linear(128, 2)
self.hidden_activation = nn.ReLU()
self.output_activation = nn.Softmax(dim=1)
def forward(self, inputs):
#
x = self.input_layer(inputs)
x = self.hidden_activation(x)
#
x = self.hidden_layer(x)
x = self.output_activation(x)
#
x = self.output_layer(x)
return x
I want to use softmax so that the sum of all outputs is 1. In this case, the confusion matrix:
confusion_matrix(DataFrame(labels).values.argmax(axis=1), DataFrame(y_pred_list).values.argmax(axis=1))
array([[1072, 9],
[0, 982]])
If I use a ReLU at the output, I will get perfect results again.
What's my mistake? Why does softmax work worse in this case?

Related

How to initialise (and check sanity) weights efficiently of layers within complex (nested) modules in PyTorch?

Looking for an efficient way to access nested Modules and Layers to set the weights
I am replicating the DCGAN Paper and my code works as expected. I found out that in the paper, the authors said that:
All weights were initialized from a zero-centered Normal distribution
with standard deviation 0.02
This awesome answer explains that it can be done using torch.nn.init.normal_(nn.Conv2d(1,1,1, 1,1 ).weight.data, 0.0, 0.02) but I have complex structure using ModuleList and others. What is the most efficient way of doing this?
By Complex, please look at the code below for my implementation:
'''
Implement the Deep Convolution Gan AKA DCGAN in Pytorch: Paper at https://arxiv.org/pdf/1511.06434v2.pdf
'''
import torch
import torch.nn as nn
class GeneratorBlock(nn.Module):
'''
Generator Block uses TransposedConv2D -> Batch Norm (except LAST block) -> Relu
Note: kernel_size = 4, stride = 2, padding = 1 is used in the paper. When BatchNorm is used, Bias is not used for Conv2D
'''
def __init__(self, in_channels, out_channels, kernel_size = 4, stride = 2, padding = 1, use_batchnorm:bool = True):
super().__init__()
self.use_batchnorm = use_batchnorm
self.transpose_conv = nn.ConvTranspose2d(in_channels, out_channels, kernel_size = kernel_size, stride=stride, padding=padding, bias = not self.use_batchnorm)
self.batch_norm = nn.BatchNorm2d(out_channels) if self.use_batchnorm else None
self.activation = nn.ReLU() # Paper uses Relu in Generator Network
def forward(self, x):
x = self.transpose_conv(x)
return self.activation(self.batch_norm(x)) if self.use_batchnorm else self.activation(x)
class Generator(nn.Module):
'''
Generate Images using Transposed Convolution. Input is a random noise of [Batch, 100, 1,1] Dimension and then upsampled
'''
def __init__(self, input_features = 100, base_feature = 128, final_channels:int = 1):
'''
We use nn.Sequantial here just to show the workings. If you want to make the layers dynamically using a loop, find nn.ModuleList() in the Descriminator block. Both works same
So we'll use 'base_feature = 64' as a base for input and output channels
args:
input_features: The shape of Random Noise from which an image will be generated
base_feature: The shape of feature map or number or channels which will act as out base. Other inputs and outputs will be calculated based on this
final_channels: The channels / features which will be sent to the Discriminator as an input
'''
super(Generator, self).__init__()
# in Descriminator, we do the same work using ModuleList(). Uses 4 blocks
self.blocks = nn.Sequential(
GeneratorBlock(in_channels = input_features, out_channels = base_feature * 8, stride = 1, padding = 0), # from Random Noise, Generate 1024 features
GeneratorBlock(in_channels = base_feature * 8, out_channels = base_feature * 4), # 1024 -> 512 features
GeneratorBlock(in_channels = base_feature * 4, out_channels = base_feature * 2), # 512 -> 256 features
GeneratorBlock(in_channels = base_feature * 2, out_channels = base_feature), # 256 -> 128 features
nn.ConvTranspose2d(base_feature, final_channels, kernel_size = 4, stride = 2, padding = 1)# 128 -> final feature. It is just GeneratorBlock without ReLu and BatchNorm ;)
)
self.activation = nn.Tanh() # To make the outputs between [-1,1]
def forward(self, x):
'''
Takes Random Noise as input and Generte features from that
'''
return self.activation(self.blocks(x))
class DiscriminatorBlock(nn.Module):
'''
Discriminator Block uses Conv2D -> Batch Norm (except FIRST block) -> LeakyRelu
Note: kernel_size = 4, stride = 2, padding = 1 is used in the paper. When BatchNorm is used, Bias is not used for Conv2D
'''
def __init__(self, in_channels, out_channels, kernel_size = 4, stride = 2, padding = 1, use_batchnorm:bool = True):
super().__init__()
self.use_batchnorm = use_batchnorm
self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding, bias = not self.use_batchnorm)
self.batch_norm = nn.BatchNorm2d(out_channels) if self.use_batchnorm else None
self.activation = nn.LeakyReLU(0.2)
def forward(self, x):
x = self.conv(x)
return self.activation(self.batch_norm(x)) if self.use_batchnorm else self.activation(x)
class Discriminator(nn.Module):
'''
CNNs to classify whether the image generated by the Generator are as good as the real ones
Feature Changes as :: 1 -> 64 -> 128 -> 256 -> 512 -> 1
'''
def __init__(self, input_features = 1, output_features = 1, middle_features = [64,128,256]):
'''
In the paper, they take in a feature of [Batch, 1, 64, 64] from the Generator and then output a single number per sample in the batch
'''
super().__init__()
self.layers = nn.ModuleList() # Just a fancy method of stacking layers using loop
# in the paper, the first layer does not use BatchNorm
self.layers.append(DiscriminatorBlock(input_features, middle_features[0], use_batchnorm = False)) # 1 -> 64 Because the input has 1 channel
for i, channel in enumerate(middle_features): # total 4 blocks are used in paper. 1 has already been used in the line above. 3 blocks are these
self.layers.append(DiscriminatorBlock(channel, channel*2)) # 64 -> 128 --- 128 -> 256 --- 256 -> 512
self.final_conv = nn.Conv2d(in_channels = middle_features[-1]*2, out_channels = output_features, kernel_size = 4, stride = 2, padding = 0) # Input from previous layer 512 -> 1
self.sigmoid_layer = nn.Sigmoid() # gives whether an image is real or fake or more precisely, how CLOSE is it to the real image
def forward(self, x):
for layer in self.layers:
x = layer(x)
return self.sigmoid_layer(self.final_conv(x))
def test_DCGAN_code():
noise = torch.rand(10,100,1,1)
image = Generator()(noise)
result = Discriminator()(image)
print('Model Built Successfully!!! Generating 10 random samples and their end results')
print(f"'Z' random Noise shape: {noise.shape} || Generator output shape: {image.shape} || Discriminator shape: {result.shape}")

You can simply iterate over all submodules, at the end of your __init__ method:
class Generator(nn.Module):
def __init__(self, ....):
# all code here
# ...
# init weights, at the very bottom of __init__
for sm in self.modules():
if isinstance(sm, nn.Conv2d):
# only conv2d will be initialized in this way
torch.nn.init.normal_(sm.weight.data, 0.0, 0.02)
done.

Found some answer to this. Just want to know it this is the right approach:
def initialise_weights(m):
if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d)):
nn.init.normal_(m.weight.data, 0.0, 0.02)
def check_sanity(m):
if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d)):
print(m.weight.data.mean(), m.weight.data.std())
gen = Generator()
gen = gen.apply(initialise_weights)
gen = gen.apply(check_sanity)

The accepted answer is the best answer (an alternative would be going to class _ConvNd and modify the source in other words replace the init.kaiming_uniform_(self.weight, a=math.sqrt(5))). All said and done though the best practice is to define another method called reset_parameters() put it at the end of your __init__(self, *args) and change the parameters there:
class Generator(nn.Module):
def __init__(self, *args) -> None:
...
self.reset_parameters()
def reset_parameters(self) -> None:
# comments
for sm in self.modules():
if isinstance(sm, nn.Conv2d):
torch.nn.init.normal_(
sm.weight.data,
mean=0.0,
std=0.02
)

How to allow complex inputs, and complex weights to a Pytorch model?

Assume even the simplest model (taken from here)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
When feeding complex data to the model,
output = model(data.complex())
it gives
ret = torch.addmm(bias, input, weight.t())
RuntimeError: expected scalar type Float but found ComplexDouble
(I didn't copy the entire stack trace, nor the entire training code, for question simplicity)
doing self.complex() after the model's __init__, as I normally would do self.double(), doesn't work, with
torch.nn.modules.module.ModuleAttributeError: 'Net' object has no attribute 'complex'
How to allow model's weights to be complex?
How to allow complex input to a model?
Which built-in activation functions support this?
Is anything also supported for 1d operations?
EDIT:
In the meantime, I found
this paper. Still reading it.

As you normally did self.double(), I found self.type(dst_type) from https://pytorch.org/docs/stable/generated/torch.nn.Module.html
In my case, self.type(torch.complex64) is working for me.

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Given a pytorch input dataset with dimensions:
dat.shape = torch.Size([128, 3, 64, 64])
This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.
I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:
two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels
Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:
Output of Convolutional layer -> Input of Linear [Readout] layer
class CNNClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.linear1 = nn.Linear(32*16*16, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(x) # Size mismatch error HERE
x = self.softmax1(x)
return x
Training of the model is started by :
Xout = model(dat)
This results in :
RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]
at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:
..
x = x.view(x.size(0), -1)
x = self.linear1(x) # Size mismatch error HERE
If that is included then the error changes to
RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]
Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.

The error
You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].
You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).
Use this for nn.Linear:
self.linear1 = nn.Linear(32 * 7 * 7, C)
And this in forward:
x = self.linear1(x.view(x.shape[0], -1))
Other possibilities
Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:
class CNNClassifier(torch.nn.Module):
def __init__(self, C=10):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
self.linear1 = nn.Linear(32, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(self.pooling(x).view(x.shape[0], -1))
x = self.softmax1(x)
return x
So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.

So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.
If you will add print(x.shape) before the entrance to the fully connected layer you will get:
torch.Size([Batch, 32, 7, 7])
So your calculation should have been 7*7*32:
self.linear1 = nn.Linear(32*7*7, C)
And then using:
x = x.view(x.size(0), -1)
x = self.linear1(x)
Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?

Classifier Loss function dimension out of range error

import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=2, padding=2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=5,stride=2)
self.fc = nn.Linear(884736, 1000)
self.fc1 = nn.Linear(1000, 600)
self.fc2 = nn.Linear(600, 200)
self.fc3 = nn.Linear(200, 6)
self.pooling = nn.MaxPool2d(2, 2)
def forward(self, x):
x = self.conv1(x)
x = nn.functional.relu(x)
x= self.pooling(x)
x= self.conv2(x)
x = torch.flatten(nn.functional.relu(x))
x= self.fc(x)
x = nn.functional.relu(x)
# import pdb; pdb.set_trace()
x= self.fc1(x)
x= self.fc2(x)
x= self.fc3(x)
# x = torch.softmax(x)
return x
# model = torch.nn.Sequential(
# )
model = MyModel()
#Training
dataiter = iter(trainloader)
total_epochs = 5
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())
for epoch in tqdm(range(total_epochs)):
#initialize batch
gc.collect()
input_, label_ = dataiter.next()
#forwardd
out = model.forward(input_)
#backwardd
print (out,out.shape,)
print (label_, label_.shape)
# out = out.unsqueeze(dim=0)
# label_ =label_.type_as(out)
loss = criterion(out, label_)
loss.backward()
optimizer.zero_grad()
optimizer.step()
print('batch_loss:', str(loss.item()))
print('Epochs completed:', epoch+1,'\n')
print('epoch_loss:' + loop_loss/float(batch_size))
I have a dataset of different breed of dogs (120 classes)
http://vision.stanford.edu/aditya86/ImageNetDogs/images.tar
The labels are int values ranging from 1 to 120
I need to make a classifier
Getting an error at loss computation
Dimension out of range (expected to be in range of [-1, 0], but got 1)
What could be wrong?

The output of the model has only a single dimension, it has the size [6], but the nn.CrossEntropyLoss expects a size of [batch_size, num_classes].
In the model you flatten the output of the convolutions. You have to preserve the batch dimension, as they are independent of each other and flattening it completely would combine them into a single one. torch.flatten accepts a start_dim argument (second argument), which decides from which dimension it starts to flatten. By setting it to 1, it will start with the second dimension, leaving the first dimension (batch dimension) unchanged.
# Flatten everything but the first dimension
# From: [batch_size, channels, height, width] (4D)
# To: [batch_size, channels * height * width] (2D)
x = torch.flatten(nn.functional.relu(x), 1)
The output of the model must also have the same number of classes as your dataset. Since you have 120 classes, the output of the last linear layer must be 120.
self.fc3 = nn.Linear(200, 120)
Also, the labels need to be in range [0, 119], because they are the indices of the classes and like every indexing in Python, it is zero-based. If your labels are in range [1, 120], you can simply subtract one from them.

RuntimeError: expected stride to be a single integer value

I am new at Pytorch sorry for the basic question. The model gives me dimension mismatch error how to solve this ?
Maybe more than one problems in it.
Any help would be appriciated.
Thanks
class PR(nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = nn.Conv2d(3,6,kernel_size=5)
self.conv2 = nn.Conv2d(6,1,kernel_size=2)
self.dens1 = nn.Linear(300, 256)
self.dens2 = nn.Linear(256, 256)
self.dens3 = nn.Linear(512, 24)
self.drop = nn.Dropout()
def forward(self, x):
out = self.conv1(x)
out = self.conv2(x)
out = self.dens1(x)
out = self.dens2(x)
out = self.dens3(x)
return out
model = PR()
input = torch.rand(28,28,3)
output = model(input)

Please have a look at the corrected code. I numbered the lines where I did corrections and described them below.
class PR(torch.nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = torch.nn.Conv2d(3,6, kernel_size=5) # (2a) in 3x28x28 out 6x24x24
self.conv2 = torch.nn.Conv2d(6,1, kernel_size=2) # (2b) in 6x24x24 out 1x23x23 (6)
self.dens1 = torch.nn.Linear(529, 256) # (3a)
self.dens2 = torch.nn.Linear(256, 256)
self.dens3 = torch.nn.Linear(256, 24) # (4)
self.drop = torch.nn.Dropout()
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out) # (5)
out = out.view(-1, 529) # (3b)
out = self.dens1(out)
out = self.dens2(out)
out = self.dens3(out)
return out
model = PR()
ins = torch.rand(1, 3, 28, 28) # (1)
output = model(ins)
First of all, pytorch handles image tensors (you perform 2d convolution therefore I assume this is an image input) as follows: [batch_size x image_depth x height width]
It is important to understand how the convolution with kernel, padding and stride works. In your case kernel_size is 5 and you have no padding (and stride 1). This means that the dimensions of the feature-map gets reduced (as depicted). In your case the first conv. layer takes a 3x28x28 tensor and produces a 6x24x24 tensor, the second one takes 6x24x24 out 1x23x23. I find it very useful to have comments with the in and out tensor dimensions next to the definition conv layers (see in the code above)
Here you need to "flatten" the [batch_size x depth x height x width] tensor to [batch_size x fully connected input]. This can be done via tensor.view().
There was a wrong input for the linear layer
Each operation in the forward-pass took the input value x, instead I think you might want to pass the results of each layer to the next one
Altough this code is now runnable, it does not mean that it makes perfect sense. The most important thing (for neural networks in general i would say) are activation functions. These are missing completely.
For getting started with neural networks in pytorch I can highly recommend the great pytorch tutorials: https://pytorch.org/tutorials/ (I would start with the 60min blitz tutorial)
Hope this helps!

There are few problems with your code. I've reviewed and corrected it below:
class PR(nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
self.conv2 = nn.Conv2d(6, 1, kernel_size=2)
# 300 does not match the shape of the previous layer's output,
# for the specified input, the output of conv2 is [1, 1, 23, 23]
# this output should be flattened before feeding it to the dense layers
# the shape then becomes [1, 529], which should match the input shape of dens1
# self.dens1 = nn.Linear(300, 256)
self.dens1 = nn.Linear(529, 256)
self.dens2 = nn.Linear(256, 256)
# The input should match the output of the previous layer, which is 256
# self.dens3 = nn.Linear(512, 24)
self.dens3 = nn.Linear(256, 24)
self.drop = nn.Dropout()
def forward(self, x):
# The output of each layer should be fed to the next layer
x = self.conv1(x)
x = self.conv2(x)
# The output should be flattened before feeding it to the dense layers
x = x.view(x.size(0), -1)
x = self.dens1(x)
x = self.dens2(x)
x = self.dens3(x)
return x
model = PR()
# The input shape should be (N,Cin,H,W)
# where N is the batch size, Cin is input channels, H and W are height and width respectively
# so the input should be torch.rand(1,3,28,28)
# input = torch.rand(28,28,3)
input = torch.rand(1, 3, 28, 28)
output = model(input)
Let me know if you have any follow-up questions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Softmax vs ReLU in output layer - python

Related

How to initialise (and check sanity) weights efficiently of layers within complex (nested) modules in PyTorch?

How to allow complex inputs, and complex weights to a Pytorch model?

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Classifier Loss function dimension out of range error

RuntimeError: expected stride to be a single integer value

Categories

Resources