Input dimension of Pytorch CNN model

Input dimension of Pytorch CNN model - python

I have input data for my 2D CNN model, say; X_train with shape (torch.Size([716, 50, 50])
my model is:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=4,stride=1,padding = 1)
self.mp1 = nn.MaxPool2d(kernel_size=4,stride=2)
self.conv2 = nn.Conv2d(32,64, kernel_size=4,stride =1)
self.mp2 = nn.MaxPool2d(kernel_size=4,stride=2)
self.fc1= nn.Linear(2304,256)
self.dp1 = nn.Dropout(p=0.2)
self.fc2 = nn.Linear(256,10)
def forward(self, x):
in_size = x.size(0)
x = F.relu(self.mp1(self.conv1(x)))
x = F.relu(self.mp2(self.conv2(x)))
x = x.view(in_size,-1)
x = F.relu(self.fc1(x))
x = self.dp1(x)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
but when I run the model, I always get this error:
---> x = F.relu(self.mp1(self.conv1(x)))
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 1, 4, 4], but got 3-dimensional input of size [64, 50, 50] instead
I understand my input for the model is of size 64 (batch size), 50*50 (size of each input, in this case is signal picture).
But I don't understand why it still requires 4-dimensional input where I had set my in_channels for nn.Conv2d to be 1.
How to solve this input dimension problem or to change the dimension requirement of model input?

Whether in_channels is 1 or 42 does not matter: it is still an added dimension. It is useful to read the documentation in this respect.
In- and output are of the form N, C, H, W
N: batch size
C: channels
H: height in pixels
W: width in pixels
So you need to add the dimension in your case:
# Add a dimension at index 1
x = x.unsqueeze(1)

That's the problem...
You've entered the in_channels=1, That doesn't mean that It doesn't exists...
Expanding the Dimension of Your Data to [64, 1, 50, 50] should solve your problem
use .view() on input tensor

Related

How to feed an image sequence into a CNN while keeping input images independent?

I'm using a convolutional neural network (CNN) to preprocess my input for a Long Short-Term Memory (LSTM). I have the following input dimensions: 128 x 10 x 3 x 32 x 32 (batch size, sequence length, color channel, height, width) and would like to obtain the following output dimensions: 128 x 10 x 480 (batch size, sequence length, output CNN/ input LSTM size). Is the code below maintaining the sequence dimension correctly? Am I processing multiple images independently here or are the dimensions getting mixed up? Inputs and Outputs are shaped as they should be, but I'm uncertain about the intermedia steps.
class CNN(nn.Module):
def __init__(self):
super(CNN_coords, self).__init__()
self.conv1 = nn.Conv2d(3, 10, 5)
self.conv2 = nn.Conv2d(10, 20, 5)
self.conv3 = nn.Conv2d(20, 30, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, i):
x = i.reshape(-1, i.shape[2], i.shape[3], i.shape[4])
x = F.relu(self.conv1(x))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = x.view(i.shape[0], i.shape[1], -1)
return x
The goal is to leave the time dimension for the LSTM. The architecture looks like this:

Pytorch identifying batch size as number of channels in Conv2d layer

I am a total newbie to neural networks using Pytorch to create a VAE model. I've used a bit of tensorflow before, but I have no idea what "in_channels" and "out_channels" are, as arguments to nn.Conv2d/nn.Conv1d.
Disclaimers aside, currently, my model takes in a dataloader with batch size 128 and where each input is a 248 by 46 tensor (so, a 128 x 248 x 46 tensor).
My encoder looks like this right now -- I chopped it down so I could focus on where the error was coming from.
class Encoder(nn.Module):
def __init__(self, latent_dim):
super(Encoder, self).__init__()
self.latent_dim = latent_dim
self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print(x.size())
x = F.relu(self.conv1(x))
return x
The Conv2d layer was meant to reduce the 248 by 46 input into a 50 by 46 tensor. However, I get this error:
RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead
...even though I print x.size() and it displays as [torch.Size([128, 248, 46]).
I am unsure a) why the error shows that the layer is adding on an extra dimension to x, and b) whether I am even understanding channels correctly. Should 46 be the real number of channels? Why doesn't Pytorch simply request my input size as a tuple or something, like in=(248, 46)?
Or c) if this is an issue with the way I loaded in my data to the model. I have a numpy array data of shape (-1, 248, 46) and then started training my model as follows.
tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
for x_train, y_train in train_loader:
x_train = x_train.to(device).float()
optimizer.zero_grad()
x_pred, mu, log_var = vae(x_train)
bce_loss = train.BCE(y_train, x_pred)
kl_loss = train.KL(mu, log_var)
loss = bce_loss + kl_loss
loss.backward()
optimizer.step()
Any thoughts appreciated!

In pytorch, nn.Conv2d assumes the input (mostly image data) is shaped like: [B, C_in, H, W], where B is the batch size, C_in is the number of channels, H and W are the height and width of the image. The output has a similar shape [B, C_out, H_out, W_out]. Here, C_in and C_out are in_channels and out_channels, respectively. (H_out, W_out) is the output image size, which may or may not equal (H, W), depending on the kernel size, the stride and the padding.
However, it is confusing to apply conv2d to reduce [128, 248, 46] inputs to [128, 50, 46]. Are they image data with height 248 and width 46? If so you can reshape the inputs to [128, 1, 248, 46] and use in_channels = 1 and out_channels = 1 in conv2d.

Let's say your model takes a single channel image 28*28 this becomes 784 which is your in_channel and out_channels is the number of classes your model wants to predict

You need to add an extra dimension for the number of channels (1) with the view function. The below code will work!
class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print("encoder input size: "+ str(x.shape))
# x.shape[0] is the number of samples in batches if the number of samples >1, otherwise it is the width
# (number of samples in a batch, number of channels, width, height)
x = x.view(x.shape[0], 1, 248,46)
print("encoder input size after adding 1 channel to shape: "+ str(x.shape))
x = F.relu(self.conv1(x))
return x
# a test dataset with 128 samples, 248 width and 46 height
test_dataset = torch.rand(128,248,46)
# prints shape of dataset
test.shape
model = Encoder()
model(test_dataset)
# if you are passing only one sample to the model (i.e. to plot) you need to do this instead
test_dataset2 = torch.rand(1,248,46)
model(test_dataset2.view(test_dataset2.shape[0],1,248,46))

Error in transformation of EMNIST data through Pytorch

I was trying to train my model for prediction of EMNIST by using Pytorch.
Edit:- Here's the link of colab notebook for the problem.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(28, 64, (5, 5), padding=2)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv2 = nn.Conv2d(64, 128, 2, padding=2)
self.fc1 = nn.Linear(2048, 1024)
self.dropout = nn.Dropout(0.3)
self.fc2 = nn.Linear(1024, 512)
self.bn = nn.BatchNorm1d(1)
self.fc3 = nn.Linear(512, 128)
self.fc4 = nn.Linear(128, 47)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = self.conv1_bn(x)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 2048)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
x = x.view(-1, 1, 512)
x = self.bn(x)
x = x.view(-1, 512)
x = self.fc3(x)
x = self.fc4(x)
return F.log_softmax(x, dim=1)
return x
I am getting this type of error as shown below, whenever I am training my model.
<ipython-input-11-07c68cf1cac2> in forward(self, x)
24 def forward(self, x):
25 x = F.relu(self.conv1(x))
---> 26 x = F.max_pool2d(x, 2, 2)
27 x = self.conv1_bn(x)
RuntimeError: Given input size: (64x28x1). Calculated output size: (64x14x0). Output size is too small
I tried to searched for the solutions and found that I should transform the data before. So i tried transforming it by the most common suggestion:-
transform_valid = transforms.Compose(
[
transforms.ToTensor(),
])
But then again I am getting the error mentioned below. Maybe the problem lies here in the transformation part.
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py:469: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/tensor_numpy.cpp:141.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
I wanted to make that particular numpy array writable by using "ndarray.setflags(write=None, align=None, uic=None)" but I'm not able to figure out from where and what type of array should I make writable, as I'm directly loading the dataset using ->
"datasets.EMNIST(root, split="balanced", train=False, download=True, transform=transform_valid)"

welcome to Stackoverflow !
Your problem is not related to the toTensor transform, this error is yielded because of the dimension of the tensor you input in your maxpool : the error clearly states that you are trying to maxppol a tensor of which one of the dimensions is 1 (64, 28, 1) and thus it will output a tensor with a dimension of 0 (64,14,0), which makes no sense.
You need to check the dimensions of the tensors you input in your model. They are definitely too small. Maybe you made a mistake with a view somewhere (hard to tell without a minimal reproducible example).
If I can try to guess, you have at the beginning a tensor size 28x28x1 (typical MNIST), and you put it into a convolution that expects a tensor of dims BxCxWxH (batch_size, channels, width, height), i.e something like (B, 1, 28, 28), but you confuse the width (28) from the input channels (nn.Conv2d(->28<-, 64, (5, 5), padding=2))
I believe you want your first layer to be nn.Conv2d(1, 64, (5, 5), padding=2), and you need to resize your tensors to give them the shape (B, 1, 28, 28) (the value of B is up to you) before giving them to the network.
Sidenote : the warning about writable numpy arrays is completely unrelated, it just means that pytorch will possibly override the "non-writable" data of your numpy array. If you don't care about this numpy array being modified, you can ignore the warning.

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Given a pytorch input dataset with dimensions:
dat.shape = torch.Size([128, 3, 64, 64])
This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.
I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:
two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels
Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:
Output of Convolutional layer -> Input of Linear [Readout] layer
class CNNClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.linear1 = nn.Linear(32*16*16, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(x) # Size mismatch error HERE
x = self.softmax1(x)
return x
Training of the model is started by :
Xout = model(dat)
This results in :
RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]
at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:
..
x = x.view(x.size(0), -1)
x = self.linear1(x) # Size mismatch error HERE
If that is included then the error changes to
RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]
Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.

The error
You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].
You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).
Use this for nn.Linear:
self.linear1 = nn.Linear(32 * 7 * 7, C)
And this in forward:
x = self.linear1(x.view(x.shape[0], -1))
Other possibilities
Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:
class CNNClassifier(torch.nn.Module):
def __init__(self, C=10):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
self.linear1 = nn.Linear(32, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(self.pooling(x).view(x.shape[0], -1))
x = self.softmax1(x)
return x
So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.

So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.
If you will add print(x.shape) before the entrance to the fully connected layer you will get:
torch.Size([Batch, 32, 7, 7])
So your calculation should have been 7*7*32:
self.linear1 = nn.Linear(32*7*7, C)
And then using:
x = x.view(x.size(0), -1)
x = self.linear1(x)
Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?

RuntimeError: expected stride to be a single integer value

I am new at Pytorch sorry for the basic question. The model gives me dimension mismatch error how to solve this ?
Maybe more than one problems in it.
Any help would be appriciated.
Thanks
class PR(nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = nn.Conv2d(3,6,kernel_size=5)
self.conv2 = nn.Conv2d(6,1,kernel_size=2)
self.dens1 = nn.Linear(300, 256)
self.dens2 = nn.Linear(256, 256)
self.dens3 = nn.Linear(512, 24)
self.drop = nn.Dropout()
def forward(self, x):
out = self.conv1(x)
out = self.conv2(x)
out = self.dens1(x)
out = self.dens2(x)
out = self.dens3(x)
return out
model = PR()
input = torch.rand(28,28,3)
output = model(input)

Please have a look at the corrected code. I numbered the lines where I did corrections and described them below.
class PR(torch.nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = torch.nn.Conv2d(3,6, kernel_size=5) # (2a) in 3x28x28 out 6x24x24
self.conv2 = torch.nn.Conv2d(6,1, kernel_size=2) # (2b) in 6x24x24 out 1x23x23 (6)
self.dens1 = torch.nn.Linear(529, 256) # (3a)
self.dens2 = torch.nn.Linear(256, 256)
self.dens3 = torch.nn.Linear(256, 24) # (4)
self.drop = torch.nn.Dropout()
def forward(self, x):
out = self.conv1(x)
out = self.conv2(out) # (5)
out = out.view(-1, 529) # (3b)
out = self.dens1(out)
out = self.dens2(out)
out = self.dens3(out)
return out
model = PR()
ins = torch.rand(1, 3, 28, 28) # (1)
output = model(ins)
First of all, pytorch handles image tensors (you perform 2d convolution therefore I assume this is an image input) as follows: [batch_size x image_depth x height width]
It is important to understand how the convolution with kernel, padding and stride works. In your case kernel_size is 5 and you have no padding (and stride 1). This means that the dimensions of the feature-map gets reduced (as depicted). In your case the first conv. layer takes a 3x28x28 tensor and produces a 6x24x24 tensor, the second one takes 6x24x24 out 1x23x23. I find it very useful to have comments with the in and out tensor dimensions next to the definition conv layers (see in the code above)
Here you need to "flatten" the [batch_size x depth x height x width] tensor to [batch_size x fully connected input]. This can be done via tensor.view().
There was a wrong input for the linear layer
Each operation in the forward-pass took the input value x, instead I think you might want to pass the results of each layer to the next one
Altough this code is now runnable, it does not mean that it makes perfect sense. The most important thing (for neural networks in general i would say) are activation functions. These are missing completely.
For getting started with neural networks in pytorch I can highly recommend the great pytorch tutorials: https://pytorch.org/tutorials/ (I would start with the 60min blitz tutorial)
Hope this helps!

There are few problems with your code. I've reviewed and corrected it below:
class PR(nn.Module):
def __init__(self):
super(PR, self).__init__()
self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
self.conv2 = nn.Conv2d(6, 1, kernel_size=2)
# 300 does not match the shape of the previous layer's output,
# for the specified input, the output of conv2 is [1, 1, 23, 23]
# this output should be flattened before feeding it to the dense layers
# the shape then becomes [1, 529], which should match the input shape of dens1
# self.dens1 = nn.Linear(300, 256)
self.dens1 = nn.Linear(529, 256)
self.dens2 = nn.Linear(256, 256)
# The input should match the output of the previous layer, which is 256
# self.dens3 = nn.Linear(512, 24)
self.dens3 = nn.Linear(256, 24)
self.drop = nn.Dropout()
def forward(self, x):
# The output of each layer should be fed to the next layer
x = self.conv1(x)
x = self.conv2(x)
# The output should be flattened before feeding it to the dense layers
x = x.view(x.size(0), -1)
x = self.dens1(x)
x = self.dens2(x)
x = self.dens3(x)
return x
model = PR()
# The input shape should be (N,Cin,H,W)
# where N is the batch size, Cin is input channels, H and W are height and width respectively
# so the input should be torch.rand(1,3,28,28)
# input = torch.rand(28,28,3)
input = torch.rand(1, 3, 28, 28)
output = model(input)
Let me know if you have any follow-up questions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Input dimension of Pytorch CNN model - python

That's the problem... You've entered the in_channels=1, That doesn't mean that It doesn't exists... Expanding the Dimension of Your Data to [64, 1, 50, 50] should solve your problem use .view() on input tensor

Related

How to feed an image sequence into a CNN while keeping input images independent?

Pytorch identifying batch size as number of channels in Conv2d layer

Error in transformation of EMNIST data through Pytorch

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

RuntimeError: expected stride to be a single integer value

Categories

Resources