I have a model that I am trying to get working. I am working through the errors, but now I think it has come down to the values in my layers. I get this error:
RuntimeError: Given groups=1, weight of size 24 1 3 3, expected input[512, 50, 50, 3] to have 1 channels,
but got 50 channels instead
My parameters are:
LR = 5e-2
N_EPOCHS = 30
BATCH_SIZE = 512
DROPOUT = 0.5
My image information is:
depth=24
channels=3
original height = 1600
original width = 1200
resized to 50x50
This is the size of my data:
Train shape (743, 50, 50, 3) (743, 7)
Test shape (186, 50, 50, 3) (186, 7)
Train pixels 0 255 188.12228712427097 61.49539262385051
Test pixels 0 255 189.35559211469533 60.688278787628775
I looked here to try to understand what values each layer is expecting, but when I put in what it says here, https://towardsdatascience.com/pytorch-layer-dimensions-what-sizes-should-they-be-and-why-4265a41e01fd, it gives me errors about wrong channels and kernels.
I found torch_summary to give me more understanding about the outputs, but it only poses more questions.
This is my torch_summary code:
from torchvision import models
from torchsummary import summary
import torch
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1,24, kernel_size=5) # output (n_examples, 16, 26, 26)
self.convnorm1 = nn.BatchNorm2d(24) # channels from prev layer
self.pool1 = nn.MaxPool2d((2, 2)) # output (n_examples, 16, 13, 13)
self.conv2 = nn.Conv2d(24,48,kernel_size=5) # output (n_examples, 32, 11, 11)
self.convnorm2 = nn.BatchNorm2d(48) # 2*channels?
self.pool2 = nn.AvgPool2d((2, 2)) # output (n_examples, 32, 5, 5)
self.linear1 = nn.Linear(400,120) # input will be flattened to (n_examples, 32 * 5 * 5)
self.linear1_bn = nn.BatchNorm1d(400) # features?
self.drop = nn.Dropout(DROPOUT)
self.linear2 = nn.Linear(400, 10)
self.act = torch.relu
def forward(self, x):
x = self.pool1(self.convnorm1(self.act(self.conv1(x))))
x = self.pool2(self.convnorm2(self.act(self.conv2(x))))
x = self.drop(self.linear1_bn(self.act(self.linear1(x.view(len(x), -1)))))
return self.linear2(x)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model=CNN().to(device)
summary(model, (3, 50, 50))
This is what it gave me:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 24 1 5 5, expected input[2, 3, 50, 50] to have 1 channels, but got 3 channels instead
When I run my whole code, and unsqueeze_(0) my data, like so....x_train = torch.from_numpy(x_train).unsqueeze_(0) I get this error:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 24 1 5 5, but got 5-dimensional input of size [1, 743, 50, 50, 3] instead
I don't know how to figure out how to fill in the proper values in the layers. Will someone please help me find the correct values and understand how to understand this? I do know that the output of one layer should be the input of another layer. Nothing is matching up with what I thought I knew.
Thanks in advance!!
It seems you have wrong order of input x tensor axis.
As you can see in the doc Conv2d input must be (N, C, H, W)
N is a batch size, C denotes a number of channels, H is a height of input planes in pixels, and W is width in pixels.
So, To make it right use torch.permute to swap axis in forward pass.
...
def forward(self, x):
x = x.permute(0, 3, 1, 2)
...
...
return self.linear2(x)
...
Example of permute:
t = torch.rand(512, 50, 50, 3)
t.size()
torch.Size([512, 50, 50, 3])
t = t.permute(0, 3, 1, 2)
t.size()
torch.Size([512, 3, 50, 50])
Related
I am a total newbie to neural networks using Pytorch to create a VAE model. I've used a bit of tensorflow before, but I have no idea what "in_channels" and "out_channels" are, as arguments to nn.Conv2d/nn.Conv1d.
Disclaimers aside, currently, my model takes in a dataloader with batch size 128 and where each input is a 248 by 46 tensor (so, a 128 x 248 x 46 tensor).
My encoder looks like this right now -- I chopped it down so I could focus on where the error was coming from.
class Encoder(nn.Module):
def __init__(self, latent_dim):
super(Encoder, self).__init__()
self.latent_dim = latent_dim
self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print(x.size())
x = F.relu(self.conv1(x))
return x
The Conv2d layer was meant to reduce the 248 by 46 input into a 50 by 46 tensor. However, I get this error:
RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead
...even though I print x.size() and it displays as [torch.Size([128, 248, 46]).
I am unsure a) why the error shows that the layer is adding on an extra dimension to x, and b) whether I am even understanding channels correctly. Should 46 be the real number of channels? Why doesn't Pytorch simply request my input size as a tuple or something, like in=(248, 46)?
Or c) if this is an issue with the way I loaded in my data to the model. I have a numpy array data of shape (-1, 248, 46) and then started training my model as follows.
tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
for x_train, y_train in train_loader:
x_train = x_train.to(device).float()
optimizer.zero_grad()
x_pred, mu, log_var = vae(x_train)
bce_loss = train.BCE(y_train, x_pred)
kl_loss = train.KL(mu, log_var)
loss = bce_loss + kl_loss
loss.backward()
optimizer.step()
Any thoughts appreciated!
In pytorch, nn.Conv2d assumes the input (mostly image data) is shaped like: [B, C_in, H, W], where B is the batch size, C_in is the number of channels, H and W are the height and width of the image. The output has a similar shape [B, C_out, H_out, W_out]. Here, C_in and C_out are in_channels and out_channels, respectively. (H_out, W_out) is the output image size, which may or may not equal (H, W), depending on the kernel size, the stride and the padding.
However, it is confusing to apply conv2d to reduce [128, 248, 46] inputs to [128, 50, 46]. Are they image data with height 248 and width 46? If so you can reshape the inputs to [128, 1, 248, 46] and use in_channels = 1 and out_channels = 1 in conv2d.
Let's say your model takes a single channel image 28*28 this becomes 784 which is your in_channel and out_channels is the number of classes your model wants to predict
You need to add an extra dimension for the number of channels (1) with the view function. The below code will work!
class Encoder(nn.Module):
def __init__(self):
super(Encoder, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))
def forward(self, x):
print("encoder input size: "+ str(x.shape))
# x.shape[0] is the number of samples in batches if the number of samples >1, otherwise it is the width
# (number of samples in a batch, number of channels, width, height)
x = x.view(x.shape[0], 1, 248,46)
print("encoder input size after adding 1 channel to shape: "+ str(x.shape))
x = F.relu(self.conv1(x))
return x
# a test dataset with 128 samples, 248 width and 46 height
test_dataset = torch.rand(128,248,46)
# prints shape of dataset
test.shape
model = Encoder()
model(test_dataset)
# if you are passing only one sample to the model (i.e. to plot) you need to do this instead
test_dataset2 = torch.rand(1,248,46)
model(test_dataset2.view(test_dataset2.shape[0],1,248,46))
The size of my input images are 68 x 224 x 3 (HxWxC), and the first Conv2d layer is defined as
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9)).
Why is the size of the output feature volume 16 x 15 x 54? I get that there are 16 filters, so there is a 16 in the front, but if I use [(W−K+2P)/S]+1 to calculate dimensions, the dimensions are not divisible.
Can someone please explain?
The calculation of feature maps is [(W−K+2P)/S]+1 and here [] brackets means floor division. In your example padding is zero, so the calculation is [(68-9+2*0)/4]+1 ->[14.75]=14 -> [14.75]+1 = 15 and [(224-9+2*0)/4]+1 -> [53.75]=53 -> [53.75]+1 = 54.
import torch
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9))
input = torch.rand(1, 3, 68, 224)
print(conv1(input).shape)
# torch.Size([1, 16, 15, 54])
You may see different formulas too calculate feature maps.
In PyTorch:
In general, you may see this:
However the result of both cases are the same
Was having same kind of inconvenience estimating output size of tensor after convolutional layer. Check out a helper function I implemented at https://github.com/tuttelikz/conv_output_size.
Example:
import torch
import torch.nn as nn
from conv_output_size import conv2d_output_size
c_i, c_o = 3, 16
k, s, p = 3, 2, 1
sample_2d_tensor = torch.ones((c_i, 64, 64))
c2d = nn.Conv2d(in_channels=c_i, out_channels=c_o, kernel_size=k,
stride=s, padding=p)
output_size = conv2d_output_size(
sample_2d_tensor.shape, out_channels=c_o, kernel_size=k, stride=s, padding=p)
print("After conv2d")
print("Dummy input size:", sample_2d_tensor.shape)
print("Calculated output size:", output_size)
print("Real output size:", c2d(sample_2d_tensor).detach().numpy().shape")
>>> After conv2d
>>> Dummy input size: torch.Size([3, 64, 64])
>>> Calculated output size: (16, 32, 32)
>>> Real output size: (16, 32, 32)
I have input data for my 2D CNN model, say; X_train with shape (torch.Size([716, 50, 50])
my model is:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=4,stride=1,padding = 1)
self.mp1 = nn.MaxPool2d(kernel_size=4,stride=2)
self.conv2 = nn.Conv2d(32,64, kernel_size=4,stride =1)
self.mp2 = nn.MaxPool2d(kernel_size=4,stride=2)
self.fc1= nn.Linear(2304,256)
self.dp1 = nn.Dropout(p=0.2)
self.fc2 = nn.Linear(256,10)
def forward(self, x):
in_size = x.size(0)
x = F.relu(self.mp1(self.conv1(x)))
x = F.relu(self.mp2(self.conv2(x)))
x = x.view(in_size,-1)
x = F.relu(self.fc1(x))
x = self.dp1(x)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
but when I run the model, I always get this error:
---> x = F.relu(self.mp1(self.conv1(x)))
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 1, 4, 4], but got 3-dimensional input of size [64, 50, 50] instead
I understand my input for the model is of size 64 (batch size), 50*50 (size of each input, in this case is signal picture).
But I don't understand why it still requires 4-dimensional input where I had set my in_channels for nn.Conv2d to be 1.
How to solve this input dimension problem or to change the dimension requirement of model input?
Whether in_channels is 1 or 42 does not matter: it is still an added dimension. It is useful to read the documentation in this respect.
In- and output are of the form N, C, H, W
N: batch size
C: channels
H: height in pixels
W: width in pixels
So you need to add the dimension in your case:
# Add a dimension at index 1
x = x.unsqueeze(1)
That's the problem...
You've entered the in_channels=1, That doesn't mean that It doesn't exists...
Expanding the Dimension of Your Data to [64, 1, 50, 50] should solve your problem
use .view() on input tensor
Given a pytorch input dataset with dimensions:
dat.shape = torch.Size([128, 3, 64, 64])
This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.
I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:
two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels
Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:
Output of Convolutional layer -> Input of Linear [Readout] layer
class CNNClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.linear1 = nn.Linear(32*16*16, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(x) # Size mismatch error HERE
x = self.softmax1(x)
return x
Training of the model is started by :
Xout = model(dat)
This results in :
RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]
at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:
..
x = x.view(x.size(0), -1)
x = self.linear1(x) # Size mismatch error HERE
If that is included then the error changes to
RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]
Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.
The error
You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].
You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).
Use this for nn.Linear:
self.linear1 = nn.Linear(32 * 7 * 7, C)
And this in forward:
x = self.linear1(x.view(x.shape[0], -1))
Other possibilities
Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:
class CNNClassifier(torch.nn.Module):
def __init__(self, C=10):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
self.linear1 = nn.Linear(32, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(self.pooling(x).view(x.shape[0], -1))
x = self.softmax1(x)
return x
So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.
So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.
If you will add print(x.shape) before the entrance to the fully connected layer you will get:
torch.Size([Batch, 32, 7, 7])
So your calculation should have been 7*7*32:
self.linear1 = nn.Linear(32*7*7, C)
And then using:
x = x.view(x.size(0), -1)
x = self.linear1(x)
Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?
I am going through tutorials to train/test a convolutional neural network(CNN), and I am having an issue with prepping a test image to run it through the trained network. My initial guess is that it has something to do with having a correct format of the tensor input for the net.
Here is the code for the Net.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.init as I
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## 1. This network takes in a square (same width and height), grayscale image as input
## 2. It ends with a linear layer that represents the keypoints
## this last layer output 136 values, 2 for each of the 68 keypoint (x, y) pairs
# input size 224 x 224
# after the first conv layer, (W-F)/S + 1 = (224-5)/1 + 1 = 220
# after one pool layer, this becomes (32, 110, 110)
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size = 2, stride = 2
self.pool = nn.MaxPool2d(2,2)
# second conv layer: 32 inputs, 64 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (110-3)/1 + 1 = 108
## output dimension: (64, 108, 108)
## after another pool layer, this becomes (64, 54, 54)
self.conv2 = nn.Conv2d(32, 64, 3)
# third conv layer: 64 inputs, 128 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (54-3)/1 + 1 = 52
## output dimension: (128, 52, 52)
## after another pool layer, this becomes (128, 26, 26)
self.conv3 = nn.Conv2d(64,128,3)
self.conv_drop = nn.Dropout(p = 0.2)
self.fc_drop = nn.Dropout(p = 0.4)
# 64 outputs * 5x5 filtered/pooled map = 186624
self.fc1 = nn.Linear(128*26*26, 1000)
#
self.fc2 = nn.Linear(1000, 1000)
self.fc3 = nn.Linear(1000, 136)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.conv_drop(x)
# prep for linear layer
# flattening
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc_drop(x)
x = self.fc2(x)
x = self.fc_drop(x)
x = self.fc3(x)
return x
Maybe my calculation in layer inputs is wrong?
And here is the test-running code block: (you can think of 'roi' as a standard numpy image.)
# loop over the detected faces from your haar cascade
for i, (x,y,w,h) in enumerate(faces):
plt.figure(figsize=(10,5))
ax = plt.subplot(1, len(faces), i+1)
# Select the region of interest that is the face in the image
roi = image_copy[y:y+h, x:x+w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = np.multiply(roi, 1/255)
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
roi = roi.reshape(roi.shape[0], roi.shape[1], 1)
roi = roi.transpose((2, 0, 1))
## TODO: Change to tensor
roi = torch.from_numpy(roi)
roi = roi.type(torch.FloatTensor)
roi = roi.unsqueeze(0)
print (roi.shape)
## TODO: run it through the net
output_pts = net(roi)
And I get the error message saying:
RuntimeError: size mismatch, m1: [1 x 100352], m2: [86528 x 1000] at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:2033
The caveat is if I run my trained network in the provided test suite (where the tensor inputs are already prepped), it gives no errors there and runs as it is supposed to. I think that means there's nothing wrong with the design of the network architecture itself. I think there's something wrong with the way that I am prepping the image.
The output of the 'roi.shape' is:
torch.Size([1, 1, 244, 244])
Which should be okay because ([batch_size, color_channel, x, y]).
UPDATE: I have printed out the shape of the layers during running through the net. It turns out the matching input dimensions for FC are different for the test image for the task and the given test images from the test suite. Then I'm almost 80% sure that my prepping the input image for the net is wrong. But how can they have different matching dimensions if the input tensor for both has the exact same dimension ([1,1,244,244])?
when using the provided test suite (where it runs fine):
input: torch.Size([1, 1, 224, 224])
layer before 1st CV: torch.Size([1, 1, 224, 224])
layer after 1st CV pool: torch.Size([1, 32, 110, 110])
layer after 2nd CV pool: torch.Size([1, 64, 54, 54])
layer after 3rd CV pool: torch.Size([1, 128, 26, 26])
flattend layer for the 1st FC: torch.Size([1, 86528])
When prepping/running the test image:
input: torch.Size([1, 1, 244, 244])
layer before 1st CV: torch.Size([1, 1, 244, 244])
layer after 1st CV pool: torch.Size([1, 32, 120, 120]) #<- what happened here??
layer after 2nd CV pool: torch.Size([1, 64, 59, 59])
layer after 3rd CV pool: torch.Size([1, 128, 28, 28])
flattend layer for the 1st FC: torch.Size([1, 100352])
Did you noticed you have this line in the image preparation.
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
so you just resized it to 244x244 and not to 224x224.