Output Dimensions of convolution in PyTorch - python

The size of my input images are 68 x 224 x 3 (HxWxC), and the first Conv2d layer is defined as
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9)).
Why is the size of the output feature volume 16 x 15 x 54? I get that there are 16 filters, so there is a 16 in the front, but if I use [(W−K+2P)/S]+1 to calculate dimensions, the dimensions are not divisible.
Can someone please explain?

The calculation of feature maps is [(W−K+2P)/S]+1 and here [] brackets means floor division. In your example padding is zero, so the calculation is [(68-9+2*0)/4]+1 ->[14.75]=14 -> [14.75]+1 = 15 and [(224-9+2*0)/4]+1 -> [53.75]=53 -> [53.75]+1 = 54.
import torch
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9))
input = torch.rand(1, 3, 68, 224)
print(conv1(input).shape)
# torch.Size([1, 16, 15, 54])
You may see different formulas too calculate feature maps.
In PyTorch:
In general, you may see this:
However the result of both cases are the same

Was having same kind of inconvenience estimating output size of tensor after convolutional layer. Check out a helper function I implemented at https://github.com/tuttelikz/conv_output_size.
Example:
import torch
import torch.nn as nn
from conv_output_size import conv2d_output_size
c_i, c_o = 3, 16
k, s, p = 3, 2, 1
sample_2d_tensor = torch.ones((c_i, 64, 64))
c2d = nn.Conv2d(in_channels=c_i, out_channels=c_o, kernel_size=k,
stride=s, padding=p)
output_size = conv2d_output_size(
sample_2d_tensor.shape, out_channels=c_o, kernel_size=k, stride=s, padding=p)
print("After conv2d")
print("Dummy input size:", sample_2d_tensor.shape)
print("Calculated output size:", output_size)
print("Real output size:", c2d(sample_2d_tensor).detach().numpy().shape")
>>> After conv2d
>>> Dummy input size: torch.Size([3, 64, 64])
>>> Calculated output size: (16, 32, 32)
>>> Real output size: (16, 32, 32)

Related

How padding=zeros works in pytorch in functional.conv1d

This following code below giving a output of shape (1,1,3) for the shape of xodd is (1,1,2). The given kernel shape is(112, 1, 1).
from torch.nn import functional as F
output = F.conv1d(xodd, kernel, padding=zeros)
How the padding=zeros works?
And also, How can I write an equivalent code in tensorflow so that the output is as same as the above output?
What is padding=zeros?
If we set paddin=zeros, we don't need to add numbers at the right and the left of the tensor.
Padding=0:
from torch.nn import functional as F
import torch
inputs = torch.randn(33, 16, 6) # (minibatch,in_channels,features)
filters = torch.randn(20, 16, 5) # (out_channels, in_channels, kernel_size)
out_tns = F.conv1d(inputs, filters, stride=1, padding=0)
print(out_tns.shape)
# torch.Size([33, 20, 2]) # (minibatch,out_channels,(features-kernel_size+1))
Padding=2:(We want to add two numbers at the right and the left of the tensor)
inputs = torch.randn(33, 16, 6) # (minibatch,in_channels,features)
filters = torch.randn(20, 16, 5) # (out_channels, in_channels, kernel_size)
out_tns = F.conv1d(inputs, filters, stride=1, padding=2)
print(out_tns.shape)
# torch.Size([33, 20, 6]) # (minibatch,out_channels,(features-kernel_size+1+2+2))
How can I write an equivalent code in tensorflow:
import tensorflow as tf
input_shape = (33, 6, 16)
x = tf.random.normal(input_shape)
out_tf = tf.keras.layers.Conv1D(filters = 20,
kernel_size = 5,
strides = 1,
input_shape=input_shape[1:])(x)
print(out_tf.shape)
# TensorShape([33, 2, 20])
# If you want that tensor have shape exactly like pytorch you can transpose
tf.transpose(out_tf, [0, 2, 1]).shape
# TensorShape([33, 20, 2])

TensorFlow vs PyTorch convolution confusion

I am confused on how to replicate Keras (TensorFlow) convolutions in PyTorch.
In Keras, I can do something like this. (the input size is (256, 237, 1, 21) and the output size is (256, 237, 1, 1024).
import tensorflow as tf
x = tf.random.normal((256,237,1,21))
y = tf.keras.layers.Conv1D(filters=1024, kernel_size=5,padding="same")(x)
print(y.shape)
(256, 237, 1, 1024)
However, in PyTorch, when I try to do the same thing I get a different output size:
import torch.nn as nn
x = torch.randn(256,237,1,21)
m = nn.Conv1d(in_channels=237, out_channels=1024, kernel_size=(1,5))
y = m(x)
print(y.shape)
torch.Size([256, 1024, 1, 17])
I want PyTorch to give me the same output size that Keras does:
This previous question seems to imply that Keras filters are PyTorch's out_channels but thats what I have. I tried to add the padding in PyTorch of padding=(0,503) but that gives me torch.Size([256, 1024, 1, 1023]) but that still not correct. This also takes so much longer than keras does so I feel that I have incorrectly assigned a parameter.
How can I replicate what Keras did with convolution in PyTorch?
In TensorFlow, tf.keras.layers.Conv1D takes in a tensor of shape (batch_shape + (steps, input_dim)). Which means that what is commonly known as channels appears on the last axis. For instance in 2D convolution you would have (batch, height, width, channels). This is different from PyTorch where the channel dimension is right after the batch axis: torch.nn.Conv1d takes in shapes of (batch, channel, length). So you will need to permute two axes.
For torch.nn.Conv1d:
in_channels is the number of channels in the input tensor
out_channels is the number of filters, i.e. the number of channels the output will have
stride the step size of the convolution
padding the zero-padding added to both sides
In PyTorch there is no option for padding='same', you will need to choose padding correctly. Here stride=1, so padding must equal to kernel_size//2 (i.e. padding=2) in order to maintain the length of the tensor.
In your example, since x has a shape of (256, 237, 1, 21), in TensorFlow's terminology it will be considered as an input with:
a batch shape of (256, 237),
steps=1, so the length of your 1D input is 1,
21 input channels.
Whereas in PyTorch, x of shape (256, 237, 1, 21) would be:
batch shape of (256, 237),
1 input channel
a length of 21.
Have kept the input in both examples below (TensorFlow vs. PyTorch) as x.shape=(256, 237, 21) assuming 256 is the batch size, 237 is the length of the input sequence, and 21 is the number of channels (i.e. the input dimension, what I see as the dimension on each timestep).
In TensorFlow:
>>> x = tf.random.normal((256, 237, 21))
>>> m = tf.keras.layers.Conv1D(filters=1024, kernel_size=5, padding="same")
>>> y = m(x)
>>> y.shape
TensorShape([256, 237, 1024])
In PyTorch:
>>> x = torch.randn(256, 237, 21)
>>> m = nn.Conv1d(in_channels=21, out_channels=1024, kernel_size=5, padding=2)
>>> y = m(x.permute(0, 2, 1))
>>> y.permute(0, 2, 1).shape
torch.Size([256, 237, 1024])
So in the latter, you would simply work with x = torch.randn(256, 21, 237)...
PyTorch now has out of the box same convolution operation you can take a look at this link [Same convolution][1]
class InceptionNet(nn.Module):
def __init__(self, in_channels, in_1x1, in_3x3reduce, in_3x3, in_5x5reduce, in_5x5, in_1x1pool):
super(InceptionNet, self).__init__()
self.incep_1 = ConvBlock(in_channels, in_1x1, kernel_size=1, padding='same')
Note a same convolution only supports the default stride value which is 1 anything other won't work.
[1]: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Given a pytorch input dataset with dimensions:
dat.shape = torch.Size([128, 3, 64, 64])
This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.
I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:
two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels
Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:
Output of Convolutional layer -> Input of Linear [Readout] layer
class CNNClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.linear1 = nn.Linear(32*16*16, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(x) # Size mismatch error HERE
x = self.softmax1(x)
return x
Training of the model is started by :
Xout = model(dat)
This results in :
RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]
at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:
..
x = x.view(x.size(0), -1)
x = self.linear1(x) # Size mismatch error HERE
If that is included then the error changes to
RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]
Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.
The error
You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].
You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).
Use this for nn.Linear:
self.linear1 = nn.Linear(32 * 7 * 7, C)
And this in forward:
x = self.linear1(x.view(x.shape[0], -1))
Other possibilities
Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:
class CNNClassifier(torch.nn.Module):
def __init__(self, C=10):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
self.linear1 = nn.Linear(32, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(self.pooling(x).view(x.shape[0], -1))
x = self.softmax1(x)
return x
So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.
So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.
If you will add print(x.shape) before the entrance to the fully connected layer you will get:
torch.Size([Batch, 32, 7, 7])
So your calculation should have been 7*7*32:
self.linear1 = nn.Linear(32*7*7, C)
And then using:
x = x.view(x.size(0), -1)
x = self.linear1(x)
Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?

how to set values for layers in pytorch nn.module?

I have a model that I am trying to get working. I am working through the errors, but now I think it has come down to the values in my layers. I get this error:
RuntimeError: Given groups=1, weight of size 24 1 3 3, expected input[512, 50, 50, 3] to have 1 channels,
but got 50 channels instead
My parameters are:
LR = 5e-2
N_EPOCHS = 30
BATCH_SIZE = 512
DROPOUT = 0.5
My image information is:
depth=24
channels=3
original height = 1600
original width = 1200
resized to 50x50
This is the size of my data:
Train shape (743, 50, 50, 3) (743, 7)
Test shape (186, 50, 50, 3) (186, 7)
Train pixels 0 255 188.12228712427097 61.49539262385051
Test pixels 0 255 189.35559211469533 60.688278787628775
I looked here to try to understand what values each layer is expecting, but when I put in what it says here, https://towardsdatascience.com/pytorch-layer-dimensions-what-sizes-should-they-be-and-why-4265a41e01fd, it gives me errors about wrong channels and kernels.
I found torch_summary to give me more understanding about the outputs, but it only poses more questions.
This is my torch_summary code:
from torchvision import models
from torchsummary import summary
import torch
import torch.nn as nn
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1,24, kernel_size=5) # output (n_examples, 16, 26, 26)
self.convnorm1 = nn.BatchNorm2d(24) # channels from prev layer
self.pool1 = nn.MaxPool2d((2, 2)) # output (n_examples, 16, 13, 13)
self.conv2 = nn.Conv2d(24,48,kernel_size=5) # output (n_examples, 32, 11, 11)
self.convnorm2 = nn.BatchNorm2d(48) # 2*channels?
self.pool2 = nn.AvgPool2d((2, 2)) # output (n_examples, 32, 5, 5)
self.linear1 = nn.Linear(400,120) # input will be flattened to (n_examples, 32 * 5 * 5)
self.linear1_bn = nn.BatchNorm1d(400) # features?
self.drop = nn.Dropout(DROPOUT)
self.linear2 = nn.Linear(400, 10)
self.act = torch.relu
def forward(self, x):
x = self.pool1(self.convnorm1(self.act(self.conv1(x))))
x = self.pool2(self.convnorm2(self.act(self.conv2(x))))
x = self.drop(self.linear1_bn(self.act(self.linear1(x.view(len(x), -1)))))
return self.linear2(x)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model=CNN().to(device)
summary(model, (3, 50, 50))
This is what it gave me:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 24 1 5 5, expected input[2, 3, 50, 50] to have 1 channels, but got 3 channels instead
When I run my whole code, and unsqueeze_(0) my data, like so....x_train = torch.from_numpy(x_train).unsqueeze_(0) I get this error:
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 24 1 5 5, but got 5-dimensional input of size [1, 743, 50, 50, 3] instead
I don't know how to figure out how to fill in the proper values in the layers. Will someone please help me find the correct values and understand how to understand this? I do know that the output of one layer should be the input of another layer. Nothing is matching up with what I thought I knew.
Thanks in advance!!
It seems you have wrong order of input x tensor axis.
As you can see in the doc Conv2d input must be (N, C, H, W)
N is a batch size, C denotes a number of channels, H is a height of input planes in pixels, and W is width in pixels.
So, To make it right use torch.permute to swap axis in forward pass.
...
def forward(self, x):
x = x.permute(0, 3, 1, 2)
...
...
return self.linear2(x)
...
Example of permute:
t = torch.rand(512, 50, 50, 3)
t.size()
torch.Size([512, 50, 50, 3])
t = t.permute(0, 3, 1, 2)
t.size()
torch.Size([512, 3, 50, 50])

Pytorch: Size Mismatch during running a test image through a trained CNN

I am going through tutorials to train/test a convolutional neural network(CNN), and I am having an issue with prepping a test image to run it through the trained network. My initial guess is that it has something to do with having a correct format of the tensor input for the net.
Here is the code for the Net.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.nn.init as I
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## 1. This network takes in a square (same width and height), grayscale image as input
## 2. It ends with a linear layer that represents the keypoints
## this last layer output 136 values, 2 for each of the 68 keypoint (x, y) pairs
# input size 224 x 224
# after the first conv layer, (W-F)/S + 1 = (224-5)/1 + 1 = 220
# after one pool layer, this becomes (32, 110, 110)
self.conv1 = nn.Conv2d(1, 32, 5)
# maxpool layer
# pool with kernel_size = 2, stride = 2
self.pool = nn.MaxPool2d(2,2)
# second conv layer: 32 inputs, 64 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (110-3)/1 + 1 = 108
## output dimension: (64, 108, 108)
## after another pool layer, this becomes (64, 54, 54)
self.conv2 = nn.Conv2d(32, 64, 3)
# third conv layer: 64 inputs, 128 outputs , 3x3 conv
## output size = (W-F)/S + 1 = (54-3)/1 + 1 = 52
## output dimension: (128, 52, 52)
## after another pool layer, this becomes (128, 26, 26)
self.conv3 = nn.Conv2d(64,128,3)
self.conv_drop = nn.Dropout(p = 0.2)
self.fc_drop = nn.Dropout(p = 0.4)
# 64 outputs * 5x5 filtered/pooled map = 186624
self.fc1 = nn.Linear(128*26*26, 1000)
#
self.fc2 = nn.Linear(1000, 1000)
self.fc3 = nn.Linear(1000, 136)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.conv_drop(x)
# prep for linear layer
# flattening
x = x.view(x.size(0), -1)
# two linear layers with dropout in between
x = F.relu(self.fc1(x))
x = self.fc_drop(x)
x = self.fc2(x)
x = self.fc_drop(x)
x = self.fc3(x)
return x
Maybe my calculation in layer inputs is wrong?
And here is the test-running code block: (you can think of 'roi' as a standard numpy image.)
# loop over the detected faces from your haar cascade
for i, (x,y,w,h) in enumerate(faces):
plt.figure(figsize=(10,5))
ax = plt.subplot(1, len(faces), i+1)
# Select the region of interest that is the face in the image
roi = image_copy[y:y+h, x:x+w]
## TODO: Convert the face region from RGB to grayscale
roi = cv2.cvtColor(roi, cv2.COLOR_RGB2GRAY)
## TODO: Normalize the grayscale image so that its color range falls in [0,1] instead of [0,255]
roi = np.multiply(roi, 1/255)
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
roi = roi.reshape(roi.shape[0], roi.shape[1], 1)
roi = roi.transpose((2, 0, 1))
## TODO: Change to tensor
roi = torch.from_numpy(roi)
roi = roi.type(torch.FloatTensor)
roi = roi.unsqueeze(0)
print (roi.shape)
## TODO: run it through the net
output_pts = net(roi)
And I get the error message saying:
RuntimeError: size mismatch, m1: [1 x 100352], m2: [86528 x 1000] at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:2033
The caveat is if I run my trained network in the provided test suite (where the tensor inputs are already prepped), it gives no errors there and runs as it is supposed to. I think that means there's nothing wrong with the design of the network architecture itself. I think there's something wrong with the way that I am prepping the image.
The output of the 'roi.shape' is:
torch.Size([1, 1, 244, 244])
Which should be okay because ([batch_size, color_channel, x, y]).
UPDATE: I have printed out the shape of the layers during running through the net. It turns out the matching input dimensions for FC are different for the test image for the task and the given test images from the test suite. Then I'm almost 80% sure that my prepping the input image for the net is wrong. But how can they have different matching dimensions if the input tensor for both has the exact same dimension ([1,1,244,244])?
when using the provided test suite (where it runs fine):
input: torch.Size([1, 1, 224, 224])
layer before 1st CV: torch.Size([1, 1, 224, 224])
layer after 1st CV pool: torch.Size([1, 32, 110, 110])
layer after 2nd CV pool: torch.Size([1, 64, 54, 54])
layer after 3rd CV pool: torch.Size([1, 128, 26, 26])
flattend layer for the 1st FC: torch.Size([1, 86528])
When prepping/running the test image:
input: torch.Size([1, 1, 244, 244])
layer before 1st CV: torch.Size([1, 1, 244, 244])
layer after 1st CV pool: torch.Size([1, 32, 120, 120]) #<- what happened here??
layer after 2nd CV pool: torch.Size([1, 64, 59, 59])
layer after 3rd CV pool: torch.Size([1, 128, 28, 28])
flattend layer for the 1st FC: torch.Size([1, 100352])
Did you noticed you have this line in the image preparation.
## TODO: Rescale the detected face to be the expected square size for your CNN (224x224, suggested)
roi = cv2.resize(roi, (244,244))
so you just resized it to 244x244 and not to 224x224.

Categories

Resources