TensorFlow vs PyTorch convolution confusion - python

I am confused on how to replicate Keras (TensorFlow) convolutions in PyTorch.
In Keras, I can do something like this. (the input size is (256, 237, 1, 21) and the output size is (256, 237, 1, 1024).
import tensorflow as tf
x = tf.random.normal((256,237,1,21))
y = tf.keras.layers.Conv1D(filters=1024, kernel_size=5,padding="same")(x)
print(y.shape)
(256, 237, 1, 1024)
However, in PyTorch, when I try to do the same thing I get a different output size:
import torch.nn as nn
x = torch.randn(256,237,1,21)
m = nn.Conv1d(in_channels=237, out_channels=1024, kernel_size=(1,5))
y = m(x)
print(y.shape)
torch.Size([256, 1024, 1, 17])
I want PyTorch to give me the same output size that Keras does:
This previous question seems to imply that Keras filters are PyTorch's out_channels but thats what I have. I tried to add the padding in PyTorch of padding=(0,503) but that gives me torch.Size([256, 1024, 1, 1023]) but that still not correct. This also takes so much longer than keras does so I feel that I have incorrectly assigned a parameter.
How can I replicate what Keras did with convolution in PyTorch?

In TensorFlow, tf.keras.layers.Conv1D takes in a tensor of shape (batch_shape + (steps, input_dim)). Which means that what is commonly known as channels appears on the last axis. For instance in 2D convolution you would have (batch, height, width, channels). This is different from PyTorch where the channel dimension is right after the batch axis: torch.nn.Conv1d takes in shapes of (batch, channel, length). So you will need to permute two axes.
For torch.nn.Conv1d:
in_channels is the number of channels in the input tensor
out_channels is the number of filters, i.e. the number of channels the output will have
stride the step size of the convolution
padding the zero-padding added to both sides
In PyTorch there is no option for padding='same', you will need to choose padding correctly. Here stride=1, so padding must equal to kernel_size//2 (i.e. padding=2) in order to maintain the length of the tensor.
In your example, since x has a shape of (256, 237, 1, 21), in TensorFlow's terminology it will be considered as an input with:
a batch shape of (256, 237),
steps=1, so the length of your 1D input is 1,
21 input channels.
Whereas in PyTorch, x of shape (256, 237, 1, 21) would be:
batch shape of (256, 237),
1 input channel
a length of 21.
Have kept the input in both examples below (TensorFlow vs. PyTorch) as x.shape=(256, 237, 21) assuming 256 is the batch size, 237 is the length of the input sequence, and 21 is the number of channels (i.e. the input dimension, what I see as the dimension on each timestep).
In TensorFlow:
>>> x = tf.random.normal((256, 237, 21))
>>> m = tf.keras.layers.Conv1D(filters=1024, kernel_size=5, padding="same")
>>> y = m(x)
>>> y.shape
TensorShape([256, 237, 1024])
In PyTorch:
>>> x = torch.randn(256, 237, 21)
>>> m = nn.Conv1d(in_channels=21, out_channels=1024, kernel_size=5, padding=2)
>>> y = m(x.permute(0, 2, 1))
>>> y.permute(0, 2, 1).shape
torch.Size([256, 237, 1024])
So in the latter, you would simply work with x = torch.randn(256, 21, 237)...

PyTorch now has out of the box same convolution operation you can take a look at this link [Same convolution][1]
class InceptionNet(nn.Module):
def __init__(self, in_channels, in_1x1, in_3x3reduce, in_3x3, in_5x5reduce, in_5x5, in_1x1pool):
super(InceptionNet, self).__init__()
self.incep_1 = ConvBlock(in_channels, in_1x1, kernel_size=1, padding='same')
Note a same convolution only supports the default stride value which is 1 anything other won't work.
[1]: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

Related

Matrix 2D on Convolutional Netowrk

that may be a silly question, but I wanted to use a convolutional neural network in my deep reinforcement learning project and I got a problem I don't understand.
In my project I want to insert into network matrix 6x7 which should be equivalent to black and white picture of 6x7 size (42 pixels) right?
class CNN(nn.Module):
def __init__(self):
super().__init__()
self.model = torch.nn.Sequential()
self.model.add_module("conv_1", torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=4, stride = 1))
self.model.add_module("relu_1", torch.nn.ReLU())
self.model.add_module("max_pool", torch.nn.MaxPool2d(2))
self.model.add_module("conv_2", torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=4, stride = 1))
self.model.add_module("relu_2", torch.nn.ReLU())
self.model.add_module("flatten", torch.nn.Flatten())
self.model.add_module("linear", torch.nn.Linear(in_features=16*16*16, out_features=7))
def forward(self, x):
x = self.model(x)
return x
In conv1 in_channels=1 because I have got only 1 matrix (if it was image recognition that means 1 color). Other in_channels and out_channels are kind of random until linear. I have no idea where I should insert the size of a matrix, but the final output should be a size of 7 which i typed in linear.
The error i get is:
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [6, 7]
There are a few problems with your code. First, the reason you're getting that error message is because the CNN is expecting a tensor with shape (N, Cin, Hin, Win), where:
N is the batch size
Cin is the number of input channels
Hin is the input image pixel height
Win is the input image pixel width
You're only providing the width and height dimensions. You need to explicitly add a channels and batch dimension, even if the value of those dimensions is only 1:
model = CNN()
example_input = torch.randn(size=(6, 7)) # this is your input image
print(example_input.shape) # should be (6, 7)
output = model(example_input) # you original error
example_input = example_input.unsqueeze(0).unsqueeze(0) # adds batch and channels dimension
print(example_input.shape) # should now be (1, 1, 6, 7)
output = model(example_input) # no more error!
You'll note however, you get a different error now:
RuntimeError: Calculated padded input size per channel: (1 x 2). Kernel size: (4 x 4). Kernel size can't be greater than actual input size
This is because after the first conv layer, your data is of shape 1x2, but your kernel size for the second layer is 4, which makes the operation impossible. An input image of size 6x7 is quite small, either reduce the kernel size to something that works, or use a larger images.
Here's a working example:
import torch
from torch import nn
class CNN(nn.Module):
def __init__(self):
super().__init__()
self.model = torch.nn.Sequential()
self.model.add_module(
"conv_1",
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=2, stride=1),
)
self.model.add_module("relu_1", torch.nn.ReLU())
self.model.add_module("max_pool", torch.nn.MaxPool2d(2))
self.model.add_module(
"conv_2",
torch.nn.Conv2d(in_channels=16, out_channels=16, kernel_size=2, stride=1),
)
self.model.add_module("relu_2", torch.nn.ReLU())
self.model.add_module("flatten", torch.nn.Flatten())
self.model.add_module("linear", torch.nn.Linear(in_features=32, out_features=7))
def forward(self, x):
x = self.model(x)
return x
model = CNN()
x = torch.randn(size=(6, 7))
x = x.unsqueeze(0).unsqueeze(0)
output = model(x)
print(output.shape) # has shape (1, 7)
Note, I changed the kernel_size to 2, and the final linear layer has an input size of 32. Also, the output has shape (1, 7), the 1 is the batch_size, which in our case was only 1. If you want just the 7 output features, just use x = torch.squeeze(x).

Error in transformation of EMNIST data through Pytorch

I was trying to train my model for prediction of EMNIST by using Pytorch.
Edit:- Here's the link of colab notebook for the problem.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(28, 64, (5, 5), padding=2)
self.conv1_bn = nn.BatchNorm2d(64)
self.conv2 = nn.Conv2d(64, 128, 2, padding=2)
self.fc1 = nn.Linear(2048, 1024)
self.dropout = nn.Dropout(0.3)
self.fc2 = nn.Linear(1024, 512)
self.bn = nn.BatchNorm1d(1)
self.fc3 = nn.Linear(512, 128)
self.fc4 = nn.Linear(128, 47)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = self.conv1_bn(x)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 2048)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
x = x.view(-1, 1, 512)
x = self.bn(x)
x = x.view(-1, 512)
x = self.fc3(x)
x = self.fc4(x)
return F.log_softmax(x, dim=1)
return x
I am getting this type of error as shown below, whenever I am training my model.
<ipython-input-11-07c68cf1cac2> in forward(self, x)
24 def forward(self, x):
25 x = F.relu(self.conv1(x))
---> 26 x = F.max_pool2d(x, 2, 2)
27 x = self.conv1_bn(x)
RuntimeError: Given input size: (64x28x1). Calculated output size: (64x14x0). Output size is too small
I tried to searched for the solutions and found that I should transform the data before. So i tried transforming it by the most common suggestion:-
transform_valid = transforms.Compose(
[
transforms.ToTensor(),
])
But then again I am getting the error mentioned below. Maybe the problem lies here in the transformation part.
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py:469: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1595629403081/work/torch/csrc/utils/tensor_numpy.cpp:141.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
I wanted to make that particular numpy array writable by using "ndarray.setflags(write=None, align=None, uic=None)" but I'm not able to figure out from where and what type of array should I make writable, as I'm directly loading the dataset using ->
"datasets.EMNIST(root, split="balanced", train=False, download=True, transform=transform_valid)"
welcome to Stackoverflow !
Your problem is not related to the toTensor transform, this error is yielded because of the dimension of the tensor you input in your maxpool : the error clearly states that you are trying to maxppol a tensor of which one of the dimensions is 1 (64, 28, 1) and thus it will output a tensor with a dimension of 0 (64,14,0), which makes no sense.
You need to check the dimensions of the tensors you input in your model. They are definitely too small. Maybe you made a mistake with a view somewhere (hard to tell without a minimal reproducible example).
If I can try to guess, you have at the beginning a tensor size 28x28x1 (typical MNIST), and you put it into a convolution that expects a tensor of dims BxCxWxH (batch_size, channels, width, height), i.e something like (B, 1, 28, 28), but you confuse the width (28) from the input channels (nn.Conv2d(->28<-, 64, (5, 5), padding=2))
I believe you want your first layer to be nn.Conv2d(1, 64, (5, 5), padding=2), and you need to resize your tensors to give them the shape (B, 1, 28, 28) (the value of B is up to you) before giving them to the network.
Sidenote : the warning about writable numpy arrays is completely unrelated, it just means that pytorch will possibly override the "non-writable" data of your numpy array. If you don't care about this numpy array being modified, you can ignore the warning.

How to restructure the output tensor of a cnn layer for use by a linear layer in a simple pytorch model

Given a pytorch input dataset with dimensions:
dat.shape = torch.Size([128, 3, 64, 64])
This is a supervised learning problem: we have a separate labels.txt file containing one of C classes for each input observation. The value of C is calculated by the number of distinct values in the labeles file and is presently in the single digits.
I could use assistance on how to mesh the layers of a simple mix of convolutional and linear layers network that is performing multiclass classification. The intent is to pass through:
two cnn layers with maxpooling after each
a linear "readout" layer
softmax activation before the output/labels
Here is the core of my (faulty/broken) network. I am unable to determine the proper size/shape required of:
Output of Convolutional layer -> Input of Linear [Readout] layer
class CNNClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.linear1 = nn.Linear(32*16*16, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(x) # Size mismatch error HERE
x = self.softmax1(x)
return x
Training of the model is started by :
Xout = model(dat)
This results in :
RuntimeError: size mismatch, m1: [128 x 1568], m2: [8192 x 6]
at the linear1 input. What is needed here ? Note I have seen uses of wildcard input sizes e.g via a view:
..
x = x.view(x.size(0), -1)
x = self.linear1(x) # Size mismatch error HERE
If that is included then the error changes to
RuntimeError: size mismatch, m1: [28672 x 7], m2: [8192 x 6]
Some pointers on how to think about and calculate the cnn layer / linear layer input/output sizes would be much appreciated.
The error
You have miscalculated the output size from convolutional stack. It is actually [batch, 32, 7, 7] instead of [batch, 32, 16, 16].
You have to use reshape (or view) as output from Conv2d has 4 dimensions ([batch, channels, width, height]), while input to nn.Linear is required to have 2 dimensions ([batch, features]).
Use this for nn.Linear:
self.linear1 = nn.Linear(32 * 7 * 7, C)
And this in forward:
x = self.linear1(x.view(x.shape[0], -1))
Other possibilities
Current new architectures use pooling across channels (usually called global pooling). In PyTorch there is an torch.nn.AdaptiveAvgPool2d (or Max pooling). Using this approach allows you to have variable size of height and width of your input image as only one value per channel is used as input to nn.Linear. This is how it looks:
class CNNClassifier(torch.nn.Module):
def __init__(self, C=10):
super().__init__()
self.conv1 = nn.Conv2d(3, 16, 3)
self.maxpool = nn.MaxPool2d(kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3)
self.pooling = torch.nn.AdaptiveAvgPool2d(output_size=1)
self.linear1 = nn.Linear(32, C)
self.softmax1 = nn.LogSoftmax(dim=1)
def forward(self, x):
x = self.conv1(x)
x = self.maxpool(F.leaky_relu(x))
x = self.conv2(x)
x = self.maxpool(F.leaky_relu(x))
x = self.linear1(self.pooling(x).view(x.shape[0], -1))
x = self.softmax1(x)
return x
So now images of torch.Size([128, 3, 64, 64]) and torch.Size([128, 3, 128, 128]) can be passed to the network.
So the issue is with the way you defined the nn.Linear. You set the input size to 32*16*16 which is not the shape of the output image but the number 32/16 represent the number of "channels" dim that the Conv2d expect for the input and what it will output.
If you will add print(x.shape) before the entrance to the fully connected layer you will get:
torch.Size([Batch, 32, 7, 7])
So your calculation should have been 7*7*32:
self.linear1 = nn.Linear(32*7*7, C)
And then using:
x = x.view(x.size(0), -1)
x = self.linear1(x)
Will work perfectly fine. You can read about the what the view does in: How does the "view" method work in PyTorch?

Determining the dimensions of weights for a 3-d Convolution with a 4-d Kernel

As the title says I'm looking at determining the proper dimensions for my CNN architecture. First, I obtain the next element of my dataset:
train_ds = iter(model.train_dataset)
feature, label = next(train_ds)
Where feature has dimensions (32, 64, 64, 4) corresponding to a batch size of 32, height of 64, length 64, and extended batch size of 4 (not a channel dimension). I initialize my 4-d kernel to pass over my 3-matrix, as I do not want the extended batch size to be convoluted. What I mean by this is in practice I want a 2-d kernel of size (1, 1) to pass over each 64 x 64 image, and do the same for the extended batch size without convoluting the extended batch sizes together. So I am in fact doing a (1, 1) convolution for each image in parallel with each other. So far I was able to initialize the kernel and feed the conv2d like so:
kernel = tf.constant(np.ones((1, 1, 4, 4)), dtype=tf.float32)
output = tf.nn.conv2d(feature, kernel, strides=[1, 1, 1, 1], padding='SAME')
Doing this produces my expected output, (32, 64, 64, 4). But I have absolutely no idea how to initialize the weights so that they work with this architecture. I have something like this:
w_init = tf.random_normal_initializer()
input_dim = (4, 1, 1, 4)
w = tf.Variable(
initial_value=w_init(shape=(input_dim), dtype="float32"),
trainable=True)
tf.matmul(output, w)
But I'm receiving incompatible batch dimensions as I don't know what the input_dim should be. I know it should be something like (num_filters * filter_size * filter_size * num_channels) + num_filters according to this answer, but I'm pretty sure that doesn't work for my scenario.
After tinkering around I was able to come up with a solution when the dimension weights are of size (1, 1, 4, 4) or (num_filters * num_channels * filter_size * filter_size). If anyone wants to provide a mathematical or similar explanation, it would be much appreciated!

Why conv2d in tensorflow gives an output has the same shape as input

According to this Deep Learning course http://cs231n.github.io/convolutional-networks/#conv, It says that if there is an input x with shape [W,W] (where W = width = height) goes through a Convolutional Layer with filter shape [F,F]and stride S, the Layer will return an output with shape [(W-F)/S +1, (W-F)/S +1]
However, when I'm trying to follow the tutorial of the Tensorflow: https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/index.html. There seems to have difference of the function tf.nn.conv2d(inputs, filter, stride)
Whatever how do I change my filter size, conv2d will constantly return me a value with the same shape as the input.
In my case, I am using the MNIST dataset which indicates that every image has size [28,28](ignoring channel_num = 1)
but after I defining the first conv1 layers, I used the conv1.get_shape() to see its output, it gives me [28,28, num_of_filters]
Why is this? I thought the return value should follow the formula above.
Appendix: Code snippet
#reshape x from 2d to 4d
x_image = tf.reshape(x, [-1, 28, 28, 1]) #[num_samples, width, height, channel_num]
## define the shape of weights and bias
w_shape = [5, 5, 1, 32] #patch_w, patch_h, in_channel, output_num(out_channel)
b_shape = [32] #bias only need to be consistent with output_num
## init weights of conv1 layers
W_conv1 = weight_variable(w_shape)
b_conv1 = bias_variable(b_shape)
## first layer x_iamge->conv1/relu->pool1
#Our convolutions uses a stride of one
#and are zero padded
#so that the output is the same size as the input
h_conv1 = tf.nn.relu(
conv2d(x_image, W_conv1) + b_conv1
)
print 'conv1.shape=',h_conv1.get_shape()
## conv1.shape= (?, 28, 28, 32)
## I thought conv1.shape should be (?, (28-5)/1+1, 24 ,32)
h_pool1 = max_pool_2x2(h_conv1) #output 32 num
print 'pool1.shape=',h_pool1.get_shape() ## pool1.shape= (?, 14, 14, 32)
It depends on the padding parameter. 'SAME' will keep the output as WxW (assuming stride=1,) 'VALID' will shrink the size of the output to (W-F+1)x(W-F+1)
Conv2d has a parameter called padding see here
Where if you set padding to "VALID" it will satisfy your formula. It defaults to "SAME" which pads (same as adding a border around) the image filled with zeroes such that the output will remain the same shape as the input.

Categories

Resources