How padding=zeros works in pytorch in functional.conv1d - python

This following code below giving a output of shape (1,1,3) for the shape of xodd is (1,1,2). The given kernel shape is(112, 1, 1).
from torch.nn import functional as F
output = F.conv1d(xodd, kernel, padding=zeros)
How the padding=zeros works?
And also, How can I write an equivalent code in tensorflow so that the output is as same as the above output?

What is padding=zeros?
If we set paddin=zeros, we don't need to add numbers at the right and the left of the tensor.
Padding=0:
from torch.nn import functional as F
import torch
inputs = torch.randn(33, 16, 6) # (minibatch,in_channels,features)
filters = torch.randn(20, 16, 5) # (out_channels, in_channels, kernel_size)
out_tns = F.conv1d(inputs, filters, stride=1, padding=0)
print(out_tns.shape)
# torch.Size([33, 20, 2]) # (minibatch,out_channels,(features-kernel_size+1))
Padding=2:(We want to add two numbers at the right and the left of the tensor)
inputs = torch.randn(33, 16, 6) # (minibatch,in_channels,features)
filters = torch.randn(20, 16, 5) # (out_channels, in_channels, kernel_size)
out_tns = F.conv1d(inputs, filters, stride=1, padding=2)
print(out_tns.shape)
# torch.Size([33, 20, 6]) # (minibatch,out_channels,(features-kernel_size+1+2+2))
How can I write an equivalent code in tensorflow:
import tensorflow as tf
input_shape = (33, 6, 16)
x = tf.random.normal(input_shape)
out_tf = tf.keras.layers.Conv1D(filters = 20,
kernel_size = 5,
strides = 1,
input_shape=input_shape[1:])(x)
print(out_tf.shape)
# TensorShape([33, 2, 20])
# If you want that tensor have shape exactly like pytorch you can transpose
tf.transpose(out_tf, [0, 2, 1]).shape
# TensorShape([33, 20, 2])

Related

Tensorflow Multi Head Attention on Inputs: 4 x 5 x 20 x 64 with attention_axes=2 throwing mask dimension error (tf 2.11.0)

The expectation here is that the attention is applied on the 2nd dimension (4, 5, 20, 64). I am trying to apply self attention using the following code (issue reproducible with this code):
import numpy as np
import tensorflow as tf
from keras import layers as tfl
class Encoder(tfl.Layer):
def __init__(self,):
super().__init__()
self.embed_layer = tfl.Embedding(4500, 64, mask_zero=True)
self.attn_layer = tfl.MultiHeadAttention(num_heads=2,
attention_axes=2,
key_dim=16)
return
def call(self, x):
# Input shape: (4, 5, 20) (Batch size: 4)
x = self.embed_layer(x) # Output: (4, 5, 20, 64)
x = self.attn_layer(query=x, key=x, value=x) # Output: (4, 5, 20, 64)
return x
eg_input = tf.constant(np.random.randint(0, 150, (4, 5, 20)))
enc = Encoder()
enc(eg_input)
However, the above layer defined throws the following error. Could someone please explain why is this happening & how to fix this?
{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [4,5,2,20,20] vs. [4,5,1,5,20] [Op:AddV2]
Call arguments received by layer 'softmax_2' (type Softmax):
• inputs=tf.Tensor(shape=(4, 5, 2, 20, 20), dtype=float32)
• mask=tf.Tensor(shape=(4, 5, 1, 5, 20), dtype=bool)
PS: If I set mask_zero = False in defining the embedding layer, the code runs fine as expected without any issues.
Just concat the input along axis=0
import numpy as np
import tensorflow as tf
from keras import layers as tfl
class Encoder(tfl.Layer):
def __init__(self,):
super().__init__()
self.embed_layer = tfl.Embedding(4500, 64, mask_zero=True)
self.attn_layer = tfl.MultiHeadAttention(num_heads=2,
key_dim=16,
attention_axes=2)
def call(self, x):
x = self.embed_layer(x) # Output: (4, 5, 20, 32)
x = tf.concat(x, axis=0)
x, attention_scores = self.attn_layer(query=x, key=x, value=x , return_attention_scores=True) # Output: (4, 5, 20, 32)
return x , attention_scores
eg_input = tf.constant(np.random.randint(0, 150, (4, 5, 20)))
enc = Encoder()
scores , attentions = enc(eg_input)
scores.shape , attentions.shape
#(TensorShape([4, 5, 20, 64]), TensorShape([4, 5, 2, 20, 20]))

Output Dimensions of convolution in PyTorch

The size of my input images are 68 x 224 x 3 (HxWxC), and the first Conv2d layer is defined as
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9)).
Why is the size of the output feature volume 16 x 15 x 54? I get that there are 16 filters, so there is a 16 in the front, but if I use [(W−K+2P)/S]+1 to calculate dimensions, the dimensions are not divisible.
Can someone please explain?
The calculation of feature maps is [(W−K+2P)/S]+1 and here [] brackets means floor division. In your example padding is zero, so the calculation is [(68-9+2*0)/4]+1 ->[14.75]=14 -> [14.75]+1 = 15 and [(224-9+2*0)/4]+1 -> [53.75]=53 -> [53.75]+1 = 54.
import torch
conv1 = torch.nn.Conv2d(3, 16, stride=4, kernel_size=(9,9))
input = torch.rand(1, 3, 68, 224)
print(conv1(input).shape)
# torch.Size([1, 16, 15, 54])
You may see different formulas too calculate feature maps.
In PyTorch:
In general, you may see this:
However the result of both cases are the same
Was having same kind of inconvenience estimating output size of tensor after convolutional layer. Check out a helper function I implemented at https://github.com/tuttelikz/conv_output_size.
Example:
import torch
import torch.nn as nn
from conv_output_size import conv2d_output_size
c_i, c_o = 3, 16
k, s, p = 3, 2, 1
sample_2d_tensor = torch.ones((c_i, 64, 64))
c2d = nn.Conv2d(in_channels=c_i, out_channels=c_o, kernel_size=k,
stride=s, padding=p)
output_size = conv2d_output_size(
sample_2d_tensor.shape, out_channels=c_o, kernel_size=k, stride=s, padding=p)
print("After conv2d")
print("Dummy input size:", sample_2d_tensor.shape)
print("Calculated output size:", output_size)
print("Real output size:", c2d(sample_2d_tensor).detach().numpy().shape")
>>> After conv2d
>>> Dummy input size: torch.Size([3, 64, 64])
>>> Calculated output size: (16, 32, 32)
>>> Real output size: (16, 32, 32)

what is the difference between unsqueez_ in pytorch and epxand_dim in keras and what will be the shape of output after using it?

I am a beginner in keras and I have a pytorch code that I need to change it to keras, but I could not understand some part of it. specially I have problems in the size of the output shape. the shape of image is (:, 3,32,32) and the first dimension of image is the size of the batch. now, my question is: what this line do and what is the output shape:
image_yuv_ch = image[:, channel, :, :].unsqueeze_(1)
it adds a dimension in position 1? what is the output shape?:(
the size of filters was (64,8,8) and then we have filters.unsqueez_(1), is this means the new shape of filters is (64,1,8,8)?
what does this line do? image_conv = F.conv2d(image_yuv_ch, filters, stride=8) is it the same as conv2d in keras what is the shape of output tensor from it? I also could not understand what view do? I know it tries to show tensor in new shape but in the below code I could not understand the output shape after each unsqueez_, permute or view. could you please tell me what is the output shape of each line? Thank you in advance.
import torch.nn.functional as F
def apply_conv(self, image, filter_type: str):
if filter_type == 'dct':
filters = self.dct_conv_weights
elif filter_type == 'idct':
filters = self.idct_conv_weights
else:
raise('Unknown filter_type value.')
image_conv_channels = []
for channel in range(image.shape[1]):
image_yuv_ch = image[:, channel, :, :].unsqueeze_(1)
image_conv = F.conv2d(image_yuv_ch, filters, stride=8)
image_conv = image_conv.permute(0, 2, 3, 1)
image_conv = image_conv.view(image_conv.shape[0], image_conv.shape[1], image_conv.shape[2], 8, 8)
image_conv = image_conv.permute(0, 1, 3, 2, 4)
image_conv = image_conv.contiguous().view(image_conv.shape[0],
image_conv.shape[1]*image_conv.shape[2],
image_conv.shape[3]*image_conv.shape[4])
image_conv.unsqueeze_(1)
# image_conv = F.conv2d()
image_conv_channels.append(image_conv)
image_conv_stacked = torch.cat(image_conv_channels, dim=1)
return image_conv_stacked
It seems like you are Keras-user or Tensorflow-user and trying to learn Pytorch.
You should go to the website of Pytorch document to understand more about each operation.
unsqueeze is to expand the dim by 1 of the tensor. The underscore in unsqueeze_() means this is in-place function.
view() can be understood as .reshape() in keras.
permute() is to switch multiple dimensions of tensor. For example:
x = torch.randn(1,2,3) # shape [1,2,3]
x = torch.permute(2,0,1) # shape [3,1,2]
In order to know the shape of the tensor after each operation, just simply add print(x.size()). For example:
image_conv = image_conv.permute(0, 2, 3, 1)
print(image_conv.size())
image_conv = image_conv.view(image_conv.shape[0], image_conv.shape[1],
print(image_conv.size())
image_conv.shape[2], 8, 8)
print(image_conv.size())
image_conv = image_conv.permute(0, 1, 3, 2, 4)
print(image_conv.size())
The big difference between Pytorch and Tensorflow (back-end of Keras) is that Pytorch will generate a dynamic graph, rather than a static graph as Tensorflow. Your way of defining a model would not work properly in Pytorch since the weights of conv will not be save in model.parameters() which can't be optimized during the backpropagation.
One more comment, please check this link to learn how to define a proper model using Pytorch:
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
The code for the comment:
import torch
x = torch.randn(8, 3, 32, 32)
print(x.shape)
torch.Size([8, 3, 32, 32])
channel = 1
y = x[:, channel, :, :]
print(y.shape)
torch.Size([8, 32, 32])
y = y.unsqueeze_(1)
print(y.shape)
torch.Size([8, 1, 32, 32])
Hope this helps and enjoy your learning!

Comparing Conv2D with padding between Tensorflow and PyTorch

I am trying to import weights saved from a Tensorflow model to PyTorch. So far the results have been very similar. I ran into a snag when the model calls for conv2d with stride=2.
To verify the mismatch, I set up a very simple comparison between TF and PyTorch. First, I compare conv2d with stride=1.
import tensorflow as tf
import numpy as np
import torch
import torch.nn.functional as F
np.random.seed(0)
sess = tf.Session()
# Create random weights and input
weights = torch.empty(3, 3, 3, 8)
torch.nn.init.constant_(weights, 5e-2)
x = np.random.randn(1, 3, 10, 10)
weights_tf = tf.convert_to_tensor(weights.numpy(), dtype=tf.float32)
# PyTorch adopts [outputC, inputC, kH, kW]
weights_torch = torch.Tensor(weights.permute((3, 2, 0, 1)))
# Tensorflow defaults to NHWC
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_torch = torch.Tensor(x)
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 1, 1, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=1, stride=1)
sess.run(tf.global_variables_initializer())
tf_result = sess.run(tf_conv2d)
diff = np.mean(np.abs(tf_result.transpose((0, 3, 1, 2)) - torch_conv2d.detach().numpy()))
print('Mean of Abs Diff: {0}'.format(diff))
The result of this execution is:
Mean of Abs Diff: 2.0443112092038973e-08
When I change stride to 2, the results start to vary.
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 2, 2, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=1, stride=2)
The result of this execution is:
Mean of Abs Diff: 0.2104552686214447
According to PyTorch documentation, conv2d uses zero-padding defined by the padding argument. Thus, zeros are added to the left, top, right, and bottom of the input in my example.
If PyTorch simply adds padding on both sides based on the input parameter, it should be easy to replicate in Tensorflow.
# Manually add padding - consistent with PyTorch
paddings = tf.constant([[0, 0], [1, 1], [1, 1], [0, 0]])
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_tf = tf.pad(x_tf, paddings, "CONSTANT")
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, 2, 2, 1],
padding="VALID")
The result of this comparison is:
Mean of Abs Diff: 1.6035047067930464e-08
What this tells me is that if I am somehow able to replicate the default padding behavior from Tensorflow into PyTorch, then my results will be similar.
This question inspected the behavior of padding in Tensorflow. TF documentation explains how padding is added for "SAME" convolutions. I discovered these links while writing this question.
Now that I know the padding strategy of Tensorflow, I can implement it in PyTorch.
To replicate the behavior, padding sizes are calculated as described in the Tensorflow documentation. Here, I test the padding behavior by setting stride=2 and padding the PyTorch input.
import tensorflow as tf
import numpy as np
import torch
import torch.nn.functional as F
np.random.seed(0)
sess = tf.Session()
# Create random weights and input
weights = torch.empty(3, 3, 3, 8)
torch.nn.init.constant_(weights, 5e-2)
x = np.random.randn(1, 3, 10, 10)
weights_tf = tf.convert_to_tensor(weights.numpy(), dtype=tf.float32)
weights_torch = torch.Tensor(weights.permute((3, 2, 0, 1)))
# Tensorflow padding behavior. Assuming that kH == kW to keep this simple.
stride = 2
if x.shape[2] % stride == 0:
pad = max(weights.shape[0] - stride, 0)
else:
pad = max(weights.shape[0] - (x.shape[2] % stride), 0)
if pad % 2 == 0:
pad_val = pad // 2
padding = (pad_val, pad_val, pad_val, pad_val)
else:
pad_val_start = pad // 2
pad_val_end = pad - pad_val_start
padding = (pad_val_start, pad_val_end, pad_val_start, pad_val_end)
x_tf = tf.convert_to_tensor(x.transpose((0, 2, 3, 1)), dtype=tf.float32)
x_torch = torch.Tensor(x)
x_torch = F.pad(x_torch, padding, "constant", 0)
# TF Conv2D
tf_conv2d = tf.nn.conv2d(x_tf,
weights_tf,
strides=[1, stride, stride, 1],
padding="SAME")
# PyTorch Conv2D
torch_conv2d = F.conv2d(x_torch, weights_torch, padding=0, stride=stride)
sess.run(tf.global_variables_initializer())
tf_result = sess.run(tf_conv2d)
diff = np.mean(np.abs(tf_result.transpose((0, 3, 1, 2)) - torch_conv2d.detach().numpy()))
print('Mean of Abs Diff: {0}'.format(diff))
The output is:
Mean of Abs Diff: 2.2477470551507395e-08
I wasn't quite sure why this was happening when I started writing this question, but a bit of reading clarified this very quickly. I hope this example can help others.

convolution after dynamic transpose (tensorflow)

I'm trying to implement the architecture in this paper: https://arxiv.org/pdf/1701.05957.pdf (in the top of page 3)
The problem is that, my dataset contains different size of images, and while trying to apply convolution after transpose with dynamic shapes, the unknown shapes will lead to error:
ValueError: Shape of a new variable (fuse01/weights) must be fully defined, but instead was (3, 3, ?, 10).
This is my code:
import tensorflow as tf
import numpy as np
slim = tf.contrib.slim
def conv(input_batch, nb_kernel, nb_row, nb_col, scope_name, strides=None):
if strides is None:
strides = 1
with slim.arg_scope([slim.conv2d], padding='SAME', stride=strides):
out = slim.conv2d(input_batch, nb_kernel, [nb_row, nb_col], scope=scope_name)
return out
def conv_trans(input_batch, nb_kernel, nb_row, nb_col, name_scope, stride, output_like):
with tf.name_scope(name_scope):
weights = get_weights([nb_row, nb_col, nb_kernel, input_batch.get_shape()[3].value])
out_shape = tf.shape(output_like)
out_shape = [input_batch.get_shape()[0].value, out_shape[1], out_shape[2], nb_kernel]
output = tf.nn.conv2d_transpose(input_batch, weights, out_shape, [1, stride, stride, 1])
return output
def get_weights(shape):
initializer = tf.contrib.layers.xavier_initializer_conv2d(dtype=tf.float32)
variable = tf.Variable(initializer(shape=shape), name='weights')
return variable
a = np.ones([1, 165, 167, 3], np.float32)
x = tf.placeholder(tf.float32, [1, None, None, 3])
net = conv(x, 10, 3, 3, 'conv1')
net = conv(net, 20, 3, 3, 'conv2', strides=2)
skip_01 = net
net = conv(net, 40, 3, 3, 'conv3', strides=2)
skip_02 = net
net = conv(net, 80, 3, 3, 'conv4', strides=2)
skip_03 = net
net = conv(net, 160, 3, 3, 'conv5', strides=2)
up_01 = conv_trans(net, 30, 3, 3, 'test', 2, skip_03) # shape: (?, ?, ?, ?)
fuse_01 = tf.concat([skip_03, up_01], 3) # shape: (1, ?, ?, ?)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print fuse_01
print sess.run(fuse_01, feed_dict={x: a})
fuse_01 = conv(fuse_01, 10, 3, 3, 'fuse01') # this will cause error
Is there any way to get the specific shape of the tensor after applying conv_transpose?
The problem is that, my dataset contains different size of images, and while trying to ....
The easiest approach is just to convert all the images to the same size at the very beginning. You are trying to remove rain/snow from the image, so the size of the image should not matter.

Categories

Resources