I have a tensorflow model with my truth data in the shape (N, 32, 32, 5) ie. 32x32 images with 5 channels.
Inside the loss function I would like to calculate, for each pixel, the sum of the values of the neighboring pixels for each channel, generating a new (N, 32, 32, 5) tensor.
The tf.nn.pool function does something similar but not exactly what I need. I was trying to see if tf.nn.conv2d could get me there but I'm not sure what I'd need to use as the filter parameter in this case.
Is there a specific function for this? Or can I use conv2d somehow?
You can do that with tf.nn.separable_conv2d like this
import tensorflow as tf
input = tf.placeholder(tf.float32, [None, 32, 32, 5])
# Depthwise filter adds the neighborhood of each pixel per channel
depthwise_filter = tf.ones([3, 3, 5, 1], input.dtype)
# Pointwise filter does not do anything
pointwise_filter = tf.eye(5, batch_shape=[1, 1], dtype=input.dtype)
output = tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter,
strides=[1, 1, 1, 1], padding='SAME')
print(output.shape)
# (?, 32, 32, 5)
The following method using tf.nn.conv2d is also equivalent:
import tensorflow as tf
input = tf.placeholder(tf.float32, [None, 32, 32, 5])
# Each filter adds the neighborhood for a different channel
filter = tf.eye(5, batch_shape=[3, 3], dtype=input.dtype)
output = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
A new convolutional layer with the filter size of 3x3 and filters initialized to 1 will do the job. Just be careful to declare this special filter as an untrainable variable otherwise your optimizer would change its contents. Additionally, set padding to "SAME" to get the same size output from that convolutional layer. The pixels at the edges will have zero neigbors in that case.
Related
I have a PyTorch tensor with the shape of [1, 3, 64, 64], and I want to convert it to the shape [1, 4, 64, 64] while setting the value of the newly added layer to be the same as the previous layer in the same dimension (eg newtensor[0][3] = oldtensor[0][2])
Note that my tensor has requires_grad=True, so I cannot use resize_()
How can I do this?
Get a slice from the old tensor, and concatenate it to the new tensor along dimension 1.
tslice = old[:,-1:,:,:]
new = torch.cat((old,tslice), dim = 1)
This will work perfectly. #DerekG code had an error in -1, but his idea is correct.
tensor is your tensor data.
new = torch.cat((tensor, tensor[:, 0:1, :, :]), dim=1)
As the title says I'm looking at determining the proper dimensions for my CNN architecture. First, I obtain the next element of my dataset:
train_ds = iter(model.train_dataset)
feature, label = next(train_ds)
Where feature has dimensions (32, 64, 64, 4) corresponding to a batch size of 32, height of 64, length 64, and extended batch size of 4 (not a channel dimension). I initialize my 4-d kernel to pass over my 3-matrix, as I do not want the extended batch size to be convoluted. What I mean by this is in practice I want a 2-d kernel of size (1, 1) to pass over each 64 x 64 image, and do the same for the extended batch size without convoluting the extended batch sizes together. So I am in fact doing a (1, 1) convolution for each image in parallel with each other. So far I was able to initialize the kernel and feed the conv2d like so:
kernel = tf.constant(np.ones((1, 1, 4, 4)), dtype=tf.float32)
output = tf.nn.conv2d(feature, kernel, strides=[1, 1, 1, 1], padding='SAME')
Doing this produces my expected output, (32, 64, 64, 4). But I have absolutely no idea how to initialize the weights so that they work with this architecture. I have something like this:
w_init = tf.random_normal_initializer()
input_dim = (4, 1, 1, 4)
w = tf.Variable(
initial_value=w_init(shape=(input_dim), dtype="float32"),
trainable=True)
tf.matmul(output, w)
But I'm receiving incompatible batch dimensions as I don't know what the input_dim should be. I know it should be something like (num_filters * filter_size * filter_size * num_channels) + num_filters according to this answer, but I'm pretty sure that doesn't work for my scenario.
After tinkering around I was able to come up with a solution when the dimension weights are of size (1, 1, 4, 4) or (num_filters * num_channels * filter_size * filter_size). If anyone wants to provide a mathematical or similar explanation, it would be much appreciated!
At a certain stage in a resnet, I have 6 features per image i.e. each example is of shape 1X8X8X6, I want to involve each feature with 4 constant filters (DWT) of size 1X2X2X1 with a stride of 2 to get 24 features in next layer and the image to become 1X4X4X24. However, I am unable to use tf.nn.conv2d or tf.nn.convolution for this purpose, conv2d says fourth dimension of input be equal to 3rd dimension of the filter, but how can I do this, I tried doing for the first filter but even this doesn't work:
x_in = np.random.randn(1,8,8,6)
kernel_in = np.array([[[[1],[1]],[[1],[1]]]])
kernel_in.shape
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.convolution(x, kernel, strides=[1, 1, 1, 1], padding='VALID')
try in this way
x_in = np.random.randn(1,8,8,6) # [batch, in_height, in_width, in_channels]
kernel_in = np.ones((2,2,6,24)) # [filter_height, filter_width, in_channels, out_channels]
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)
tf.nn.conv2d(x, kernel, strides=[1, 2, 2, 1], padding='VALID')
# <tf.Tensor: shape=(1, 4, 4, 24), dtype=float32, numpy=....>
A simple example of how to fill predefined values to filters in a Keras.conv2d layer in TF2:
model = models.Sequential()
# one 3x3 filter
model.add(layers.Conv2D(1, (3, 3), input_shape=(None, None, 1)))
# access to the target layer
layer = model.layers[0]
current_w, current_bias = layer.get_weights() # see the current weights
new_w = tf.constant([[1,2, 3],
[4, 5, 6],
[7, 8, 9]])
new_w = tf.reshape(new_w, custom_w.shape) # fix the shape
new_bias = tf.constant([0])
layer.set_weights([new_w, new_bias])
model.summary()
# let's see ..
tf.print(model.layers[0].get_weights())
I am a beginner in keras and I have a pytorch code that I need to change it to keras, but I could not understand some part of it. specially I have problems in the size of the output shape. the shape of image is (:, 3,32,32) and the first dimension of image is the size of the batch. now, my question is: what this line do and what is the output shape:
image_yuv_ch = image[:, channel, :, :].unsqueeze_(1)
it adds a dimension in position 1? what is the output shape?:(
the size of filters was (64,8,8) and then we have filters.unsqueez_(1), is this means the new shape of filters is (64,1,8,8)?
what does this line do? image_conv = F.conv2d(image_yuv_ch, filters, stride=8) is it the same as conv2d in keras what is the shape of output tensor from it? I also could not understand what view do? I know it tries to show tensor in new shape but in the below code I could not understand the output shape after each unsqueez_, permute or view. could you please tell me what is the output shape of each line? Thank you in advance.
import torch.nn.functional as F
def apply_conv(self, image, filter_type: str):
if filter_type == 'dct':
filters = self.dct_conv_weights
elif filter_type == 'idct':
filters = self.idct_conv_weights
else:
raise('Unknown filter_type value.')
image_conv_channels = []
for channel in range(image.shape[1]):
image_yuv_ch = image[:, channel, :, :].unsqueeze_(1)
image_conv = F.conv2d(image_yuv_ch, filters, stride=8)
image_conv = image_conv.permute(0, 2, 3, 1)
image_conv = image_conv.view(image_conv.shape[0], image_conv.shape[1], image_conv.shape[2], 8, 8)
image_conv = image_conv.permute(0, 1, 3, 2, 4)
image_conv = image_conv.contiguous().view(image_conv.shape[0],
image_conv.shape[1]*image_conv.shape[2],
image_conv.shape[3]*image_conv.shape[4])
image_conv.unsqueeze_(1)
# image_conv = F.conv2d()
image_conv_channels.append(image_conv)
image_conv_stacked = torch.cat(image_conv_channels, dim=1)
return image_conv_stacked
It seems like you are Keras-user or Tensorflow-user and trying to learn Pytorch.
You should go to the website of Pytorch document to understand more about each operation.
unsqueeze is to expand the dim by 1 of the tensor. The underscore in unsqueeze_() means this is in-place function.
view() can be understood as .reshape() in keras.
permute() is to switch multiple dimensions of tensor. For example:
x = torch.randn(1,2,3) # shape [1,2,3]
x = torch.permute(2,0,1) # shape [3,1,2]
In order to know the shape of the tensor after each operation, just simply add print(x.size()). For example:
image_conv = image_conv.permute(0, 2, 3, 1)
print(image_conv.size())
image_conv = image_conv.view(image_conv.shape[0], image_conv.shape[1],
print(image_conv.size())
image_conv.shape[2], 8, 8)
print(image_conv.size())
image_conv = image_conv.permute(0, 1, 3, 2, 4)
print(image_conv.size())
The big difference between Pytorch and Tensorflow (back-end of Keras) is that Pytorch will generate a dynamic graph, rather than a static graph as Tensorflow. Your way of defining a model would not work properly in Pytorch since the weights of conv will not be save in model.parameters() which can't be optimized during the backpropagation.
One more comment, please check this link to learn how to define a proper model using Pytorch:
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
The code for the comment:
import torch
x = torch.randn(8, 3, 32, 32)
print(x.shape)
torch.Size([8, 3, 32, 32])
channel = 1
y = x[:, channel, :, :]
print(y.shape)
torch.Size([8, 32, 32])
y = y.unsqueeze_(1)
print(y.shape)
torch.Size([8, 1, 32, 32])
Hope this helps and enjoy your learning!
According to this Deep Learning course http://cs231n.github.io/convolutional-networks/#conv, It says that if there is an input x with shape [W,W] (where W = width = height) goes through a Convolutional Layer with filter shape [F,F]and stride S, the Layer will return an output with shape [(W-F)/S +1, (W-F)/S +1]
However, when I'm trying to follow the tutorial of the Tensorflow: https://www.tensorflow.org/versions/r0.11/tutorials/mnist/pros/index.html. There seems to have difference of the function tf.nn.conv2d(inputs, filter, stride)
Whatever how do I change my filter size, conv2d will constantly return me a value with the same shape as the input.
In my case, I am using the MNIST dataset which indicates that every image has size [28,28](ignoring channel_num = 1)
but after I defining the first conv1 layers, I used the conv1.get_shape() to see its output, it gives me [28,28, num_of_filters]
Why is this? I thought the return value should follow the formula above.
Appendix: Code snippet
#reshape x from 2d to 4d
x_image = tf.reshape(x, [-1, 28, 28, 1]) #[num_samples, width, height, channel_num]
## define the shape of weights and bias
w_shape = [5, 5, 1, 32] #patch_w, patch_h, in_channel, output_num(out_channel)
b_shape = [32] #bias only need to be consistent with output_num
## init weights of conv1 layers
W_conv1 = weight_variable(w_shape)
b_conv1 = bias_variable(b_shape)
## first layer x_iamge->conv1/relu->pool1
#Our convolutions uses a stride of one
#and are zero padded
#so that the output is the same size as the input
h_conv1 = tf.nn.relu(
conv2d(x_image, W_conv1) + b_conv1
)
print 'conv1.shape=',h_conv1.get_shape()
## conv1.shape= (?, 28, 28, 32)
## I thought conv1.shape should be (?, (28-5)/1+1, 24 ,32)
h_pool1 = max_pool_2x2(h_conv1) #output 32 num
print 'pool1.shape=',h_pool1.get_shape() ## pool1.shape= (?, 14, 14, 32)
It depends on the padding parameter. 'SAME' will keep the output as WxW (assuming stride=1,) 'VALID' will shrink the size of the output to (W-F+1)x(W-F+1)
Conv2d has a parameter called padding see here
Where if you set padding to "VALID" it will satisfy your formula. It defaults to "SAME" which pads (same as adding a border around) the image filled with zeroes such that the output will remain the same shape as the input.