Problem with solving exquations of TransposedConv2D for finding its parameters - python

Regarding the answer posted here, when I want to use the equations for obtaining the values of the parameters of the transposed convolution, I face some problems. For example, I have a tensor with the size of [16, 256, 16, 160, 160] and I want to upsample that to the size of [16, 256, 16, 224, 224]. Based on the equation of the transposed convolution, when, for solving the equations for the height, I select stride+2 and I want to find the k (kernel size), I have the following equation that the kernel size will have a large and also negative value.
224 = (160 - 2)x (2) + 1x(k - 1) + 1
What is wrong with my calculations and how I can find the parameters.

I don't think you applied the formula incorrectly, I think it's primarily the issue with the input and output dimensions you desire that are not possible with a stride=2
Transposed or Dialated convolutions scale the output really quickly. Let's say for example, you were just taking these params for your Transposed Convolution(I'm simplifying the values here to 1D just to make the calculations clear):
Input Size = 160
Stride = 2
Kernel = 1
Padding = 0
Output Padding = 0
Now we apply the formula from the official docs for calculating output shape:
H_out =(H_in − 1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
OR we can simplify the formula a bit:
Output Size = ((Input Size - 1) * Strides) - (2 * Padding) + Filter_Size + Ouput Padding
Here, Filter_Size = dilation_factor* (kernel_size-1) to make the formula seem less scary.
Now let's take our example and put the values in to see what Transposed OUtput size we can get with the stride=2 and smallest kernel size possible, that is, kernel=1
Ouput_Size = ((160-1)*2) - (2*0) + 1*(1-1) + 0
Output_Size = 318 - 0 + 0 + 0
Output_Size = 318
So, with the stride you want, you will atleast have an output_size >= 318 and you want 224 hence the negative kernel_size.
I hope that answers your question.
Ref Links to understand Transposed Convolution calculations better with an example:
Paperspace: Transpose Convolution Explained for Up-Sampling Images
Calculating the Output Size of Convolutions and Transpose Convolutions

There is no good constructive answer to this question.
Being in some sense inverse to conv2d, which downsample image stride times, transposed_conv2d upsample stride times. One can not use it for arbitrary resize and get evenly good result, there's torchvision.transforms.Resize or adaptive pooling for this.
torchvision.transforms.Resize is the default choice, it is simple and flexible, one can feed PIL image or torch.Tensor to it, - use former, if input sizes vary dynamically, use latter, if not.
Adaptive pooling, usually it is AdaptiveAvgPool2d, is more sofisticated, it supposed to be a part of architecture. Being inserted at the begining of network, it works as (batched) ImageResize; no magic - it is CPU implemented usualy, one will have a hard time implementing it on tensor hardware. In embedded solutions it is typical to have special image processor for such work.
Well, you still could formaly solved the task with transposed_conv2d, by playing with padding, but it would be just cutting off part of the image, probably loosing information, or inserting a lot of useless spacing.

Related

pytorch gradient / derivative / difference along axis like numpy.diff

I have been struggling with this for quite some time. All I want is a torch.diff() function. However, many matrix operations do not appear to be easily compatible with tensor operations.
I have tried an enormous amount of various pytorch operation combinations, yet none of them work.
Due to the fact that pytorch hasn't implemented this basic feature, I started by simply trying to subtract the element i+1 from element i along a specific axis.
However, you can't simply do this element-wise (due to the tensor limitations), so I tried to construct another tensor, with the elements shifted along one axis:
ix_plus_one = [0]+list(range(0,prediction.size(1)-1))
ix_differential_tensor = torch.LongTensor(ix_plus_one)
diff_one_tensor = prediction[:,ix_differential_tensor]
But now we have a different problem - indexing doesn't really work to mimic numpy in pytorch as it advertises, so you can't index with a "list-like" Tensor like this. I also tried using the tensor scatter functions
So I'm still stuck with this simple problem of trying to get a gradient on a pytoch tensor.
All of my searching leads to the marvelous capabilities of pytorchs' "autograd" function - which has nothing to do with this problem.
A 1D convolution with a fixed filter should do the trick:
filter = torch.nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=1, padding=1, groups=1, bias=False)
kernel = np.array([-1.0, 1.0])
kernel = torch.from_numpy(kernel).view(1,1,2)
filter.weight.data = kernel
filter.weight.requires_grad = False
Then use filter like you would any other layer in torch.nn.
Also, you might want to change padding to suit your specific needs.
There appears to be a simpler solution to this (as I needed a similarly), referenced here: https://discuss.pytorch.org/t/equivalent-function-like-numpy-diff-in-pytorch/35327/2
diff = x[1:] - x[:-1]
which can be done along different dimensions such as
diff = polygon[:, 1:] - polygon[:, :-1]
I would recommend writing a unit test that verifies identical behavior though.
For all those running into the question after March 2021
As of torch 1.8 there's torch.diff that works exactly as expected by the OP

Correct way to vectorise tensorflow step

I'm working with 4 dimensional tensors and need to do a couple of calculations that work like the following example. Take A to be a tensor with shape (6,64,64,64). I want to use the function tf.where to obtain the voxels of each (64,64,64) volume that has a value larger than 0.75. The only way I have managed to do this is like this:
X = tf.convert_to_tensor([tf.where(A[i,:,:,:] > 0.75) for i in range(A.shape[0])]
This seems to be a very crude solution. Is there a better way to achieve this?
The problem with what you are trying to do is that it requires that each (64, 64, 64) volume has the same number of values greater than 0.75. If that is the case, you could just do the following:
X = tf.reshape(tf.where(A > 0.75)[:, 1:], (A.shape[0], -1, A.shape.ndims - 1))
But if that is not the case, you just cannot have a tensor like that because the second dimension would need to have multiple sizes.

Indexing for 3 dimensional Numpy Arrays (convolutional network)

I'm trying to write a function that performs Convolution, and I'm getting a little challenged trying to create the output volume using numpy. Specifically, I have an input image that is represented as an array of dimensions (150,150,3). Now, I want to convolve over this image with a set of kernels num_kernels, which are arrays of dimension (4,4,3), and I want these kernels to move over the image with a stride of 2. My thought process has been:
(1) I'll create an output array which is comprised of taking (4,4,3) size chunks out of the input array and stretching these out into rows, and ultimately making a large matrix of these.
(2) Then, I'll create a parameter array composed of all of my (4,4,3) kernels stretched out into rows, which will also make a large matrix.
(3) Then I can dot product these matrices together and reshape the output matrix into the proper dimensions.
My rough psuedo-code start to number (1) is as follows.
def Convolution(input, filter_size, num_filters, stride):
X = input
output_Volume = np.zeros(#dimensions)
weights = np.zeros(#dimensions)
#get weights from other function
for width in range(0,150,2):
for height in range(0,150,2):
row = X(#indexes here to take out chunk).flatten
output_Volume.append(row) #something of this sort
return #dot product output volume and weights
If someone could provide a specific code example of how to implement this (most helpful would be answers to (1) and (2)) in Python (I'm using numpy), it would be much appreciated. Thank you!

Keras: Multiply with a (constant) numpy-matrix

Is it somehow possible in Keras (neural network library) to do a multiplication with a fixed / given numpy array?
I like to multiply the output of a 2D-convolution with a matrix. I tried to use Backend.dot, but it does not seem to work (i always get some errors like numpy.ndarray object has no attribute get_shape).
Thank you
-edit-
I found a solution (already yesterday, anyway: thanks for the comments). I liked to convert a 2D-spectrum to a 2D-mel-spectrum inside the network (not trainable). The 2D-spectrogram may be seen as a large matrix and then the mel-spectrogram may be calculated via a matrix multiplication.
Long story short: I used a TimeDistributed Dense-Layer without a bias. I just load the weights (=the constant matrix) and thats it:
# Change time / frequency axis (required for TimeDistributed)
nw = input_layer
nw = Permute((2, 1))(nw)
# Create a layer that computes the Mel spectrogram
mel_basis = librosa.filters.mel(22050, N_FFT)[:, :spectrogram_freqs] # The spectrogram has only spectrogram_freqs frequencies
mel_layer = TimeDistributed(Dense(mel_basis.shape[0], bias=False, trainable=False))
nw = mel_layer(nw)
mel_layer.set_weights([np.transpose(mel_basis)])
# Change the time / frequency axis
nw = Permute((2, 1))(nw)

Using strides for an efficient moving average filter

I recently learned about strides in the answer to this post, and was wondering how I could use them to compute a moving average filter more efficiently than what I proposed in this post (using convolution filters).
This is what I have so far. It takes a view of the original array then rolls it by the necessary amount and sums the kernel values to compute the average. I am aware that the edges are not handled correctly, but I can take care of that afterward... Is there a better and faster way? The objective is to filter large floating point arrays up to 5000x5000 x 16 layers in size, a task that scipy.ndimage.filters.convolve is fairly slow at.
Note that I am looking for 8-neighbour connectivity, that is a 3x3 filter takes the average of 9 pixels (8 around the focal pixel) and assigns that value to the pixel in the new image.
import numpy, scipy
filtsize = 3
a = numpy.arange(100).reshape((10,10))
b = numpy.lib.stride_tricks.as_strided(a, shape=(a.size,filtsize), strides=(a.itemsize, a.itemsize))
for i in range(0, filtsize-1):
if i > 0:
b += numpy.roll(b, -(pow(filtsize,2)+1)*i, 0)
filtered = (numpy.sum(b, 1) / pow(filtsize,2)).reshape((a.shape[0],a.shape[1]))
scipy.misc.imsave("average.jpg", filtered)
EDIT Clarification on how I see this working:
Current code:
use stride_tricks to generate an array like [[0,1,2],[1,2,3],[2,3,4]...] which corresponds to the top row of the filter kernel.
Roll along the vertical axis to get the middle row of the kernel [[10,11,12],[11,12,13],[13,14,15]...] and add it to the array I got in 1)
Repeat to get the bottom row of the kernel [[20,21,22],[21,22,23],[22,23,24]...]. At this point, I take the sum of each row and divide it by the number of elements in the filter, giving me the average for each pixel, (shifted by 1 row and 1 col, and with some oddities around edges, but I can take care of that later).
What I was hoping for is a better use of stride_tricks to get the 9 values or the sum of the kernel elements directly, for the entire array, or that someone can convince me of another more efficient method...
For what it's worth, here's how you'd do it using "fancy" striding tricks. I was going to post this yesterday, but got distracted by actual work! :)
#Paul & #eat both have nice implementations using various other ways of doing this. Just to continue things from the earlier question, I figured I'd post the N-dimensional equivalent.
You're not going to be able to significantly beat scipy.ndimage functions for >1D arrays, however. (scipy.ndimage.uniform_filter should beat scipy.ndimage.convolve, though)
Moreover, if you're trying to get a multidimensional moving window, you risk having memory usage blow up whenever you inadvertently make a copy of your array. While the initial "rolling" array is just a view into the memory of your original array, any intermediate steps that copy the array will make a copy that is orders of magnitude larger than your original array (i.e. Let's say that you're working with a 100x100 original array... The view into it (for a filter size of (3,3)) will be 98x98x3x3 but use the same memory as the original. However, any copies will use the amount of memory that a full 98x98x3x3 array would!!)
Basically, using crazy striding tricks is great for when you want to vectorize moving window operations on a single axis of an ndarray. It makes it really easy to calculate things like a moving standard deviation, etc with very little overhead. When you want to start doing this along multiple axes, it's possible, but you're usually better off with more specialized functions. (Such as scipy.ndimage, etc)
At any rate, here's how you do it:
import numpy as np
def rolling_window_lastaxis(a, window):
"""Directly taken from Erik Rigtorp's post to numpy-discussion.
<http://www.mail-archive.com/numpy-discussion#scipy.org/msg29450.html>"""
if window < 1:
raise ValueError, "`window` must be at least 1."
if window > a.shape[-1]:
raise ValueError, "`window` is too long."
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def rolling_window(a, window):
if not hasattr(window, '__iter__'):
return rolling_window_lastaxis(a, window)
for i, win in enumerate(window):
if win > 1:
a = a.swapaxes(i, -1)
a = rolling_window_lastaxis(a, win)
a = a.swapaxes(-2, i)
return a
filtsize = (3, 3)
a = np.zeros((10,10), dtype=np.float)
a[5:7,5] = 1
b = rolling_window(a, filtsize)
blurred = b.mean(axis=-1).mean(axis=-1)
So what we get when we do b = rolling_window(a, filtsize) is an 8x8x3x3 array, that's actually a view into the same memory as the original 10x10 array. We could have just as easily used different filter size along different axes or operated only along selected axes of an N-dimensional array (i.e. filtsize = (0,3,0,3) on a 4-dimensional array would give us a 6 dimensional view).
We can then apply an arbitrary function to the last axis repeatedly to effectively calculate things in a moving window.
However, because we're storing temporary arrays that are much bigger than our original array on each step of mean (or std or whatever), this is not at all memory efficient! It's also not going to be terribly fast, either.
The equivalent for ndimage is just:
blurred = scipy.ndimage.uniform_filter(a, filtsize, output=a)
This will handle a variety of boundary conditions, do the "blurring" in-place without requiring a temporary copy of the array, and be very fast. Striding tricks are a good way to apply a function to a moving window along one axis, but they're not a good way to do it along multiple axes, usually....
Just my $0.02, at any rate...
I'm not familiar enough with Python to write out code for that, but the two best ways to speed up convolutions is to either separate the filter or to use the Fourier transform.
Separated filter : Convolution is O(M*N), where M and N are number of pixels in the image and the filter, respectively. Since average filtering with a 3-by-3 kernel is equivalent to filtering first with a 3-by-1 kernel and then a 1-by-3 kernel, you can get (3+3)/(3*3) = ~30% speed improvement by consecutive convolution with two 1-d kernels (this obviously gets better as the kernel gets larger). You may still be able to use stride tricks here, of course.
Fourier Transform : conv(A,B) is equivalent to ifft(fft(A)*fft(B)), i.e. a convolution in direct space becomes a multiplication in Fourier space, where A is your image and B is your filter. Since the (element-wise) multiplication of the Fourier transforms requires that A and B are the same size, B is an array of size(A) with your kernel at the very center of the image and zeros everywhere else. To place a 3-by-3 kernel at the center of an array, you may have to pad A to odd size. Depending on your implementation of the Fourier transform, this can be a lot faster than the convolution (and if you apply the same filter multiple times, you can pre-compute fft(B), saving another 30% of computation time).
Lets see:
It's not so clear form your question, but I'm assuming now that you'll like to improve significantly this kind of averaging.
import numpy as np
from numpy.lib import stride_tricks as st
def mf(A, k_shape= (3, 3)):
m= A.shape[0]- 2
n= A.shape[1]- 2
strides= A.strides+ A.strides
new_shape= (m, n, k_shape[0], k_shape[1])
A= st.as_strided(A, shape= new_shape, strides= strides)
return np.sum(np.sum(A, -1), -1)/ np.prod(k_shape)
if __name__ == '__main__':
A= np.arange(100).reshape((10, 10))
print mf(A)
Now, what kind of performance improvements you would actually expect?
Update:
First of all, a warning: the code in it's current state does not adapt properly to the 'kernel' shape. However that's not my primary concern right now (anyway the idea is there allready how to adapt properly).
I have just chosen the new shape of a 4D A intuitively, for me it really make sense to think about a 2D 'kernel' center to be centered to each grid position of original 2D A.
But that 4D shaping may not actually be the 'best' one. I think the real problem here is the performance of summing. One should to be able to find 'best order' (of the 4D A) inorder to fully utilize your machines cache architecture. However that order may not be the same for 'small' arrays which kind of 'co-operates' with your machines cache and those larger ones, which don't (at least not so straightforward manner).
Update 2:
Here is a slightly modified version of mf. Clearly it's better to reshape to a 3D array first and then instead of summing just do dot product (this has the advantage all so, that kernel can be arbitrary). However it's still some 3x slower (on my machine) than Pauls updated function.
def mf(A):
k_shape= (3, 3)
k= np.prod(k_shape)
m= A.shape[0]- 2
n= A.shape[1]- 2
strides= A.strides* 2
new_shape= (m, n)+ k_shape
A= st.as_strided(A, shape= new_shape, strides= strides)
w= np.ones(k)/ k
return np.dot(A.reshape((m, n, -1)), w)
One thing I am confident needs to be fixed is your view array b.
It has a few items from unallocated memory, so you'll get crashes.
Given your new description of your algorithm, the first thing that needs fixing is the fact that you are striding outside the allocation of a:
bshape = (a.size-filtsize+1, filtsize)
bstrides = (a.itemsize, a.itemsize)
b = numpy.lib.stride_tricks.as_strided(a, shape=bshape, strides=bstrides)
Update
Because I'm still not quite grasping the method and there seems to be simpler ways to solve the problem, I'm just going to put this here:
A = numpy.arange(100).reshape((10,10))
shifts = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
B = A[1:-1, 1:-1].copy()
for dx,dy in shifts:
xstop = -1+dx or None
ystop = -1+dy or None
B += A[1+dx:xstop, 1+dy:ystop]
B /= 9
...which just seems like the straightforward approach. The only extraneous operation is that it has allocate and populate B only once. All the addition, division and indexing has to be done regardless. If you are doing 16 bands, you still only need to allocate B once if your intent is to save an image. Even if this is no help, it might clarify why I don't understand the problem, or at least serve as a benchmark to time the speedups of other methods. This runs in 2.6 sec on my laptop on a 5k x 5k array of float64's, 0.5 of which is the creation of B

Categories

Resources