How unfold operation works in pytorch with dilation and stride?

How unfold operation works in pytorch with dilation and stride? - python

In my case I am applying this unfold operation on a tensor of A as given below:
A.shape=torch.Size([16, 309,128])
A = A.unsqueeze(1) # that's I guess for making it 4 dim for unfold operation
A_out= F.unfold(A, (7, 128), stride=(1,128),dilation=(3,1))
A_out.shape=torch.Size([16, 896,291])
I am not getting this 291. If the dilation factor is not there, it would be [16,896,303] right?
But if dialtion=3 then it's 291 how? Also here stride is not mentioned so deafualt is 1 but what if it is also mentioned like 4. Please guide.

Also here stride is not mentioned so default is 1 but what if it is
also mentioned like 4.
Your code already has stride=(1,128). If stride is only set to 4 it will be used like (4,4) in this case. This can be easily verified with formula below.
If the dilation factor is not there, it would be [16,896,303] right?
Yes. Example below.
But if dialtion=3 then it's 291 how?
Following the formula given in pytorch docs it comes to 291. After doing A.unsqueeze(1) the shape becomes, [16, 1, 309, 128]. Here, N=16, C=1, H=309, W=128.
The output dimension is, (N, C * product(kernel_size), L). With kernel_size=(7,128) So this becomes, (16, 1 * 7 * 128, L) = (16, 896, L).
L can be calculated using the formula below with multiplication over each dimension.
L = d3 * d4
Over height dimension spatial_size[3] = 309, padding[3] = 0 default, dilation[3] = 3, kernel_size[3] = 7, stride[3] = 1.
d3 = (309 + 2 * 0 - 3 * (7 - 1) - 1) / 1 + 1
= 291
Over width dimension spatial_size[4] = 128, padding[4] = 0 default, dilation[4] = 1, kernel_size[4] = 128, stride[4] = 128.
d4 = (128 + 2 * 0 - 1 * (128 - 1) - 1) / 128 + 1
= 1
So, using above formula L becomes 291.
Code
import torch
from torch.nn import functional as F
A = torch.randn([16, 309,128])
print(A.shape)
A = A.unsqueeze(1)
print(A.shape)
A_out= F.unfold(A, kernel_size=(7, 128), stride=(1,128),dilation=(3,1))
print(A_out.shape)
Output
torch.Size([16, 309, 128])
torch.Size([16, 1, 309, 128])
torch.Size([16, 896, 291])
Links
https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html
https://pytorch.org/docs/stable/generated/torch.nn.functional.unfold.html

Related

How to quickly cut slices from cuda tensor wrt to another tensors values

I have a torch cuda tensor A of shape (100, 80000, 4), and another cuda tensor B of shape (1, 80000, 1). What I want to do, is for each element i in the second dimension (from 0 to 79999), take the value of tensor B (which will be from 0 to 99 and will point out to which value in the first dimension of A to take.
An additional problem is that for this element of B (B[0, i, 0]), I want to take a slice from A that is A[lower_bound:upper_bound, i:i+1, :].
To sum it up, my tensor B has the indices of the centers of slices that I would like to cut from A. I am wondering, whether there is a way, to do what I am doing with the below code faster (eg. using cuda)
A # tensor of shape (100, 80000, 4)
B # tensor of shape (1, 80000, 1) which has
k = 3 # width of the slice to take (or half of it to be exact)
A_list = []
for i in range(80000):
lower_bound = max(0, B[0, i, 0]-k)
upper_bound = min(100, B[0, i, 0]+k+1)
A_mean = A[lower_bound:upper_bound, i:i+1, :].mean(0, keepdim=True)
A_list.append(A_mean)
A = torch.cat(A_list , dim=1)

Something similar to this can work if k = 1. (requires torch>1.7)
a = torch.rand((10, 20, 4))
b = torch.randint(10, (20, 1))
b2 = torch.cat((b-1, b, b+1), dim=1)
b2 = torch.minimum(9*torch.ones_like(b2), b2)
b2 = torch.maximum(0*torch.ones_like(b2), b2)
a[:, b2, :]
and then reshape to get the right size

Numerical errors in Keras vs Numpy

In order to really understand convolutional layers, I have reimplemented the forward method of a single keras Conv2D layer in basic numpy. The outputs of both seam almost identical, but there are some minor differences.
Getting the keras output:
inp = K.constant(test_x)
true_output = model.layers[0].call(inp).numpy()
My output:
def relu(x):
return np.maximum(0, x)
def forward(inp, filter_weights, filter_biases):
result = np.zeros((1, 64, 64, 32))
inp_with_padding = np.zeros((1, 66, 66, 1))
inp_with_padding[0, 1:65, 1:65, :] = inp
for filter_num in range(32):
single_filter_weights = filter_weights[:, :, 0, filter_num]
for i in range(64):
for j in range(64):
prod = single_filter_weights * inp_with_padding[0, i:i+3, j:j+3, 0]
filter_sum = np.sum(prod) + filter_biases[filter_num]
result[0, i, j, filter_num] = relu(filter_sum)
return result
my_output = forward(test_x, filter_weights, biases_weights)
The results are largely the same, but here are some examples of differences:
Mine: 2.6608338356018066
Keras: 2.660834312438965
Mine: 1.7892705202102661
Keras: 1.7892701625823975
Mine: 0.007190803997218609
Keras: 0.007190565578639507
Mine: 4.970898151397705
Keras: 4.970897197723389
I've tried converting everything to float32, but that does not solve it. Any ideas?
Edit:
I plotted the distribution over errors, and it might give some insight into what is happening. As can be seen, the errors all have very similar values, falling into four groups. However, these errors are not exactly these four values, but are almost all unique values around these four peaks.
I am very interested in how to get my implementation to exactly match the keras one. Unfortunately, the errors seem to increase exponentially when implementing multiple layers. Any insight would help me out a lot!

Given how small the differences are, I would say that they are rounding errors.
I recommend using np.isclose (or math.isclose) to check if floats are "equal".

Floating point operations are not commutable. Here is an example:
In [19]: 1.2 - 1.0 - 0.2
Out[19]: -5.551115123125783e-17
In [21]: 1.2 - 0.2 - 1.0
Out[21]: 0.0
So if you want completely identical results, you not only need to do the same computations analytically. But you also need to do them in the exact same order, with the same datatypes and rounding implementation.
To debug this. Start with the Keras code and change it line by line towards your code, until you see a difference.

First thing is to check whether you're using padding='same'. You seem to be using padding same in your implementation.
If you're using other types of padding, including the default which is padding='valid', there will be a difference.
Another possibility is that you may be accumulating errors because of the triple loop of little sums.
You could do it at once and see if it gets different. Compare this implementation with your own, for instance:
def forward2(inp, filter_weights, filter_biases):
#inp: (batch, 64, 64, in)
#w: (3, 3, in, out)
#b: (out,)
padded_input = np.pad(inp, ((0,0), (1,1), (1,1), (0,0))) #(batch, 66, 66, in)
stacked_input = np.stack([
padded_input[:, :-2],
padded_input[:, 1:-1],
padded_input[:, 2: ]], axis=1) #(batch, 3, 64, 64, in)
stacked_input = np.stack([
stacked_input[:, :, :, :-2],
stacked_input[:, :, :, 1:-1],
stacked_input[:, :, :, 2: ]], axis=2) #(batch, 3, 3, 64, 64, in)
stacked_input = stacked_input.reshape((-1, 3, 3, 64, 64, 1, 1))
w = filter_weights.reshape(( 1, 3, 3, 1, 1, 1, 32))
b = filter_biases.reshape (( 1, 1, 1, 32))
result = stacked_input * w #(-1, 3, 3, 64, 64, 1, 32)
result = result.sum(axis=(1,2,-2)) #(-1, 64, 64, 32)
result += b
result = relu(result)
return result
A third possibility is to check whether you're using GPU and switch everything to CPU for test. Some algorithms for GPU are even non-deterministic.
For any kernel size:
def forward3(inp, filter_weights, filter_biases):
inShape = inp.shape #(batch, imgX, imgY, ins)
wShape = filter_weights.shape #(wx, wy, ins, out)
bShape = filter_biases.shape #(out,)
ins = inShape[-1]
out = wShape[-1]
wx = wShape[0]
wy = wShape[1]
imgX = inShape[1]
imgY = inShape[2]
assert imgX >= wx
assert imgY >= wy
assert inShape[-1] == wShape[-2]
assert bShape[-1] == wShape[-1]
#you may need to invert this padding, exchange L with R
loseX = wx - 1
padXL = loseX // 2
padXR = padXL + (1 if loseX % 2 > 0 else 0)
loseY = wy - 1
padYL = loseY // 2
padYR = padYL + (1 if loseY % 2 > 0 else 0)
padded_input = np.pad(inp, ((0,0), (padXL,padXR), (padYL,padYR), (0,0)))
#(batch, paddedX, paddedY, in)
stacked_input = np.stack([padded_input[:, i:imgX + i] for i in range(wx)],
axis=1) #(batch, wx, imgX, imgY, in)
stacked_input = np.stack([stacked_input[:,:,:,i:imgY + i] for i in range(wy)],
axis=2) #(batch, wx, wy, imgX, imgY, in)
stacked_input = stacked_input.reshape((-1, wx, wy, imgX, imgY, ins, 1))
w = filter_weights.reshape(( 1, wx, wy, 1, 1, ins, out))
b = filter_biases.reshape(( 1, 1, 1, out))
result = stacked_input * w
result = result.sum(axis=(1,2,-2))
result += b
result = relu(result)
return result

How to vectorize the sum? tensor[i,:,:,:] + tensor[i]

I want to vectorize the following code:
def style_noise(self, y, style):
n = torch.randn(y.shape)
for i in range(n.shape[0]):
n[i] = (n[i] - n.mean(dim=(1, 2, 3))[i]) * style.std(dim=(1, 2, 3))[i] / n.std(dim=(1, 2, 3))[i] + style.mean(dim=(1, 2, 3))[i]
noise = Variable(n, requires_grad=False).to(y.device)
return noise
I didn't find a way nice way of doing so.
y and style are 4d tensors, say style.shape = y.shape = [64, 3, 128, 128].
I want to return the noise tensor, noise.shape = [64, 3, 128, 128].
Please let me know in the comments if the question is not clear.

Your use case is exactly why the .mean and .std methods come with a keepdim parameter. You can make use of this to enable broadcasting semantics to vectorize things for you:
def style_noise(self, y, style):
n = torch.randn(y.shape)
n_mean = n.mean(dim=(1, 2, 3), keepdim=True)
n_std = n.std(dim=(1, 2, 3), keepdim=True)
style_mean = style.mean(dim=(1, 2, 3), keepdim=True)
style_std = style.std(dim=(1, 2, 3), keepdim=True)
n = (n - n_mean) * style_std / n_std + style_mean
noise = Variable(n, requires_grad=False).to(y.device)
return noise

To calculate mean and std for the whole tensor you set no arguments
m = t.mean(); print(m) # if you don't set the dim for the whole tensor
s = t.std(); print(s) # if you don't set the dim for the whole tensor
Then if your shape is 2,2,2 for instance, create tensors for broadcasting subtract and division.
ss = torch.empty(2,2,2).fill_(s)
print(ss)
mm = torch.empty(2,2,2).fill_(m)
print(mm)
At the moment keepdim is not working as expected when you don't set the dim.
m = t.mean(); print(m) # for the whole tensor
s = t.std(); print(s) # for the whole tensor
m = t.mean(dim=0); print(m) # 0 means columns mean
s = t.std(dim=0); print(s) # 0 means columns mean
m = t.mean(dim=1); print(m) # 1 means rows mean
s = t.std(dim=1); print(s) # 1 means rows mean
s = t.mean(keepdim=True);print(s) # will not work
m = t.std(keepdim=True);print(m) # will not work
If you set a dim as a tuple, then it will return mean for axes, you asked not for the whole.

How to implement maxpool: taking a maximum on sliding window on image or tensor

In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions.
In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here). Writing
y = MaxPool(x, S), x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by:
y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.
That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x, the window "corner" is placed at the indexes b,h,w,c.
Some additional details: The network is implemented using numpy. CNN has many "layers" where output of one layer is the input to the next layer. The input to a layers are numpy.narrays called "tensors". In my case tensors are 4-dimensional numpy.narray's, x. That is x.shape is a tuple (B,H,W,C). Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3, while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5. As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C).
For a concreteness: if I use
import numpy as np
y = np.amax(x, axis = (1,2))
where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3, the size of the second and third dimension of x, which is not exactly what I want.

Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C). The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. The subsequent operation of max being a reduction would efficiently utilize the sliding views.
Thus, an implementation would be -
# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
a, X, Y, b = img.shape
x, y = patch_shape
shape = (a, X - x + 1, Y - y + 1, x, y, b)
a_str, X_str, Y_str, b_str = img.strides
strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
out = patchify(x, (S,S)).max(axis=(3,4))
Sample run -
In [224]: x = np.random.randint(0,9,(10,24,24,3))
In [225]: S = 5
In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True
In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)
In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)

Broadcasting mathematical operations with PYMC3/Theano

I think this issue boils down to my lack of understanding with Theano works. I'm in a situation where I want to create a variable that is the result of a subtraction between a distribution and a numpy array. This works fine when I specify the shape parameter as 1
import pymc3 as pm
import numpy as np
import theano.tensor as T
X = np.random.randint(low = -10, high = 10, size = 100)
with pm.Model() as model:
nl = pm.Normal('nl', shape = 1)
det = pm.Deterministic('det', nl - x)
nl.dshape
(1,)
However, this breaks when I specify shape > 1
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X)
ValueError: Input dimension mis-match. (input[0].shape[0] = 2, input[1].shape[0] = 100)
nl.dshape
(2,)
X.shape
(100,)
I tried transposing X to make it broadcastable
X2 = X.reshape(-1, 1).transpose()
X2.shape
(1, 100)
But now it declares a mismatch at .shape[1] instead of .shape[0]
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X2)
ValueError: Input dimension mis-match. (input[0].shape[1] = 2, input[1].shape[1] = 100)
I can make this work if I loop over the elements of the distribution
distShape = 2
with pm.Model() as model:
nl = pm.Normal('nl', shape = distShape)
det = {}
for i in range(distShape):
det[i] = pm.Deterministic('det' + str(i), nl[i] - X)
det
{0: det0, 1: det1}
However this feels inelegant and constrains me to use loops for the rest of the model. I was wondering if there was a way to specify this operation so that it could work the same as with distributions.
distShape = 2
with pm.Model() as model:
nl0 = pm.Normal('nl1', shape = distShape)
nl1 = pm.Normal('nl2', shape = 1)
det = pm.Deterministic('det', nl0 - nl1)

You can do
X = np.random.randint(low = -10, high = 10, size = 100)
X = x[:,None] # or x.reshape(-1, 1)
and then
with pm.Model() as model:
nl = pm.Normal('nl', shape = 2)
det = pm.Deterministic('det', nl - X)
In this case the shapes of nl and X will be ((2, 1), (100,)), respectively and then broadcastable.
Notice we get the same behavior with two NumPy arrays (not only one Theano tensor and one NumPy array)
a0 = np.array([1,2])
b0 = np.array([1,2,3,5])
a0 = a0[:,None] # comment/uncomment this line
print(a0.shape, b0.shape)
b0-a0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How unfold operation works in pytorch with dilation and stride? - python

Related

How to quickly cut slices from cuda tensor wrt to another tensors values

Numerical errors in Keras vs Numpy

How to vectorize the sum? tensor[i,:,:,:] + tensor[i]

How to implement maxpool: taking a maximum on sliding window on image or tensor

Broadcasting mathematical operations with PYMC3/Theano

Categories

Resources