“*” means convolution
Hello,
I am trying to find a way to merge two 2D convolutions together.
Assume that I have an image “Img” of dimensions (1x20x20) and two kernels “k1” and “k2” both of dimensions (1x3x3).
Normally you would first convolve Img with k1 and then convolve the result with k2:
(Img * k1) * k2
My goal is to find a kernel k3 that if applied to Img does the same thing of the expression above.
Since convolutions are linear operators this is possible. In order to do that (at least mathematically speaking) we can just first convolve k1 with k2 and then apply the result over Img:
k3 = k1 * k2
(Img * k1) * k2 = Img * (k1 * k2) = Img * k3
This formula although works well in the mathematical world, it doesn’t work at all at an implementation level. Take for instance the example above. Both k1 and k2 are of dimensions (1x3x3). If I just blindly apply the formula above and I convolve k1 with k2 then my output will be of dimension (1x1x1). This is clearly not what I want. Therefore, even in this very simple scenario, this formula is “wrong”. What we are supposed to do in this case is to pad k1 with 2 pixels in order to obtain the correct kernel k3 we are looking for.
I’ve found a code that does this here.
I’ll report the code here for simplicity:
import torch
def merge_conv_kernels(k1, k2):
"""
:input k1: A tensor of shape ``(out1, in1, s1, s1)``
:input k1: A tensor of shape ``(out2, in2, s2, s2)``
:returns: A tensor of shape ``(out2, in1, s1+s2-1, s1+s2-1)``
so that convolving with it equals convolving with k1 and
then with k2.
"""
padding = k2.shape[-1] - 1
# Flip because this is actually correlation, and permute to adapt to BHCW
k3 = torch.conv2d(k1.permute(1, 0, 2, 3), k2.flip(-1, -2),
padding=padding).permute(1, 0, 2, 3)
return k3
However, this code doesn’t work at all when the two convolutions have different paddings and strides.
I was wondering if it is still possible to merge convolutions together when paddings and strides are taken into consideration and if someone could provide a hint on how to do it or a working code for this more complicated scenario (PyTorch).
Thank you
When the first convolution pad enough (up to kernel size - 1) and no stride, you can merge your convolution with any pad/stride for the second convolution with:
def merge_conv_kernels(k1, k2, s2, p2):
# Assuming p1 = k1.shape[-1] - 1 and s1 = 1
kernel_pad = k2.shape[-1] - 1
k3 = torch.conv2d(k1.permute(1, 0, 2, 3), k2.flip(-1, -2),
padding=kernel_pad,
stride=1).permute(1, 0, 2, 3)
p3 = k1.shape[-1] - 1 + p2
s3 = s2
return k3, s3, p3
If you have pad=0 in the first convolution, you can find counter-example. For instance in a 3*3 image and :
kernel1 = tensor([[1, 0, -1],
[1, 0, -1],
[1, 0, -1]])
p1, s1 = 0, 1
kernel2 = ones(3, 3)
p2, s2 = 2, 1
Basically the combination of the two convolutions applies the kernel 1 and copy the value in a 3*3 image. You can't get it with only one convolution. First, the kernel should be of size 4 or 5 with padding 1 or 2 to get all the input values every time (or more than 5 but it leads to never used values in the kernel). Now by considering each pixel and any input matrix, we can see that the kernel must contain the kernel 1 in a all its 3*3 sub-matrices. But it is impossible due to the asymmetry of the kernel 1.
You can find similar problems with stride > 1 or padding < kernel_size - 1 in the first convolution that decreases the size of the output.
Related
I have a torch cuda tensor A of shape (100, 80000, 4), and another cuda tensor B of shape (1, 80000, 1). What I want to do, is for each element i in the second dimension (from 0 to 79999), take the value of tensor B (which will be from 0 to 99 and will point out to which value in the first dimension of A to take.
An additional problem is that for this element of B (B[0, i, 0]), I want to take a slice from A that is A[lower_bound:upper_bound, i:i+1, :].
To sum it up, my tensor B has the indices of the centers of slices that I would like to cut from A. I am wondering, whether there is a way, to do what I am doing with the below code faster (eg. using cuda)
A # tensor of shape (100, 80000, 4)
B # tensor of shape (1, 80000, 1) which has
k = 3 # width of the slice to take (or half of it to be exact)
A_list = []
for i in range(80000):
lower_bound = max(0, B[0, i, 0]-k)
upper_bound = min(100, B[0, i, 0]+k+1)
A_mean = A[lower_bound:upper_bound, i:i+1, :].mean(0, keepdim=True)
A_list.append(A_mean)
A = torch.cat(A_list , dim=1)
Something similar to this can work if k = 1. (requires torch>1.7)
a = torch.rand((10, 20, 4))
b = torch.randint(10, (20, 1))
b2 = torch.cat((b-1, b, b+1), dim=1)
b2 = torch.minimum(9*torch.ones_like(b2), b2)
b2 = torch.maximum(0*torch.ones_like(b2), b2)
a[:, b2, :]
and then reshape to get the right size
A 2D convolution kernel, K, of shape (k1, k2, n_channel, n_filter) applies on a 2D vector, A, of shape (m1, m2, n_channel) and generates another 2D vector, B, of shape (m1 - k1 + 1, m2 - k2 + 1, n_filter) (with valid padding).
It is also true that for each K, there exists a W_K of shape (m1 - k1 + 1, m2 - k2 + 1, n_filter, m1, m2, n_channel), such that tensor dot of W_K and A is equal to B. i.e. B = np.tensordot(W_K, A, 3).
I am trying to find a pure NumPy solution to generate this W_K from K without using any python loops.
I can see W_K[i,j,f] == np.pad(K[...,f], ((i,m1-i-k1), (j,m2-j-k2)), 'constant', constant_values=0) or simply W_K[i, j, f, i:i+k1, j:j+k2, ...] == K[..., f].
What I'm looking for is almost similar to a Toeplitz matrix. But I need it in multi-dimensions.
Example in loopy code:
import numpy as np
# 5x5 image with 3-channels
A = np.random.random((5,5,3))
# 2x2 Conv2D kernel with 2 filters for A
K = np.random.random((2,2,3,2))
# It should be of (4,4,2,5,5,3), but I create this way for convenience. I move the axis at the end.
W_K = np.empty((4,4,5,5,3,2))
for i, j in np.ndindex(4, 4):
W_K[i, j] = np.pad(K, ((i, 5-i-2),(j, 5-j-2), (0, 0), (0, 0)), 'constant', constant_values=0)
# above lines can also be rewritten as
W_K = np.zeros((4,4,5,5,3,2))
for i, j in np.ndindex(4, 4):
W_K[i, j, i:i+2, j:j+2, ...] = K[...]
W_K = np.moveaxis(W_K, -1, 2)
# now I can do
B = np.tensordot(W_K, A, 3)
What you want needs a bit of fancy indexing gymnastics but it's not very cumbersome to code. The idea is to create 4-dimensional index arrays that apply the W_K[i, j, i:i+2, j:j+2, ...] part of your second loopy example.
Here's a slightly modified version of your example, just to make sure that some relevant dimensions differ (because this makes bugs easier to find: they would be proper errors rather than mangled values):
import numpy as np
# parameter setup
k1, k2, nch, nf = 2, 4, 3, 2
m1, m2 = 5, 6
w1, w2 = m1 - k1 + 1, m2 - k2 + 1
K = np.random.random((k1, k2, nch, nf))
A = np.random.random((m1, m2, nch))
# your loopy version for comparison
W_K = np.zeros((w1, w2, nf, m1, m2, nch))
for i, j in np.ndindex(w1, w2):
W_K[i, j, :, i:i+k1, j:j+k2, ...] = K.transpose(-1, 0, 1, 2)
W_K2 = np.zeros((w1, w2, m1, m2, nch, nf)) # to be transposed back
i,j = np.mgrid[:w1, :w2][..., None, None] # shape (w1, w2, 1, 1)
k,l = np.mgrid[:k1, :k2] # shape (k1, k2) ~ (1, 1, k1, k2)
W_K2[i, j, i+k, j+l, ...] = K
W_K2 = np.moveaxis(W_K2, -1, 2)
print(np.array_equal(W_K, W_K2)) # True
We first create an index mesh i,j that span the first two dimensions of W_K, then create two similar meshes that span its (pre-moveaxis) second and third dimensions. By injecting two trailing singleton dimensions into the former we end up with 4d index arrays that together span the first four dimensions of W_K.
All that's left is to assign to this slice using the original K, and move back the dimension. Due to how advanced indexing changes behaviour when the sliced (non-advanced) indices in an expression are not all next to one another, this is much easier to do with your moveaxis approach. I first tried to create W_K2 with its final dimensions, but then we'd have W_K[i, j, :, i+k, j+l, ...] that has subtly different behaviour (in particular, different shape).
I have two discrete probability distributions. These are represented as TensorFlow 1D tensors p1 and p2, each of length len. I want to generate pairs of indices (i, j), where i is generated from the first probability distribution and j is from the second. I want to generate a lot of pairs until there are in total len distinct pairs. How can I achieve this in TensorFlow using a while loop or scan?
To remove the duplicate, one might first start by removing duplicates from the 1d tensors p1 and p2
p1 = tf.unique(tf.constant([1, 2, 3]))
p2 = tf.unique(tf.constant([3, 4, 5]))
Computing the pairs
p1 = tf.constant([1, 2, 3])
p2 = tf.constant([3, 4, 5])
# adding zeros columns
t1 = tf.stack([p1, tf.zeros([a.shape[0]], dtype="int32")], axis=1)
t2 = tf.stack([tf.zeros([b.shape[0]], dtype="int32"), p2], axis=1)
x = tf.expand_dims(t1, 0)
y = tf.expand_dims(t2, 1)
# broadcasting with addition to concatenate the two tensors
# reshaping to have the 2d tensor
c = tf.reshape(tf.add(x, y), [-1, 2])
In short: I am looking for a simple numpy (maybe oneliner) implementation of Maxpool - maximum on a window on numpy.narray for all location of the window across dimensions.
In more details: I am implementing a convolutional neural network ("CNN"), one of the typical layers in such a network is MaxPool layer (look for example here). Writing
y = MaxPool(x, S), x is an input narray and S is a parameter, using pseudocode, the output of the MaxPool is given by:
y[b,h,w,c] = max(x[b, s*h + i, s*w + j, c]) over i = 0,..., S-1; j = 0,...,S-1.
That is, y is narray where the value at indexes b,h,w,c equals the maximum taken over the window of size S x S along the second and the third dimension of the input x, the window "corner" is placed at the indexes b,h,w,c.
Some additional details: The network is implemented using numpy. CNN has many "layers" where output of one layer is the input to the next layer. The input to a layers are numpy.narrays called "tensors". In my case tensors are 4-dimensional numpy.narray's, x. That is x.shape is a tuple (B,H,W,C). Each size of dimensions changes after the tensor is process by a layer, for example the input to layer i= 4 can have size B = 10, H = 24, W = 24, C = 3, while the output, aka input to i+1 layer has B = 10, H = 12, W = 12, C = 5. As indicated in the comments the size after application of MaxPool is (B, H - S + 1, W - S + 1, C).
For a concreteness: if I use
import numpy as np
y = np.amax(x, axis = (1,2))
where x.shape is say (2,3,3,4) this will give me what I want but for a degenerate case where the window I am maximizing over is of the size 3 x 3, the size of the second and third dimension of x, which is not exactly what I want.
Here's a solution using np.lib.stride_tricks.as_strided to create sliding windows resulting in a 6D array of shape : (B,H-S+1,W-S+1,S,S,C) and then simply performing max along the fourth and fifth axes, resulting in an output array of shape : (B,H-S+1,W-S+1,C). The intermediate 6D array would be a view into the input array and as such won't occupy anymore memory. The subsequent operation of max being a reduction would efficiently utilize the sliding views.
Thus, an implementation would be -
# Based on http://stackoverflow.com/a/41850409/3293881
def patchify(img, patch_shape):
a, X, Y, b = img.shape
x, y = patch_shape
shape = (a, X - x + 1, Y - y + 1, x, y, b)
a_str, X_str, Y_str, b_str = img.strides
strides = (a_str, X_str, Y_str, X_str, Y_str, b_str)
return np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
out = patchify(x, (S,S)).max(axis=(3,4))
Sample run -
In [224]: x = np.random.randint(0,9,(10,24,24,3))
In [225]: S = 5
In [226]: np.may_share_memory(patchify(x, (S,S)), x)
Out[226]: True
In [227]: patchify(x, (S,S)).shape
Out[227]: (10, 20, 20, 5, 5, 3)
In [228]: patchify(x, (S,S)).max(axis=(3,4)).shape
Out[228]: (10, 20, 20, 3)
I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by
tiling the original tensor :
def pairwise_l2_norm2(x, y, scope=None):
with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
size_x = tf.shape(x)[0]
size_y = tf.shape(y)[0]
xx = tf.expand_dims(x, -1)
xx = tf.tile(xx, tf.pack([1, 1, size_y]))
yy = tf.expand_dims(y, -1)
yy = tf.tile(yy, tf.pack([1, 1, size_x]))
yy = tf.transpose(yy, perm=[2, 1, 0])
diff = tf.sub(xx, yy)
square_diff = tf.square(diff)
square_dist = tf.reduce_sum(square_diff, 1)
return square_dist
This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.
The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory.
I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.
You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and
D[i,j] = (a[i]-a[j])(a[i]-a[j])'
You can rewrite that into
D[i,j] = r[i] - 2 a[i]a[j]' + r[j]
Where r[i] is squared norm of ith row of the original matrix.
In a system that supports standard broadcasting rules you can treat r as a column vector and write D as
D = r - 2 A A' + r'
In TensorFlow you could write this as
A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)
# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)
result
array([[0, 2, 8],
[2, 0, 2],
[8, 2, 0]], dtype=int32)
Using squared_difference:
def squared_dist(A):
expanded_a = tf.expand_dims(A, 1)
expanded_b = tf.expand_dims(A, 0)
distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
return distances
One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by #YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).
Here is a more general solution for two tensors of coordinates A and B:
def squared_dist(A, B):
assert A.shape.as_list() == B.shape.as_list()
row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
row_norms_A = tf.reshape(row_norms_A, [-1, 1]) # Column vector.
row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
row_norms_B = tf.reshape(row_norms_B, [1, -1]) # Row vector.
return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B
Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).
If you want compute other method , then change the order of the tf modules.
def compute_euclidean_distance(x, y):
size_x = x.shape.dims[0]
size_y = y.shape.dims[0]
for i in range(size_x):
tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
if i == 0:
d = eu_one
else:
d = tf.concat([d, eu_one], axis=0)
return d