Numpy way to generate linear operation matrix from a convolution kernel - python

A 2D convolution kernel, K, of shape (k1, k2, n_channel, n_filter) applies on a 2D vector, A, of shape (m1, m2, n_channel) and generates another 2D vector, B, of shape (m1 - k1 + 1, m2 - k2 + 1, n_filter) (with valid padding).
It is also true that for each K, there exists a W_K of shape (m1 - k1 + 1, m2 - k2 + 1, n_filter, m1, m2, n_channel), such that tensor dot of W_K and A is equal to B. i.e. B = np.tensordot(W_K, A, 3).
I am trying to find a pure NumPy solution to generate this W_K from K without using any python loops.
I can see W_K[i,j,f] == np.pad(K[...,f], ((i,m1-i-k1), (j,m2-j-k2)), 'constant', constant_values=0) or simply W_K[i, j, f, i:i+k1, j:j+k2, ...] == K[..., f].
What I'm looking for is almost similar to a Toeplitz matrix. But I need it in multi-dimensions.
Example in loopy code:
import numpy as np
# 5x5 image with 3-channels
A = np.random.random((5,5,3))
# 2x2 Conv2D kernel with 2 filters for A
K = np.random.random((2,2,3,2))
# It should be of (4,4,2,5,5,3), but I create this way for convenience. I move the axis at the end.
W_K = np.empty((4,4,5,5,3,2))
for i, j in np.ndindex(4, 4):
W_K[i, j] = np.pad(K, ((i, 5-i-2),(j, 5-j-2), (0, 0), (0, 0)), 'constant', constant_values=0)
# above lines can also be rewritten as
W_K = np.zeros((4,4,5,5,3,2))
for i, j in np.ndindex(4, 4):
W_K[i, j, i:i+2, j:j+2, ...] = K[...]
W_K = np.moveaxis(W_K, -1, 2)
# now I can do
B = np.tensordot(W_K, A, 3)

What you want needs a bit of fancy indexing gymnastics but it's not very cumbersome to code. The idea is to create 4-dimensional index arrays that apply the W_K[i, j, i:i+2, j:j+2, ...] part of your second loopy example.
Here's a slightly modified version of your example, just to make sure that some relevant dimensions differ (because this makes bugs easier to find: they would be proper errors rather than mangled values):
import numpy as np
# parameter setup
k1, k2, nch, nf = 2, 4, 3, 2
m1, m2 = 5, 6
w1, w2 = m1 - k1 + 1, m2 - k2 + 1
K = np.random.random((k1, k2, nch, nf))
A = np.random.random((m1, m2, nch))
# your loopy version for comparison
W_K = np.zeros((w1, w2, nf, m1, m2, nch))
for i, j in np.ndindex(w1, w2):
W_K[i, j, :, i:i+k1, j:j+k2, ...] = K.transpose(-1, 0, 1, 2)
W_K2 = np.zeros((w1, w2, m1, m2, nch, nf)) # to be transposed back
i,j = np.mgrid[:w1, :w2][..., None, None] # shape (w1, w2, 1, 1)
k,l = np.mgrid[:k1, :k2] # shape (k1, k2) ~ (1, 1, k1, k2)
W_K2[i, j, i+k, j+l, ...] = K
W_K2 = np.moveaxis(W_K2, -1, 2)
print(np.array_equal(W_K, W_K2)) # True
We first create an index mesh i,j that span the first two dimensions of W_K, then create two similar meshes that span its (pre-moveaxis) second and third dimensions. By injecting two trailing singleton dimensions into the former we end up with 4d index arrays that together span the first four dimensions of W_K.
All that's left is to assign to this slice using the original K, and move back the dimension. Due to how advanced indexing changes behaviour when the sliced (non-advanced) indices in an expression are not all next to one another, this is much easier to do with your moveaxis approach. I first tried to create W_K2 with its final dimensions, but then we'd have W_K[i, j, :, i+k, j+l, ...] that has subtly different behaviour (in particular, different shape).

Related

How to merge two 2D convolutions together

“*” means convolution
Hello,
I am trying to find a way to merge two 2D convolutions together.
Assume that I have an image “Img” of dimensions (1x20x20) and two kernels “k1” and “k2” both of dimensions (1x3x3).
Normally you would first convolve Img with k1 and then convolve the result with k2:
(Img * k1) * k2
My goal is to find a kernel k3 that if applied to Img does the same thing of the expression above.
Since convolutions are linear operators this is possible. In order to do that (at least mathematically speaking) we can just first convolve k1 with k2 and then apply the result over Img:
k3 = k1 * k2
(Img * k1) * k2 = Img * (k1 * k2) = Img * k3
This formula although works well in the mathematical world, it doesn’t work at all at an implementation level. Take for instance the example above. Both k1 and k2 are of dimensions (1x3x3). If I just blindly apply the formula above and I convolve k1 with k2 then my output will be of dimension (1x1x1). This is clearly not what I want. Therefore, even in this very simple scenario, this formula is “wrong”. What we are supposed to do in this case is to pad k1 with 2 pixels in order to obtain the correct kernel k3 we are looking for.
I’ve found a code that does this here.
I’ll report the code here for simplicity:
import torch
def merge_conv_kernels(k1, k2):
"""
:input k1: A tensor of shape ``(out1, in1, s1, s1)``
:input k1: A tensor of shape ``(out2, in2, s2, s2)``
:returns: A tensor of shape ``(out2, in1, s1+s2-1, s1+s2-1)``
so that convolving with it equals convolving with k1 and
then with k2.
"""
padding = k2.shape[-1] - 1
# Flip because this is actually correlation, and permute to adapt to BHCW
k3 = torch.conv2d(k1.permute(1, 0, 2, 3), k2.flip(-1, -2),
padding=padding).permute(1, 0, 2, 3)
return k3
However, this code doesn’t work at all when the two convolutions have different paddings and strides.
I was wondering if it is still possible to merge convolutions together when paddings and strides are taken into consideration and if someone could provide a hint on how to do it or a working code for this more complicated scenario (PyTorch).
Thank you
When the first convolution pad enough (up to kernel size - 1) and no stride, you can merge your convolution with any pad/stride for the second convolution with:
def merge_conv_kernels(k1, k2, s2, p2):
# Assuming p1 = k1.shape[-1] - 1 and s1 = 1
kernel_pad = k2.shape[-1] - 1
k3 = torch.conv2d(k1.permute(1, 0, 2, 3), k2.flip(-1, -2),
padding=kernel_pad,
stride=1).permute(1, 0, 2, 3)
p3 = k1.shape[-1] - 1 + p2
s3 = s2
return k3, s3, p3
If you have pad=0 in the first convolution, you can find counter-example. For instance in a 3*3 image and :
kernel1 = tensor([[1, 0, -1],
[1, 0, -1],
[1, 0, -1]])
p1, s1 = 0, 1
kernel2 = ones(3, 3)
p2, s2 = 2, 1
Basically the combination of the two convolutions applies the kernel 1 and copy the value in a 3*3 image. You can't get it with only one convolution. First, the kernel should be of size 4 or 5 with padding 1 or 2 to get all the input values every time (or more than 5 but it leads to never used values in the kernel). Now by considering each pixel and any input matrix, we can see that the kernel must contain the kernel 1 in a all its 3*3 sub-matrices. But it is impossible due to the asymmetry of the kernel 1.
You can find similar problems with stride > 1 or padding < kernel_size - 1 in the first convolution that decreases the size of the output.

How to quickly cut slices from cuda tensor wrt to another tensors values

I have a torch cuda tensor A of shape (100, 80000, 4), and another cuda tensor B of shape (1, 80000, 1). What I want to do, is for each element i in the second dimension (from 0 to 79999), take the value of tensor B (which will be from 0 to 99 and will point out to which value in the first dimension of A to take.
An additional problem is that for this element of B (B[0, i, 0]), I want to take a slice from A that is A[lower_bound:upper_bound, i:i+1, :].
To sum it up, my tensor B has the indices of the centers of slices that I would like to cut from A. I am wondering, whether there is a way, to do what I am doing with the below code faster (eg. using cuda)
A # tensor of shape (100, 80000, 4)
B # tensor of shape (1, 80000, 1) which has
k = 3 # width of the slice to take (or half of it to be exact)
A_list = []
for i in range(80000):
lower_bound = max(0, B[0, i, 0]-k)
upper_bound = min(100, B[0, i, 0]+k+1)
A_mean = A[lower_bound:upper_bound, i:i+1, :].mean(0, keepdim=True)
A_list.append(A_mean)
A = torch.cat(A_list , dim=1)
Something similar to this can work if k = 1. (requires torch>1.7)
a = torch.rand((10, 20, 4))
b = torch.randint(10, (20, 1))
b2 = torch.cat((b-1, b, b+1), dim=1)
b2 = torch.minimum(9*torch.ones_like(b2), b2)
b2 = torch.maximum(0*torch.ones_like(b2), b2)
a[:, b2, :]
and then reshape to get the right size

distinct pairs of indices from two 1d tensors

I have two discrete probability distributions. These are represented as TensorFlow 1D tensors p1 and p2, each of length len. I want to generate pairs of indices (i, j), where i is generated from the first probability distribution and j is from the second. I want to generate a lot of pairs until there are in total len distinct pairs. How can I achieve this in TensorFlow using a while loop or scan?
To remove the duplicate, one might first start by removing duplicates from the 1d tensors p1 and p2
p1 = tf.unique(tf.constant([1, 2, 3]))
p2 = tf.unique(tf.constant([3, 4, 5]))
Computing the pairs
p1 = tf.constant([1, 2, 3])
p2 = tf.constant([3, 4, 5])
# adding zeros columns
t1 = tf.stack([p1, tf.zeros([a.shape[0]], dtype="int32")], axis=1)
t2 = tf.stack([tf.zeros([b.shape[0]], dtype="int32"), p2], axis=1)
x = tf.expand_dims(t1, 0)
y = tf.expand_dims(t2, 1)
# broadcasting with addition to concatenate the two tensors
# reshaping to have the 2d tensor
c = tf.reshape(tf.add(x, y), [-1, 2])

Compute x**k with x, k being arrays of arbitrary dimensionality

I have two numpy arrays: One array x with shape (n, a0, a1, ...) and one array k with shape (n, b0, b1, ...). I would like to compute and array of exponentials such that the output has dimension (a0, a1, ..., b0, b1, ...) and
out[i0, i1, ..., j0, j1, ...] == prod(x[:, i0, i1, ...] ** k[:, j0, j1, ...])
If there is only one a_i and one b_j, broadcasting does the trick via
import numpy
x = numpy.random.rand(2, 31)
k = numpy.random.randint(1, 10, size=(2, 101))
out = numpy.prod(x[..., None]**k[:, None], axis=0)
If x has a few dimensions more, more Nones have to be added:
x = numpy.random.rand(2, 31, 32, 33)
k = numpy.random.randint(1, 10, size=(2, 101))
out = numpy.prod(x[..., None]**k[:, None, None, None], axis=0)
If x has a few dimensions more, more Nones have to be added at other places:
x = numpy.random.rand(2, 31)
k = numpy.random.randint(1, 10, size=(2, 51, 51))
out = numpy.prod(x[..., None, None]**k[:, None], axis=0)
How to make the computation of out generic with respect to the dimensionality of the input arrays?
Here's one using reshaping of the two arrays so that they are broadcastable against each other and then performing those operations and prod reduction along the first axis -
k0_shp = [k.shape[0]] + [1]*(x.ndim-1) + list(k.shape[1:])
x0_shp = list(x.shape) + [1]*(k.ndim-1)
out = (x.reshape(x0_shp) ** k.reshape(k0_shp)).prod(0)
Here's another way to reshape both inputs to 3D allowing one singleton dim per input and such that they are broadcastable against each other, perform prod reduction to get 2D array, then reshape back to multi-dim array -
s = x.shape[1:] + k.shape[1:] # output shape
out = (x.reshape(x.shape[0],-1,1)**k.reshape(k.shape[0],1,-1)).prod(0).reshape(s)
It must be noted that reshaping merely creates a view into the input array and as such is virtually free both memory-wise and performance-wise.
Without understanding fully the math of what you're doing, it seems that you need a constant number of None's for the number of dimensions of each x and k.
does something like this work?
out = numpy.prod(x[[...]+[None]*(k.ndim-1)]**k[[slice(None)]+[None]*(x.ndim-1)])
Here are the slices separately so they're a bit easier to read:
x[ [...] + [None]*(k.ndim-1) ]
k[ [slice(None)] + [None]*(x.ndim-1) ]
Compatibility Note:
[...] seems to only be valid in python 3.x If you are using 2.7 (I haven't tested lower) substitute [Ellipsis] instead:
x[ [Ellipsis] + [None]*(k.ndim-1) ]

Compute pairwise distance in a batch without replicating tensor in Tensorflow?

I want to compute the pairwise square distance of a batch of feature in Tensorflow. I have a simple implementation using + and * operations by
tiling the original tensor :
def pairwise_l2_norm2(x, y, scope=None):
with tf.op_scope([x, y], scope, 'pairwise_l2_norm2'):
size_x = tf.shape(x)[0]
size_y = tf.shape(y)[0]
xx = tf.expand_dims(x, -1)
xx = tf.tile(xx, tf.pack([1, 1, size_y]))
yy = tf.expand_dims(y, -1)
yy = tf.tile(yy, tf.pack([1, 1, size_x]))
yy = tf.transpose(yy, perm=[2, 1, 0])
diff = tf.sub(xx, yy)
square_diff = tf.square(diff)
square_dist = tf.reduce_sum(square_diff, 1)
return square_dist
This function takes as input two matrices of size (m,d) and (n,d) and compute the squared distance between each row vector. The output is a matrix of size (m,n) with element 'd_ij = dist(x_i, y_j)'.
The problem is that I have a large batch and high dim features 'm, n, d' replicating the tensor consume a lot of memory.
I'm looking for another way to implement this without increasing the memory usage and just only store the final distance tensor. Kind of double looping the original tensor.
You can use some linear algebra to turn it into matrix ops. Note that what you need matrix D where a[i] is the ith row of your original matrix and
D[i,j] = (a[i]-a[j])(a[i]-a[j])'
You can rewrite that into
D[i,j] = r[i] - 2 a[i]a[j]' + r[j]
Where r[i] is squared norm of ith row of the original matrix.
In a system that supports standard broadcasting rules you can treat r as a column vector and write D as
D = r - 2 A A' + r'
In TensorFlow you could write this as
A = tf.constant([[1, 1], [2, 2], [3, 3]])
r = tf.reduce_sum(A*A, 1)
# turn r into column vector
r = tf.reshape(r, [-1, 1])
D = r - 2*tf.matmul(A, tf.transpose(A)) + tf.transpose(r)
sess = tf.Session()
sess.run(D)
result
array([[0, 2, 8],
[2, 0, 2],
[8, 2, 0]], dtype=int32)
Using squared_difference:
def squared_dist(A):
expanded_a = tf.expand_dims(A, 1)
expanded_b = tf.expand_dims(A, 0)
distances = tf.reduce_sum(tf.squared_difference(expanded_a, expanded_b), 2)
return distances
One thing I noticed is that this solution using tf.squared_difference gives me out of memory (OOM) for very large vectors, while the approach by #YaroslavBulatov doesn't. So, I think decomposing the operation yields a smaller memory footprint (which I thought squared_difference would handle better under the hood).
Here is a more general solution for two tensors of coordinates A and B:
def squared_dist(A, B):
assert A.shape.as_list() == B.shape.as_list()
row_norms_A = tf.reduce_sum(tf.square(A), axis=1)
row_norms_A = tf.reshape(row_norms_A, [-1, 1]) # Column vector.
row_norms_B = tf.reduce_sum(tf.square(B), axis=1)
row_norms_B = tf.reshape(row_norms_B, [1, -1]) # Row vector.
return row_norms_A - 2 * tf.matmul(A, tf.transpose(B)) + row_norms_B
Note that this is the square distance. If you want to change this to the Euclidean distance, perform a tf.sqrt on the result. If you want to do that, don't forget to add a small constant to compensate for the floating point instabilities: dist = tf.sqrt(squared_dist(A, B) + 1e-6).
If you want compute other method , then change the order of the tf modules.
def compute_euclidean_distance(x, y):
size_x = x.shape.dims[0]
size_y = y.shape.dims[0]
for i in range(size_x):
tile_one = tf.reshape(tf.tile(x[i], [size_y]), [size_y, -1])
eu_one = tf.expand_dims(tf.sqrt(tf.reduce_sum(tf.pow(tf.subtract(tile_one, y), 2), axis=1)), axis=0)
if i == 0:
d = eu_one
else:
d = tf.concat([d, eu_one], axis=0)
return d

Categories

Resources