Compute multiplication (element-wise) over multiple batches of images using broadcasting - python

I am currently working on the mnist dataset to create a CNN.
My input is
X: Array of shape (batch_size, n_channels, image_height, image_width)
F: The filter to apply. Array of shape (n_channels, filter_height, filter_width)
I am able to compute the element-wise multiplication on a single filter as below:
index : tuple pointing to the top-left corner of where Kernel to be placed
f_shape = np.shape(F)
np.multiply(X[:, :, index[0]:index[0] + f_shape[1], index[1]:index[1] + f_shape[2]], F)
But now, I want to compute the element-wise multiplication over multiple filters.
So my input will be:
X: Array of shape (batch_size, n_channels, image_height, image_width)
F: The filter to apply. Array of shape (n_filters, n_channels, filter_height, filter_width)
I am not able figure out an efficient numpy operation using broadcasting to solve this.

You want skimage.util.view_as_windows for both cases. In addition, np.multiply does not do dot products, does. Or in this case (when tracking many dimensions) np.einsum
from skimage.util import view_as_windows
x_window = view_as_windows(X, (n_channels, filter_height, filter_width)).squeeze()
single_filter_mult = np.einsum('ijkmnp, mnp -> ijkmnp', x_window, F)
single_filter_dot = np.einsum('ijkmnp, mpq -> ijklmq', x_window, F)
multi_filter_mult = np.einsum('ijkmnp, lmnp -> ijklmnp', x_window, F_multi)
multi_filter_dot = np.einsum('ijkmnp, lmpq -> ijklmnq', x_window, F_multi)
now *_filter_*[index] will give the expected output.


Softmax and its derivative along an axis

I'm trying to implement a Softmax activation that can be applied to arrays of any dimension and softmax can be obtained along a specified axis.
Let's suppose I've an array [[1,2],[3,4]], then if I need the softmax along the rows, I extract each row and apply softmax individually on it through np.apply_along_axis with axis=1. So for the example given above applying softmax to each of [1,2] and [3,4] we get the output as softmax = [[0.26894142, 0.73105858], [0.26894142, 0.73105858]]. So far so good.
Now for the backward pass, let's suppose, I'll have the gradient from the upper layer as upper_grad = [[1,1],[1,1]], so I compute the Jacobian jacobian = [[0.19661193, -0.19661193],[-0.19661193, 0.19661193]] of shape (2,2) for each of the 1D arrays of shape (2,) in softmax and then it with the corresponding 1D array in upper_grad of shape (2,), so the result of dot product will be an array of shape (2,), the final derivative will be grads = [[0. 0.],[0. 0.]]
I definitely know I'm going wrong somewhere, because while doing gradient checking, I get ~0.90, which is absolutely bonkers. Could someone please help with what is wrong in my approach and how I can resolve it?
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
counter = 0
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
def backward(arr_1d, upper_grad_slices, counter):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads =, upper_grad_slices[counter]) # grads is 1d array because (2,2) dot (2,)
counter+=1 # increment the counter to access the next slice of upper_grad_slices
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices, counter)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)
apply_along_axis documentation -
I'm so dumb. I was using the counter to get the next slice of upper_grad, but the counter was only getting updated locally, so this caused me to get the same slice of upper_grad each time, thus giving invalid gradient. Resolved it using pop method on upper_grad_slices
Updated code
import numpy as np
def softmax(arr, axis):
# implementation of softmax for a 1d array
def calc_softmax(arr_1d):
exponentiated = np.exp(arr_1d-np.max(arr_1d))
sum_val = np.sum(exponentiated)
return exponentiated/sum_val
# split the given array of multiple dims into 1d arrays along axis and
# apply calc_softmax to each of those 1d arrays
result = np.apply_along_axis(calc_softmax, axis, arr)
return result
def softmax_backward(arr, axis, upper_grad):
result = softmax(arr, axis)
upper_grad_slices = []
def get_ug_slices(arr_1d, upper_grad_slices):
def backward(arr_1d, upper_grad_slices):
local_grad = -np.broadcast_to(arr_1d, (arr_1d.size, arr_1d.size)) # local_grad is the jacobian
np.fill_diagonal(local_grad, 1+np.diagonal(local_grad))
local_grad*=arr_1d.reshape(arr_1d.size, 1)
grads =, upper_grad_slices.pop(0)) # grads is 1d array because (2,2) dot (2,)
return grads
# since apply_along_axis doesnt give the index of the 1d array,
# we take the slices of 1d array of upper_grad and store it in a list
np.apply_along_axis(get_ug_slices, axis, upper_grad, upper_grad_slices)
# Iterate over each 1d array in result along axis and calculate its local_grad(jacobian)
# and it with the corresponding upper_grad slice
grads = np.apply_along_axis(backward, axis, result, upper_grad_slices)
return grads
a = np.array([[1,2],[3,4]])
result = softmax(a, 1)
upper_grad = np.array([[1,1],[1,1]])
grads = softmax_backward(a, 1, upper_grad)

Vectorised pairwise distance

TLDR: given two tensors t1 and t2 that represent b samples of a tensor with shape c,h,w (i.e, every tensor has shape b,c,h,w), i'm trying to calculate the pairwise distance between t1[i] and t2[j] for all i,j efficiently
some more context - I've extracted ResNet18 activations for both my train and test data (CIFAR10) and I'm trying to implement k-nearest-neighbours. A possible pseudo-code might be:
for te in test_activations:
distances = []
for tr in train_activations:
neighbors = k_smallest_elements(distances)
prediction(te) = majority_vote(labels(neighbors))
I'm trying to vectorise this process given batches from the test and train activations datasets. I've tried iterating the batches (and not the samples) and using torch.cdist(train_batch,test_batch), but I'm not quite sure how this function handles multi-dimensional tensors, as in the documentation it states
torch.cdist(x1, x2,...):
If x1 has shape BxPxM and x2 has shape BxRxM then the output will have shape BxPxR
Which doesn't seem to handle my case (see below)
A minimal example can be found here:
b,c,h,w = 1000,128,28,28 # actual dimensions in my problem
train_batch = torch.randn(b,c,h,w)
test_batch = torch.randn(b,c,h,w)
d = torch.cdist(train_batch,test_batch)
You can think of test_batch and train_batch as the tensors in the for loop for test_batch in train: for train_batch in test:...
EDIT: im adding another example:
both t1[i] and t2[j] are tensors shaped (c,h,w), and the distance between them is a scalar d. so for example, if we have
t1 = torch.randn(2,128,28,28)
t2 = torch.randn(2,128,28,28)
the distance matrix would look something like
[[d(t1[0],t2[0]), d(t1[0],t2[1])],
[d(t1[1],t2[0]), d(t1[1],t2[1])]]
and have a shape (2,2) (or (b,b) more generally)
where d is the scalar distance between the two tensors t1[i] and t2[j].
It is common to have to reshape your data before feeding it to a builtin PyTorch operator. As you've said torch.cdist works with two inputs shaped (B, P, M) and (B, R, M) and returns a tensor shaped (B, P, R).
Instead, you have two tensors shaped the same way: (b, c, h, w). If we match those dimensions we have: B=b, M=c, while P=h*w (from the 1st tensor) and R=h*w (from the 2nd tensor). This requires flattening the spatial dimensions together and swapping the last two axes. Something like:
>>> x1 = train_batch.flatten(2).transpose(1,2)
>>> x2 = test_batch.flatten(2).transpose(1,2)
>>> d = torch.cdist(x1, x2)
Now d contains distance between all possible pairs (train_batch[b, :, iy, ix], test_batch[b, :, jy, jx]) and is shaped (b, h*w, h*w).
You can then apply a knn using argmax to retrieve the k closest neighbour from one element of the training batch to the test batch.

Fancy indexing in tensorflow

I have implemented a 3D CNN with a custom loss function (Ax' - y)^2 where x' is a flattened and cropped vector of the 3D output from the CNN, y is the ground truth and A is a linear operator that takes an x and outputs a y. So I need a way to flatten the 3D output and crop it using fancy indexing before computing the loss.
Here is what I have tried:
This is the numpy code I am trying to replicate,
def flatten_crop(img_vol, indices, vol_shape, N):
:param img_vol: shape (145, 59, 82, N)
:param indices: shape (396929,)
nVx, nVy, nVz = vol_shape
voxels = np.reshape(img_vol, (nVx * nVy * nVz, N), order='F')
voxels = voxels[indices, :]
return voxels
I tried using tf.nd_gather to perform the same action but I am unable to generalize it for an arbitrary batch size. Here is my tensorflow code for batch size of 1 (or a single 3D output):
voxels = tf.transpose(tf.reshape(tf.transpose(y_pred), (1, 145 * 59 * 82))) # to flatten and reshape using Fortran-like index order
voxels = tf.gather_nd(voxels, tf.stack([indices, tf.zeros(len(indices), dtype=tf.dtypes.int32)], axis=1)) # indexing
voxels = tf.reshape(voxels, (voxels.shape[0], 1))
Currently I have this piece of code in my custom loss function and I would like to be able to generalize to an arbitrary batch size. Also if you have an alternate suggestion to implement this (such as a custom layer instead of integrating with the loss function), I am all ears!
Thank you.
Try this code:
import tensorflow as tf
y_pred = tf.random.uniform((10, 145, 59, 82))
indices = tf.random.uniform((396929,), 0, 145*59*82, dtype=tf.int32)
voxels = tf.reshape(y_pred, (-1, 145 * 59 * 82)) # to flatten and reshape using Fortran-like index order
voxels = tf.gather(voxels, indices, axis=-1)
voxels = tf.transpose(voxels)

how do I change the shape of a tensor with shape 64x4x4x3 to another shape as the same as input to the network?

I am trying to implement jpeg compression as a noise layer in keras. during my implemention, I need to change the shape and I am puzzled how can I do this. so I try to explain what did I do and what do I want to do. first, in the following function, I produced all DCT coefficient of an 8x8 block that produces a filter with shape 64x8x8. each one of these 64 filters is the DCT coefficients for one pixel in final DCT transform output.
def gen_filters(size_x: int, size_y: int, dct_or_idct_fun: callable) -> np.ndarray:
tile_size_x = 8
filters = np.zeros((size_x * size_y, size_x, size_y))
for k_y in range(size_y):
for k_x in range(size_x):
for n_y in range(size_y):
for n_x in range(size_x):
filters[k_y * tile_size_x + k_x, n_y, n_x] = dct_or_idct_fun(n_y, k_y, size_y) * dct_or_idct_fun(n_x,
return filters
because we can not use assignment in keras, I have to implement the DCT transform using convolve layer using the following code.
image_conv = Kr.backend.conv2d(image_yuv_ch,filters,strides=(8,8),data_format='channels_first')
but I have some problem with the above code. if the input image_yuv_ch was 32x32x1 and the filters was 64x8x8, how would I change the shape in order to implement the convolve layer? because with these shape it produced an error. does keras consider the first number in filters 's shape as the number of filter and 8x8 as the size of the filter in the above code?
I also have another question. if the convolve layer produced the things I want, means output with shape 64x4x4, as we know each element of the 4x4 filter as a vector of length 64 is the DCT value of the 8x8 block of the input image and now I will need to reshape each 64 vector to 8x8 block and put them beside each other and make 32x32x1 from 64x4x4. but I really do not know how can I do this? do you have any suggestion for me? I look forward to hearing about your suggestion. Thank you.

numpy broadcast from first dimension

In NumPy, is there an easy way to broadcast two arrays of dimensions e.g. (x,y) and (x,y,z)? NumPy broadcasting typically matches dimensions from the last dimension, so usual broadcasting will not work (it would require the first array to have dimension (y,z)).
Background: I'm working with images, some of which are RGB (shape (h,w,3)) and some of which are grayscale (shape (h,w)). I generate alpha masks of shape (h,w), and I want to apply the mask to the image via mask * im. This doesn't work because of the above-mentioned problem, so I end up having to do e.g.
mask = mask.reshape(mask.shape + (1,) * (len(im.shape) - len(mask.shape)))
which is ugly. Other parts of the code do operations with vectors and matrices, which also run into the same issue: it fails trying to execute m + v where m has shape (x,y) and v has shape (x,). It's possible to use e.g. atleast_3d, but then I have to remember how many dimensions I actually wanted.
how about use transpose:
(a.T + c.T).T
numpy functions often have blocks of code that check dimensions, reshape arrays into compatible shapes, all before getting down to the core business of adding or multiplying. They may reshape the output to match the inputs. So there is nothing wrong with rolling your own that do similar manipulations.
Don't offhand dismiss the idea of rotating the variable 3 dimension to the start of the dimensions. Doing so takes advantage of the fact that numpy automatically adds dimensions at the start.
For element by element multiplication, einsum is quite powerful.
will handle cases where im and mask are any mix of 2 or 3 dimensions (assuming the 1st 2 are always compatible. Unfortunately this does not generalize to addition or other operations.
A while back I simulated einsum with a pure Python version. For that I used np.lib.stride_tricks.as_strided and np.nditer. Look into those functions if you want more power in mixing and matching dimensions.
as another angle: if you encounter this pattern frequently, it may be useful to create a utility function to enforce right-broadcasting:
def right_broadcasting(arr, target):
return arr.reshape(arr.shape + (1,) * (target.ndim - arr.ndim))
Although if there are only two types of input (already having 3 dims or having only 2), id say the single if statement is preferable.
Indexing with np.newaxis creates a new axis in that place. Ie
xyz = #some 3d array
xy = #some 2d array
xyz_sum = xyz + xy[:,:,np.newaxis]
xyz_sum = xyz + xy[:,:,None]
Indexing in this way creates an axis with shape 1 and stride 0 in this location.
Why not just decorate-process-undecorate:
def flipflop(func):
def wrapper(a, mask):
if len(a.shape) == 3:
mask = mask[..., None]
b = func(a, mask)
return np.squeeze(b)
return wrapper
def f(x, mask):
return x * mask
>>> N = 12
>>> gs = np.random.random((N, N))
>>> rgb = np.random.random((N, N, 3))
>>> mask = np.ones((N, N))
>>> f(gs, mask).shape
(12, 12)
>>> f(rgb, mask).shape
(12, 12, 3)
Easy, you just add a singleton dimension at the end of the smaller array. For example, if xyz_array has shape (x,y,z) and xy_array has shape (x,y), you can do
xyz_array + np.expand_dims(xy_array, xy_array.ndim)

