I have a stochastic process with Mmax trajectories. For each trajectory, I have to take the dot product of two matrices, A and B.
With a loop, it works great
A=np.zeros((2,Mmax),dtype=np.complex64)
B=np.zeros((2,2,Mmax),dtype=np.complex64)
C=np.zeros((2,Mmax),dtype=np.complex64)
for m in range(Mmax):
C[:,m]=B[:,:,m].dot(A[:,m])
(here are just 2x2 matrices to simplify it, when in reality they are much larger)
However, this loop is slow for a large number of trajectories. I want to optimize it by vectorizing it, and I have some problems when I try to implement it
B[:,:,:].dot(A[:,:])
It gives me the error 'shapes (2,2,10) and (2,10) not aligned: 10 (dim 2) != 2 (dim 0)', which makes sense. However, I would really need to vectorize this process, or at least optimize it as much as possible.
Is there any way to get this?
If speed is your concern, there is a way to have that multiplication non-vectorised and yet extremely fast - usually even significantly faster than that. It needs numba though:
import numpy as np
import numba as nb
#nb.njit
def mat_mul(A, B):
n, Mmax = A.shape
C = np.zeros((n, Mmax))
for m in range(Mmax):
for j in range(n):
for i in range(n):
C[j, m] += B[j, i, m]*A[i, m]
return C
Mmax = 100
A = np.ones((2, Mmax))
B = np.ones((2, 2, Mmax))
C = mat_mul(A, B)
Define sample arrays that aren't all zeros. We want to verify values as well as shapes.
In [82]: m = 5
In [83]: A = np.arange(2*m).reshape(2,m)
In [84]: B = np.arange(2*2*m).reshape(2,2,m)
Your iteration:
In [85]: C = np.zeros((2,m))
In [86]: for i in range(m):
...: C[:,i]=B[:,:,i].dot(A[:,i])
...:
In [87]: C
Out[87]:
array([[ 25., 37., 53., 73., 97.],
[ 75., 107., 143., 183., 227.]])
It's fairly easy to express that in einsum:
In [88]: np.einsum('ijk,jk->ik',B,A)
Out[88]:
array([[ 25, 37, 53, 73, 97],
[ 75, 107, 143, 183, 227]])
matmul/# is a variation on np.dot that handles 'batches' nicely. But the batch dimension has to be first (of 3). Your batch dimension, m, is last, so we have do some transposing to get the same result:
In [90]: (np.matmul(B.transpose(2,0,1),A.transpose(1,0)[...,None])[...,0]).T
Out[90]:
array([[ 25, 37, 53, 73, 97],
[ 75, 107, 143, 183, 227]])
I have a target array A, which represents isobaric pressure levels in NCEP reanalysis data.
I also have the pressure at which a cloud is observed as a long time series, B.
What I am looking for is a k-nearest neighbour lookup that returns the indices of those nearest neighbours, something like knnsearch in Matlab that could be represented the same in python such as: indices, distance = knnsearch(A, B, n)
where indices is the nearest n indices in A for every value in B, and distance is how far removed the value in B is from the nearest value in A, and A and B can be of different lengths (this is the bottleneck that I have found with most solutions so far, whereby I would have to loop each value in B to return my indices and distance)
import numpy as np
A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10]) # this is a fixed 17-by-1 array
B = np.array([923, 584.2, 605.3, 153.2]) # this can be any n-by-1 array
n = 2
What I would like returned from indices, distance = knnsearch(A, B, n) is this:
indices = [[1, 2],[4, 5] etc...]
where 923 in A is matched to first A[1]=925 and then A[2]=850
and 584.2 in A is matched to first A[4]=600 and then A[5]=500
distance = [[72, 77],[15.8, 84.2] etc...]
where 72 represents the distance between queried value in B to the nearest value in A e.g. distance[0, 0] == np.abs(B[0] - A[1])
The only solution I have been able to come up with is:
import numpy as np
def knnsearch(A, B, n):
indices = np.zeros((len(B), n))
distances = np.zeros((len(B), n))
for i in range(len(B)):
a = A
for N in range(n):
dif = np.abs(a - B[i])
ind = np.argmin(dif)
indices[i, N] = ind + N
distances[i, N] = dif[ind + N]
# remove this neighbour from from future consideration
np.delete(a, ind)
return indices, distances
array_A = np.array([1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10])
array_B = np.array([923, 584.2, 605.3, 153.2])
neighbours = 2
indices, distances = knnsearch(array_A, array_B, neighbours)
print(indices)
print(distances)
returns:
[[ 1. 2.]
[ 4. 5.]
[ 4. 3.]
[10. 11.]]
[[ 2. 73. ]
[ 15.8 84.2]
[ 5.3 94.7]
[ 3.2 53.2]]
There must be a way to remove the for loops, as I need the performance should my A and B arrays contain many thousands of elements with many nearest neighbours...
Please help! Thanks :)
The second loop can easily be vectorized. The most straightforward way to do it is to use np.argsort and select the indices corresponding to the n smallest dif values. However, for large arrays, as only n values should be sorted, it is better to use np.argpartition.
Therefore, the code would look like something like that:
def vector_knnsearch(A, B, n):
indices = np.empty((len(B), n))
distances = np.empty((len(B), n))
for i,b in enumerate(B):
dif = np.abs(A - b)
min_ind = np.argpartition(dif,n)[:n] # Returns the indexes of the 3 smallest
# numbers but not necessarily sorted
ind = min_ind[np.argsort(dif[min_ind])] # sort output of argpartition just in case
indices[i, :] = ind
distances[i, :] = dif[ind]
return indices, distances
As said in the comments, the first loop can also be removed using a meshgrid, however, the extra use of memory and computation time to construct the meshgrid makes this approach slower for the dimensions I tried (and this will probably get worse for large arrays and end up in Memory Error). In addition, the readability of the code decreases. Overall, this would probably do this approach less pythonic.
def mesh_knnsearch(A, B, n):
m = len(B)
rng = np.arange(m).reshape((m,1))
Amesh, Bmesh = np.meshgrid(A,B)
dif = np.abs(Amesh-Bmesh)
min_ind = np.argpartition(dif,n,axis=1)[:,:n]
ind = min_ind[rng,np.argsort(dif[rng,min_ind],axis=1)]
return ind, dif[rng,ind]
Not that it is important to define this rng as a 2d array in order to retrieve a[rng[0],ind[0]], a[rng[1],ind[1]], etc and maintain the dimensions of the array, as opposed to a[:,ind] which retrieves a[:,ind[0]], a[:,ind[1]], etc.
I tried to implement strided convolution of a 2D array using for loop i.e
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
def stride_conv(arr1,arr2,s,p):
beg = 0
end = arr2.shape[0]
final = []
for i in range(0,arr1.shape[0]-1,s):
k = []
for j in range(0,arr1.shape[0]-1,s):
k.append(np.sum(arr1[beg+i : end+i, beg+j:end+j] * (arr2)))
final.append(k)
return np.array(final)
stride_conv(arr,arr2,2,0)
This results in 3*3 array:
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Is there a numpy function or scipy function to do the same? My approach is not that good. How can I vectorize this?
Ignoring the padding argument and trailing windows that won't have enough lengths for convolution against the second array, here's one way with np.lib.stride_tricks.as_strided -
def strided4D(arr,arr2,s):
strided = np.lib.stride_tricks.as_strided
s0,s1 = arr.strides
m1,n1 = arr.shape
m2,n2 = arr2.shape
out_shp = (1+(m1-m2)//s, m2, 1+(n1-n2)//s, n2)
return strided(arr, shape=out_shp, strides=(s*s0,s*s1,s0,s1))
def stride_conv_strided(arr,arr2,s):
arr4D = strided4D(arr,arr2,s=s)
return np.tensordot(arr4D, arr2, axes=((2,3),(0,1)))
Alternatively, we can use the scikit-image built-in view_as_windows to get those windows elegantly, like so -
from skimage.util.shape import view_as_windows
def strided4D_v2(arr,arr2,s):
return view_as_windows(arr, arr2.shape, step=s)
How about using signal.convolve2d from scipy?
My approach is similar to Jason's one but using indexing.
def strideConv(arr, arr2, s):
return signal.convolve2d(arr, arr2[::-1, ::-1], mode='valid')[::s, ::s]
Note that the kernal has to be reversed. For details, please see discussion here and here. Otherwise use signal.correlate2d.
Examples:
>>> strideConv(arr, arr2, 1)
array([[ 91, 80, 100, 84, 88],
[ 99, 106, 126, 92, 77],
[ 69, 98, 91, 93, 117],
[ 80, 79, 87, 93, 61],
[ 44, 72, 72, 63, 74]])
>>> strideConv(arr, arr2, 2)
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
I think we can do a "valid" fft convolution and pick out only those results at strided locations, like this:
def strideConv(arr,arr2,s):
cc=scipy.signal.fftconvolve(arr,arr2[::-1,::-1],mode='valid')
idx=(np.arange(0,cc.shape[1],s), np.arange(0,cc.shape[0],s))
xidx,yidx=np.meshgrid(*idx)
return cc[yidx,xidx]
This gives same results as other people's answers.
But I guess this only works if the kernel size is odd numbered.
Also I've flipped the kernel in arr2[::-1,::-1] just to stay consistent with others, you may want to omit it depending on context.
UPDATE:
We currently have a few different ways of doing 2D or 3D convolution using numpy and scipy alone, and I thought about doing some comparisons to give some idea on which one is faster on data of different sizes. I hope this won't be regarded as off-topic.
Method 1: FFT convolution (using scipy.signal.fftconvolve):
def padArray(var,pad,method=1):
if method==1:
var_pad=numpy.zeros(tuple(2*pad+numpy.array(var.shape[:2]))+var.shape[2:])
var_pad[pad:-pad,pad:-pad]=var
else:
var_pad=numpy.pad(var,([pad,pad],[pad,pad])+([0,0],)*(numpy.ndim(var)-2),
mode='constant',constant_values=0)
return var_pad
def conv3D(var,kernel,stride=1,pad=0,pad_method=1):
'''3D convolution using scipy.signal.convolve.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
stride=int(stride)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,pad_method)
else:
var_pad=var
conv=fftconvolve(var_pad,kernel,mode='valid')
if stride>1:
conv=conv[::stride,::stride,...]
return conv
Method 2: Special conv (see this anwser):
def conv3D2(var,kernel,stride=1,pad=0):
'''3D convolution by sub-matrix summing.
'''
var_ndim=numpy.ndim(var)
ny,nx=var.shape[:2]
ky,kx=kernel.shape[:2]
result=0
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
for ii in range(ky*kx):
yi,xi=divmod(ii,kx)
slabii=var_pad[yi:2*pad+ny-ky+yi+1:1, xi:2*pad+nx-kx+xi+1:1,...]*kernel[yi,xi]
if var_ndim==3:
slabii=slabii.sum(axis=-1)
result+=slabii
if stride>1:
result=result[::stride,::stride,...]
return result
Method 3: Strided-view conv, as suggested by Divakar:
def asStride(arr,sub_shape,stride):
'''Get a strided sub-matrices view of an ndarray.
<arr>: ndarray of rank 2.
<sub_shape>: tuple of length 2, window size: (ny, nx).
<stride>: int, stride of windows.
Return <subs>: strided window view.
See also skimage.util.shape.view_as_windows()
'''
s0,s1=arr.strides[:2]
m1,n1=arr.shape[:2]
m2,n2=sub_shape[:2]
view_shape=(1+(m1-m2)//stride,1+(n1-n2)//stride,m2,n2)+arr.shape[2:]
strides=(stride*s0,stride*s1,s0,s1)+arr.strides[2:]
subs=numpy.lib.stride_tricks.as_strided(arr,view_shape,strides=strides)
return subs
def conv3D3(var,kernel,stride=1,pad=0):
'''3D convolution by strided view.
'''
var_ndim=numpy.ndim(var)
kernel_ndim=numpy.ndim(kernel)
if var_ndim<2 or var_ndim>3 or kernel_ndim<2 or kernel_ndim>3:
raise Exception("<var> and <kernel> dimension should be in 2 or 3.")
if var_ndim==2 and kernel_ndim==3:
raise Exception("<kernel> dimension > <var>.")
if var_ndim==3 and kernel_ndim==2:
kernel=numpy.repeat(kernel[:,:,None],var.shape[2],axis=2)
if pad>0:
var_pad=padArray(var,pad,1)
else:
var_pad=var
view=asStride(var_pad,kernel.shape,stride)
#return numpy.tensordot(aa,kernel,axes=((2,3),(0,1)))
if numpy.ndim(kernel)==2:
conv=numpy.sum(view*kernel,axis=(2,3))
else:
conv=numpy.sum(view*kernel,axis=(2,3,4))
return conv
I did 3 sets of comparisons:
convolution on 2D data, with different input size and different kernel size, stride=1, pad=0. Results below (color as time used for convolution repeated for 10 times):
So "FFT conv" is in general the fastest. "Special conv" and "Stride-view conv" get slow as kernel size increases, but decreases again as it approaches the size of input data. The last subplot shows the fastest method, so the big triangle of purple indicates FFT being the winner, but note there is a thin green column on the left side (probably too small to see, but it's there), suggesting that "Special conv" has advantage for very small kernels (smaller than about 5x5). And when kernel size approaches input, "stride-view conv" is fastest (see the diagonal line).
Comparison 2: convolution on 3D data.
Setup: pad=0, stride=2, input dimension=nxnx5, kernel shape=fxfx5.
I skipped computations of "Special Conv" and "Stride-view conv" when kernel size is in the mid of input. Basically "Special Conv" shows no advantage now, and "Stride-view" is faster than FFT for both small and large kernels.
One additional note: when sizes goes above 350, I notice considerable memory usage peaks for the "Stride-view conv".
Comparison 3: convolution on 3D data with larger stride.
Setup: pad=0, stride=5, input dimension=nxnx10, kernel shape=fxfx10.
This time I omitted the "Special Conv". For a larger area "Stride-view conv" surpasses FFT, and last subplots shows that the difference approaches 100 %.
Probably because as the stride goes up, the FFT approach will have more wasted numbers so the "stride-view" gains more advantages for small and large kernels.
Here is an O(N^d (log N)^d) fft-based approach. The idea is to chop up both operands into strides-spaced grids at all offsets modulo strides, do the conventional fft convolution between grids of corresponding offsets and then pointwise sum the results. It is a bit index-heavy but I'm afraid that can't be helped:
import numpy as np
from numpy.fft import fftn, ifftn
def strided_conv_2d(x, y, strides):
s, t = strides
# consensus dtype
cdt = (x[0, 0, ...] + y[0, 0, ...]).dtype
xi, xj = x.shape
yi, yj = y.shape
# round up modulo strides
xk, xl, yk, yl = map(lambda a, b: -a//b * -b, (xi,xj,yi,yj), (s,t,s,t))
# zero pad to avoid circular convolution
xp, yp = (np.zeros((xk+yk, xl+yl), dtype=cdt) for i in range(2))
xp[:xi, :xj] = x
yp[:yi, :yj] = y
# fold out strides
xp = xp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
yp = yp.reshape((xk+yk)//s, s, (xl+yl)//t, t)
# do conventional fft convolution
xf = fftn(xp, axes=(0, 2))
yf = fftn(yp, axes=(0, 2))
result = ifftn(xf * yf.conj(), axes=(0, 2)).sum(axis=(1, 3))
# restore dtype
if cdt in (int, np.int_, np.int64, np.int32):
result = result.real.round()
return result.astype(cdt)
arr = np.array([[2,3,7,4,6,2,9],
[6,6,9,8,7,4,3],
[3,4,8,3,8,9,7],
[7,8,3,6,6,3,4],
[4,2,1,8,3,4,6],
[3,2,4,1,9,8,3],
[0,1,3,9,2,1,4]])
arr2 = np.array([[3,4,4],
[1,0,2],
[-1,0,3]])
print(strided_conv_2d(arr, arr2, (2, 2)))
Result:
[[ 91 100 88 23 0 29]
[ 69 91 117 19 0 38]
[ 44 72 74 17 0 22]
[ 16 53 26 12 0 0]
[ 0 0 0 0 0 0]
[ 19 11 21 -9 0 6]]
As far as I know, there is no direct implementation of convolution filter in numpy or scipy that supports stride and padding so I think it's better to use a DL package such as torch or tensorflow, then cast the final result to numpy. a torch implementation might be:
import torch
import torch.nn.functional as F
arr = torch.tensor(np.expand_dims(arr, axis=(0,1))
arr2 = torch.tensor(np.expand_dims(arr2, axis=(0,1))
output = F.conv2d(arr, arr2, stride=2, padding=0)
output = output.numpy().squeeze()
output>
array([[ 91, 100, 88],
[ 69, 91, 117],
[ 44, 72, 74]])
Convolution which supports strides and dilation. numpy.lib.stride_tricks.as_strided is used.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def conv_view(X, F_s, dr, std):
X_s = np.array(X.shape)
F_s = np.array(F_s)
dr = np.array(dr)
Fd_s = (F_s - 1) * dr + 1
if np.any(Fd_s > X_s):
raise ValueError('(Dilated) filter size must be smaller than X')
std = np.array(std)
X_ss = np.array(X.strides)
Xn_s = (X_s - Fd_s) // std + 1
Xv_s = np.append(Xn_s, F_s)
Xv_ss = np.tile(X_ss, 2) * np.append(std, dr)
return as_strided(X, Xv_s, Xv_ss, writeable=False)
def convolve_stride(X, F, dr=None, std=None):
if dr is None:
dr = np.ones(X.ndim, dtype=int)
if std is None:
std = np.ones(X.ndim, dtype=int)
if not (X.ndim == F.ndim == len(dr) == len(std)):
raise ValueError('X.ndim, F.ndim, len(dr), len(std) must be the same')
Xv = conv_view(X, F.shape, dr, std)
return np.tensordot(Xv, F, axes=X.ndim)