Black voodoo of NumPy Einsum - python

I got some working code using einsum function. But as einsum is currently still like black voodoo for me. I was wondering, what this code actually is doing and if it can be somehow optimized using np.dot
My data looks likes this
n, p, q = 40000, 8, 4
a = np.random.rand(n, p, q)
b = np.random.rand(n, p)
And my existing functions einsum functions looks like this
f1 = np.einsum("ijx,ijy->ixy", a, a)
f2 = np.einsum("ijx,ij->ix", a, b)
But what does it really do? I get till here: each dimension (axis) is represented by a label, i is equal to the first axis n, j for the 2nd axis p and x and y are different labels for the same axis q.
So the order of the output array of f1 is ixy and thus the output shape is 40000,4,4 (n,q,q)
But that's as far as I get. And

Lets play around with a couple of small arrays
In [110]: a=np.arange(2*3*4).reshape(2,3,4)
In [111]: b=np.arange(2*3).reshape(2,3)
In [112]: np.einsum('ijx,ij->ix',a,b)
Out[112]:
array([[ 20, 23, 26, 29],
[200, 212, 224, 236]])
In [113]: np.diagonal(np.dot(b,a)).T
Out[113]:
array([[ 20, 23, 26, 29],
[200, 212, 224, 236]])
np.dot operates on the last dim of the 1st array, and 2nd to the last of the 2nd. So I have to switch the arguments so the 3 dimension lines up. dot(b,a) produces a (2,2,4) array. diagonal selects 2 of those 'rows', and transpose to clean up. Another einsum expresses that cleanup nicely:
In [122]: np.einsum('iik->ik',np.dot(b,a))
Since np.dot is producing a larger array than the original einsum, it is unlikely to be faster, even if the underlying C code is tighter.
(Curiously I'm having trouble replicating np.dot(b,a) with einsum; it won't generate that (2,2,...) array).
For the a,a case we have to do something similar - roll the axes of one array so the last dimension lines up with the 2nd to last of the other, do the dot, and then cleanup with diagonal and transpose:
In [157]: np.einsum('ijx,ijy->ixy',a,a).shape
Out[157]: (2, 4, 4)
In [158]: np.einsum('ijjx->jix',np.dot(np.rollaxis(a,2),a))
In [176]: np.diagonal(np.dot(np.rollaxis(a,2),a),0,2).T
tensordot is another way of taking a dot over selected axes.
np.tensordot(a,a,(1,1))
np.diagonal(np.rollaxis(np.tensordot(a,a,(1,1)),1),0,2).T # with cleanup

Related

3d array to matrix multiplication

I have a matrix called vec with two columns, vec[:,0] and vec[:,1]. P contains two matrices, P[0,:,:] and P[1,:,:]. I want to mulitiply P[0,:,:] with the first column of vec and multiply P[1,:,:] with the second column of vec. However, the operation P#vec also gives me the matrix product of P[0,:,:] with the second column of vec and the matrix product of P[1,:,:] with the first column of vec, which slows my code.
Is it possible to directly compute the pairs column 1 to matrix 1 and column 2 to matrix 2 without the "off" products?
import numpy as np
P=np.arange(50).reshape(2, 5, 5)
vec=np.arange(10).reshape(5,2)
have=P#vec
want=np.column_stack((have[0,:,0],have[1,:,1]))
have,want
There is a very powerful function in numpy called np.einsum. It can perform all kind of tensor contractions, axis reordering and matrix multiplication. For your example you could use
res = np.einsum('nij,jn->in', P, vec)
after which res is exactly like want.
How does this work:
You give the np.einsum function both your arrays as well as a signature (that 'nij,jn->in' string) that tells the function how to multiply the arrays. In short, you want the third axis of the P tensor to be contracted with the first axis of vec. Therefore you choose the same index j in the signature string and leave it out in the part after the ->. A mere broadcast is done if indices appear on the left and right hand side of the ->, which is done here for the n and i indices.
A more complete explanation of this very powerful function with many examples of how to use it can be found at the corresponding numpy documentation.
#/matmul handles batches nicely, but the rules are that for 3d arrays, the first dimension is the batch, and dot is done on the last 2 dimensions, with the usual "last of A with the second to the last of B" pairing.
It took a bit of reading to decipher you description but it appears that you want the first of p to the batch, and last of vec to be the batch. That means vec needs to transformed to a (2,5,1) to work with the (2,5,5) p.
In [176]: P#vec.T[:,:,None]
Out[176]:
array([[[ 60],
[ 160],
[ 260],
[ 360],
[ 460]],
[[ 695],
[ 820],
[ 945],
[1070],
[1195]]])
The result is (2,5,1). We can squeeze out the the last to get (2,5), but apparently you want a (5,2)
In [179]: (P#vec.T[:,:,None])[...,0].T
Out[179]:
array([[ 60, 695],
[ 160, 820],
[ 260, 945],
[ 360, 1070],
[ 460, 1195]])
np.einsum('nij,jn->in', P, vec) does effectively the same, with the n as the batch dimension that is 'carried through' to the result, and sum-of-products on the shared j dimension.

Is there any way to vectorize a rolling cross-correlation in python based on my example?

Let's suppose I have two arrays that represent pixels in pictures.
I want to build an array of tensordot products of pixels of a smaller picture with a bigger picture as it "scans" the latter. By "scanning" I mean iteration over rows and columns while creating overlays with the original picture.
For instance, a 2x2 picture can be overlaid on top of 3x3 in four different ways, so I want to produce a four-element array that contains tensordot products of matching pixels.
Tensordot is calculated by multiplying a[i,j] with b[i,j] element-wise and summing the terms.
Please examine this code:
import numpy as np
a = np.array([[0,1,2],
[3,4,5],
[6,7,8]])
b = np.array([[0,1],
[2,3]])
shape_diff = (a.shape[0] - b.shape[0] + 1,
a.shape[1] - b.shape[1] + 1)
def compute_pixel(x,y):
sub_matrix = a[x : x + b.shape[0],
y : y + b.shape[1]]
return np.tensordot(sub_matrix, b, axes=2)
def process():
arr = np.zeros(shape_diff)
for i in range(shape_diff[0]):
for j in range(shape_diff[1]):
arr[i,j]=compute_pixel(i,j)
return arr
print(process())
Computing a single pixel is very easy, all I need is the starting location coordinates within a. From there I match the size of the b and do a tensordot product.
However, because I need to do this all over again for each x and y location as I'm iterating over rows and columns I've had to use a loop, which is of course suboptimal.
In the next piece of code I have tried to utilize a handy feature of tensordot, which also accepts tensors as arguments. In order words I can feed an array of arrays for different combinations of a, while keeping the b the same.
Although in order to create an array of said combination, I couldn't think of anything better than using another loop, which kind of sounds silly in this case.
def try_vector():
tensor = np.zeros(shape_diff + b.shape)
for i in range(shape_diff[0]):
for j in range(shape_diff[1]):
tensor[i,j]=a[i: i + b.shape[0],
j: j + b.shape[1]]
return np.tensordot(tensor, b, axes=2)
print(try_vector())
Note: tensor size is the sum of two tuples, which in this case gives (2, 2, 2, 2)
Yet regardless, even if I produced such array, it would be prohibitively large in size to be of any practical use. For doing this for a 1000x1000 picture, could probably consume all the available memory.
So, is there any other ways to avoid loops in this problem?
In [111]: process()
Out[111]:
array([[19., 25.],
[37., 43.]])
tensordot with 2 is the same as element multiply and sum:
In [116]: np.tensordot(a[0:2,0:2],b, axes=2)
Out[116]: array(19)
In [126]: (a[0:2,0:2]*b).sum()
Out[126]: 19
A lower-memory way of generating your tensor is:
In [121]: np.lib.stride_tricks.sliding_window_view(a,(2,2))
Out[121]:
array([[[[0, 1],
[3, 4]],
[[1, 2],
[4, 5]]],
[[[3, 4],
[6, 7]],
[[4, 5],
[7, 8]]]])
We can do a broadcasted multiply, and sum on the last 2 axes:
In [129]: (Out[121]*b).sum((2,3))
Out[129]:
array([[19, 25],
[37, 43]])

Multiply and sum numpy arrays with shapes (x,y,z) and (x,)

So I have a 3D data-set (x,y,z), and i want to sum over one of the axes (x) with a set of weights, w = w(x). The start and end index i am summing over is different for every (y,z), I have solved this by masking the 3D-array. The weights are constant with regard to the two variables i am not summing over. Both answers regarding implementation and mathematics are appreciated (is there a smart linalg. way of doing this?).
I have a 3D masked array (A) of shape (x,y,z) and a 1D array (t) of shape (x,). Is there a good way to multiply every (y,z) element in A with the corresponding number in t without expanding t to a 3D array? My current solution is using np.tensordot to make a 3D array of the same shape as A, that holds all the t-values, but it feels very unsatisfactory to spend runtime building the "new_t" array, which is essensially just y*z copies of t.
Example of current solution:
a1 = np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
a2 = np.array([[0,1,2,3],
[4,5,6,7],
[8,9,10,11]])
#note: A is a masked array, mask is a 3D array of bools
A = np.ma.masked_array([a1,a2],mask)
t = np.array([10,11])
new_t = np.tensordot(t, np.ones(A[0].shape), axes = 0)
return np.sum(A*new_t, axis=0)
In essence i want to perform t*A[:,i,j] for all i,j with the shortest possible runtime, preferably without using too many other libraries than numpy and scipy.
Another way of producing desired output (again, with far too high run time):
B = [[t*A[:,i,j] for j in range(A.shape[2])] for i in range(A.shape[1])]
return np.sum(B,axis=2)
inspired by #phipsgabler comment
arr1 = np.tensordot(A.T,t,axes=1).T
arr1
array([[ 10, 31, 52, 73],
[ 94, 115, 136, 157],
[178, 199, 220, 241]])
Thanks for good answers! Using tensordot like #alyhosny proposed worked, but replacing masked values with zeros using
A = np.ma.MaskedArray.filled(A,0)
before summing with einsum (thanks #phipsgabler) gave half the run time. Final code:
A = np.ma.MaskedArray(A,mask)
A = np.ma.MaskedArray.filled(A,0)
return np.einsum('ijk,i->jk',A,t)

What are the efficient ways to loop over vectors along a specified axis in numpy ndarray?

I'm processing data by looping over vectors along an axis (could be any axis) of numpy ndarray (could be of any dimensions).
I didn't work on array directly because the data are not perfect. It requires quality control on each vector. If not good, the vector will be filled by zeros (or nan) and not have a real processing.
I found this Q similar but my problem is much more difficult because
ndim is arbitrary.
For a 3D array, I can take vectors along axis 1 like this
x = np.arange(24).reshape(2,3,4)
for i in range(x.shape[0]):
for k in range(x.shape[2]):
process(x[i,:,k])
but if ndim and the taken axis are not fixed, how to take vectors?
The axis for taking vectors is arbitrary.
One possible way I'm considering is
y = x.swapaxes(ax,-1)
# loop over vectors along last axis
for i in np.ndindex(y.shape[:-1]):
process(y[i+(slice(None),)])
# then swap back
z = y.swapaxes(ax,-1)
But I'm doubting the efficiency of this method.
The best way to test efficiency is to do time tests on realistic examples. But %timeit (ipython) tests on toy examples are a start.
Based on experience from answering similar 'if you must iterate' questions, there isn't much difference in times. np.frompyfunc has a modest speed edge - but its pyfunc takes scalars, not arrays or slices. (np.vectorize is a nicer API to this function, and a bit slower).
But here you want to pass a 1d slice of an array to your function, while iterating over all the other dimensions. I don't think there's much difference in the alternative iteration methods.
Actions like swapaxis, transpose and ravel are fast, often just creating a new view with different shape and strides.
np.ndindex uses np.nditer (with the multindex flat) to iterate over a range of dimensions. nditer is fast when used in C code, but isn't anything special when used in Python code.
np.apply_along_axis creates a (i,j,:,k) indexing tuple, and steps the variables. It's a nice general approach, but isn't doing anything special to speed things up. itertools.product is another way of generating the indices.
But usually it isn't the iteration mechanism that slows things down, it's the repeated call to your function. You can test the iteration mechanism by using a trivial function, e.g.
def foo(x):
return x
===================
You don't need to swapaxes to use ndindex; you can use it to iterate on any combination of axes.
For example, make a 3d array, and sum along the middle dimension:
In [495]: x=np.arange(2*3*4).reshape(2,3,4)
In [496]: N=np.ndindex(2,4)
In [497]: [x[i,:,k].sum() for i,k in N]
Out[497]: [12, 15, 18, 21, 48, 51, 54, 57]
In [498]: x.sum(1)
Out[498]:
array([[12, 15, 18, 21],
[48, 51, 54, 57]])
I don't think it makes a difference in speed; the code's just simpler.
===================
Another possible tool is np.ma, masked arrays. With those you mark individual elements as masked (because they are nan or 0). It has code that evaluates things like sum, mean, product in such a way that the masked values don't harm the solution.
The 3d array again:
In [517]: x=np.arange(2*3*4).reshape(2,3,4)
add in some bad values:
In [518]: x[1,1,2]=99
In [519]: x[0,0,:]=99
those values mess up the normal sum:
In [520]: x.sum(axis=1)
Out[520]:
array([[111, 113, 115, 117],
[ 48, 51, 135, 57]])
but if we mask them, they are 'filtered out' of the solution (in this case, they are set temporarily to 0)
In [521]: xm=np.ma.masked_greater(x,50)
In [522]: xm
Out[522]:
masked_array(data =
[[[-- -- -- --]
[4 5 6 7]
[8 9 10 11]]
[[12 13 14 15]
[16 17 -- 19]
[20 21 22 23]]],
mask =
[[[ True True True True]
...
[False False False False]]],
fill_value = 999999)
In [523]: xm.sum(1)
Out[523]:
masked_array(data =
[[12 14 16 18]
[48 51 36 57]],
...)
Have you considered numpy.nditer?
See also Iterating over arrays.
EDIT: maybe another solution would just be to either use:
flatten
ravel
the flat 1D iterator
You can thus iterate 1D-like whatever the array's initial dim, and then reshape the array to its original shape.

Numpy matrix multiplication of 2d matrix to give 3d matrix

I have two numpy arrays, like
A: = array([[0, 1],
[2, 3],
[4, 5]])
B = array([[ 6, 7],
[ 8, 9],
[10, 11]])
For each row of A and B, say Ra and Rb respectively, I want to calculate transpose(Ra)*Rb. So for given value of A and B, i want following answer:
array([[[ 0, 0],
[ 6, 7]],
[[ 16, 18],
[ 24, 27]],
[[ 40, 44],
[ 50, 55]]])
I have written the following code to do so:
x = np.outer(np.transpose(A[0]), B[0])
for i in range(1,len(A)):
x = np.append(x,np.outer(np.transpose(A[i]), B[i]),axis=0)
Is there any better way to do this task.
You can use extend dimensions of A and B with np.newaxis/None to bring in broadcasting for a vectorized solution like so -
A[...,None]*B[:,None,:]
Explanation : np.outer(np.transpose(A[i]), B[i]) basically does elementwise multiplications between a columnar version of A[i] and B[i]. You are repeating this for all rows in A against corresoinding rows in B. Please note that the np.transpose() doesn't seem to make any impact as np.outer takes care of the intended elementwise multiplications.
I would describe these steps in a vectorized language and thus implement, like so -
Extend dimensions of A and B to form 3D shapes for both of them such that we keep axis=0 aligned and keep as axis=0 in both of those extended versions too. Thus, we are left with deciding the last two axes.
To bring in the elementwise multiplications, push axis=1 of A in its original 2D version to axis=1 in its 3D version, thus creating a singleton dimension at axis=2 for extended version of A.
This last singleton dimension of 3D version of A has to align with the elements from axis=1 in original 2D version of B to let broadcasting happen. Thus, extended version of B would have the elements from axis=1 in its 2D version being pushed to axis=2 in its 3D version, thereby creating a singleton dimension for axis=1.
Finally, the extended versions would be : A[...,None] & B[:,None,:], multiplying whom would give us the desired output.

Categories

Resources