How does `numpy.einsum` work?

How does `numpy.einsum` work? - python

The correct way of writing a summation in terms of Einstein summation is a puzzle to me, so I want to try it in my code. I have succeeded in a few cases but mostly with trial and error.
Now there is a case that I cannot figure out. First, a basic question. For two matrices A and B that are Nx1 and 1xN, respectively, AB is NxN but BA is 1x1. When I want to calculate the NxN case with np.einsum I can do:
import numpy as np
a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ji->ij', a, b)
and the final array is 2x2. However
a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ij->ij', a, b)
returns a 1x2 array. I don't quite understand why this does not give the correct result.
For example for the above case numpy's guide says that arrows can be used to force summation or stop it from taking place. But that seems quite vague to me; in the above case I don't understand how numpy decides about the final size of the output array based on the order of indices (which apparently changes).
Formally I know the following: When there is nothing on the right side of the arrow, one can write the summation mathematically as $\sum\limits_{i=0}^{N}\sum\limits_{j=0}^{M} A_{ij}B_{ij}$
for np.einsum('ij,ij',A,B), but when there is an arrow I am clueless how to interpret it in terms of a formal mathematical expression.

In [22]: a
Out[22]: array([[1, 2]])
In [23]: b
Out[23]: array([[2, 3]])
In [24]: np.einsum('ij,ij->ij',a,b)
Out[24]: array([[2, 6]])
In [29]: a*b
Out[29]: array([[2, 6]])
Here the repetition of the indices in all parts, including output, is interpreted as element by element multiplication. Nothing is summed. a[i,j]*b[i,j] = c[i,j] for all i,j.
In [25]: np.einsum('ij,ji->ij',a,b)
Out[25]:
array([[2, 4],
[3, 6]])
In [28]: np.dot(a.T,b).T
Out[28]:
array([[2, 4],
[3, 6]])
In [38]: np.outer(a,b)
Out[38]:
array([[2, 3],
[4, 6]])
Again no summation because the same indices appear on left and right sides. a[i,j]*b[j,i] = c[i,j], in other words:
[[1*2, 2*2],
[1*3, 2*3]]
In effect an outer product. A look at how a is broadcasted against b.T might help:
In [69]: np.broadcast_arrays(a,b.T)
Out[69]:
[array([[1, 2],
[1, 2]]),
array([[2, 2],
[3, 3]])]
On the left side of the statement, repeated indices indicate which dimensions are multiplied. Matching left and right sides determines whether they are summed or not.
np.einsum('ij,ji->j',a,b) # array([ 5, 10]) sum on i only
np.einsum('ij,ji->i',a,b) # array([ 5, 10]) sum on j only
np.einsum('ij,ji',a,b) # 15 sum on i and j
A while back I worked out a pure Python equivalent to einsum, with most of focus on how it parsed the string. The goal is the create an nditer with which it does a sum of products calculation. But it's not a trivial script to follow, even in Python:
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py
A simpler sequence showing these summation rules:
In [53]: c=np.array([[1,2],[3,4]])
In [55]: np.einsum('ij',c)
Out[55]:
array([[1, 2],
[3, 4]])
In [56]: np.einsum('ij->i',c)
Out[56]: array([3, 7])
In [57]: np.einsum('ij->j',c)
Out[57]: array([4, 6])
In [58]: np.einsum('ij->',c)
Out[58]: 10
Using arrays that don't have a 1 dimension removes the broadcasting complication:
In [71]: b2=np.arange(1,7).reshape(2,3)
In [72]: np.einsum('ij,ji',a2,b2)
...
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,3)->(2,3) (2,3)->(3,2)
Or should I say, it exposes the attempted broadcasting.
Ellipsis adds a level of complexity to the einsum interpretation. I developed the above mentioned github code when I solved a bug in the uses of .... But I didn't put much effort into refining the documentation.
Ellipsis broadcasting in numpy.einsum
The ellipses are most useful when you want an expression that can handle various sizes of arrays. If your arrays always 2D, it doesn't do anything extra.
By way of example, consider a generalization of the dot, one that multiplies the last dimension of A with the 2nd to the last of B. With ellipsis we can write an expression that can handle a mix of 2d, 3D and larger arrays:
np.einsum('...ij,...jk',np.ones((2,3)),np.ones((3,4))) # (2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((5,3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((7,5,3,4))) # (7,5,2,4)
np.einsum('...ij,...jk->...ik',np.ones((5,2,3)),np.ones((7,5,3,4)) # (7, 5, 2, 4)
The last expression uses the default right hand side indexing ...ik, ellipsis plus the non-summing indices.
Your original example could be written as
np.einsum('...j,j...->...j',a,b)
Effectively it fills in the i (or more dimensions) to match the dimensions of the arrays.
which would also work if a or b was 1d:
np.einsum('...j,j...->...j',a,b[0,:])
np.dot way of generalizing to larger dimensions is different
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
is expressed in einsum as:
np.einsum('ijo,kom->ijkm',np.ones((2,3,4)),np.ones((3,4,2)))
which can be generalized with
np.einsum('...o,kom->...km',np.ones((4,)),np.ones((3,4,2)))
or
np.einsum('ijo,...om->ij...m',np.ones((2,3,4)),np.ones((3,4,2)))
But I don't think I can completely replicate it in einsum. That is, I can't tell it to fill in indices for A, followed by different ones for B.

Related

get a vector from a matrix and a vactor of index in numpy

I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?

In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?

Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.

As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]

What's the difference when indexing a numpy array between using an integer and a numpy scalar?

I didn't expect them to be different, until it just cost me 2 hours to find a bug. Here is an example showing the difference I noticed, but I couldn't make sense of it.
>>> a = np.array([[1, 2], [3, 4]])
>>> a[0][0]
1
>>> a[np.array(0)][np.array(0)]
1
>>> a[0][0] = 5
>>> a
array([[5, 2],
[3, 4]])
>>> a[np.array(0)][np.array(0)] = 6
>>> a
array([[5, 2],
[3, 4]])
It looks like using numpy scalar as index the element can't be changed. Is a copy of the original array element instead of the reference being returned?
However, with tuple indexing, the problem is gone.
>>> a[np.array(0), np.array(0)] = 6
>>> a
array([[6, 2],
[3, 4]])
What's happening here? I understand sementically chain bracket indexing and tuple indexing are different, but in principle shouldn't they both access the same element regardless?
Out of curiosity, I tried it with one dimensional array. The result is different.
>>> a = np.array([1, 2])
>>> a[np.array(0)] = 3
>>> a
array([3, 2])
This time the element has been modified.
The lesson I learned is that I should use tuple index for numpy arrays as much as possible just to be safe. But I would really like an explanation for these inconsistent effects. Thanks!

Looking at the databuffer location:
In [45]: a.__array_interface__['data']
Out[45]: (44666160, False)
In [46]: a[0].__array_interface__['data']
Out[46]: (44666160, False)
Same location for the a[0] case. Modifying a[0] will modify a.
But with the array index, the data buffer is different - this a copy. Modifying this copy will not affect a.
In [47]: a[np.array(0)].__array_interface__['data']
Out[47]: (43467872, False)
a[i,j] indexing is more idiomatic than a[i][j]. In some cases they are the same. But there are enough cases where they differ that it is wise to avoid the later unless you really know what it does, and why.
In [49]: a[0]
Out[49]: array([1, 2])
In [50]: a[np.array(0)]
Out[50]: array([1, 2])
In [51]: a[np.array([0])]
Out[51]: array([[1, 2]])
Indexing with np.array(0), a 0d array, is like indexing with np.array([0]), a 1d array. Both produce a copy, whose first dimension is sized like the index.
Admittedly this is tricky, and probably doesn't show up except when doing this sort of set.
When using np.matrix the choice of [i][j] versus [i,j] affects shape as well - python difference between the two form of matrix x[i,j] and x[i][j]

get indices of n-dimensional array when condition is True python [duplicate]

In Numpy, nonzero(a), where(a) and argwhere(a), with a being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
On argwhere the documentation says:
np.argwhere(a) is the same as np.transpose(np.nonzero(a)).
Why have a whole function that just transposes the output of nonzero ? When would that be so useful that it deserves a separate function?
What about the difference between where(a) and nonzero(a)? Wouldn't they return the exact same result?

nonzero and argwhere both give you information about where in the array the elements are True. where works the same as nonzero in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy "ufunc" version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a and b).
As far as having both nonzero and argwhere, they're conceptually different. nonzero is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere comes in.

I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where vs nonzero. In it's simplest use case, where is indeed the same as nonzero.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where is different from nonzero in the case when you wish to pick elements of from array a if some condition is True and from array b when that condition is False.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can't explain why they added the nonzero functionality to where, but this at least explains how the two are different.
EDIT: Fixed the first example... my logic was incorrect previously

Modifying block matrices in Python

I would like to take a matrix and modify blocks of it. For example, with a 4x4 matrix the {1,2},{1,2} block is to the top left quadrant ([0,1;4,5] below). The {4,1},{4,1} block is the top left quadrant if we rearrange the matrix so the 4th row/column is in position 1 and the 1st in position 2.
Let's made such a 4x4 matrix:
a = np.arange(16).reshape(4, 4)
print(a)
## [[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]]
Now one way of selecting the block, where I specify which rows/columns I want beforehand, is as follows:
C=[3,0]
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]
## array([[15, 12],
## [ 3, 0]])
Here's another way:
a[C,:][:,C]
## array([[15, 12],
## [ 3, 0]])
Yet, if I have a 2x2 array, call it b, setting
a[C,:][:,C]=b
doesn't work but
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]=b
does.
Why is this? And is this second way the most efficient possible? Thanks!

The relevant section from the numpy docs is
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing
Advanced array indexing.
Adapting that example to your case:
In [213]: rows=np.array([[C[0],C[0]],[C[1],C[1]]])
In [214]: cols=np.array([[C[0],C[1]],[C[0],C[1]]])
In [215]: rows
array([[3, 3],
[0, 0]])
In [216]: cols
array([[3, 0],
[3, 0]])
In [217]: a[rows,cols]
array([[15, 12],
[ 3, 0]])
due to broadcasting, you don't need to repeat duplicate indices, thus:
a[[[3],[0]],[3,0]]
does just fine. np.ix_ is a convenience function to produce just such a pair:
np.ix_(C,C)
(array([[3],
[0]]),
array([[3, 0]]))
thus a short answer is:
a[np.ix_(C,C)]
A related function is meshgrid, which constructs full indexing arrays:
a[np.meshgrid(C,C,indexing='ij')]
np.meshgrid(C,C,indexing='ij') is the same as your [rows, cols]. See the functions doc for the significance of the 'ij' parameter.
np.meshgrid(C,C,indexing='ij',sparse=True) produces the same pair of arrays as np.ix_.
I don't think there's a serious difference in computational speed. Obviously some require less typing on your part.
a[:,C][C,:] works for viewing values, but not for modifying them. The details have to do with which actions make views and which make copies. The simple answer is, use only one layer of indexing if you want to modify values.
The indexing documentation:
Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing.
Thus a[1][3] += 7 works. But the doc also warns
Warning
The above is not true for advanced indexing.

Indexing NumPy 2D array with another 2D array

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.

The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))

I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).

What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])

IMHO, this is simplest variant:
m[np.arange(4), select]

Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.

result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.