How does np.einsum work?
Given arrays A and B, their matrix multiplication followed by transpose is computed using (A # B).T, or equivalently, using:
np.einsum("ij, jk -> ki", A, B)
(Note: this answer is based on a short blog post about einsum I wrote a while ago.)
What does einsum do?
Imagine that we have two multi-dimensional arrays, A and B. Now let's suppose we want to...
multiply A with B in a particular way to create new array of products; and then maybe
sum this new array along particular axes; and then maybe
transpose the axes of the new array in a particular order.
There's a good chance that einsum will help us do this faster and more memory-efficiently than combinations of the NumPy functions like multiply, sum and transpose will allow.
How does einsum work?
Here's a simple (but not completely trivial) example. Take the following two arrays:
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
We will multiply A and B element-wise and then sum along the rows of the new array. In "normal" NumPy we'd write:
>>> (A[:, np.newaxis] * B).sum(axis=1)
array([ 0, 22, 76])
So here, the indexing operation on A lines up the first axes of the two arrays so that the multiplication can be broadcast. The rows of the array of products are then summed to return the answer.
Now if we wanted to use einsum instead, we could write:
>>> np.einsum('i,ij->i', A, B)
array([ 0, 22, 76])
The signature string 'i,ij->i' is the key here and needs a little bit of explaining. You can think of it in two halves. On the left-hand side (left of the ->) we've labelled the two input arrays. To the right of ->, we've labelled the array we want to end up with.
Here is what happens next:
A has one axis; we've labelled it i. And B has two axes; we've labelled axis 0 as i and axis 1 as j.
By repeating the label i in both input arrays, we are telling einsum that these two axes should be multiplied together. In other words, we're multiplying array A with each column of array B, just like A[:, np.newaxis] * B does.
Notice that j does not appear as a label in our desired output; we've just used i (we want to end up with a 1D array). By omitting the label, we're telling einsum to sum along this axis. In other words, we're summing the rows of the products, just like .sum(axis=1) does.
That's basically all you need to know to use einsum. It helps to play about a little; if we leave both labels in the output, 'i,ij->ij', we get back a 2D array of products (same as A[:, np.newaxis] * B). If we say no output labels, 'i,ij->, we get back a single number (same as doing (A[:, np.newaxis] * B).sum()).
The great thing about einsum however, is that it does not build a temporary array of products first; it just sums the products as it goes. This can lead to big savings in memory use.
A slightly bigger example
To explain the dot product, here are two new arrays:
A = array([[1, 1, 1],
[2, 2, 2],
[5, 5, 5]])
B = array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1]])
We will compute the dot product using np.einsum('ij,jk->ik', A, B). Here's a picture showing the labelling of the A and B and the output array that we get from the function:
You can see that label j is repeated - this means we're multiplying the rows of A with the columns of B. Furthermore, the label j is not included in the output - we're summing these products. Labels i and k are kept for the output, so we get back a 2D array.
It might be even clearer to compare this result with the array where the label j is not summed. Below, on the left you can see the 3D array that results from writing np.einsum('ij,jk->ijk', A, B) (i.e. we've kept label j):
Summing axis j gives the expected dot product, shown on the right.
Some exercises
To get more of a feel for einsum, it can be useful to implement familiar NumPy array operations using the subscript notation. Anything that involves combinations of multiplying and summing axes can be written using einsum.
Let A and B be two 1D arrays with the same length. For example, A = np.arange(10) and B = np.arange(5, 15).
The sum of A can be written:
np.einsum('i->', A)
Element-wise multiplication, A * B, can be written:
np.einsum('i,i->i', A, B)
The inner product or dot product, np.inner(A, B) or np.dot(A, B), can be written:
np.einsum('i,i->', A, B) # or just use 'i,i'
The outer product, np.outer(A, B), can be written:
np.einsum('i,j->ij', A, B)
For 2D arrays, C and D, provided that the axes are compatible lengths (both the same length or one of them of has length 1), here are a few examples:
The trace of C (sum of main diagonal), np.trace(C), can be written:
np.einsum('ii', C)
Element-wise multiplication of C and the transpose of D, C * D.T, can be written:
np.einsum('ij,ji->ij', C, D)
Multiplying each element of C by the array D (to make a 4D array), C[:, :, None, None] * D, can be written:
np.einsum('ij,kl->ijkl', C, D)
Grasping the idea of numpy.einsum() is very easy if you understand it intuitively. As an example, let's start with a simple description involving matrix multiplication.
To use numpy.einsum(), all you have to do is to pass the so-called subscripts string as an argument, followed by your input arrays.
Let's say you have two 2D arrays, A and B, and you want to do matrix multiplication. So, you do:
np.einsum("ij, jk -> ik", A, B)
Here the subscript string ij corresponds to array A while the subscript string jk corresponds to array B. Also, the most important thing to note here is that the number of characters in each subscript string must match the dimensions of the array (i.e., two chars for 2D arrays, three chars for 3D arrays, and so on). And if you repeat the chars between subscript strings (j in our case), then that means you want the einsum to happen along those dimensions. Thus, they will be sum-reduced (i.e., that dimension will be gone).
The subscript string after this -> symbol represent the dimensions of our resultant array.
If you leave it empty, then everything will be summed and a scalar value is returned as the result. Else the resultant array will have dimensions according to the subscript string. In our example, it'll be ik. This is intuitive because we know that for the matrix multiplication to work, the number of columns in array A has to match the number of rows in array B which is what is happening here (i.e., we encode this knowledge by repeating the char j in the subscript string)
Here are some more examples illustrating the use/power of np.einsum() in implementing some common tensor or nd-array operations, succinctly.
Inputs
# a vector
In [197]: vec
Out[197]: array([0, 1, 2, 3])
# an array
In [198]: A
Out[198]:
array([[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44]])
# another array
In [199]: B
Out[199]:
array([[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]])
1) Matrix multiplication (similar to np.matmul(arr1, arr2))
In [200]: np.einsum("ij, jk -> ik", A, B)
Out[200]:
array([[130, 130, 130, 130],
[230, 230, 230, 230],
[330, 330, 330, 330],
[430, 430, 430, 430]])
2) Extract elements along the main-diagonal (similar to np.diag(arr))
In [202]: np.einsum("ii -> i", A)
Out[202]: array([11, 22, 33, 44])
3) Hadamard product (i.e. element-wise product of two arrays) (similar to arr1 * arr2)
In [203]: np.einsum("ij, ij -> ij", A, B)
Out[203]:
array([[ 11, 12, 13, 14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])
4) Element-wise squaring (similar to np.square(arr) or arr ** 2)
In [210]: np.einsum("ij, ij -> ij", B, B)
Out[210]:
array([[ 1, 1, 1, 1],
[ 4, 4, 4, 4],
[ 9, 9, 9, 9],
[16, 16, 16, 16]])
5) Trace (i.e. sum of main-diagonal elements) (similar to np.trace(arr))
In [217]: np.einsum("ii -> ", A)
Out[217]: 110
6) Matrix transpose (similar to np.transpose(arr))
In [221]: np.einsum("ij -> ji", A)
Out[221]:
array([[11, 21, 31, 41],
[12, 22, 32, 42],
[13, 23, 33, 43],
[14, 24, 34, 44]])
7) Outer Product (of vectors) (similar to np.outer(vec1, vec2))
In [255]: np.einsum("i, j -> ij", vec, vec)
Out[255]:
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
8) Inner Product (of vectors) (similar to np.inner(vec1, vec2))
In [256]: np.einsum("i, i -> ", vec, vec)
Out[256]: 14
9) Sum along axis 0 (similar to np.sum(arr, axis=0))
In [260]: np.einsum("ij -> j", B)
Out[260]: array([10, 10, 10, 10])
10) Sum along axis 1 (similar to np.sum(arr, axis=1))
In [261]: np.einsum("ij -> i", B)
Out[261]: array([ 4, 8, 12, 16])
11) Batch Matrix Multiplication
In [287]: BM = np.stack((A, B), axis=0)
In [288]: BM
Out[288]:
array([[[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34],
[41, 42, 43, 44]],
[[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 4, 4, 4, 4]]])
In [289]: BM.shape
Out[289]: (2, 4, 4)
# batch matrix multiply using einsum
In [292]: BMM = np.einsum("bij, bjk -> bik", BM, BM)
In [293]: BMM
Out[293]:
array([[[1350, 1400, 1450, 1500],
[2390, 2480, 2570, 2660],
[3430, 3560, 3690, 3820],
[4470, 4640, 4810, 4980]],
[[ 10, 10, 10, 10],
[ 20, 20, 20, 20],
[ 30, 30, 30, 30],
[ 40, 40, 40, 40]]])
In [294]: BMM.shape
Out[294]: (2, 4, 4)
12) Sum along axis 2 (similar to np.sum(arr, axis=2))
In [330]: np.einsum("ijk -> ij", BM)
Out[330]:
array([[ 50, 90, 130, 170],
[ 4, 8, 12, 16]])
13) Sum all the elements in array (similar to np.sum(arr))
In [335]: np.einsum("ijk -> ", BM)
Out[335]: 480
14) Sum over multiple axes (i.e. marginalization)
(similar to np.sum(arr, axis=(axis0, axis1, axis2, axis3, axis4, axis6, axis7)))
# 8D array
In [354]: R = np.random.standard_normal((3,5,4,6,8,2,7,9))
# marginalize out axis 5 (i.e. "n" here)
In [363]: esum = np.einsum("ijklmnop -> n", R)
# marginalize out axis 5 (i.e. sum over rest of the axes)
In [364]: nsum = np.sum(R, axis=(0,1,2,3,4,6,7))
In [365]: np.allclose(esum, nsum)
Out[365]: True
15) Double Dot Products (similar to np.sum(hadamard-product) cf. 3)
In [772]: A
Out[772]:
array([[1, 2, 3],
[4, 2, 2],
[2, 3, 4]])
In [773]: B
Out[773]:
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
In [774]: np.einsum("ij, ij -> ", A, B)
Out[774]: 124
16) 2D and 3D array multiplication
Such a multiplication could be very useful when solving linear system of equations (Ax = b) where you want to verify the result.
# inputs
In [115]: A = np.random.rand(3,3)
In [116]: b = np.random.rand(3, 4, 5)
# solve for x
In [117]: x = np.linalg.solve(A, b.reshape(b.shape[0], -1)).reshape(b.shape)
# 2D and 3D array multiplication :)
In [118]: Ax = np.einsum('ij, jkl', A, x)
# indeed the same!
In [119]: np.allclose(Ax, b)
Out[119]: True
On the contrary, if one has to use np.matmul() for this verification, we have to do couple of reshape operations to achieve the same result like:
# reshape 3D array `x` to 2D, perform matmul
# then reshape the resultant array to 3D
In [123]: Ax_matmul = np.matmul(A, x.reshape(x.shape[0], -1)).reshape(x.shape)
# indeed correct!
In [124]: np.allclose(Ax, Ax_matmul)
Out[124]: True
Bonus: Read more math here : Einstein-Summation and definitely here: Tensor-Notation
When reading einsum equations, I've found it the most helpful to just be able to
mentally boil them down to their imperative versions.
Let's start with the following (imposing) statement:
C = np.einsum('bhwi,bhwj->bij', A, B)
Working through the punctuation first we see that we have two 4-letter comma-separated blobs - bhwi and bhwj, before the arrow,
and a single 3-letter blob bij after it. Therefore, the equation produces a rank-3 tensor result from two rank-4 tensor inputs.
Now, let each letter in each blob be the name of a range variable. The position at which the letter appears in the blob
is the index of the axis that it ranges over in that tensor.
The imperative summation that produces each element of C, therefore, has to start with three nested for loops, one for each index of C.
for b in range(...):
for i in range(...):
for j in range(...):
# the variables b, i and j index C in the order of their appearance in the equation
C[b, i, j] = ...
So, essentially, you have a for loop for every output index of C. We'll leave the ranges undetermined for now.
Next we look at the left-hand side - are there any range variables there that don't appear on the right-hand side? In our case - yes, h and w.
Add an inner nested for loop for every such variable:
for b in range(...):
for i in range(...):
for j in range(...):
C[b, i, j] = 0
for h in range(...):
for w in range(...):
...
Inside the innermost loop we now have all indices defined, so we can write the actual summation and
the translation is complete:
# three nested for-loops that index the elements of C
for b in range(...):
for i in range(...):
for j in range(...):
# prepare to sum
C[b, i, j] = 0
# two nested for-loops for the two indexes that don't appear on the right-hand side
for h in range(...):
for w in range(...):
# Sum! Compare the statement below with the original einsum formula
# 'bhwi,bhwj->bij'
C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]
If you've been able to follow the code thus far, then congratulations! This is all you need to be able to read einsum equations. Notice in particular how the original einsum formula maps to the final summation statement in the snippet above. The for-loops and range bounds are just fluff and that final statement is all you really need to understand what's going on.
For the sake of completeness, let's see how to determine the ranges for each range variable. Well, the range of each variable is simply the length of the dimension(s) which it indexes.
Obviously, if a variable indexes more than one dimension in one or more tensors, then the lengths of each of those dimensions have to be equal.
Here's the code above with the complete ranges:
# C's shape is determined by the shapes of the inputs
# b indexes both A and B, so its range can come from either A.shape or B.shape
# i indexes only A, so its range can only come from A.shape, the same is true for j and B
assert A.shape[0] == B.shape[0]
assert A.shape[1] == B.shape[1]
assert A.shape[2] == B.shape[2]
C = np.zeros((A.shape[0], A.shape[3], B.shape[3]))
for b in range(A.shape[0]): # b indexes both A and B, or B.shape[0], which must be the same
for i in range(A.shape[3]):
for j in range(B.shape[3]):
# h and w can come from either A or B
for h in range(A.shape[1]):
for w in range(A.shape[2]):
C[b, i, j] += A[b, h, w, i] * B[b, h, w, j]
Another view on np.einsum
Most answers here explain by example, I thought I'd give an additional point of view.
You can see einsum as a generalized matrix summation operator. The string given contains the subscripts which are labels representing axes. I like to think of it as your operation definition. The subscripts provide two apparent constraints:
the number of axes for each input array,
axis size equality between inputs.
Let's take the initial example: np.einsum('ij,jk->ki', A, B). Here the constraints 1. translates to A.ndim == 2 and B.ndim == 2, and 2. to A.shape[1] == B.shape[0].
As you will see later down, there are other constraints. For instance:
labels in the output subscript must not appear more than once.
labels in the output subscript must appear in the input subscripts.
When looking at ij,jk->ki, you can think of it as:
which components from the input arrays will contribute to component [k, i] of the output array.
The subscripts contain the exact definition of the operation for each component of the output array.
We will stick with operation ij,jk->ki, and the following definitions of A and B:
>>> A = np.array([[1,4,1,7], [8,1,2,2], [7,4,3,4]])
>>> A.shape
(3, 4)
>>> B = np.array([[2,5], [0,1], [5,7], [9,2]])
>>> B.shape
(4, 2)
The output, Z, will have a shape of (B.shape[1], A.shape[0]) and could naively be constructed in the following way. Starting with a blank array for Z:
Z = np.zeros((B.shape[1], A.shape[0]))
for i in range(A.shape[0]):
for j in range(A.shape[1]):
for k range(B.shape[0]):
Z[k, i] += A[i, j]*B[j, k] # ki <- ij*jk
np.einsum is about accumulating contributions in the output array. Each (A[i,j], B[j,k]) pair is seen contributing to each Z[k, i] component.
You might have noticed, it looks extremely similar to how you would go about computing general matrix multiplications...
Minimal implementation
Here is a minimal implementation of np.einsum in Python. This should help understand what is really going on under the hood.
As we go along I will keep referring to the previous example. Defining inputs as [A, B].
np.einsum can actually take more than two inputs. In the following, we will focus on the general case: n inputs and n input subscripts. The main goal is to find the domain of iteration, i.e. the cartesian product of all our ranges.
We can't rely on manually writing for loops, simply because we don't know how many there will be. The main idea is this: we need to find all unique labels (I will use key and keys to refer to them), find the corresponding array shape, then create ranges for each one, and compute the product of the ranges using itertools.product to get the domain of study.
index
keys
constraints
sizes
ranges
1
'i'
A.shape[0]
3
range(0, 3)
2
'j'
A.shape[1] == B.shape[0]
4
range(0, 4)
0
'k'
B.shape[1]
2
range(0, 2)
The domain of study is the cartesian product: range(0, 2) x range(0, 3) x range(0, 4).
Subscripts processing:
>>> expr = 'ij,jk->ki'
>>> qry_expr, res_expr = expr.split('->')
>>> inputs_expr = qry_expr.split(',')
>>> inputs_expr, res_expr
(['ij', 'jk'], 'ki')
Find the unique keys (labels) in the input subscripts:
>>> keys = set([(key, size) for keys, input in zip(inputs_expr, inputs)
for key, size in list(zip(keys, input.shape))])
{('i', 3), ('j', 4), ('k', 2)}
We should be checking for constraints (as well as in the output subscript)! Using set is a bad idea but it will work for the purpose of this example.
Get the associated sizes (used to initialize the output array) and construct the ranges (used to create our domain of iteration):
>>> sizes = dict(keys)
{'i': 3, 'j': 4, 'k': 2}
>>> ranges = [range(size) for _, size in keys]
[range(0, 2), range(0, 3), range(0, 4)]
We need an list containing the keys (labels):
>>> to_key = sizes.keys()
['k', 'i', 'j']
Compute the cartesian product of the ranges
>>> domain = product(*ranges)
Note: [itertools.product][1] returns an iterator which gets consumed over time.
Initialize the output tensor as:
>>> res = np.zeros([sizes[key] for key in res_expr])
We will be looping over domain:
>>> for indices in domain:
... pass
For each iteration, indices will contain the values on each axis. In our example, that would provide i, j, and k as a tuple: (k, i, j). For each input (A and B) we need to determine which component to fetch. That's A[i, j] and B[j, k], yes! However, we don't have variables i, j, and k, literally speaking.
We can zip indices with to_key to create a mapping between each key (label) and its current value:
>>> vals = dict(zip(to_key, indices))
To get the coordinates for the output array, we use vals and loop over the keys: [vals[key] for key in res_expr]. However, to use these to index the output array, we need to wrap it with tuple and zip to separate the indices along each axis:
>>> res_ind = tuple(zip([vals[key] for key in res_expr]))
Same for the input indices (although there can be several):
>>> inputs_ind = [tuple(zip([vals[key] for key in expr])) for expr in inputs_expr]
We will use a itertools.reduce to compute the product of all contributing components:
>>> def reduce_mult(L):
... return reduce(lambda x, y: x*y, L)
Overall the loop over the domain looks like:
>>> for indices in domain:
... vals = {k: v for v, k in zip(indices, to_key)}
... res_ind = tuple(zip([vals[key] for key in res_expr]))
... inputs_ind = [tuple(zip([vals[key] for key in expr]))
... for expr in inputs_expr]
...
... res[res_ind] += reduce_mult([M[i] for M, i in zip(inputs, inputs_ind)])
>>> res
array([[70., 44., 65.],
[30., 59., 68.]])
That's pretty close to what np.einsum('ij,jk->ki', A, B) returns!
I found NumPy: The tricks of the trade (Part II) instructive
We use -> to indicate the order of the output array. So think of 'ij, i->j' as having left hand side (LHS) and right hand side (RHS). Any repetition of labels on the LHS computes the product element wise and then sums over. By changing the label on the RHS (output) side, we can define the axis in which we want to proceed with respect to the input array, i.e. summation along axis 0, 1 and so on.
import numpy as np
>>> a
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> b
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> d = np.einsum('ij, jk->ki', a, b)
Notice there are three axes, i, j, k, and that j is repeated (on the left-hand-side). i,j represent rows and columns for a. j,k for b.
In order to calculate the product and align the j axis we need to add an axis to a. (b will be broadcast along(?) the first axis)
a[i, j, k]
b[j, k]
>>> c = a[:,:,np.newaxis] * b
>>> c
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]],
[[ 0, 3, 6],
[ 9, 12, 15],
[18, 21, 24]]])
j is absent from the right-hand-side so we sum over j which is the second axis of the 3x3x3 array
>>> c = c.sum(1)
>>> c
array([[ 9, 12, 15],
[18, 24, 30],
[27, 36, 45]])
Finally, the indices are (alphabetically) reversed on the right-hand-side so we transpose.
>>> c.T
array([[ 9, 18, 27],
[12, 24, 36],
[15, 30, 45]])
>>> np.einsum('ij, jk->ki', a, b)
array([[ 9, 18, 27],
[12, 24, 36],
[15, 30, 45]])
>>>
Lets make 2 arrays, with different, but compatible dimensions to highlight their interplay
In [43]: A=np.arange(6).reshape(2,3)
Out[43]:
array([[0, 1, 2],
[3, 4, 5]])
In [44]: B=np.arange(12).reshape(3,4)
Out[44]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Your calculation, takes a 'dot' (sum of products) of a (2,3) with a (3,4) to produce a (4,2) array. i is the 1st dim of A, the last of C; k the last of B, 1st of C. j is 'consumed' by the summation.
In [45]: C=np.einsum('ij,jk->ki',A,B)
Out[45]:
array([[20, 56],
[23, 68],
[26, 80],
[29, 92]])
This is the same as np.dot(A,B).T - it's the final output that's transposed.
To see more of what happens to j, change the C subscripts to ijk:
In [46]: np.einsum('ij,jk->ijk',A,B)
Out[46]:
array([[[ 0, 0, 0, 0],
[ 4, 5, 6, 7],
[16, 18, 20, 22]],
[[ 0, 3, 6, 9],
[16, 20, 24, 28],
[40, 45, 50, 55]]])
This can also be produced with:
A[:,:,None]*B[None,:,:]
That is, add a k dimension to the end of A, and an i to the front of B, resulting in a (2,3,4) array.
0 + 4 + 16 = 20, 9 + 28 + 55 = 92, etc; Sum on j and transpose to get the earlier result:
np.sum(A[:,:,None] * B[None,:,:], axis=1).T
# C[k,i] = sum(j) A[i,j (,k) ] * B[(i,) j,k]
Once get familiar with the dummy index (the common or repeating index) and the summation along the dummy index in the Einstein Summation (einsum), the output -> shaping is easy. Hence focus on:
Dummy index, the common index j in np.einsum("ij,jk->ki", a, b)
Summation along the dummy index j
Dummy index
For einsum("...", a, b), element wise multiplication always happens in-between matrices a and b regardless there are common indices or not. We can have einsum('xy,wz', a, b) which has no common index in the subscripts 'xy,wz'.
If there is a common index, as j in "ij,jk->ki", then it is called a dummy index in the Einstein Summation.
Einstein Summation
An index that is summed over is a summation index, in this case "i". It is also called a dummy index since any symbol can replace "i" without changing the meaning of the expression provided that it does not collide with index symbols in the same term.
Summation along the dummy index
For np.einsum("ij,j", a, b) of the green rectangle in the diagram, j is the dummy index. The element-wise multiplication a[i][j] * b[j] is summed up along the j axis as Σ ( a[i][j] * b[j] ).
It is a dot product np.inner(a[i], b) for each i. Here being specific with np.inner() and avoiding np.dot as it is not strictly a mathematical dot product operation.
Einstein Summation Convention: an Introduction
The dummy index can appear anywhere as long as the rules (please see the youtube for details) are met.
For the dummy index i in np.einsum(“ik,il", a, b), it is a row index of the matrices a and b, hence a column from a and that from b are extracted to generate the dot products.
Output form
Because the summation occurs along the dummy index, the dummy index disappears in the result matrix, hence i from “ik,il" is dropped and form the shape (k,l). We can tell np.einsum("... -> <shape>") to specify the output form by the output subscript labels with -> identifier.
See the explicit mode in numpy.einsum for details.
In explicit mode the output can be directly controlled by specifying
output subscript labels. This requires the identifier ‘->’ as well as
the list of output subscript labels. This feature increases the
flexibility of the function since summing can be disabled or forced
when required. The call np.einsum('i->', a) is like np.sum(a, axis=-1), and np.einsum('ii->i', a) is like np.diag(a). The difference
is that einsum does not allow broadcasting by default. Additionally
np.einsum('ij,jh->ih', a, b) directly specifies the order of the
output subscript labels and therefore returns matrix multiplication,
unlike the example above in implicit mode.
Without a dummy index
An example for having no dummy index in the einsum.
A term (subscript Indices, e.g. "ij") selects an element in each array.
Each left-hand side element is applied on the element on the right-hand side for element-wise multiplication (hence multiplication always happens).
a has shape (2,3) each element of which is applied to b of shape (2,2). Hence it creates a matrix of shape (2,3,2,2) without no summation as (i,j), (k.l) are all free indices.
# --------------------------------------------------------------------------------
# For np.einsum("ij,kl", a, b)
# 1-1: Term "ij" or (i,j), two free indices, selects selects an element a[i][j].
# 1-2: Term "kl" or (k,l), two free indices, selects selects an element b[k][l].
# 2: Each a[i][j] is applied on b[k][l] for element-wise multiplication a[i][j] * b[k,l]
# --------------------------------------------------------------------------------
# for (i,j) in a:
# for(k,l) in b:
# a[i][j] * b[k][l]
np.einsum("ij,kl", a, b)
array([[[[ 0, 0],
[ 0, 0]],
[[10, 11],
[12, 13]],
[[20, 22],
[24, 26]]],
[[[30, 33],
[36, 39]],
[[40, 44],
[48, 52]],
[[50, 55],
[60, 65]]]])
Examples
dot products from matrix A rows and matrix B columns
A = np.matrix('0 1 2; 3 4 5')
B = np.matrix('0 -3; -1 -4; -2 -5');
np.einsum('ij,ji->i', A, B)
# Same with
np.diagonal(np.matmul(A,B))
(A*B).diagonal()
---
[ -5 -50]
[ -5 -50]
[[ -5 -50]]
I think the simplest example is in tensorflow docs
There are four steps to convert your equation to einsum notation. Lets take this equation as an example C[i,k] = sum_j A[i,j] * B[j,k]
First we drop the variable names. We get ik = sum_j ij * jk
We drop the sum_j term as it is implicit. We get ik = ij * jk
We replace * with ,. We get ik = ij, jk
The output is on the RHS and is separated with -> sign. We get ij, jk -> ik
The einsum interpreter just runs these 4 steps in reverse. All indices missing in the result are summed over.
Here are some more examples from the docs
# Matrix multiplication
einsum('ij,jk->ik', m0, m1) # output[i,k] = sum_j m0[i,j] * m1[j, k]
# Dot product
einsum('i,i->', u, v) # output = sum_i u[i]*v[i]
# Outer product
einsum('i,j->ij', u, v) # output[i,j] = u[i]*v[j]
# Transpose
einsum('ij->ji', m) # output[j,i] = m[i,j]
# Trace
einsum('ii', m) # output[j,i] = trace(m) = sum_i m[i, i]
# Batch matrix multiplication
einsum('aij,ajk->aik', s, t) # out[a,i,k] = sum_j s[a,i,j] * t[a, j, k]
Related
So I have a tensor that is M x B x C, where M is the number of models, B is the batch and C is the classes and each cell is the probability of a class for a given model and batch. Then I have a tensor of the correct answers which is just a 1D of size B we'll call "t". How do I use the 1D of size B to just return a M x B x 1, where the returned tensor is just the value at the correct class? Say the M x B x C tensor is called "blah" I've tried
blah[:, :, C]
for i in range(M):
blah[i, :, C]
blah[:, C, :]
The top 2 just return the values of indexes t in the 3rd dimension of every slice. The last one returns the values at t indexes in the 2nd dimension. How do I do this?
We can get the desired result by combining advanced and basic indexing
import torch
# shape [2, 3, 4]
blah = torch.tensor([
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
# shape [3]
t = torch.tensor([2, 1, 0])
b = torch.arange(blah.shape[1]).type_as(t)
# shape [2, 3, 1]
result = blah[:, b, t].unsqueeze(-1)
which results in
>>> result
tensor([[[ 2],
[ 5],
[ 8]],
[[14],
[17],
[20]]])
Here is one way to do it:
Suppose a is your M x B x C shaped tensor. I am taking some representative values below,
>>> M = 3
>>> B = 5
>>> C = 4
>>> a = torch.rand(M, B, C)
>>> a
tensor([[[0.6222, 0.6703, 0.0057, 0.3210],
[0.6251, 0.3286, 0.8451, 0.5978],
[0.0808, 0.8408, 0.3795, 0.4872],
[0.8589, 0.8891, 0.8033, 0.8906],
[0.5620, 0.5275, 0.4272, 0.2286]],
[[0.2419, 0.0179, 0.2052, 0.6859],
[0.1868, 0.7766, 0.3648, 0.9697],
[0.6750, 0.4715, 0.9377, 0.3220],
[0.0537, 0.1719, 0.0013, 0.0537],
[0.2681, 0.7514, 0.6523, 0.7703]],
[[0.5285, 0.5360, 0.7949, 0.6210],
[0.3066, 0.1138, 0.6412, 0.4724],
[0.3599, 0.9624, 0.0266, 0.1455],
[0.7474, 0.2999, 0.7476, 0.2889],
[0.1779, 0.3515, 0.8900, 0.2301]]])
Let's say the 1D class tensor is t, which gives the true class of each example in the batch. So it is a 1D tensor of shape (B, ) having class labels in the range {0, 1, 2, ..., C-1}.
>>> t = torch.randint(C, size = (B, ))
>>> t
tensor([3, 2, 1, 1, 0])
So basically you want to select the indices corresponding to t from the innermost dimension of a. This can be achieved using fancy indexing and broadcasting combined as follows:
>>> i = torch.arange(M).reshape(M, 1, 1)
>>> j = torch.arange(B).reshape(1, B, 1)
>>> k = t.reshape(1, B, 1)
Note that once you index anything by (i, j, k), they are going to expand and take the shape (M, B, 1) which is the desired output shape.
Now just indexing a by i, j and k gives:
>>> a[i, j, k]
tensor([[[0.3210],
[0.8451],
[0.8408],
[0.8891],
[0.5620]],
[[0.6859],
[0.3648],
[0.4715],
[0.1719],
[0.2681]],
[[0.6210],
[0.6412],
[0.9624],
[0.2999],
[0.1779]]])
So essentially, if you generate the index arrays conveying your access pattern beforehand, you can directly use them to extract some slice of the tensor.
You simply need to pass:
your index as the third slice
range(B) as the second slice
(i.e. which element in the 2nd dim each 3rd dim index corresponds to)
blah[:,range(B),t]
I want to sum up all elements (W * H) of 3D matrix and store it in 1D matrix with length=depth(third dimension of input matrix)
To make myself clear:
Input dimension = 1D in the form of (W * H * D).
Required output = 1D again with length=D
let's consider below 3D Matrix : 2 x 3 x 2.
Layer 1 Layer 2
[1, 2, 3 [7, 8, 9
4, 5, 6] 10, 11, 12]
output is 1D : [21, 57]
I am new to python and wrote like this:
def test(w, h, c, image_inp):
output = [image_inp[j * w + k] for i in enumerate(image_inp)
for j in range(0,h)
for k in range(0,w)
#image_inp[j * w + k] comment
]
printout(output)
I know this will copy the input array as it is to output array.
also output array length is not equal to Depth.
Some one please help me in getting this right
def test(w, h, c, image_inp):
output = [hwsum for i in enumerate(image_inp)
hwsum += wsum for j in range(0,h)
wsum += image_inp[j*w + k] for k in range(0,w)
#image_inp[j * w + k]
]
print "Calling outprint"
printout(output)
Note: I do not want to use numpy(with this it is working) or any math libraries.
reason being I am writing test code in python to evaluate a working on language.
EDIT:
input matrix will be entering the test function as 1D with w, h, c as arguments,
so it takes the form as:
[1,2,3,4,5,6,7,8,9,10,12],
with w, h, c have to compute considering input1D as 3D matrix.
thanks
Numpy is very suitable for slicing and manipulating single and multiple dimensional data. It is fast, easy to use and very "pythonic".
Following your example, you can just do:
>>> import numpy
>>> img3d=numpy.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,12,12]]])
>>> img3d.shape
(2, 2, 3)
You can see here that img3d has 2 layers, 2 rows and 3 columns. You can just slice using indexing like this:
>>> img3d[0,:,:]
array([[1, 2, 3],
[4, 5, 6]])
To go from 3D to 1D, just use numpy.flatten():
>>> f=img3d.flatten()
>>> f
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 12])
And reversed, use numpy.reshape():
>>> f.reshape((2,2,3))
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 12, 12]]])
Now add just jusing numpy.sum, giving the dimensions you want to add (in your case, dimensions 1 and 2 (dimensions being 0-indexed):
>>> numpy.sum(img3d,(1,2))
array([21, 58])
Just to summarize in a oneliner, you can do (variable names from your question):
>>> numpy.sum(numpy.array(image_inp).reshape(w,h,c),(1,2))
From the numpy manual on numpy.sum:
numpy.sum
numpy.sum(a, axis=None, dtype=None, out=None, keepdims=numpy._globals._NoValue>)
Sum of array elements over a given axis.
Parameters:
a : array_like Elements to sum.
axis : None or int or
tuple of ints, optional Axis or axes along which a sum is performed.
The default, axis=None, will sum all of the elements of the input
array. If axis is negative it counts from the last to the first axis.
New in version 1.7.0.: If axis is a tuple of ints, a sum is performed
on all of the axes specified in the tuple instead of a single axis or
all the axes as before.
If your matrix is set as your post implies with your "3D" matrix as an array of arrays:
M = [ [1, 2, 3,
4, 5, 6],
[ 7, 8, 9,
10,11,12],
]
array_of_sums = []
for pseudo_2D_matrix in M:
array_of_sums.append(sum(pseudo_2D_matrix))
If your 3D matrix, as a real three dimensional object, is set up as:
M = [
[ [ 1, 2, 3],
[ 4, 5, 6]
],
[ [ 7, 8, 9],
[10,11,12],
]
You could create a 1D array of sums by doing the following:
array_of_sums = []
for 2D_matrix in M:
s = 0
for row in 2D_matrix:
s += sum(row)
array_of_sums.append(s)
It's a bit unclear how your data are formatted, but hopefully you get the idea from these two examples.
EDIT:
In light of clarification on input you could easily accomplish this:
If dimensions w,h,c are given as dimensional breakout of the array [1,2,3,4,5,6,7,8,9,10,12], then you simply need to boundary off those regions and sum based on that:
input_array = [1,2,3,4,5,6,7,8,9,10,11,12]
w,h,c = 2,3,2
array_of_sums = []
i = 0
while i < w:
array_of_sums.append(sum(input_array[i*h*c:(i+1)*h*c]))
i += 1
as a simplified method:
def sum_2D_slices(w,h,c,matrix_3D):
return [sum(matrix_3D[i*h*c:(i+1)*h*c]) for i in range(w)]
Lets say, I have bunch of matrices As and vectors bs.
As = array([[[1, 7], [3, 8]],
[[2, 1], [5, 9]],
[[7, 2], [8, 3]]])
bs = array([[8, 0], [8, 8], [7, 3]])
When I do np.inner(As, bs), I get:
array([[[ 8, 64, 28], [ 24, 88, 45]],
[[ 16, 24, 17], [ 40, 112, 62]],
[[ 56, 72, 55], [ 64, 88, 65]]])
But I do not need all inner products. What I want is, to calculate each matrix with each vector once.
I can do something like this:
np.array(map(lambda (a, b): np.inner(a, b), zip(As, bs)))
Then I get the expected matrix:
array([[ 8, 24], [ 24, 112], [ 55, 65]])
Now I do not want to use zip, map etc. because I need this operation > 10**6 time (for image processing, exactly for GMM).
Is there any way to use numpy, scipy etc. that can do this for me? (fast and efficent)
You can use np.einsum -
np.einsum('ijk,ik->ij',As, bs)
Explanation
With np.array(map(lambda (a, b): np.inner(a, b), zip(As, bs))), we are selecting the first element off As as a and off bs as b and doing inner product. Thus, we are doing :
In [19]: np.inner(As[0],bs[0])
Out[19]: array([ 8, 24])
In [20]: np.inner(As[1],bs[1])
Out[20]: array([ 24, 112])
In [21]: np.inner(As[2],bs[2])
Out[21]: array([55, 65])
Think of it as a loop, we iterate 3 times, corresponding to the length of first axis of As, which is same as for bs. Thus, looking at the lambda expression, at each iteration, we have a = As[0] & b = bs[0], a = As[1] & b = bs[1] and so on.
As and bs being 3D and 2D, let's represent them as iterators imagining the inner-product in our minds. Thus, at iteration, we would have a : j,k and b : m. With that inner product between a and b, we would lose the second axis of a and first of b. Thus, we need to align k with m. Thus, we could assume b to have the same iterator as k. Referencing back from a to As and b to bs, in essence, we would lose the third axis from As and second from bs with the inner product/sum-reduction. That iterating along the first axis for As and bs signifies that we need to keep those aligned under these sum-reductions.
Let's summarize.
We have the iterators involved for the input arrays like so -
As : i x j x k
bs : i x k
Steps involved in the intended operation :
Keep the first axis of As aligned with first of bs.
Lose the third axis of As with sum-reduction against second of bs.
Thus, we would be left with the iterators i,j for the output.
np.einsum is a pretty efficient implementation and is specially handy when we need to keep one or more of the axes of the input arrays aligned against each other.
For more info on einsum, I would suggest following the docs link supplied earlier and also this Q&A could be helpful!
I am attempting to generalize some Python code to operate on arrays of arbitrary dimension. The operations are applied to each vector in the array. So for a 1D array, there is simply one operation, for a 2-D array it would be both row and column-wise (linearly, so order does not matter). For example, a 1D array (a) is simple:
b = operation(a)
where 'operation' is expecting a 1D array. For a 2D array, the operation might proceed as
for ii in range(0,a.shape[0]):
b[ii,:] = operation(a[ii,:])
for jj in range(0,b.shape[1]):
c[:,ii] = operation(b[:,ii])
I would like to make this general where I do not need to know the dimension of the array beforehand, and not have a large set of if/elif statements for each possible dimension.
Solutions that are general for 1 or 2 dimensions are ok, though a completely general solution would be preferred. In reality, I do not imagine needing this for any dimension higher than 2, but if I can see a general example I will learn something!
Extra information:
I have a matlab code that uses cells to do something similar, but I do not fully understand how it works. In this example, each vector is rearranged (basically the same function as fftshift in numpy.fft). Not sure if this helps, but it operates on an array of arbitrary dimension.
function aout=foldfft(ain)
nd = ndims(ain);
for k = 1:nd
nx = size(ain,k);
kx = floor(nx/2);
idx{k} = [kx:nx 1:kx-1];
end
aout = ain(idx{:});
In Octave, your MATLAB code does:
octave:19> size(ain)
ans =
2 3 4
octave:20> idx
idx =
{
[1,1] =
1 2
[1,2] =
1 2 3
[1,3] =
2 3 4 1
}
and then it uses the idx cell array to index ain. With these dimensions it 'rolls' the size 4 dimension.
For 5 and 6 the index lists would be:
2 3 4 5 1
3 4 5 6 1 2
The equivalent in numpy is:
In [161]: ain=np.arange(2*3*4).reshape(2,3,4)
In [162]: idx=np.ix_([0,1],[0,1,2],[1,2,3,0])
In [163]: idx
Out[163]:
(array([[[0]],
[[1]]]), array([[[0],
[1],
[2]]]), array([[[1, 2, 3, 0]]]))
In [164]: ain[idx]
Out[164]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
Besides the 0 based indexing, I used np.ix_ to reshape the indexes. MATLAB and numpy use different syntax to index blocks of values.
The next step is to construct [0,1],[0,1,2],[1,2,3,0] with code, a straight forward translation.
I can use np.r_ as a short cut for turning 2 slices into an index array:
In [201]: idx=[]
In [202]: for nx in ain.shape:
kx = int(np.floor(nx/2.))
kx = kx-1;
idx.append(np.r_[kx:nx, 0:kx])
.....:
In [203]: idx
Out[203]: [array([0, 1]), array([0, 1, 2]), array([1, 2, 3, 0])]
and pass this through np.ix_ to make the appropriate index tuple:
In [204]: ain[np.ix_(*idx)]
Out[204]:
array([[[ 1, 2, 3, 0],
[ 5, 6, 7, 4],
[ 9, 10, 11, 8]],
[[13, 14, 15, 12],
[17, 18, 19, 16],
[21, 22, 23, 20]]])
In this case, where 2 dimensions don't roll anything, slice(None) could replace those:
In [210]: idx=(slice(None),slice(None),[1,2,3,0])
In [211]: ain[idx]
======================
np.roll does:
indexes = concatenate((arange(n - shift, n), arange(n - shift)))
res = a.take(indexes, axis)
np.apply_along_axis is another function that constructs an index array (and turns it into a tuple for indexing).
If you are looking for a programmatic way to index the k-th dimension an n-dimensional array, then numpy.take might help you.
An implementation of foldfft is given below as an example:
In[1]:
import numpy as np
def foldfft(ain):
result = ain
nd = len(ain.shape)
for k in range(nd):
nx = ain.shape[k]
kx = (nx+1)//2
shifted_index = list(range(kx,nx)) + list(range(kx))
result = np.take(result, shifted_index, k)
return result
a = np.indices([3,3])
print("Shape of a = ", a.shape)
print("\nStarting array:\n\n", a)
print("\nFolded array:\n\n", foldfft(a))
Out[1]:
Shape of a = (2, 3, 3)
Starting array:
[[[0 0 0]
[1 1 1]
[2 2 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
Folded array:
[[[2 0 1]
[2 0 1]
[2 0 1]]
[[2 2 2]
[0 0 0]
[1 1 1]]]
You could use numpy.ndarray.flat, which allows you to linearly iterate over a n dimensional numpy array. Your code should then look something like this:
b = np.asarray(x)
for i in range(len(x.flat)):
b.flat[i] = operation(x.flat[i])
The folks above provided multiple appropriate solutions. For completeness, here is my final solution. In this toy example for the case of 3 dimensions, the function 'ops' replaces the first and last element of a vector with 1.
import numpy as np
def ops(s):
s[0]=1
s[-1]=1
return s
a = np.random.rand(4,4,3)
print '------'
print 'Array a'
print a
print '------'
for ii in np.arange(a.ndim):
a = np.apply_along_axis(ops,ii,a)
print '------'
print ' Axis',str(ii)
print a
print '------'
print ' '
The resulting 3D array has a 1 in every element on the 'border' with the numbers in the middle of the array unchanged. This is of course a toy example; however ops could be any arbitrary function that operates on a 1D vector.
Flattening the vector will also work; I chose not to pursue that simply because the book-keeping is more difficult and apply_along_axis is the simplest approach.
apply_along_axis reference page
I'm new to python and numpy in general. I read several tutorials and still so confused between the differences in dim, ranks, shape, aixes and dimensions. My mind seems to be stuck at the matrix representation. So if you say that A is a matrix that looks like this:
A =
1 2 3
4 5 6
then all I can think of is a 2x3 matrix (two rows and three columns). Here I understand that the shape is 2x3. But I really I am unable to go out side the thinking of a 2D matrices. I don't understand for example the dot() documentation when it says "For N dimensions it is a sum product over the last axis of a and the second-to-last of b". I'm so confused and unable to understand this. I don't understand like if V is a N:1 vector and M is N:N matrix, how dot(V,M) or dot(M,V) work and the difference between them.
Can anyone then please explain to me what is a N dimensional array, what's a shape, what's an axis and how does it relate to the documentation of the dot() function? It would be great if the explanation visualizes the ideas.
Dimensionality of NumPy arrays must be understood in the data structures sense, not the mathematical sense, i.e. it's the number of scalar indices you need to obtain a scalar value.(*)
E.g., this is a 3-d array:
>>> X = np.arange(24).reshape(2, 3, 4)
>>> X
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Indexing once gives a 2-d array (matrix):
>>> X[0]
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Indexing twice gives a 1-d array (vector), and indexing three times gives a scalar.
The rank of X is its number of dimensions:
>>> X.ndim
3
>>> np.rank(X)
3
Axis is roughly synonymous with dimension; it's used in broadcasting operations:
>>> X.sum(axis=0)
array([[12, 14, 16, 18],
[20, 22, 24, 26],
[28, 30, 32, 34]])
>>> X.sum(axis=1)
array([[12, 15, 18, 21],
[48, 51, 54, 57]])
>>> X.sum(axis=2)
array([[ 6, 22, 38],
[54, 70, 86]])
To be honest, I find this definition of "rank" confusing since it matches neither the name of the attribute ndim nor the linear algebra definition of rank.
Now regarding np.dot, what you have to understand is that there are three ways to represent a vector in NumPy: 1-d array, a column vector of shape (n, 1) or a row vector of shape (1, n). (Actually, there are more ways, e.g. as a (1, n, 1)-shaped array, but these are quite rare.) np.dot performs vector multiplication when both arguments are 1-d, matrix-vector multiplication when one argument is 1-d and the other is 2-d, and otherwise it performs a (generalized) matrix multiplication:
>>> A = np.random.randn(2, 3)
>>> v1d = np.random.randn(2)
>>> np.dot(v1d, A)
array([-0.29269547, -0.52215117, 0.478753 ])
>>> vrow = np.atleast_2d(v1d)
>>> np.dot(vrow, A)
array([[-0.29269547, -0.52215117, 0.478753 ]])
>>> vcol = vrow.T
>>> np.dot(vcol, A)
Traceback (most recent call last):
File "<ipython-input-36-98949c6de990>", line 1, in <module>
np.dot(vcol, A)
ValueError: matrices are not aligned
The rule "sum product over the last axis of a and the second-to-last of b" matches and generalizes the common definition of matrix multiplication.
(*) Arrays of dtype=object are a bit of an exception, since they treat any Python object as a scalar.
np.dot is a generalization of matrix multiplication.
In regular matrix multiplication, an (N,M)-shape matrix multiplied with a (M,P)-shaped matrix results in a (N,P)-shaped matrix. The resultant shape can be thought of as being formed by squashing the two shapes together ((N,M,M,P)) and then removing the middle numbers, M (to produce (N,P)). This is the property that np.dot preserves while generalizing to arrays of higher dimension.
When the docs say,
"For N dimensions it is a sum product over the last axis of a and the
second-to-last of b".
it is speaking to this point. An array of shape (u,v,M) dotted with an array of shape (w,x,y,M,z) would result in an array of shape (u,v,w,x,y,z).
Let's see how this rule looks when applied to
In [25]: V = np.arange(2); V
Out[25]: array([0, 1])
In [26]: M = np.arange(4).reshape(2,2); M
Out[26]:
array([[0, 1],
[2, 3]])
First, the easy part:
In [27]: np.dot(M, V)
Out[27]: array([1, 3])
There is no surprise here; this is just matrix-vector multiplication.
Now consider
In [28]: np.dot(V, M)
Out[28]: array([2, 3])
Look at the shape of V and M:
In [29]: V.shape
Out[29]: (2,)
In [30]: M.shape
Out[30]: (2, 2)
So np.dot(V,M) is like matrix multiplication of a (2,)-shaped matrix with a (2,2)-shaped matrix, which should result in a (2,)-shaped matrix.
The last (and only) axis of V and the second-to-last axis of M (aka the first axis of M) are multiplied and summed over, leaving only the last axis of M.
If you want to visualize this: np.dot(V, M) looks as though V has 1 row and 2 columns:
[[0, 1]] * [[0, 1],
[2, 3]]
and so, when V is multiplied by M, np.dot(V, M) equals
[[0*0 + 1*2], [2,
[0*1 + 1*3]] = 3]
However, I don't really recommend trying to visualize NumPy arrays this way -- at least I never do. I focus almost exclusively on the shape.
(2,) * (2,2)
\ /
\ /
(2,)
You just think about the "middle" axes being dotted, and disappearing from the resultant shape.
np.sum(arr, axis=0) tells NumPy to sum the elements in arr eliminating the 0th axis. If arr is 2-dimensional, the 0th axis are the rows. So for example, if arr looks like this:
In [1]: arr = np.arange(6).reshape(2,3); arr
Out[1]:
array([[0, 1, 2],
[3, 4, 5]])
then np.sum(arr, axis=0) will sum along the columns, thus eliminating the 0th axis (i.e. the rows).
In [2]: np.sum(arr, axis=0)
Out[2]: array([3, 5, 7])
The 3 is the result of 0+3, the 5 equals 1+4, the 7 equals 2+5.
Notice arr had shape (2,3), and after summing, the 0th axis is removed so the result is of shape (3,). The 0th axis had length 2, and each sum is composed of adding those 2 elements. The shape (2,3) "becomes" (3,). You can know the resultant shape in advance! This can help guide your thinking.
To test your understanding, consider np.sum(arr, axis=1). Now the 1-axis is removed. So the resultant shape will be (2,), and element in the result will be the sum of 3 values.
In [3]: np.sum(arr, axis=1)
Out[3]: array([ 3, 12])
The 3 equals 0+1+2, and the 12 equals 3+4+5.
So we see that summing an axis eliminates that axis from the result. This has bearing on np.dot, since the calculation performed by np.dot is a sum of products. Since np.dot performs a summing operation along certain axes, that axis is removed from the result. That is why applying np.dot to arrays of shape (2,) and (2,2) results in an array of shape (2,). The first 2 in both arrays is summed over, eliminating both, leaving only the second 2 in the second array.
In your case,
A is a 2D array, namely a matrix, with its shape being (2, 3). From docstring of numpy.matrix:
A matrix is a specialized 2-D array that retains its 2-D nature through operations.
numpy.rank return the number of dimensions of an array, which is quite different from the concept of rank in linear algebra, e.g. A is an array of dimension/rank 2.
np.dot(V, M), or V.dot(M) multiplies matrix V with M. Note that numpy.dot do the multiplication as far as possible. If V is N:1 and M is N:N, V.dot(M) would raise an ValueError.
e.g.:
In [125]: a
Out[125]:
array([[1],
[2]])
In [126]: b
Out[126]:
array([[2, 3],
[1, 2]])
In [127]: a.dot(b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-127-9a1f5761fa9d> in <module>()
----> 1 a.dot(b)
ValueError: objects are not aligned
EDIT:
I don't understand the difference between Shape of (N,) and (N,1) and it relates to the dot() documentation.
V of shape (N,) implies an 1D array of length N, whilst shape (N, 1) implies a 2D array with N rows, 1 column:
In [2]: V = np.arange(2)
In [3]: V.shape
Out[3]: (2,)
In [4]: Q = V[:, np.newaxis]
In [5]: Q.shape
Out[5]: (2, 1)
In [6]: Q
Out[6]:
array([[0],
[1]])
As the docstring of np.dot says:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D
arrays to inner product of vectors (without complex conjugation).
It also performs vector-matrix multiplication if one of the parameters is a vector. Say V.shape==(2,); M.shape==(2,2):
In [17]: V
Out[17]: array([0, 1])
In [18]: M
Out[18]:
array([[2, 3],
[4, 5]])
In [19]: np.dot(V, M) #treats V as a 1*N 2D array
Out[19]: array([4, 5]) #note the result is a 1D array of shape (2,), not (1, 2)
In [20]: np.dot(M, V) #treats V as a N*1 2D array
Out[20]: array([3, 5]) #result is still a 1D array of shape (2,), not (2, 1)
In [21]: Q #a 2D array of shape (2, 1)
Out[21]:
array([[0],
[1]])
In [22]: np.dot(M, Q) #matrix multiplication
Out[22]:
array([[3], #gets a result of shape (2, 1)
[5]])