Unexpected result with numpy.dot

Unexpected result with numpy.dot - python

I have two matrices:
>>> a.shape
(100, 3, 1)
>>> b.shape
(100, 3, 3)
I'd like to perform a dot product such that my end result is (100, 3, 1). However, currently I receive:
>>> c = np.dot(b, a)
>>> c.shape
(100, 3, 100, 1)
Could somebody explain what is happening? I'm reading the docs and can't figure it out.
Edit:
So per the docs (overlooked it):
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
And that gives the desired result, but I'm still curious, what is taking place here? What rule of the np.dot function is being applied to yield a result of (100, 3, 100, 1)?

This is how dot works in your case:
dot(b, a)[i,j,k,m] = sum(b[i,j,:] * a[k,:,m])
Your output shape is exactly how the docs specify it:
(b.shape[0], b.shape[1], a.shape[0], a.shape[2])
If it's not what you expected, you might be looking for another matrix multiplication.

dot will return all the possible products of the matrices stored in the last two dimensions of your arrays. Use matmul aka the # operator to broadcast the leading dimensions instead of combining them:
np.matmul(b, a)
Or
b # a
The Swiss Army knife of sum-products is einsum, so you can use that too:
np.einsum('aij,ajk->aik', b, a)
Or
np.einsum('ajk,aij->aik', a, b)

Related

No broadcasting for dot product

I tried this simple example in Python.
import numpy as np
a = np.array([1,2,3,4])
b = np.array([20])
a + b # broadcasting takes place!
np.dot(a,b) # no broadcasting here?!
I thought np.dot also uses broadcasting, but it seems it doesn't.
I wonder why i.e. what is the philosophy behind this behavior?
Which operations in NumPy use broadcasting and which not?
Is there another version of the dot function for dot product,
which actually uses broadcasting?

The reason it doesn't broadcast is because the docs say so. However, that's not a very good, or satisfying, answer to the question. So perhaps I can shed some light on why.
The point of broadcasting is to take operators and apply them pointwise to different shapes of data without the programmer having to explicitly write for loops all the time.
print(a + b)
is way shorter and just as readable as
my_new_list = []
for a_elem, b_elem in zip(a, b):
my_new_list.append(a_elem + b_elem)
print(my_new_list)
The reason it works for +, and -, and all of those operators is, and I'm going to borrow some terminology from J here, that they're rank 0. What that means is that, in the absence of any broadcasting rules, + is intended to operate on scalars, i.e. ordinary numbers. The original point of the + operator is to act on numbers, so Numpy comes along and extends that rank 0 behavior to higher ranks, allowing it to work on vectors (rank 1) and matrices (rank 2) and tensors (rank 3 and beyond). Again, I'm borrowing J terminology here, but the concept is the same in Numpy.
Now, the fundamental difference is that dot doesn't work that way. The dot product function, in Numpy at least, is already special-cased to do different things for different rank arguments. For rank 1 vectors, it performs an inner product, what we usually call a "dot product" in a beginner calculus course. For rank 2 vectors, it acts like matrix multiplication. For higher-rank vectors, it's an appropriate generalization of matrix multiplication that you can read about in the docs linked above. But the point is that dot already works for all ranks. It's not an atomic operation, so we can't meaningfully broadcast it.
If dot was specialized to only work on rank 1 vectors, and it only performed the beginner calculus rank 1 inner product, then we would call it a rank 1 operator, and it could be broadcast over higher-rank tensors. So, for instance, this hypothetical dot function, which is designed to work on two arguments, each of shape (n,), could be applied to two arguments of shape (n, m) and (n, m), where the operation would be applied pointwise to each row. But Numpy's dot has different behavior. They decided (and probably rightly so) that dot should handle its own "broadcasting"-like behavior because it can do something smarter than just apply the operation pointwise.

Your 2 arrays and their shapes:
In [21]: a = np.array([1,2,3,4])
...: b = np.array([20])
In [22]: a.shape, b.shape
Out[22]: ((4,), (1,))
By rules of broadcasting, for a binary operator like times or add, the (1,) broadcasts to (4,), and it does element-wise operation:
In [23]: a*b
Out[23]: array([20, 40, 60, 80])
dot raises this error:
In [24]: np.dot(a,b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 np.dot(a,b)
File <__array_function__ internals>:5, in dot(*args, **kwargs)
ValueError: shapes (4,) and (1,) not aligned: 4 (dim 0) != 1 (dim 0)
For 1d arrays dot expects an exact match in shapes; as in np.dot(a,a) to the 'dot product' of a - sum of its elements squared. It does not expand the (1,) to (4,) as with broadcasting. And that fits the usual expectations of a linear algebra inner product. Similarly with 2d, a (n,m) works with a (m,k) to produce a (n,k). The last of A must match the 2nd to the last of B. Again basic matrix multiplication action. It does a sum-of-products on the shared m dimension.
Expanding a to (4,1), allows it to pair with the (1,) to produce a (4,). That's not broadcasting. The 1 is the sum-of-products dimension.
In [25]: np.dot(a[:,None],b)
Out[25]: array([20, 40, 60, 80])
dot also works with a scalar b - again that's documented.
In [26]: np.dot(a,20)
Out[26]: array([20, 40, 60, 80])
np.dot docs mention the np.matmul/# alternative several times. matmul behaves the same for 1 and 2d, though its explanation is bit different. It doesn't accept the scalar argument.

I think the simple answer in less-technical terms is that array broadcasting only makes sense for element-wise operations such as +, -, *, /, **.
Maybe this is what they mean by "arithmetic operations" in the documentation:
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations
I agree it would be nice if they were more explicit about which operators allow broadcasting.
The important characteristic of element-wise operations is that both arrays must be the same size. This makes broadcasting behaviour easier to predict because it should always do the obvious thing to make the sizes match.
For operators that take a and b of different sizes, it may not be clear at all what broadcasting should do. Indeed, there may be more than one possible expected result that may seem obvious.
For example,
a = np.array([[1, 2, 3]])
b = np.array([[10], [20], [30]])
print(a + b)
# [[11 12 13]
# [21 22 23]
# [31 32 33]]
This is quite clear.
But what should the result be if np.dot used broadcasting?:
np.dot(a, b)
# array([[140]]) # this is the actual result
# or
np.dot(a, np.repeat(b, 3, 1))
# array([[140, 140, 140]]) # with broadcasting of b
# or
np.dot(np.repeat(a, 3, 0), b)
# array([[140],
# [140],
# [140]]) # with broadcasting of a
# or
np.dot(np.repeat(a, 3, 0), np.repeat(b, 3, 1))
# array([[140, 140, 140],
# [140, 140, 140],
# [140, 140, 140]]) # with broadcasting of both

Python - matrix multiplication

i have an array y with shape (n,), I want to compute the inner product matrix, which is a n * n matrix
However, when I tried to do it in Python
np.dot(y , y)
I got the answer n, this is not what I am looking for
I have also tried:
np.dot(np.transpose(y),y)
np.dot(y, np.transpose(y))
I always get the same answer n

I think you are looking for:
np.multiply.outer(y,y)
or equally:
y = y[None,:]
y.T#y
example:
y = np.array([1,2,3])[None,:]
output:
#[[1 2 3]
# [2 4 6]
# [3 6 9]]

You can try to reshape y from shape (70,) to (70,1) before multiplying the 2 matrices.
# Reshape
y = y.reshape(70,1)
# Either below code would work
y*y.T
np.matmul(y,y.T)

One-liner?
np.dot(a[:, None], a[None, :])
transpose doesn't work on 1-D arrays, because you need atleast two axes to 'swap' them. This solution adds a new axis to the array; in the first argument, it looks like a column vector and has two axes; in the second argument it still looks like a row vector but has two axes.

Looks like what you need is the # matrix multiplication operator. dot method is only to compute dot product between vectors, what you want is matrix multiplication.
>>> a = np.random.rand(70, 1)
>>> (a # a.T).shape
(70, 70)
UPDATE:
Above answer is incorrect. dot does the same things if the array is 2D. See the docs here.
np.dot computes the dot product of two arrays. Specifically,
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
Simplest way to do what you want is to convert the vector to a matrix first using np.matrix and then using the #. Although, dot can also be used # is better because conventionally dot is used for vectors and # for matrices.
>>> a = np.random.rand(70)
(70,)
>>> a.shape
>>> a = np.matrix(a).T
>>> a.shape
(70, 1)
>>> (a # a.T).shape
(70, 70)

Create random numpy matrix of same size as another.

This question here was useful, but mine is slightly different.
I am trying to do something simple here, I have a numpy matrix A, and I simply want to create another numpy matrix B, of the same shape as A, but I want B to be created from numpy.random.randn() How can this be done? Thanks.

np.random.randn takes the shape of the array as its input which you can get directly from the shape property of the first array. You have to unpack a.shape with the * operator in order to get the proper input for np.random.randn.
a = np.zeros([2, 3])
print(a.shape)
# outputs: (2, 3)
b = np.random.randn(*a.shape)
print(b.shape)
# outputs: (2, 3)

Multiplying tensors containing images in numpy

I have the following 3rd order tensors. Both tensors matrices the first tensor containing 100 10x9 matrices and the second containing 100 3x10 matrices (which I have just filled with ones for this example).
My aim is to multiply the matrices as the line up one to one correspondance wise which would result in a tensor with shape: (100, 3, 9) This can be done with a for loop that just zips up both tensors and then takes the dot of each but I am looking to do this just with numpy operators. So far here are some failed attempts
Attempt 1:
import numpy as np
T1 = np.ones((100, 10, 9))
T2 = np.ones((100, 3, 10))
print T2.dot(T1).shape
Ouput of attempt 1 :
(100, 3, 100, 9)
Which means it tried all possible combinations ... which is not what I am after.
Actually non of the other attempts even compile. I tried using np.tensordot , np.einsum (read here https://jameshensman.wordpress.com/2010/06/14/multiple-matrix-multiplication-in-numpy that it is supposed to do the job but I did not get Einsteins indices correct) also in the same link there is some crazy tensor cube reshaping method that I did not manage to visualize. Any suggestions / ideas-explanations on how to tackle this ?

Did you try?
In [96]: np.einsum('ijk,ilj->ilk',T1,T2).shape
Out[96]: (100, 3, 9)
The way I figure this out is look at the shapes:
(100, 10, 9)) (i, j, k)
(100, 3, 10) (i, l, j)
-------------
(100, 3, 9) (i, l, k)
the two j sum and cancel out. The others carry to the output.
For 4d arrays, with dimensions like (100,3,2,24 ) there are several options:
Reshape to 3d, T1.reshape(300,2,24), and after reshape back R.reshape(100,3,...). Reshape is virtually costless, and a good numpy tool.
Add an index to einsum: np.einsum('hijk,hilj->hilk',T1,T2), just a parallel usage to that of i.
Or use elipsis: np.einsum('...jk,...lj->...lk',T1,T2). This expression works with 3d, 4d, and up.

Why does numpy.dot behave in this way?

I'm trying to understand why numpy's dot function behaves as it does:
M = np.ones((9, 9))
V1 = np.ones((9,))
V2 = np.ones((9, 5))
V3 = np.ones((2, 9, 5))
V4 = np.ones((3, 2, 9, 5))
Now np.dot(M, V1) and np.dot(M, V2) behave as
expected. But for V3 and V4 the result surprises
me:
>>> np.dot(M, V3).shape
(9, 2, 5)
>>> np.dot(M, V4).shape
(9, 3, 2, 5)
I expected (2, 9, 5) and (3, 2, 9, 5) respectively. On the other hand, np.matmul
does what I expect: the matrix multiply is broadcast
over the first N - 2 dimensions of the second argument and
the result has the same shape:
>>> np.matmul(M, V3).shape
(2, 9, 5)
>>> np.matmul(M, V4).shape
(3, 2, 9, 5)
So my question is this: what is the rationale for
np.dot behaving as it does? Does it serve some particular purpose,
or is it the result of applying some general rule?

From the docs for np.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
For np.dot(M, V3),
(9, 9), (2, 9, 5) --> (9, 2, 5)
For np.dot(M, V4),
(9, 9), (3, 2, 9, 5) --> (9, 3, 2, 5)
The strike-through represents dimensions that are summed over, and are therefore not present in the result.
In contrast, np.matmul treats N-dimensional arrays as 'stacks' of 2D matrices:
The behavior depends on the arguments in the following way.
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
The same reductions are performed in both cases, but the order of the axes is different. np.matmul essentially does the equivalent of:
for ii in range(V3.shape[0]):
out1[ii, :, :] = np.dot(M[:, :], V3[ii, :, :])
and
for ii in range(V4.shape[0]):
for jj in range(V4.shape[1]):
out2[ii, jj, :, :] = np.dot(M[:, :], V4[ii, jj, :, :])

From the documentation of numpy.matmul:
matmul differs from dot in two important ways.
Multiplication by scalars is not allowed.
Stacks of matrices are broadcast together as if the matrices were elements.
In conclusion, this is the standard matrix-matrix multiplication you would expect.
On the other hand, numpy.dot is only equivalent to the matrix-matrix multiplication for two-dimensional arrays. For larger dimensions, ...
it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
[source: documentation of numpy.dot]
This resembles the inner (dot) product. In case of vectors, numpy.dot returns the dot product. Arrays are considered collections of vectors, and the dot product of them is returned.

For the why :
dot and matmult are both generalizations of 2D*2D matrix multiplication. But they are a lot of possible choices, according to mathematics properties, broadcasting rules, ...
The choices are for dot and matmul are very different:
For dot, some dimensions (green here) are dedicated to the first array ,
others (blue) for the second.
matmul need an alignement of stacks regarding to broadcasting rules.
Numpy is born in an image analysis context, and dot can manage easily some tasks by a out=dot(image(s),transformation(s)) way. (see the dot docs in early version of numpy book, p92).
As an illustration :
from pylab import *
image=imread('stackoverflow.png')
identity=eye(3)
NB=ones((3,3))/3
swap_rg=identity[[1,0,2]]
randoms=[rand(3,3) for _ in range(6)]
transformations=[identity,NB,swap_rg]+randoms
out=dot(image,transformations)
for k in range(9):
subplot(3,3,k+1)
imshow (out[...,k,:])
The modern matmul can do the same thing as the old dot, but the stack of matrix must be take in account. (matmul(image,transformations[:,None]) here).
No doubt that it is better in other contexts.

The equivalent einsum expressions are:
In [92]: np.einsum('ij,kjm->kim',M,V3).shape
Out[92]: (2, 9, 5)
In [93]: np.einsum('ij,lkjm->lkim',M,V4).shape
Out[93]: (3, 2, 9, 5)
Expressed this way, the dot equivalent, 'ij,lkjm->ilkm', looks just as natural as the 'matmul' equivalent, 'ij,lkjm->lkim'.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.