No broadcasting for dot product

No broadcasting for dot product - python

I tried this simple example in Python.
import numpy as np
a = np.array([1,2,3,4])
b = np.array([20])
a + b # broadcasting takes place!
np.dot(a,b) # no broadcasting here?!
I thought np.dot also uses broadcasting, but it seems it doesn't.
I wonder why i.e. what is the philosophy behind this behavior?
Which operations in NumPy use broadcasting and which not?
Is there another version of the dot function for dot product,
which actually uses broadcasting?

The reason it doesn't broadcast is because the docs say so. However, that's not a very good, or satisfying, answer to the question. So perhaps I can shed some light on why.
The point of broadcasting is to take operators and apply them pointwise to different shapes of data without the programmer having to explicitly write for loops all the time.
print(a + b)
is way shorter and just as readable as
my_new_list = []
for a_elem, b_elem in zip(a, b):
my_new_list.append(a_elem + b_elem)
print(my_new_list)
The reason it works for +, and -, and all of those operators is, and I'm going to borrow some terminology from J here, that they're rank 0. What that means is that, in the absence of any broadcasting rules, + is intended to operate on scalars, i.e. ordinary numbers. The original point of the + operator is to act on numbers, so Numpy comes along and extends that rank 0 behavior to higher ranks, allowing it to work on vectors (rank 1) and matrices (rank 2) and tensors (rank 3 and beyond). Again, I'm borrowing J terminology here, but the concept is the same in Numpy.
Now, the fundamental difference is that dot doesn't work that way. The dot product function, in Numpy at least, is already special-cased to do different things for different rank arguments. For rank 1 vectors, it performs an inner product, what we usually call a "dot product" in a beginner calculus course. For rank 2 vectors, it acts like matrix multiplication. For higher-rank vectors, it's an appropriate generalization of matrix multiplication that you can read about in the docs linked above. But the point is that dot already works for all ranks. It's not an atomic operation, so we can't meaningfully broadcast it.
If dot was specialized to only work on rank 1 vectors, and it only performed the beginner calculus rank 1 inner product, then we would call it a rank 1 operator, and it could be broadcast over higher-rank tensors. So, for instance, this hypothetical dot function, which is designed to work on two arguments, each of shape (n,), could be applied to two arguments of shape (n, m) and (n, m), where the operation would be applied pointwise to each row. But Numpy's dot has different behavior. They decided (and probably rightly so) that dot should handle its own "broadcasting"-like behavior because it can do something smarter than just apply the operation pointwise.

Your 2 arrays and their shapes:
In [21]: a = np.array([1,2,3,4])
...: b = np.array([20])
In [22]: a.shape, b.shape
Out[22]: ((4,), (1,))
By rules of broadcasting, for a binary operator like times or add, the (1,) broadcasts to (4,), and it does element-wise operation:
In [23]: a*b
Out[23]: array([20, 40, 60, 80])
dot raises this error:
In [24]: np.dot(a,b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 np.dot(a,b)
File <__array_function__ internals>:5, in dot(*args, **kwargs)
ValueError: shapes (4,) and (1,) not aligned: 4 (dim 0) != 1 (dim 0)
For 1d arrays dot expects an exact match in shapes; as in np.dot(a,a) to the 'dot product' of a - sum of its elements squared. It does not expand the (1,) to (4,) as with broadcasting. And that fits the usual expectations of a linear algebra inner product. Similarly with 2d, a (n,m) works with a (m,k) to produce a (n,k). The last of A must match the 2nd to the last of B. Again basic matrix multiplication action. It does a sum-of-products on the shared m dimension.
Expanding a to (4,1), allows it to pair with the (1,) to produce a (4,). That's not broadcasting. The 1 is the sum-of-products dimension.
In [25]: np.dot(a[:,None],b)
Out[25]: array([20, 40, 60, 80])
dot also works with a scalar b - again that's documented.
In [26]: np.dot(a,20)
Out[26]: array([20, 40, 60, 80])
np.dot docs mention the np.matmul/# alternative several times. matmul behaves the same for 1 and 2d, though its explanation is bit different. It doesn't accept the scalar argument.

I think the simple answer in less-technical terms is that array broadcasting only makes sense for element-wise operations such as +, -, *, /, **.
Maybe this is what they mean by "arithmetic operations" in the documentation:
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations
I agree it would be nice if they were more explicit about which operators allow broadcasting.
The important characteristic of element-wise operations is that both arrays must be the same size. This makes broadcasting behaviour easier to predict because it should always do the obvious thing to make the sizes match.
For operators that take a and b of different sizes, it may not be clear at all what broadcasting should do. Indeed, there may be more than one possible expected result that may seem obvious.
For example,
a = np.array([[1, 2, 3]])
b = np.array([[10], [20], [30]])
print(a + b)
# [[11 12 13]
# [21 22 23]
# [31 32 33]]
This is quite clear.
But what should the result be if np.dot used broadcasting?:
np.dot(a, b)
# array([[140]]) # this is the actual result
# or
np.dot(a, np.repeat(b, 3, 1))
# array([[140, 140, 140]]) # with broadcasting of b
# or
np.dot(np.repeat(a, 3, 0), b)
# array([[140],
# [140],
# [140]]) # with broadcasting of a
# or
np.dot(np.repeat(a, 3, 0), np.repeat(b, 3, 1))
# array([[140, 140, 140],
# [140, 140, 140],
# [140, 140, 140]]) # with broadcasting of both

Related

Numpy vector and matrix addition

What is the best way to perform normal vector addition, where one of the operands is an n x 1 matrix?
Why do I care? Sometimes, a function that should return a vector returns an n x 1 matrix (because the function would equivalently work element-wise on a matrix). When I want to further work with the returned "vector", I always have to reshape - there must be a better way.
For example:
v = np.zeros(shape=(2,1))
w = np.array([1,1])
print('{}, {}'.format(v.shape,w.shape))
Prints: (2, 1), (2,)
print(v+w)
[[1. 1.]
[1. 1.]]
print(v+w.reshape((2,1)))
[[1.]
[1.]] (the desired output!)

Sounds a bit like you are coming from MATLAB where everything is 2d (scalars size is (1,1)) and the trailing dimension is outermost. Or a linear algebra that treats 'vectors' as single column matrices.
In numpy, 0 and 1d arrays are just a normal as 2d. A shape like (n,) is common. By the rules of broadcasting adding a leading dimension is automatic (1,n), but adding a trailing dimension requires user action. That a[:,None] is most idiomatic, though not the only option.
The v+w broadcasting logic is
(2,1) + (2,) => (2,1) + (1,2) => (2,2)
The auto-leading logic avoids ambiguity (what should happen if you try to add a (2,) to a (3,)?). And since leading dimensions are 'outer-most' it makes most sense to expand in that direction. MATLAB on the other hand 'naturally' expends and contracts the trailing dimensions.
So to some degree or other a (n,1) shape is more awkward in numpy, though it is still relatively easy to create.
Another example of an auto leading dimension:
In [129]: np.atleast_2d(np.arange(3)).shape
Out[129]: (1, 3)
On the other hand expand_dims lets us add dimensions all over the place
In [132]: np.expand_dims(np.arange(3),(0,2,3)).shape
Out[132]: (1, 3, 1, 1)

If w has the desired shape of the result (e.g. (2,)), and v has the same size (e.g. (2,1), or (2,)), this is safe and easy:
w + v.reshape(w.shape)
Less generally, if all you want is to get rid of the last dimension, knowing it is of length 1, you can do:
w + v[..., 0]

understand numpy.dot with N-d array and 1-d array [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
To who voted to close because of unclear what I'm asking, here are the questions in my post:
Can anyone tell me what's the result of y?
Is there anything called sum product in Mathematics?
Is x subject to broadcasting?
Why is y a column/row vector?
What if x=np.array([[7],[2],[3]])?
w=np.array([[1,2,3],[4,5,6],[7,8,9]])
x=np.array([7,2,3])
y=np.dot(w,x)
Can anyone tell me what's the result of y?
I deliberately Mosaic the screenshot so that you pretend you are in a test and cannot run python to get the result.
https://docs.scipy.org/doc/numpy-1.15.4/reference/generated/numpy.dot.html#numpy.dot says
If a is an N-D array and b is a 1-D array, it is a sum product over
the last axis of a and b.
Is there anything called sum product in Mathematics?
Is x subject to broadcasting?
Why is y a column/row vector?
What if x=np.array([[7],[2],[3]])?

np.dot is nothing but matrix multiplication if the dimensions match for multiplication (i.e. w is 3x3 and x is 1x3, so matrix multiplication of WX cannot be made but XW is okay). In first case:
>>> w=np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> x=np.array([7,2,3])
>>> w.shape
(3, 3)
>>> x.shape # 1d vector
(3, )
So in this case it returns the inner product of each row of W with X:
>>> [np.dot(ww,x) for ww in w]
[20, 56, 92]
>>> np.dot(w,x)
array([20, 56, 92]) # as they are both same
change the order
>>> y = np.dot(x,w) # matrix mult as usual
>>> y
array([36, 48, 60])
In second case:
>>> x=np.array([[7],[2],[3]])
>>> x.shape
(3, 1)
>>> y = np.dot(w,x) # matrix mult
>>> y
array([[20],
[56],
[92]])
However, this time dimensions does not match for both multiplication (3x1,3x3) and inner product (1x1,1x3) so it raises error.
>>> y = np.dot(x,w)
Traceback (most recent call last):
File "<ipython-input-110-dcddcf3bedd8>", line 1, in <module>
y = np.dot(x,w)
ValueError: shapes (3,1) and (3,3) not aligned: 1 (dim 1) != 3 (dim 0)

I don't think your question is unclear, but rather overly pedantic.
For example, why are you puzzled by sum product in this nD by 1d case, when the docs use inner product for the 1d by 1d case, and matrix product in the 2d by 2d case? Give yourself some freedom to read it as sum of the products, as done in the inner product.
To make your example clearer, make w rectangular, to better distinguish row actions from column ones:
In [168]: w=np.array([[1,2,3],[4,5,6]])
...: x=np.array([7,2,3])
...:
...:
In [169]: w.shape
Out[169]: (2, 3)
In [170]: x.shape
Out[170]: (3,)
The dot and its equivalent einstein notation:
In [171]: np.dot(w,x)
Out[171]: array([20, 56])
In [172]: np.einsum('ij,j->i',w,x)
Out[172]: array([20, 56])
The sum of the products is being done on the repeated j dimension, without summation on i.
We can do the same thing with broadcasted elementwise multiplication:
In [173]: (w*x[None,:]).sum(axis=1)
Out[173]: array([20, 56])
While this equivalent operation does use broadcasting, it's better not to think of dot in those terms.
matmul gives another description of the same action, adding a dimension to x to form a 2d by 2d matrix product, followed by a squeeze to remove the extra dimension. I don't think dot does that under the covers, but the result is the same.
This may also be called matrix vector multiplication, provided you don't insist on calling the 1d x a row vector or column vector.
Now for a 2d x, with shape (3,1):
In [175]: x2 = x[:,None]
In [176]: x2
Out[176]:
array([[7],
[2],
[3]])
In [177]: x2.shape
Out[177]: (3, 1)
In [178]: np.dot(w,x2)
Out[178]:
array([[20],
[56]])
In [179]: np.einsum('ij,jk->ik',w,x2)
Out[179]:
array([[20],
[56]])
The sum is over j, the last axis of w, and 2nd to the last of x. To do the same with elementwise we have to use broadcasting to generate a 3d outer product, and then do the sum to reduce the dimension back to 2.
In [180]: (w[:,:,None]*x2[None,:,:]).sum(axis=1)
Out[180]:
array([[20],
[56]])
In this example a (2,3) dot (3,1) => (2,1). That's perfectly normal matrix product behavior. In the first (2,3) dot (3,) => (2,). To me this is a logical generalization. (3,) dot (3,) => scalar (as opposed to ()` is a bit more of a special case.
I suspect the first case is mainly a problem for people who see a (3,) shape and think (1,3), a row-vector. (2,3) dot (1,3) doesn't work, because of the mismatch between the 3 and the 1.

Differences between array class and matrix class in numpy for matrix operation

I was trying to do matrix dot product and transpose with Numpy, and I found array can do many things matrix can do, such as dot product, point wise product, and transpose.
When I have to create a matrix, I have to create an array first.
example:
import numpy as np
array = np.ones([3,1])
matrix = np.matrix(array)
Since I can do matrix transpose and dot product in array type, I don't have to convert array into matrix to do matrix operations.
For example, the following line is valid, where A is an ndarray :
dot_product = np.dot(A.T, A )
The previous matrix operation can be expressed with matrix class variable A
dot_product = A.T * A
The operator * is exactly the same as point-wise product for ndarray. Therefore, it makes ndarray and matrix almost indistinguishable and causes confusions.
The confusion is a serious problem, as said in REP465
Writing code using numpy.matrix also works fine. But trouble begins as
soon as we try to integrate these two pieces of code together. Code
that expects an ndarray and gets a matrix, or vice-versa, may crash or
return incorrect results. Keeping track of which functions expect
which types as inputs, and return which types as outputs, and then
converting back and forth all the time, is incredibly cumbersome and
impossible to get right at any scale.
It will be very tempting if we stick to ndarray and deprecate matrix and support ndarray with matrix operation methods such as .inverse(), .hermitian(), outerproduct(), etc, in the future.
The major reason I still have to use matrix class is that it handles 1d array as 2d array, so I can transpose it.
It is very inconvenient so far how I transpose 1d array, since 1d array of size n has shape (n,) instead of (1,n). For example, if I have to do the inner product of two arrays :
A = [[1,1,1],[2,2,2].[3,3,3]]
B = [[1,2,3],[1,2,3],[1,2,3]]
np.dot(A,B) works fine, but if
B = [1,1,1]
,its transpose is still a row vector.
I have to handle this exception when the dimensions of input variable is unknown.
I hope this help some people with the same trouble, and hope to know if there is any better way to handle matrix operation like in Matlab, especially with 1d array. Thanks.

Your first example is a column vector:
In [258]: x = np.arange(3).reshape(3,1)
In [259]: x
Out[259]:
array([[0],
[1],
[2]])
In [260]: xm = np.matrix(x)
dot produces the inner product, and dimensions operate as: (1,2),(2,1)=>(1,1)
In [261]: np.dot(x.T, x)
Out[261]: array([[5]])
the matrix product does the same thing:
In [262]: xm.T * xm
Out[262]: matrix([[5]])
(The same thing with 1d arrays produces a scalar value, np.dot([0,1,2],[0,1,2]) # 5)
element multiplication of the arrays produces the outer product (so does np.outer(x, x) and np.dot(x,x.T))
In [263]: x.T * x
Out[263]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
For ndarray, * IS element wise multiplication (the .* of MATLAB, but with broadcasting added). For element multiplication of matrix use np.multiply(xm,xm). (scipy sparse matrices have a multiply method, X.multiply(other))
You quote from the PEP that added the # operator (matmul). This, as well as np.tensordot and np.einsum can handle larger dimensional arrays, and other mixes of products. Those don't make sense with np.matrix since that's restricted to 2d.
With your 3x3 A and B
In [273]: np.dot(A,B)
Out[273]:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
In [274]: C=np.array([1,1,1])
In [281]: np.dot(A,np.array([1,1,1]))
Out[281]: array([3, 6, 9])
Effectively this sums each row. np.dot(A,np.array([1,1,1])[:,None]) does the same thing, but returns a (3,1) array.
np.matrix was created years ago to make numpy (actually one of its predecessors) feel more like MATLAB. A key feature is that it is restricted to 2d. That's what MATLAB was like back in the 1990s. np.matrix and MATLAB don't have 1d arrays; instead they have single column or single row matrices.
If the fact that ndarrays can be 1d (or even 0d) is a problem there are many ways of adding that 2nd dimension. I prefer the [None,:] kind of syntax, but reshape is also useful. ndmin=2, np.atleast_2d, np.expand_dims also work.
np.sum and other operations that reduced dimensions have a keepdims=True parameter to counter that. The new # gives an operator syntax for matrix multiplication. As far as I know, np.matrix class does not have any compiled code of its own.
============
The method that implements * for np.matrix uses np.dot:
def __mul__(self, other):
if isinstance(other, (N.ndarray, list, tuple)) :
# This promotes 1-D vectors to row vectors
return N.dot(self, asmatrix(other))
if isscalar(other) or not hasattr(other, '__rmul__') :
return N.dot(self, other)
return NotImplemented

NumPy - What is broadcasting?

I don't understand broadcasting. The documentation explains the rules of broadcasting but doesn't seem to define it in English. My guess is that broadcasting is when NumPy fills a smaller dimensional array with dummy data in order to perform an operation. But this doesn't work:
>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
The error message hints that I'm on the right track, though. Can someone define broadcasting and then provide some simple examples of when it works and when it doesn't?

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
It's basically a way numpy can expand the domain of operations over arrays.
The only requirement for broadcasting is a way aligning array dimensions such that either:
Aligned dimensions are equal.
One of the aligned dimensions is 1.
So, for example if:
x = np.ndarray(shape=(4,1,3))
y = np.ndarray(shape=(3,3))
You could not align x and y like so:
4 x 1 x 3
3 x 3
But you could like so:
4 x 1 x 3
3 x 3
How would an operation like this result?
Suppose we have:
x = np.ndarray(shape=(1,3), buffer=np.array([1,2,3]),dtype='int')
array([[1, 2, 3]])
y = np.ndarray(shape=(3,3), buffer=np.array([1,1,1,1,1,1,1,1,1]),dtype='int')
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
The operation x + y would result in:
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I hope you caught the drift. If you did not, you can always check the official documentation here.
Cheers!

1.What is Broadcasting?
Broadcasting is a Tensor operation. Helpful in Neural Network (ML, AI)
2.What is the use of Broadcasting?
Without Broadcasting addition of only identical Dimension(shape) Tensors is supported.
Broadcasting Provide us the Flexibility to add two Tensors of Different Dimension.
for Example: adding a 2D Tensor with a 1D Tensor is not possible without broadcasting see the image explaining Broadcasting pictorially
Run the Python example code understand the concept
x = np.array([1,3,5,6,7,8])
y = np.array([2,4,5])
X=x.reshape(2,3)
x is reshaped to get a 2D Tensor X of shape (2,3), and adding this 2D Tensor X with 1D Tensor y of shape(1,3) to get a 2D Tensor z of shape(2,3)
print("X =",X)
print("\n y =",y)
z=X+y
print("X + y =",z)
You are almost correct about smaller Tensor, no ambiguity, the smaller tensor will be broadcasted to match the shape of the larger tensor.(Small vector is repeated but not filled with Dummy Data or Zeros to Match the Shape of larger).
3. How broadcasting happens?
Broadcasting consists of two steps:
1 Broadcast axes are added to the smaller tensor to match the ndim of
the larger tensor.
2 The smaller tensor is repeated alongside these new axes to match the full shape
of the larger tensor.
4. Why Broadcasting not happening in your code?
your code is working but Broadcasting can not happen here because both Tensors are different in shape but Identical in Dimensional(1D).
Broadcasting occurs when dimensions are nonidentical.
what you need to do is change Dimension of one of the Tensor, you will experience Broadcasting.
5. Going in Depth.
Broadcasting(repetition of smaller Tensor) occurs along broadcast axes but since both the Tensors are 1 Dimensional there is no broadcast Axis.
Don't Confuse Tensor Dimension with the shape of tensor,
Tensor Dimensions are not same as Matrices Dimension.

Broadcasting is numpy trying to be smart when you tell it to perform an operation on arrays that aren't the same dimension. For example:
2 + np.array([1,3,5]) == np.array([3, 5, 7])
Here it decided you wanted to apply the operation using the lower dimensional array (0-D) on each item in the higher-dimensional array (1-D).
You can also add a 0-D array (scalar) or 1-D array to a 2-D array. In the first case, you just add the scalar to all items in the 2-D array, as before. In the second case, numpy will add row-wise:
In [34]: np.array([1,2]) + np.array([[3,4],[5,6]])
Out[34]:
array([[4, 6],
[6, 8]])
There are ways to tell numpy to apply the operation along a different axis as well. This can be taken even further with applying an operation between a 3-D array and a 1-D, 2-D, or 0-D array.

>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
Broadcasting is how numpy do math operations with array of different shapes. Shapes are the format the array has, for example the array you used, x , has 3 elements of 1 dimension; y has 2 elements and 1 dimension.
To perform broadcasting there are 2 rules:
1) Array have the same dimensions(shape) or
2)The dimension that doesn't match equals one.
for example x has shape(2,3) [or 2 lines and 3 columns];
y has shape(2,1) [or 2 lines and 1 column]
Can you add them? x + y?
Answer: Yes, because the mismatched dimension is equal to 1 (the column in y). If y had shape(2,4) broadcasting would not be possible, because the mismatched dimension is not 1.
In the case you posted:
operands could not be broadcast together with shapes (3,) (2,);
it is because 3 and 2 mismatched altough both have 1 line.

I would like to suggest to try the np.broadcast_arrays, run some demos may give intuitive ideas. Official Document is also helpful. From my current understanding, numpy will compare the dimension from tail to head. If one dim is 1, it will broadcast in the dimension, if one array has more axes, such (256*256*3) multiply (1,), you can view (1) as (1,1,1). And broadcast will make (256,256,3).

Matrix multiplication problems - Numpy vs Matlab?

I am trying to translate some Matlab code I have into Python (using numpy). I have the following Matlab code:
(1/x)*eye(2)
X is simply 1000000. As I understand, * in Matlab indicates matrix multiplication, and the equivalent is .dot in numpy. So in Python, I have:
numpy.array([(1/x)]).dot(numpy.identity(2))
I get the error "shapes (1,) and (2,2) not aligned: 1 (dim 0) != 2 (dim 0)" when I try to run the numpy code.
Apparently I'm not understanding something. Anybody know what the proper numpy code would be?

Since x is a scalar, if you multiply a matrix by a scalar in MATLAB it simply scales all of the entries by that value. There is no need for matrix multiplication.
If you want to achieve the same thing in numpy, you do the same operation as in MATLAB:
(1/x)*numpy.identity(2)
If x is a matrix of compatible dimensions, then yes you use numpy.dot:
(1/x).dot(numpy.identity(2))
As such, you need to make sure that you know what x is before you decide to do the operation.
numpy performs element-wise multiplication by using the * operator and so if you want actual matrix multiplication, yes use numpy.dot. You are getting incompatible dimensions because true matrix multiplication between a scalar and matrix is not possible.

Basically in numpy operations * and dot are different.
(*) performs element wise operation – each matrix element with other matrix corresponding element
a.dot(c) – performs actual mathematical matrix multiplication, which we studied in our highschool.
a = np.arange(9).reshape(3,3)
b = np.arange(10,19).reshape(3,3)
In [47]: a*b
Out[47]:
array([[ 0, 11, 24],
[ 39, 56, 75],
[ 96, 119, 144]])
In [48]: a.dot(b)
Out[48]:
array([[ 45, 48, 51],
[162, 174, 186],
[279, 300, 321]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.