Related
I tried this simple example in Python.
import numpy as np
a = np.array([1,2,3,4])
b = np.array([20])
a + b # broadcasting takes place!
np.dot(a,b) # no broadcasting here?!
I thought np.dot also uses broadcasting, but it seems it doesn't.
I wonder why i.e. what is the philosophy behind this behavior?
Which operations in NumPy use broadcasting and which not?
Is there another version of the dot function for dot product,
which actually uses broadcasting?
The reason it doesn't broadcast is because the docs say so. However, that's not a very good, or satisfying, answer to the question. So perhaps I can shed some light on why.
The point of broadcasting is to take operators and apply them pointwise to different shapes of data without the programmer having to explicitly write for loops all the time.
print(a + b)
is way shorter and just as readable as
my_new_list = []
for a_elem, b_elem in zip(a, b):
my_new_list.append(a_elem + b_elem)
print(my_new_list)
The reason it works for +, and -, and all of those operators is, and I'm going to borrow some terminology from J here, that they're rank 0. What that means is that, in the absence of any broadcasting rules, + is intended to operate on scalars, i.e. ordinary numbers. The original point of the + operator is to act on numbers, so Numpy comes along and extends that rank 0 behavior to higher ranks, allowing it to work on vectors (rank 1) and matrices (rank 2) and tensors (rank 3 and beyond). Again, I'm borrowing J terminology here, but the concept is the same in Numpy.
Now, the fundamental difference is that dot doesn't work that way. The dot product function, in Numpy at least, is already special-cased to do different things for different rank arguments. For rank 1 vectors, it performs an inner product, what we usually call a "dot product" in a beginner calculus course. For rank 2 vectors, it acts like matrix multiplication. For higher-rank vectors, it's an appropriate generalization of matrix multiplication that you can read about in the docs linked above. But the point is that dot already works for all ranks. It's not an atomic operation, so we can't meaningfully broadcast it.
If dot was specialized to only work on rank 1 vectors, and it only performed the beginner calculus rank 1 inner product, then we would call it a rank 1 operator, and it could be broadcast over higher-rank tensors. So, for instance, this hypothetical dot function, which is designed to work on two arguments, each of shape (n,), could be applied to two arguments of shape (n, m) and (n, m), where the operation would be applied pointwise to each row. But Numpy's dot has different behavior. They decided (and probably rightly so) that dot should handle its own "broadcasting"-like behavior because it can do something smarter than just apply the operation pointwise.
Your 2 arrays and their shapes:
In [21]: a = np.array([1,2,3,4])
...: b = np.array([20])
In [22]: a.shape, b.shape
Out[22]: ((4,), (1,))
By rules of broadcasting, for a binary operator like times or add, the (1,) broadcasts to (4,), and it does element-wise operation:
In [23]: a*b
Out[23]: array([20, 40, 60, 80])
dot raises this error:
In [24]: np.dot(a,b)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 np.dot(a,b)
File <__array_function__ internals>:5, in dot(*args, **kwargs)
ValueError: shapes (4,) and (1,) not aligned: 4 (dim 0) != 1 (dim 0)
For 1d arrays dot expects an exact match in shapes; as in np.dot(a,a) to the 'dot product' of a - sum of its elements squared. It does not expand the (1,) to (4,) as with broadcasting. And that fits the usual expectations of a linear algebra inner product. Similarly with 2d, a (n,m) works with a (m,k) to produce a (n,k). The last of A must match the 2nd to the last of B. Again basic matrix multiplication action. It does a sum-of-products on the shared m dimension.
Expanding a to (4,1), allows it to pair with the (1,) to produce a (4,). That's not broadcasting. The 1 is the sum-of-products dimension.
In [25]: np.dot(a[:,None],b)
Out[25]: array([20, 40, 60, 80])
dot also works with a scalar b - again that's documented.
In [26]: np.dot(a,20)
Out[26]: array([20, 40, 60, 80])
np.dot docs mention the np.matmul/# alternative several times. matmul behaves the same for 1 and 2d, though its explanation is bit different. It doesn't accept the scalar argument.
I think the simple answer in less-technical terms is that array broadcasting only makes sense for element-wise operations such as +, -, *, /, **.
Maybe this is what they mean by "arithmetic operations" in the documentation:
The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations
I agree it would be nice if they were more explicit about which operators allow broadcasting.
The important characteristic of element-wise operations is that both arrays must be the same size. This makes broadcasting behaviour easier to predict because it should always do the obvious thing to make the sizes match.
For operators that take a and b of different sizes, it may not be clear at all what broadcasting should do. Indeed, there may be more than one possible expected result that may seem obvious.
For example,
a = np.array([[1, 2, 3]])
b = np.array([[10], [20], [30]])
print(a + b)
# [[11 12 13]
# [21 22 23]
# [31 32 33]]
This is quite clear.
But what should the result be if np.dot used broadcasting?:
np.dot(a, b)
# array([[140]]) # this is the actual result
# or
np.dot(a, np.repeat(b, 3, 1))
# array([[140, 140, 140]]) # with broadcasting of b
# or
np.dot(np.repeat(a, 3, 0), b)
# array([[140],
# [140],
# [140]]) # with broadcasting of a
# or
np.dot(np.repeat(a, 3, 0), np.repeat(b, 3, 1))
# array([[140, 140, 140],
# [140, 140, 140],
# [140, 140, 140]]) # with broadcasting of both
I suspect this is something very fundamental I don't know or understand about this code; my only excuse is that I am a complete beginner in python.
I am trying some of the cosine similarity matrix calculations from this post:
What's the fastest way in Python to calculate cosine similarity given sparse matrix data?
One of them requires the calculation of the reciprocal of the diagonal of the initial matrix product.
Say that he initial matrix is m, each row of which represents an 'object', whose 'coordinates' are in the columns of the matrix. So you want to calculate cosine similarities between rows.
Then, to use the matrix product method, you do something like mp = numpy.dot(m, m.T).
Now, if there are no rows with only 0's in m, the diagonal of mp can never have any zero values, as each of its elements is the sum of the squared elements of the corresponding row of m.
The m I am using in my calculations has indeed no rows with all 0's.
And indeed, when I do:
mp = np.dot(m, m.T)
mnorms2 = mp.diagonal()
I can easily test that:
mnorms2.min()
# 32
As I am using a sparse matrix (csr) for m, mp is also sparse, and I need only specific pairs of elements of mnorms2, which I obtain by:
mp_rows, mp_cols = mp.nonzero()
These are the indices of the elements of mnorms2 that I need to multiply together, take the square root of, and divide mp.data by.
I saw that the code in the method I was trying went through all the intermediate steps, but I thought it was only for illustration, so I tried to do it in one go instead, like:
mp.data = mp.data / numpy.sqrt(mnorms2[mp_rows] * mnorms2[mp_cols])
And this gave a division by zero error, although I know for sure that no element of mnorms2 is zero!
Worse, it did not do it systematically, but only for some m's, although in all cases these matrices had similar sparse structure and content.
In fact I even did:
denom = numpy.sqrt(mnorms2[mp_rows] * mnorms2[mp_cols])
and I found that:
denom.min()
# 0.0
How can the (element by element) product of two arrays that have no 0's have any 0's?
The only thing that worked in the end was:
inv = 1 / numpy.sqrt(mnorms2[mp_rows])
inv = inv / numpy.sqrt(mnorms2[mp_cols])
mp.data = mp.data * inv
I really don't understand why going step by step works, whereas the 'all in one go' method causes an error, as the operations should be the same in the end.
And there is clearly something strange going on, because when I try this:
mnorms2[0:5]
# array([71, 73, 77, 68, 72], dtype=uint8)
mnorms2[0:5] * mnorms2[0:5]
# array([177, 209, 41, 16, 64], dtype=uint8)
177 is not the square of 71... :/
What is going on here?
Any suggestions / ideas?
Thanks!
I think the problem is dtype
uint8 : Unsigned integer (0 to 255)
import numpy as np
mnorms2 = np.array([71, 73, 77, 68, 72], dtype='uint8')
mnorms2 * mnorms2
# array([177, 209, 41, 16, 64], dtype=uint8)
But if you change the dtype to np.float64:
mnorms2 = np.array([71, 73, 77, 68, 72], dtype=np.float64)
mnorms2 * mnorms2
# array([5041., 5329., 5929., 4624., 5184.])
To change dtype do:
mnorms2 = mnorms2.astype(np.float64)
So I have a 3D data-set (x,y,z), and i want to sum over one of the axes (x) with a set of weights, w = w(x). The start and end index i am summing over is different for every (y,z), I have solved this by masking the 3D-array. The weights are constant with regard to the two variables i am not summing over. Both answers regarding implementation and mathematics are appreciated (is there a smart linalg. way of doing this?).
I have a 3D masked array (A) of shape (x,y,z) and a 1D array (t) of shape (x,). Is there a good way to multiply every (y,z) element in A with the corresponding number in t without expanding t to a 3D array? My current solution is using np.tensordot to make a 3D array of the same shape as A, that holds all the t-values, but it feels very unsatisfactory to spend runtime building the "new_t" array, which is essensially just y*z copies of t.
Example of current solution:
a1 = np.array([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
a2 = np.array([[0,1,2,3],
[4,5,6,7],
[8,9,10,11]])
#note: A is a masked array, mask is a 3D array of bools
A = np.ma.masked_array([a1,a2],mask)
t = np.array([10,11])
new_t = np.tensordot(t, np.ones(A[0].shape), axes = 0)
return np.sum(A*new_t, axis=0)
In essence i want to perform t*A[:,i,j] for all i,j with the shortest possible runtime, preferably without using too many other libraries than numpy and scipy.
Another way of producing desired output (again, with far too high run time):
B = [[t*A[:,i,j] for j in range(A.shape[2])] for i in range(A.shape[1])]
return np.sum(B,axis=2)
inspired by #phipsgabler comment
arr1 = np.tensordot(A.T,t,axes=1).T
arr1
array([[ 10, 31, 52, 73],
[ 94, 115, 136, 157],
[178, 199, 220, 241]])
Thanks for good answers! Using tensordot like #alyhosny proposed worked, but replacing masked values with zeros using
A = np.ma.MaskedArray.filled(A,0)
before summing with einsum (thanks #phipsgabler) gave half the run time. Final code:
A = np.ma.MaskedArray(A,mask)
A = np.ma.MaskedArray.filled(A,0)
return np.einsum('ijk,i->jk',A,t)
I was trying to do matrix dot product and transpose with Numpy, and I found array can do many things matrix can do, such as dot product, point wise product, and transpose.
When I have to create a matrix, I have to create an array first.
example:
import numpy as np
array = np.ones([3,1])
matrix = np.matrix(array)
Since I can do matrix transpose and dot product in array type, I don't have to convert array into matrix to do matrix operations.
For example, the following line is valid, where A is an ndarray :
dot_product = np.dot(A.T, A )
The previous matrix operation can be expressed with matrix class variable A
dot_product = A.T * A
The operator * is exactly the same as point-wise product for ndarray. Therefore, it makes ndarray and matrix almost indistinguishable and causes confusions.
The confusion is a serious problem, as said in REP465
Writing code using numpy.matrix also works fine. But trouble begins as
soon as we try to integrate these two pieces of code together. Code
that expects an ndarray and gets a matrix, or vice-versa, may crash or
return incorrect results. Keeping track of which functions expect
which types as inputs, and return which types as outputs, and then
converting back and forth all the time, is incredibly cumbersome and
impossible to get right at any scale.
It will be very tempting if we stick to ndarray and deprecate matrix and support ndarray with matrix operation methods such as .inverse(), .hermitian(), outerproduct(), etc, in the future.
The major reason I still have to use matrix class is that it handles 1d array as 2d array, so I can transpose it.
It is very inconvenient so far how I transpose 1d array, since 1d array of size n has shape (n,) instead of (1,n). For example, if I have to do the inner product of two arrays :
A = [[1,1,1],[2,2,2].[3,3,3]]
B = [[1,2,3],[1,2,3],[1,2,3]]
np.dot(A,B) works fine, but if
B = [1,1,1]
,its transpose is still a row vector.
I have to handle this exception when the dimensions of input variable is unknown.
I hope this help some people with the same trouble, and hope to know if there is any better way to handle matrix operation like in Matlab, especially with 1d array. Thanks.
Your first example is a column vector:
In [258]: x = np.arange(3).reshape(3,1)
In [259]: x
Out[259]:
array([[0],
[1],
[2]])
In [260]: xm = np.matrix(x)
dot produces the inner product, and dimensions operate as: (1,2),(2,1)=>(1,1)
In [261]: np.dot(x.T, x)
Out[261]: array([[5]])
the matrix product does the same thing:
In [262]: xm.T * xm
Out[262]: matrix([[5]])
(The same thing with 1d arrays produces a scalar value, np.dot([0,1,2],[0,1,2]) # 5)
element multiplication of the arrays produces the outer product (so does np.outer(x, x) and np.dot(x,x.T))
In [263]: x.T * x
Out[263]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
For ndarray, * IS element wise multiplication (the .* of MATLAB, but with broadcasting added). For element multiplication of matrix use np.multiply(xm,xm). (scipy sparse matrices have a multiply method, X.multiply(other))
You quote from the PEP that added the # operator (matmul). This, as well as np.tensordot and np.einsum can handle larger dimensional arrays, and other mixes of products. Those don't make sense with np.matrix since that's restricted to 2d.
With your 3x3 A and B
In [273]: np.dot(A,B)
Out[273]:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
In [274]: C=np.array([1,1,1])
In [281]: np.dot(A,np.array([1,1,1]))
Out[281]: array([3, 6, 9])
Effectively this sums each row. np.dot(A,np.array([1,1,1])[:,None]) does the same thing, but returns a (3,1) array.
np.matrix was created years ago to make numpy (actually one of its predecessors) feel more like MATLAB. A key feature is that it is restricted to 2d. That's what MATLAB was like back in the 1990s. np.matrix and MATLAB don't have 1d arrays; instead they have single column or single row matrices.
If the fact that ndarrays can be 1d (or even 0d) is a problem there are many ways of adding that 2nd dimension. I prefer the [None,:] kind of syntax, but reshape is also useful. ndmin=2, np.atleast_2d, np.expand_dims also work.
np.sum and other operations that reduced dimensions have a keepdims=True parameter to counter that. The new # gives an operator syntax for matrix multiplication. As far as I know, np.matrix class does not have any compiled code of its own.
============
The method that implements * for np.matrix uses np.dot:
def __mul__(self, other):
if isinstance(other, (N.ndarray, list, tuple)) :
# This promotes 1-D vectors to row vectors
return N.dot(self, asmatrix(other))
if isscalar(other) or not hasattr(other, '__rmul__') :
return N.dot(self, other)
return NotImplemented
In numpy, I have two "arrays", X is (m,n) and y is a vector (n,1)
using
X*y
I am getting the error
ValueError: operands could not be broadcast together with shapes (97,2) (2,1)
When (97,2)x(2,1) is clearly a legal matrix operation and should give me a (97,1) vector
EDIT:
I have corrected this using X.dot(y) but the original question still remains.
dot is matrix multiplication, but * does something else.
We have two arrays:
X, shape (97,2)
y, shape (2,1)
With Numpy arrays, the operation
X * y
is done element-wise, but one or both of the values can be expanded in one or more dimensions to make them compatible. This operation is called broadcasting. Dimensions, where size is 1 or which are missing, can be used in broadcasting.
In the example above the dimensions are incompatible, because:
97 2
2 1
Here there are conflicting numbers in the first dimension (97 and 2). That is what the ValueError above is complaining about. The second dimension would be ok, as number 1 does not conflict with anything.
For more information on broadcasting rules: http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
(Please note that if X and y are of type numpy.matrix, then asterisk can be used as matrix multiplication. My recommendation is to keep away from numpy.matrix, it tends to complicate more than simplifying things.)
Your arrays should be fine with numpy.dot; if you get an error on numpy.dot, you must have some other bug. If the shapes are wrong for numpy.dot, you get a different exception:
ValueError: matrices are not aligned
If you still get this error, please post a minimal example of the problem. An example multiplication with arrays shaped like yours succeeds:
In [1]: import numpy
In [2]: numpy.dot(numpy.ones([97, 2]), numpy.ones([2, 1])).shape
Out[2]: (97, 1)
Per numpy docs:
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when:
they are equal, or
one of them is 1
In other words, if you are trying to multiply two matrices (in the linear algebra sense) then you want X.dot(y) but if you are trying to broadcast scalars from matrix y onto X then you need to perform X * y.T.
Example:
>>> import numpy as np
>>>
>>> X = np.arange(8).reshape(4, 2)
>>> y = np.arange(2).reshape(1, 2) # create a 1x2 matrix
>>> X * y
array([[0,1],
[0,3],
[0,5],
[0,7]])
You are looking for np.matmul(X, y). In Python 3.5+ you can use X # y.
It's possible that the error didn't occur in the dot product, but after.
For example try this
a = np.random.randn(12,1)
b = np.random.randn(1,5)
c = np.random.randn(5,12)
d = np.dot(a,b) * c
np.dot(a,b) will be fine; however np.dot(a, b) * c is clearly wrong (12x1 X 1x5 = 12x5 which cannot element-wise multiply 5x12) but numpy will give you
ValueError: operands could not be broadcast together with shapes (12,1) (1,5)
The error is misleading; however there is an issue on that line.
Use np.mat(x) * np.mat(y), that'll work.
We might confuse ourselves that a * b is a dot product.
But in fact, it is broadcast.
Dot Product :
a.dot(b)
Broadcast:
The term broadcasting refers to how numpy treats arrays with different
dimensions during arithmetic operations which lead to certain
constraints, the smaller array is broadcast across the larger array so
that they have compatible shapes.
(m,n) +-/* (1,n) → (m,n) : the operation will be applied to m rows
Convert the arrays to matrices, and then perform the multiplication.
X = np.matrix(X)
y = np.matrix(y)
X*y
we should consider two points about broadcasting.
first: what is possible.
second: how much of the possible things is done by numpy.
I know it might look a bit confusing, but I will make it clear by some example.
lets start from the zero level.
suppose we have two matrices. first matrix has three dimensions (named A) and the second has five (named B).
numpy tries to match last/trailing dimensions. so numpy does not care about the first two dimensions of B.
then numpy compares those trailing dimensions with each other. and if and only if they be equal or one of them be 1, numpy says "O.K. you two match". and if it these conditions don't satisfy, numpy would "sorry...its not my job!".
But I know that you may say comparison was better to be done in way that can handle when they are devisable(4 and 2 / 9 and 3). you might say it could be replicated/broadcasted by a whole number(2/3 in out example). and i am agree with you. and this is the reason I started my discussion with a distinction between what is possible and what is the capability of numpy.
This is because X and y are not the same types. for example X is a numpy matrix and y is a numpy array!
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
This kind of error occur when the two array does not have the same shape.
to correct this you need reshape one array to match the other.
see example below
a1 = array([1, 2, 3]), shape = (2,3)
a3 =array([[[1., 2., 3.],
[2., 3., 2.],
[2., 4., 5.]],
[[1., 0., 3.],
[2., 3., 7.],
[2., 4., 6.]]])
with shape = (2,3,3)
IF i try to run np.multiply(a2,a3) it will return the error below
Error: operands could not be broadcast together with shapes (2,3) (2,3,3)
to solve this check out the broadcating rules
which state hat Two dimensions are compatible when:
#1.they are equal, or
#2.one of them is 1`
Therefore lets reshape a2.
reshaped = a2.reshape(2,3,1)
Now try to run np.multiply(reshaped,a3)
the multiplication will run SUCCESSFUL!!
ValueError: operands could not be broadcast together with shapes (x ,y) (a ,b)
where x ,y are variables
Basically this error occurred when value of y (no. of columns) doesn't equal to the number of elements in another multidimensional array.
Now let's go through by ex=>
coding apart
import numpy as np
arr1= np.arange(12).reshape(3,
output of arr1
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
arr2= np.arange(4).reshape(1,4)
or (both are same 1 rows and 4 columns)
arr2= np.arange(4)
ouput of arr2=>
array([0, 1, 2, 3])
no of elements in arr2 is equal no of no. of the columns in arr1 it will be excute.
for x,y in np.nditer([a,b]):
print(x,y)
output =>
0 0
1 1
2 2
3 3
4 0
5 1
6 2
7 3
8 0
9 1
10 2
11 3