One can use numpy.where for selecting values from two arrays depending on a condition:
import numpy
a = numpy.random.rand(5)
b = numpy.random.rand(5)
c = numpy.where(a > 0.5, a, b) # okay
If the array has more dimensions, however, this does not work anymore:
import numpy
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a[:, 0] > 0.5, a, b) # !
Traceback (most recent call last):
File "p.py", line 10, in <module>
c = numpy.where(a[:, 0] > 0.5, a, b) # okay
File "<__array_function__ internals>", line 6, in where
ValueError: operands could not be broadcast together with shapes (5,) (5,2) (5,2)
I would have expected a numpy array of shape (5,2).
What's the issue here? How to work around it?
Remember that broadcasting in numpy only works from the right, so while (5,) shaped arrays can broadcast with (2,5) shaped arrays they can't broadcast with (5,2) shaped arrays. to broadcast with a (5,2) shaped array you need to maintain the second dimension so that the shape is (5,1) (anything can broadcast with 1)
Thus, you need to maintain the second dimension when indexing it (otherwise it removes the indexed dimension when only one value exists). You can do this by putting the index in a one-element list:
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a[:, [0]] > 0.5, a, b) # works
You can use c = numpy.where(a > 0.5, a, b)
however if you want to use only the first column of a then you need to consider the shape of the output.
let's first see what is the shape of this operation
(a[:, 0] > 0.5).shape # outputs (5,)
it's one dimensional
while the shape of a and b is (5, 2)
it's two dimensional and hence you can't broadcast this
the solution is to reshape the mask operation to be of shape (5, 1)
your code should look like this
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where((a[:, 0] > 0.5).reshape(-1, 1), a, b) # !
You can try:
import numpy
a = numpy.random.rand(5, 2)
b = numpy.random.rand(5, 2)
c = numpy.where(a > 0.5, a, b)
instead of: c = np.where(a>0.5,a,b)
you can use: c = np.array([a,b])[a>0.5]
which works for multidimensional arrays if a and b have the same shape.
Related
I have 4 arrays, A,B,C,D. A and B have shape (n,n) and C/D have shape (n,n,m). I am trying to set it up so that when an element of A is greater than B, that array of length m belongs to C. In essence
C_new = np.where(A > B, C,D) , D_new = np.where(A < B , D, C). However this gives me a value error (operands could not be broadcast together with shapes)
I am curious if I can use where here instead of just looping through each element?
Edit: example:
A = np.ones((2,2))
B = 2*np.eye(2)
C = np.ones((2,2,3))
D = np.zeros((2,2,3))
# Cnew = np.where(A > B, C,D)-> ValueError: operands could not be broadcast together with shapes (2,2) (2,2,3) (2,2,3)
The Cnew would be zeros in the (0,0) and (1,1) index.
You need to add a new axis at the end of the condition in order for it to broadcast correctly:
C_new = np.where((A > B)[..., np.newaxis], C, D)
D_new = np.where((A < B)[..., np.newaxis], D, C)
Suppose I have a 5x10x3 array, which I interpret as 5 'sub-arrays', each consisting of 10 rows and 3 columns. I also have a seperate 1D array of length 5, which I call b.
I am trying to insert a new column into each sub-array, where the column inserted into the ith (i=0,1,2,3,4) sub-array is a 10x1 vector where each element is equal to b[i].
For example:
import numpy as np
np.random.seed(777)
A = np.random.rand(5,10,3)
b = np.array([2,4,6,8,10])
A[0] should look like:
A[1] should look like:
And similarly for the other 'sub-arrays'.
(Notice b[0]=2 and b[1]=4)
What about this?
# Make an array B with the same dimensions than A
B = np.tile(b, (1, 10, 1)).transpose(2, 1, 0) # shape: (5, 10, 1)
# Concatenate both
np.concatenate([A, B], axis=-1) # shape: (5, 10, 4)
One method would be np.pad:
np.pad(A, ((0,0),(0,0),(0,1)), 'constant', constant_values=[[[],[]],[[],[]],[[],b[:, None,None]]])
# array([[[9.36513084e-01, 5.33199169e-01, 1.66763960e-02, 2.00000000e+00],
# [9.79060284e-02, 2.17614285e-02, 4.72452812e-01, 2.00000000e+00],
# etc.
Or (more typing but probably faster):
i,j,k = A.shape
res = np.empty((i,j,k+1), np.result_type(A, b))
res[...,:-1] = A
res[...,-1] = b[:, None]
Or dstack after broadcast_to:
np.dstack([A,np.broadcast_to(b[:,None],A.shape[:2])]
We all know that dot product between vectors must return a scalar:
import numpy as np
a = np.array([1,2,3])
b = np.array([3,4,5])
print(a.shape) # (3,)
print(b.shape) # (3,)
a.dot(b) # 26
b.dot(a) # 26
perfect. BUT WHY if we use a "real" (take a look at Difference between numpy.array shape (R, 1) and (R,)) row vector or column vector the numpy dot product returns error on dimension ?
arow = np.array([[1,2,3]])
brow = np.array([[3,4,5]])
print(arow.shape) # (1,3)
print(brow.shape) # (1,3)
arow.dot(brow) # ERROR
brow.dot(arow) # ERROR
acol = np.array([[1,2,3]]).reshape(3,1)
bcol = np.array([[3,4,5]]).reshape(3,1)
print(acol.shape) # (3,1)
print(bcol.shape) # (3,1)
acol.dot(bcol) # ERROR
bcol.dot(acol) # ERROR
Because by explicitly adding a second dimension, you are no longer working with vectors but with two dimensional matrices. When taking the dot product of matrices, the inner dimensions of the product must match.
You therefore need to transpose one of your matrices. Which one you transpose will determine the meaning and shape of the result.
A 1x3 times a 3x1 matrix will result in a 1x1 matrix (i.e., a scalar). This is the inner product. A 3x1 times a 1x3 matrix will result in a 3x3 outer product.
You can also use the # operator, which is actually matrix multiplication.
In this case, as well as in dot product, you need to be aware to the matrices sizes (ndarray should always be dim compatible ), but it's more readable:
>>> a = np.array([1,2,3])
>>> a.shape
(3,)
>>> b= np.array([[1,2,3]])
>>> b.shape
(1, 3)
>>> a#b
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: shapes (3,) and (1,3) not aligned: 3 (dim 0) != 1 (dim 0)
>>> a#b.T
array([14])
You can also do like this
import numpy as npy
Vector1 = npy.array([0,2,3])
Vector2 = npy.array([3,5,1])
print("Dot Product of", Vector1, "and", Vector2,)
def DotProduct(a,b):
NetValue = 0
for i in range(len(a)):
NetValue += a[i]*b[i]
return NetValue
ans = DotProduct(Vector1,Vector2)
print("The answer is =",ans)
I have a three dimensional array A, with shape (5774,15,100) and another 1 D array B with shape (5774,). I want to add these in order to get the another matrix C with shape (5774,15,101).
I am using hstack as
C = hstack((A ,np.array(B)[:,None]))
I am getting the below error, any suggesstions.
ValueError: could not broadcast input array from shape (5774,15,100) into shape (5774)
You'd need to use np.concatenate (which can cancatenate arrays of different shape, unlike the various np.*stack methods). Then, you need to use np.broadcast_to to get that (5774,) shaped array to (5774, 15, 1) (because concatenate still needs all the arrays to have the same number of dimensions).
C = np.concatenate((A,
np.broadcast_to(np.array(B)[:, None, None], A.shape[:-1] + (1,))),
axis = -1)
Checking:
A = np.random.rand(5774, 15, 100)
B = np.random.rand(5774)
C = np.concatenate((A,
np.broadcast_to(np.array(B)[:, None, None], A.shape[:-1] + (1,))),
axis = -1)
C.shape
Out: (5774, 15, 101)
I am using python3.5 and i have question: Why np.dot() is behaving like this?
>> a = np.array([[1,2,3,4]])
>> b = np.array([123])
>> np.dot(a,b)
Traceback (most recent call last):
File "<input>", line 1, in <module>
ValueError: shapes (1,4) and (1,) not aligned: 4 (dim 1) != 1 (dim 0)
>>np.dot(b,a)
array([123, 246, 369, 492])
From help(np.dot), we learn that, np.dot(x,y) is a sum product over the last axis of x and the second-to-last of y
In the case of np.dot(a, b), the last axis of a is 4 and the length of the only axis of b is 1. They don't match: fail.
In the case of np.dot(b, a), the last axis of b is 1 and the 2nd to last of a is 1. They match: success.
Workarounds
Depending on what your intention is for np.dot(a,b), you may want:
>>> np.dot(a, np.resize(b,a.shape[-1]))
array([1230])
From the documentation for numpy.dot(x, y):
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors... For N dimensions it is a sum product over the last axis of x and the second-to-last of y:
So, where you have:
a = np.array([[1,2,3,4]]) # shape is (1, 4), 2-D array (matrix)
b = np.array([123]) # shape is (1,), 1-D array (vector)
np.dot(b, a) works ((1,) * (1, 4), the relevant dimensions agree)
np.dot(a, b) doesn't ((1, 4) * (1,), the relevant dimensions don't agree, the operation is undefined. Note that the 'second-to-last' axis of (1,) corresponds to its one and only axis)
This is the same behaviour as if you have two 2-D arrays, i.e. matrices:
a = np.array([[1,2,3,4]]) # shape is (1, 4)
b = np.array([[123]]) # shape is (1, 1)
np.dot(b, a) works ((1, 1) * (1, 4), inner matrix dimensions agree)
np.dot(a, b) doesn't ((1, 4) * (1, 1), inner matrix dimensions don't agree)
If however you have two 1-D arrays, i.e. vectors, neither operation works:
a = np.array([1,2,3,4]) # shape is (4,)
b = np.array([123]) # shape is (1,)
np.dot(b, a) doesn't work ((1,) * (4,), but can only define the inner product for vectors of the same length)
np.dot(a, b) doesn't work ((4,) * (1), same)