numpy broadcasting to each column of the matrix separately - python

I have to matrices:
a = np.array([[6],[3],[4]])
b = np.array([1,10])
when I do:
c = a * b
c looks like this:
[ 6, 60]
[ 3, 30]
[ 4, 40]
which is good.
now, lets say I add a column to a (for the sake of the example its an identical column. but it dosent have to be):
a = np.array([[6,6],[3,3],[4,4]])
b stayes the same.
the result I want is 2 identical copies of c (since the column are identical), stacked along a new axis:
new_c.shape == [3,2,2]
when if u do new_c[:,:,0] or new_c[:,:,1] you get the original c.
I tried adding new axes to both a and b using np.expand_dims but it did not help.

One way is using numpy.einsum:
>>> import numpy as np
>>> a = np.array([[6],[3],[4]])
>>> b = np.array([1,10])
>>> print(a * b)
[[ 6 60]
[ 3 30]
[ 4 40]]
>>> print(np.einsum('ij, j -> ij', a, b))
[[ 6 60]
[ 3 30]
[ 4 40]]
>>> a = np.array([[6,6],[3,3],[4,4]])
>>> print(np.einsum('ij, k -> ikj', a, b)[:, :, 0])
>>> print(np.einsum('ij, k -> ikj', a, b)[:, :, 1])
[[ 6 60]
[ 3 30]
[ 4 40]]
[[ 6 60]
[ 3 30]
[ 4 40]]
For more usage about numpy.einsum, I recommend:
Understanding NumPy's einsum

You have multiple options here, one of which is using numpy.einsum as explained in the other answer. Another possibility is using array reshape method:
result = a.T.reshape((a.shape[1], a.shape[0], 1)) * b
result = result.reshape((-1, 2))
result
array([[ 6, 60],
[ 3, 30],
[ 4, 40],
[ 6, 60],
[ 3, 30],
[ 4, 40]])
Yet what is more intuitive to me is to stack arrays by mean of np.vstack with each column of a multiplied by b as follows:
result = np.vstack([c[:, None] * b for c in a.T])
result
array([[ 6, 60],
[ 3, 30],
[ 4, 40],
[ 6, 60],
[ 3, 30],
[ 4, 40]])

Related

Element-wise numpy matrix multiplication

I have two numpy arrays A and B, both with the dimension [2,2,n], where n is a very large number. I want to matrix multiply A and B in the first two dimensions to get C, i.e. C=AB, where C has the dimension [2,2,n].
The simplest way to accomplish this is by using for loop, i.e.
for i in range(n):
C[:,:,i] = np.matmul(A[:,:,i],B[:,:,i])
However, this is inefficient since n is very large. What's the most efficient way to do this with numpy?
You can do the following:
new_array = np.einsum('ijk,jlk->ilk', A, B)
What you want is the the default array multiplication in Numpy
In [22]: a = np.arange(8).reshape((2,2,2))+1 ; a[:,:,0], a[:,:,1]
Out[22]:
(array([[1, 3],
[5, 7]]),
array([[2, 4],
[6, 8]]))
In [23]: aa = a*a ; aa[:,:,0], aa[:,:,1]
Out[23]:
(array([[ 1, 9],
[25, 49]]),
array([[ 4, 16],
[36, 64]]))
Notice that I emphasized array because Numpy's arrays look like matrices but are indeed Numpy's ndarrays.
Post Scriptum
I guess that what you really want are matricesarrays with shape (n,2,2), so that you can address individual 2×2 matrices using a single index, e.g.,
In [27]: n = 3
...: a = np.arange(n*2*2)+1 ; a_22n, a_n22 = a.reshape((2,2,n)), a.reshape((n,2,2))
...: print(a_22n[0])
...: print(a_n22[0])
[[1 2 3]
[4 5 6]]
[[1 2]
[3 4]]
Post Post Scriptum
Re semantically correct:
In [13]: import numpy as np
...: n = 3
...: a = np.arange(2*2*n).reshape((2,2,n))+1
...: p = lambda t,a,n:print(t,*(a[:,:,i]for i in range(n)),sep=',\n')
...: p('Original array', a, n)
...: p('Using `einsum("ijk,jlk->ilk", ...)`', np.einsum('ijk,jlk->ilk', a, a), n)
...: p('Using standard multiplication', a*a, n)
Original array,
[[ 1 4]
[ 7 10]],
[[ 2 5]
[ 8 11]],
[[ 3 6]
[ 9 12]]
Using `einsum("ijk,jlk->ilk", ...)`,
[[ 29 44]
[ 77 128]],
[[ 44 65]
[104 161]],
[[ 63 90]
[135 198]]
Using standard multiplication,
[[ 1 16]
[ 49 100]],
[[ 4 25]
[ 64 121]],
[[ 9 36]
[ 81 144]]

Merging rows in numpy to form new array

This is a sample of what I am trying to accomplish. I am very new to python and have searched for hours to find out what I am doing wrong. I haven't been able to find what my issue is. I am still new enough that I may be searching for the wrong phrases. If so, could you please point me in the right direction?
I want to combine n mumber of arrays to make one array. I want to have the first row from x as the first row in the combined the first row from y as the second row in combined, the first row from z as the third row in combined the the second row in x as the fourth row in combined, etc.
so I would look something like this.
x = [x1 x2 x3]
[x4 x5 x6]
[x7 x8 x9]
y = [y1 y2 y3]
[y4 y5 y6]
[y7 y8 y9]
x = [z1 z2 z3]
[z4 z5 z6]
[z7 z8 z9]
combined = [x1 x2 x3]
[y1 y2 y3]
[z1 z2 z3]
[x4 x5 x6]
[...]
[z7 z8 z9]
The best I can come up with is the
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((9,3))
for rows in range(len(x)):
combined[0::3] = x[rows,:]
combined[1::3] = y[rows,:]
combined[2::3] = z[rows,:]
print(combined)
All this does is write the last value of the input array to every third row in the output array instead of what I wanted. I am not sure if this is even the best way to do this. Any advice would help out.
*I just figure out this works but if someone knows a higher performance method, *please let me know.
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
for rows in range(6):
combined[rows*3,:] = x[rows,:]
combined[rows*3+1,:] = y[rows,:]
combined[rows*3+2,:] = z[rows,:]
print(combined)
You can do this using a list comprehension and zip:
combined = np.array([row for row_group in zip(x, y, z) for row in row_group])
Using vectorised operations only:
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
A = A[idx]
Here's a demo:
import numpy as np
x, y, z = np.random.rand(3,3), np.random.rand(3,3), np.random.rand(3,3)
print(x, y, z)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.50299357 0.35075811 0.47230915]
[ 0.751129 0.81839586 0.80554345]]
[[ 0.09469396 0.33848691 0.51550685]
[ 0.38233976 0.05280427 0.37778962]
[ 0.7169351 0.17752571 0.49581777]]
[[ 0.06056544 0.70273453 0.60681583]
[ 0.57830566 0.71375038 0.14446909]
[ 0.23799775 0.03571076 0.26917939]]
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
print(idx) # [0 3 6 1 4 7 2 5 8]
A = A[idx]
print(A)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.09469396 0.33848691 0.51550685]
[ 0.06056544 0.70273453 0.60681583]
[ 0.50299357 0.35075811 0.47230915]
[ 0.38233976 0.05280427 0.37778962]
[ 0.57830566 0.71375038 0.14446909]
[ 0.751129 0.81839586 0.80554345]
[ 0.7169351 0.17752571 0.49581777]
[ 0.23799775 0.03571076 0.26917939]]
I have changed your code a little bit to get the desired output
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
combined[0::3] = x
combined[1::3] = y
combined[2::3] = z
print(combined)
You had the shape of the combined matrix wrong and there is no real need for the for loop.
This might not be the most pythonic way to do it but you could
for block in range(len(combined)/3):
for rows in range(len(x)):
combined[block*3+0::3] = x[rows,:]
combined[block*3+1::3] = y[rows,:]
combined[block*3+2::3] = z[rows,:]
A simple numpy solution is to stack the arrays on a new middle axis, and reshape the result to 2d:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: y = np.arange(9).reshape(3,3)+10
In [7]: z = np.arange(9).reshape(3,3)+100
In [8]: np.stack((x,y,z),axis=1).reshape(-1,3)
Out[8]:
array([[ 0, 1, 2],
[ 10, 11, 12],
[100, 101, 102],
[ 3, 4, 5],
[ 13, 14, 15],
[103, 104, 105],
[ 6, 7, 8],
[ 16, 17, 18],
[106, 107, 108]])
It may be easier to see what's happening if we give each dimension a different value; e.g. 2 3x4 arrays:
In [9]: x = np.arange(12).reshape(3,4)
In [10]: y = np.arange(12).reshape(3,4)+10
np.array combines them on a new 1st axis, making a 2x3x4 array. To get the interleaving you want, we can transpose the first 2 dimensions, producing a 3x2x4. Then reshape to a 6x4.
In [13]: np.array((x,y))
Out[13]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]]])
In [14]: np.array((x,y)).transpose(1,0,2)
Out[14]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
In [15]: np.array((x,y)).transpose(1,0,2).reshape(-1,4)
Out[15]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
np.vstack produces a 6x4, but with the wrong order. We can't transpose that directly.
np.stack with default axis behaves just like np.array. But with axis=1, it creates a 3x2x4, which we can reshape:
In [16]: np.stack((x,y), 1)
Out[16]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
The list zip in the accepted answer is a list version of transpose, creating a list of 3 2-element tuples.
In [17]: list(zip(x,y))
Out[17]:
[(array([0, 1, 2, 3]), array([10, 11, 12, 13])),
(array([4, 5, 6, 7]), array([14, 15, 16, 17])),
(array([ 8, 9, 10, 11]), array([18, 19, 20, 21]))]
np.array(list(zip(x,y))) produces the same thing as the stack, a 3x2x4 array.
As for speed, I suspect the allocate and assign (as in Ash's answer) is fastest:
In [27]: z = np.zeros((6,4),int)
...: for i, arr in enumerate((x,y)):
...: z[i::2,:] = arr
...:
In [28]: z
Out[28]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
For serious timings, use much larger examples than this.

How can I create an numpy array from two different numpy arrays?

I want to create a bumpy array from two different bumpy arrays. For example:
Say I have 2 arrays a and b.
a = np.array([1,3,4])
b = np.array([[1,5,51,52],[2,6,61,62],[3,7,71,72],[4,8,81,82],[5,9,91,92]])
I want it to loop through each indices in array a and find it in array b and then save the row of b into c. Like below:
c = np.array([[1,5,51,52],
[3,7,71,72],
[4,8,81,82]])
I have tried doing:
c=np.zeros(shape=(len(b),4))
for i in b:
c[i]=a[b[i][:]]
but get this error "arrays used as indices must be of integer (or boolean) type"
Approach #1
If a is sorted, we can use np.searchsorted, like so -
idx = np.searchsorted(a,b[:,0])
idx[idx==a.size] = 0
out = b[a[idx] == b[:,0]]
Sample run -
In [160]: a
Out[160]: array([1, 3, 4])
In [161]: b
Out[161]:
array([[ 1, 5, 51, 52],
[ 2, 6, 61, 62],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82],
[ 5, 9, 91, 92]])
In [162]: out
Out[162]:
array([[ 1, 5, 51, 52],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82]])
If a is not sorted, we need to use sorter argument with searchsorted.
Approach #2
We can also use np.in1d -
b[np.in1d(b[:,0],a)]

Remove rows from numpy array based on presence/absence in other arrays

I have 3 different numpy arrays, but they all start with two columns which contain the day of year and the time. For example:
dyn = [[ 83 12 7.10555687e-01 ..., 6.99242766e-01 6.868761e-01]
[ 83 13 8.28091972e-01 ..., 8.33734118e-01 8.47266838e-01]
[ 83 14 8.79437354e-01 ..., 8.73598144e-01 8.57156213e-01]
[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]
[ 162 1 2.51653976e-01 ..., 2.17209188e-01 1.42133495e-1]]
us = [[ 133 18 3.00483815e+02 ..., 1.94277561e+00 2.8168959e+00]
[ 133 19 2.98832620e+02 ..., 2.42506475e+00 2.99730800e+00]
[ 133 20 2.96706105e+02 ..., 3.16851622e+00 4.41187088e+00]
[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]
[ 162 2 2.86992180e+02 ..., 7.08996730e-02 2.6403210e-01]]
I need to be able to remove any rows where specific date and time isn't present in all 3 arrays. In other words, so I'm left with 3 arrays where the first 2 columns are identical in each of the 3 arrays.
So the resulting smaller arrays would be:
dyn= [[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]]
us= [[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]]
(But then also limited by what's in the third array)
I've tried using sort/zip but not sure that it should be applied to 2D array like that:
X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]
And also a loop but that only works when the same times/days are in the same position within the array, which isn't helpful
for i in range(100):
dyn_small=dyn[dyn[:,0]==us[i,0]]
Assuming A, B and C as the input arrays, here's a vectorized approach making heavy usage of broadcasting -
# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])
# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]
Sample run -
1) Inputs :
In [116]: A
Out[116]:
array([[ 83, 12, 443],
[ 83, 13, 565],
[ 83, 14, 342],
[161, 23, 431],
[162, 0, 113],
[162, 1, 313]])
In [117]: B
Out[117]:
array([[161, 23, 999],
[ 5, 1, 13],
[ 83, 12, 15],
[162, 0, 12],
[ 4, 3, 11]])
In [118]: C
Out[118]:
array([[ 11, 23, 143],
[162, 0, 113],
[161, 23, 545]])
2) Run solution code to get matching row IDs and thus extract the rows :
In [119]: M1 = (A[:,None,:2] == B[:,:2])
...: M2 = (B[:,None,:2] == C[:,:2])
...:
In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
In [121]: A[I]
Out[121]:
array([[161, 23, 431],
[162, 0, 113]])
In [122]: B[J]
Out[122]:
array([[161, 23, 999],
[162, 0, 12]])
In [123]: C[K]
Out[123]:
array([[161, 23, 545],
[162, 0, 113]])
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:
import numpy as np
import numpy_indexed as npi
dyn = np.array(dyn)
us = np.array(us)
dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])
common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])
Note that the performance NlogN worst case; and linear insofar as the arguments to as_index are already in sorted order. By contrast, the currently accepted answer is quadratic in input size.

How to get numpy array from multiple lists of same length and sort along an axis?

I have a very simple question ,How to get numpy array from multiple lists of same length and sort along an axis ?
I'm looking for something like:
a = [1,1,2,3,4,5,6]
b = [10,10,11,09,22,20,20]
c = [100,100,111,090,220,200,200]
d = np.asarray(a,b,c)
print d
>>>[[1,10,100],[1,10,100],[2,11,111].........[6,20,200]]
2nd Question : And if this could be achieved can i sort it along an axis (for eg. on the values of List b)?
3rd Question : Can the sorting be done over a range ? for eg. for values between b+10 and b-10 while looking at List c for further sorting. like
[[1,11,111][1,10,122][1,09,126][1,11,154][1,11,191]
[1,20,110][1,25,122][1,21,154][1,21,155][1,21,184]]
You can zip to get the array:
a = [1, 1, 2, 3, 4, 5, 6]
b = [10, 10, 11, 9, 22, 20, 20]
c = [100, 100, 111, 90, 220, 200, 200]
d = np.asarray(zip(a,b,c))
print(d)
[[ 1 10 100]
[ 1 10 100]
[ 2 11 111]
[ 3 9 90]
[ 4 22 220]
[ 5 20 200]
[ 6 20 200]]
print(d[np.argsort(d[:, 1])]) # a sorted copy
[[ 3 9 90]
[ 1 10 100]
[ 1 10 100]
[ 2 11 111]
[ 5 20 200]
[ 6 20 200]
[ 4 22 220]]
I don't know how you would do an inplace sort without doing something like:
d = np.asarray(zip(a,b,c))
d.dtype = [("0", int), ("1", int), ("2", int)]
d.shape = d.size
d.sort(order="1")
The leading 0 would make the 090 octal in python2 or invalid syntax in python3 so I removed it.
You can also sort the zipped elements before you pass the:
from operator import itemgetter
zipped = sorted(zip(a,b,c),key=itemgetter(1))
d = np.asarray(zipped)
print(d)
[[ 3 9 90]
[ 1 10 100]
[ 1 10 100]
[ 2 11 111]
[ 5 20 200]
[ 6 20 200]
[ 4 22 220]]
You can use np.dstack and np.lexsort . for example if you want to sort based on the array b(second axis) then a and then c :
>>> d=np.dstack((a,b,c))[0]
>>> indices=np.lexsort((d[:,1],d[:,0],d[:,2]))
>>> d[indices]
array([[ 3, 9, 90],
[ 1, 10, 100],
[ 1, 10, 100],
[ 2, 11, 111],
[ 5, 20, 200],
[ 6, 20, 200],
[ 4, 22, 220]])

Categories

Resources