I have two numpy arrays A and B, both with the dimension [2,2,n], where n is a very large number. I want to matrix multiply A and B in the first two dimensions to get C, i.e. C=AB, where C has the dimension [2,2,n].
The simplest way to accomplish this is by using for loop, i.e.
for i in range(n):
C[:,:,i] = np.matmul(A[:,:,i],B[:,:,i])
However, this is inefficient since n is very large. What's the most efficient way to do this with numpy?
You can do the following:
new_array = np.einsum('ijk,jlk->ilk', A, B)
What you want is the the default array multiplication in Numpy
In [22]: a = np.arange(8).reshape((2,2,2))+1 ; a[:,:,0], a[:,:,1]
Out[22]:
(array([[1, 3],
[5, 7]]),
array([[2, 4],
[6, 8]]))
In [23]: aa = a*a ; aa[:,:,0], aa[:,:,1]
Out[23]:
(array([[ 1, 9],
[25, 49]]),
array([[ 4, 16],
[36, 64]]))
Notice that I emphasized array because Numpy's arrays look like matrices but are indeed Numpy's ndarrays.
Post Scriptum
I guess that what you really want are matricesarrays with shape (n,2,2), so that you can address individual 2×2 matrices using a single index, e.g.,
In [27]: n = 3
...: a = np.arange(n*2*2)+1 ; a_22n, a_n22 = a.reshape((2,2,n)), a.reshape((n,2,2))
...: print(a_22n[0])
...: print(a_n22[0])
[[1 2 3]
[4 5 6]]
[[1 2]
[3 4]]
Post Post Scriptum
Re semantically correct:
In [13]: import numpy as np
...: n = 3
...: a = np.arange(2*2*n).reshape((2,2,n))+1
...: p = lambda t,a,n:print(t,*(a[:,:,i]for i in range(n)),sep=',\n')
...: p('Original array', a, n)
...: p('Using `einsum("ijk,jlk->ilk", ...)`', np.einsum('ijk,jlk->ilk', a, a), n)
...: p('Using standard multiplication', a*a, n)
Original array,
[[ 1 4]
[ 7 10]],
[[ 2 5]
[ 8 11]],
[[ 3 6]
[ 9 12]]
Using `einsum("ijk,jlk->ilk", ...)`,
[[ 29 44]
[ 77 128]],
[[ 44 65]
[104 161]],
[[ 63 90]
[135 198]]
Using standard multiplication,
[[ 1 16]
[ 49 100]],
[[ 4 25]
[ 64 121]],
[[ 9 36]
[ 81 144]]
Related
I have two NumPy arrays that I would like to multiply with each other across every row. To illustrate what I mean I have put the code below:
import numpy as np
a = np.array([
[1,2],
[3,4],
[5,6],
[7,8]])
b = np.array([
[1,2],
[4,4],
[5,5],
[7,10]])
final_product=[]
for i in range(0,b.shape[0]):
product=a[i,:]*b
final_product.append(product)
Rather than using loops and lists, is there more direct, faster and elegant way of doing the above row-wise multiplication in NumPy?
By using proper reshaping and repetition you can achieve what you are looking for, here is a simple implementation:
a.reshape(4,1,2) * ([b]*4)
If the length is dynamic you can do this:
a.reshape(a.shape[0],1,a.shape[1]) * ([b]*a.shape[0])
Note : Make sure a.shape[1] and b.shape[1] remains equal, while a.shape[0] and b.shape[0] can differ.
This type of problems can be handled by np.einsum(see Doc & this post) for more understanding. It is one of the most efficient ways in this regard:
np.einsum("ij, kj->ikj", a, b)
Try:
n = b.shape[0]
print(np.multiply(np.repeat(a, n, axis=0).reshape((a.shape[0], n, -1)), b))
Prints:
[[[ 1 4]
[ 4 8]
[ 5 10]
[ 7 20]]
[[ 3 8]
[12 16]
[15 20]
[21 40]]
[[ 5 12]
[20 24]
[25 30]
[35 60]]
[[ 7 16]
[28 32]
[35 40]
[49 80]]]
I have to matrices:
a = np.array([[6],[3],[4]])
b = np.array([1,10])
when I do:
c = a * b
c looks like this:
[ 6, 60]
[ 3, 30]
[ 4, 40]
which is good.
now, lets say I add a column to a (for the sake of the example its an identical column. but it dosent have to be):
a = np.array([[6,6],[3,3],[4,4]])
b stayes the same.
the result I want is 2 identical copies of c (since the column are identical), stacked along a new axis:
new_c.shape == [3,2,2]
when if u do new_c[:,:,0] or new_c[:,:,1] you get the original c.
I tried adding new axes to both a and b using np.expand_dims but it did not help.
One way is using numpy.einsum:
>>> import numpy as np
>>> a = np.array([[6],[3],[4]])
>>> b = np.array([1,10])
>>> print(a * b)
[[ 6 60]
[ 3 30]
[ 4 40]]
>>> print(np.einsum('ij, j -> ij', a, b))
[[ 6 60]
[ 3 30]
[ 4 40]]
>>> a = np.array([[6,6],[3,3],[4,4]])
>>> print(np.einsum('ij, k -> ikj', a, b)[:, :, 0])
>>> print(np.einsum('ij, k -> ikj', a, b)[:, :, 1])
[[ 6 60]
[ 3 30]
[ 4 40]]
[[ 6 60]
[ 3 30]
[ 4 40]]
For more usage about numpy.einsum, I recommend:
Understanding NumPy's einsum
You have multiple options here, one of which is using numpy.einsum as explained in the other answer. Another possibility is using array reshape method:
result = a.T.reshape((a.shape[1], a.shape[0], 1)) * b
result = result.reshape((-1, 2))
result
array([[ 6, 60],
[ 3, 30],
[ 4, 40],
[ 6, 60],
[ 3, 30],
[ 4, 40]])
Yet what is more intuitive to me is to stack arrays by mean of np.vstack with each column of a multiplied by b as follows:
result = np.vstack([c[:, None] * b for c in a.T])
result
array([[ 6, 60],
[ 3, 30],
[ 4, 40],
[ 6, 60],
[ 3, 30],
[ 4, 40]])
I have the following array:
import numpy as np
a = np.array([[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3]])
I understand that np.random.shuffle(a.T) will shuffle the array along the row, but what I need is for it to shuffe each row idependently. How can this be done in numpy? Speed is critical as there will be several million rows.
For this specific problem, each row will contain the same starting population.
import numpy as np
np.random.seed(2018)
def scramble(a, axis=-1):
"""
Return an array with the values of `a` independently shuffled along the
given axis
"""
b = a.swapaxes(axis, -1)
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
b = b[..., idx]
return b.swapaxes(axis, -1)
a = a = np.arange(4*9).reshape(4, 9)
# array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8],
# [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23, 24, 25, 26],
# [27, 28, 29, 30, 31, 32, 33, 34, 35]])
print(scramble(a, axis=1))
yields
[[ 3 8 7 0 4 5 1 2 6]
[12 17 16 9 13 14 10 11 15]
[21 26 25 18 22 23 19 20 24]
[30 35 34 27 31 32 28 29 33]]
while scrambling along the 0-axis:
print(scramble(a, axis=0))
yields
[[18 19 20 21 22 23 24 25 26]
[ 0 1 2 3 4 5 6 7 8]
[27 28 29 30 31 32 33 34 35]
[ 9 10 11 12 13 14 15 16 17]]
This works by first swapping the target axis with the last axis:
b = a.swapaxes(axis, -1)
This is a common trick used to standardize code which deals with one axis.
It reduces the general case to the specific case of dealing with the last axis.
Since in NumPy version 1.10 or higher swapaxes returns a view, there is no copying involved and so calling swapaxes is very quick.
Now we can generate a new index order for the last axis:
n = a.shape[axis]
idx = np.random.choice(n, n, replace=False)
Now we can shuffle b (independently along the last axis):
b = b[..., idx]
and then reverse the swapaxes to return an a-shaped result:
return b.swapaxes(axis, -1)
If you don't want a return value and want to operate on the array directly, you can specify the indices to shuffle.
>>> import numpy as np
>>>
>>>
>>> a = np.array([[1,2,3], [1,2,3], [1,2,3]])
>>>
>>> # Shuffle row `2` independently
>>> np.random.shuffle(a[2])
>>> a
array([[1, 2, 3],
[1, 2, 3],
[3, 2, 1]])
>>>
>>> # Shuffle column `0` independently
>>> np.random.shuffle(a[:,0])
>>> a
array([[3, 2, 3],
[1, 2, 3],
[1, 2, 1]])
If you want a return value as well, you can use numpy.random.permutation, in which case replace np.random.shuffle(a[n]) with a[n] = np.random.permutation(a[n]).
Warning, do not do a[n] = np.random.shuffle(a[n]). shuffle does not return anything, so the row/column you end up "shuffling" will be filled with nan instead.
Good answer above. But I will throw in a quick and dirty way:
a = np.array([[1,2,3], [1,2,3], [1,2,3]])
ignore_list_outpput = [np.random.shuffle(x) for x in a]
Then, a can be something like this
array([[2, 1, 3],
[4, 6, 5],
[9, 7, 8]])
Not very elegant but you can get this job done with just one short line.
Building on my comment to #Hun's answer, here's the fastest way to do this:
def shuffle_along(X):
"""Minimal in place independent-row shuffler."""
[np.random.shuffle(x) for x in X]
This works in-place and can only shuffle rows. If you need more options:
def shuffle_along(X, axis=0, inline=False):
"""More elaborate version of the above."""
if not inline:
X = X.copy()
if axis == 0:
[np.random.shuffle(x) for x in X]
if axis == 1:
[np.random.shuffle(x) for x in X.T]
if not inline:
return X
This, however, has the limitation of only working on 2d-arrays. For higher dimensional tensors, I would use:
def shuffle_along(X, axis=0, inline=True):
"""Shuffle along any axis of a tensor."""
if not inline:
X = X.copy()
np.apply_along_axis(np.random.shuffle, axis, X) # <-- I just changed this
if not inline:
return X
You can do it with numpy without any loop or extra function, and much more faster. E. g., we have an array of size (2, 6) and we want a sub array (2,2) with independent random index for each column.
import numpy as np
test = np.array([[1, 1],
[2, 2],
[0.5, 0.5],
[0.3, 0.3],
[4, 4],
[7, 7]])
id_rnd = np.random.randint(6, size=(2, 2)) # select random numbers, use choice and range if don want replacement.
new = np.take_along_axis(test, id_rnd, axis=0)
Out:
array([[2. , 2. ],
[0.5, 2. ]])
It works for any number of dimensions.
As of NumPy 1.20.0 released in January 2021 we have a permuted() method on the new Generator type (introduced with the new random API in NumPy 1.17.0, released in July 2019). This does exactly what you need:
import numpy as np
rng = np.random.default_rng()
a = np.array([
[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
])
shuffled = rng.permuted(a, axis=1)
This gives you something like
>>> print(shuffled)
[[2 3 1]
[1 3 2]
[2 1 3]]
As you can see, the rows are permuted independently. This is in sharp contrast with both rng.permutation() and rng.shuffle().
If you want an in-place update you can pass the original array as the out keyword argument. And you can use the axis keyword argument to choose the direction along which to shuffle your array.
I have 3 different numpy arrays, but they all start with two columns which contain the day of year and the time. For example:
dyn = [[ 83 12 7.10555687e-01 ..., 6.99242766e-01 6.868761e-01]
[ 83 13 8.28091972e-01 ..., 8.33734118e-01 8.47266838e-01]
[ 83 14 8.79437354e-01 ..., 8.73598144e-01 8.57156213e-01]
[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]
[ 162 1 2.51653976e-01 ..., 2.17209188e-01 1.42133495e-1]]
us = [[ 133 18 3.00483815e+02 ..., 1.94277561e+00 2.8168959e+00]
[ 133 19 2.98832620e+02 ..., 2.42506475e+00 2.99730800e+00]
[ 133 20 2.96706105e+02 ..., 3.16851622e+00 4.41187088e+00]
[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]
[ 162 2 2.86992180e+02 ..., 7.08996730e-02 2.6403210e-01]]
I need to be able to remove any rows where specific date and time isn't present in all 3 arrays. In other words, so I'm left with 3 arrays where the first 2 columns are identical in each of the 3 arrays.
So the resulting smaller arrays would be:
dyn= [[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]]
us= [[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]]
(But then also limited by what's in the third array)
I've tried using sort/zip but not sure that it should be applied to 2D array like that:
X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]
And also a loop but that only works when the same times/days are in the same position within the array, which isn't helpful
for i in range(100):
dyn_small=dyn[dyn[:,0]==us[i,0]]
Assuming A, B and C as the input arrays, here's a vectorized approach making heavy usage of broadcasting -
# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])
# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]
Sample run -
1) Inputs :
In [116]: A
Out[116]:
array([[ 83, 12, 443],
[ 83, 13, 565],
[ 83, 14, 342],
[161, 23, 431],
[162, 0, 113],
[162, 1, 313]])
In [117]: B
Out[117]:
array([[161, 23, 999],
[ 5, 1, 13],
[ 83, 12, 15],
[162, 0, 12],
[ 4, 3, 11]])
In [118]: C
Out[118]:
array([[ 11, 23, 143],
[162, 0, 113],
[161, 23, 545]])
2) Run solution code to get matching row IDs and thus extract the rows :
In [119]: M1 = (A[:,None,:2] == B[:,:2])
...: M2 = (B[:,None,:2] == C[:,:2])
...:
In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
In [121]: A[I]
Out[121]:
array([[161, 23, 431],
[162, 0, 113]])
In [122]: B[J]
Out[122]:
array([[161, 23, 999],
[162, 0, 12]])
In [123]: C[K]
Out[123]:
array([[161, 23, 545],
[162, 0, 113]])
The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems in an elegant and efficient/vectorized manner:
import numpy as np
import numpy_indexed as npi
dyn = np.array(dyn)
us = np.array(us)
dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])
common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])
Note that the performance NlogN worst case; and linear insofar as the arguments to as_index are already in sorted order. By contrast, the currently accepted answer is quadratic in input size.
The question is the inverse of this question. I'm looking for a generic method to from the original big array from small arrays:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
->
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
I am currently developing a solution, will post it when it's done, would however like to see other (better) ways.
import numpy as np
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array looks like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
def unblockshaped(arr, h, w):
"""
Return an array of shape (h, w) where
h * w = arr.size
If arr is of shape (n, nrows, ncols), n sublocks of shape (nrows, ncols),
then the returned array preserves the "physical" layout of the sublocks.
"""
n, nrows, ncols = arr.shape
return (arr.reshape(h//nrows, -1, nrows, ncols)
.swapaxes(1,2)
.reshape(h, w))
For example,
c = np.arange(24).reshape((4,6))
print(c)
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
print(blockshaped(c, 2, 3))
# [[[ 0 1 2]
# [ 6 7 8]]
# [[ 3 4 5]
# [ 9 10 11]]
# [[12 13 14]
# [18 19 20]]
# [[15 16 17]
# [21 22 23]]]
print(unblockshaped(blockshaped(c, 2, 3), 4, 6))
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]]
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handing arrays of any
dimension.
Yet another (simple) approach:
threedarray = ...
twodarray = np.array(map(lambda x: x.flatten(), threedarray))
print(twodarray.shape)
I hope I get you right, let's say we have a,b :
>>> a = np.array([[1,2] ,[3,4]])
>>> b = np.array([[5,6] ,[7,8]])
>>> a
array([[1, 2],
[3, 4]])
>>> b
array([[5, 6],
[7, 8]])
in order to make it one big 2d array use numpy.concatenate:
>>> c = np.concatenate((a,b), axis=1 )
>>> c
array([[1, 2, 5, 6],
[3, 4, 7, 8]])
It works for the images I tested for now. Will if further tests are made. It is however a solution which takes no account about speed and memory usage.
def unblockshaped(blocks, h, w):
n, nrows, ncols = blocks.shape
bpc = w/ncols
bpr = h/nrows
reconstructed = zeros((h,w))
t = 0
for i in arange(bpr):
for j in arange(bpc):
reconstructed[i*nrows:i*nrows+nrows,j*ncols:j*ncols+ncols] = blocks[t]
t = t+1
return reconstructed
Here is a solution that one can use if someone is wishing to create tiles of a matrix:
from itertools import product
import numpy as np
def tiles(arr, nrows, ncols):
"""
If arr is a 2D array, the returned list contains nrowsXncols numpy arrays
with each array preserving the "physical" layout of arr.
When the array shape (rows, cols) are not divisible by (nrows, ncols) then
some of the array dimensions can change according to numpy.array_split.
"""
rows, cols = arr.shape
col_arr = np.array_split(range(cols), ncols)
row_arr = np.array_split(range(rows), nrows)
return [arr[r[0]: r[-1]+1, c[0]: c[-1]+1]
for r, c in product(row_arr, col_arr)]