This question already has answers here:
Vectorize large NumPy multiplication
(2 answers)
Closed 3 years ago.
Consider the two toy arrays below:
import numpy as np
k = np.random.randint(1, 25, (5, 2, 3))
l = np.random.randint(25, 50, (7, 3))
In [27]: k
Out[27]:
array([[[14, 15, 24],
[21, 24, 5]],
[[22, 19, 9],
[21, 1, 11]],
[[ 1, 23, 5],
[16, 14, 2]],
[[ 7, 3, 16],
[23, 2, 8]],
[[12, 24, 4],
[ 2, 15, 20]]])
In [28]: l
Out[28]:
array([[47, 31, 42],
[28, 27, 26],
[45, 32, 49],
[29, 34, 32],
[40, 36, 25],
[44, 27, 31],
[27, 35, 26]])
I can get the multiplicative sum that I am interested in as follows:
f = np.array([np.sum( k * x, axis = 2) for x in l])
In [29]: f
Out[29]:
array([[[2131, 1941],
[2001, 1480],
[ 970, 1270],
[1094, 1479],
[1476, 1399]],
[[1421, 1366],
[1363, 901],
[ 779, 878],
[ 693, 906],
[1088, 981]],
[[2286, 1958],
[2039, 1516],
[1026, 1266],
[1195, 1491],
[1504, 1550]],
[[1684, 1585],
[1572, 995],
[ 971, 1004],
[ 817, 991],
[1292, 1208]],
[[1700, 1829],
[1789, 1151],
[ 993, 1194],
[ 788, 1192],
[1444, 1120]],
[[1765, 1727],
[1760, 1292],
[ 820, 1144],
[ 885, 1314],
[1300, 1113]],
[[1527, 1537],
[1493, 888],
[ 962, 974],
[ 710, 899],
[1268, 1099]]])
How can I calculate this sum without resorting to comprehension?
This is a good use case for np.einsum:
np.einsum('ijk,lk->lij', k, l)
list_comp = np.array([np.sum( k * x, axis = 2) for x in l])
np.allclose(np.einsum('ijk,lk->lij', k, l), list_comp)
# True
Or using broadcasting:
(l[:,None,None]*k).sum(-1)
Although from a quick check on timings np.einsum runs about 3 times faster
You can also do that with np.tensordot:
import numpy as np
np.random.seed(0)
k = np.random.randint(1, 25, (5, 2, 3))
l = np.random.randint(25, 50, (7, 3))
f = np.tensordot(l, k, [-1, -1])
f_comp = np.array([np.sum(k * x, axis=2) for x in l])
print(np.allclose(f, f_comp))
# True
Related
I have value X of type ndarray with shape: (40000, 2)
The second column of X contains list of 50 numbers
Example:
[17, [1, 2, 3, ...]],
[39, [44, 45, 45, ...]], ...
I want to convert it to ndarray of shape (40000, 51):
the first column will be the same
the every element of the list will be in it's own column.
for my example:
[17, 1, 2, 3, ....],
[39, 44, 45, 45, ...]
How can I do it ?
np.hstack((arr[:,0].reshape(-1,1), np.array(arr[:,1].tolist())))
Example:
>>> arr
array([[75, list([90, 39, 63])],
[20, list([82, 92, 22])],
[80, list([12, 6, 89])],
[79, list([11, 96, 74])],
[96, list([26, 37, 65])]], dtype=object)
>>> np.hstack((arr[:,0].reshape(-1,1),np.array(arr[:,1].tolist()))).astype(int)
array([[75, 90, 39, 63],
[20, 82, 92, 22],
[80, 12, 6, 89],
[79, 11, 96, 74],
[96, 26, 37, 65]])
You can do this for each line of your ndarray , here is an example :
# X = [39, [44, 45, 45, ...]]
newX = numpy.ndarray(shape=(1,51))
new[0] = X[0] # adding the first element
# now the rest of elements
i = 0
for e in X[1] :
newX[i] = e
i = i + 1
You can make this process as a function and apply it in this way :
newArray = numpy.ndarray(shape=(40000,51))
i = 0
for x in oldArray :
Process(newArray[i],x)
i=i+1
I defined the source array (with shorter lists in column 1) as:
X = np.array([[17, [1, 2, 3, 4]], [39, [44, 45, 45, 46]]])
To do your task, define the following function:
def myExplode(row):
tbl = [row[0]]
tbl.extend(row[1])
return tbl
Then apply it to each row:
np.apply_along_axis(myExplode, axis=1, arr=X)
The result is:
array([[17, 1, 2, 3, 4],
[39, 44, 45, 45, 46]])
I have a structure like this :
data = [[2,5,6,9,12,45,32] , [43,23,12,76,845,1] ,[65,23,1,54,22,123] ,
[323,23,412,656,2,3] , [8,5,3,9,12,45,32] , [60,23,12,76,845,1] ,
[5,23,1,54,22,123] , [35,2,12,56,22,34] ]
and I want order this lists based on another list with the positions
order = [5,4,1,3,0,6,7, 2]
the result would be :
data_ordered = [[60,23,12,76,845,1],[8,5,3,9,12,45,32], [43,23,12,76,845,1],
[323,23,412,656,2,3] , [2,5,6,9,12,45,32] , [5,23,1,54,22,123] ,
[35,2,12,56,22,34] ,[65,23,1,54,22,123] ]
Any idea?
data_ordered = [ data[i] for i in order]
Pretty basic list comprehension.
import numpy as np
data_ordered = np.array(data)[np.array(order)].tolist()
And this will be done. Full example given below:
import numpy as np
data = [[2,5,6,9,12,45,32] , [43,23,12,76,845,1] ,[65,23,1,54,22,123] ,
[323,23,412,656,2,3] , [8,5,3,9,12,45,32] , [60,23,12,76,845,1] ,
[5,23,1,54,22,123] , [35,2,12,56,22,34] ]
order = [5,4,1,3,0,6,7, 2]
data_ordered= np.array(data)[np.array(order)].tolist()
print(data_ordered)
Output is
[[60, 23, 12, 76, 845, 1], [8, 5, 3, 9, 12, 45, 32], [43, 23, 12, 76, 845, 1], [323, 23, 412, 656, 2, 3], [2, 5, 6, 9, 12, 45, 32], [5, 23, 1, 54, 22, 123], [35, 2, 12, 56, 22, 34], [65, 23, 1, 54, 22, 123]]
Use numpy to solve it.
Say I have an array myarr such that myarr.shape = (2,64,64,2). Now if I define myarr2 = myarr[[0,1,0,0,1],...], then the following is true
myarr2.shape #(5,64,64,2)
myarr2[0,...] == myarr[0,...] # = True
myarr2[1,...] == myarr[1,...] # = True
myarr2[2,...] == myarr[0,...] # = True
...
Can this be generalized so the slices are arrays? That is, is there a way to make the following hypothetical code work?
myarr2 = myarr[...,[20,30,40]:[30,40,50],[15,25,35]:[25,35,45],..]
myarr2[0,] == myarr[...,20:30,15:25,...] # = True
myarr2[1,] == myarr[...,30:40,25:35,...] # = True
myarr2[2,] == myarr[...,40:50,35:45,...] # = True
you may feed the coordinates of subarrays to the cycle which cuts subarrays from myarray. I don't know hoe you store the indices of subarrays so I put them into nested list idx_list:
idx_list = [[[20,30,40],[30,40,50]],[[15,25,35]:[25,35,45]]] # assuming 2D cutouts
idx_array = np.array([k for i in idx_list for j in i for k in j]) # unpack
idx_array = idx_array .reshape(-1,2).T # reshape
myarray2 = np.array([myarray[a:b,c:d] for a,b,c,d in i2]) # cut and combine
Let's simplify the problem a bit; first by removing the two outer dimensions that don't affect the core indexing issue; and by reducing the size so we can see and understand the results.
The setup
In [540]: arr = np.arange(7*7).reshape(7,7)
In [541]: arr
Out[541]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48]])
In [542]: idx =np.array([[0,2,4,6],[1,3,5,7]])
Now a straightforward iteration approach:
In [543]: alist = []
...: for i in range(idx.shape[1]-1):
...: j,k = idx[:,i]
...: sub = arr[j:j+2, k:k+2]
...: alist.append(sub)
...:
In [544]: np.array(alist)
Out[544]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
In [545]: _.shape
Out[545]: (3, 2, 2)
I simplified the iteration from:
...: for i in range(idx.shape[1]-1):
...: sub = arr[idx[0,i]:idx[0,i+1],idx[1,i]:idx[1,i+1]]
...: alist.append(sub)
to highlight the fact that we are generating ranges of a consistent size, and make the next transformation more obvious.
So I start with a (7,7) array, and create 3 (2,2) slices.
As I demonstrated in Slicing a different range at each index of a multidimensional numpy array, we can use linspace to expand a set of slices, or ranges.
In [567]: ranges = np.linspace(idx[:,:3],idx[:,:3]+1,2).astype(int)
In [568]: ranges
Out[568]:
array([[[0, 2, 4],
[1, 3, 5]],
[[1, 3, 5],
[2, 4, 6]]])
So ranges[0] expands on the idx[0] slices, etc. But if I simply index with these I get 'diagonal' values from Out[554]:
In [569]: arr[ranges[0], ranges[1]]
Out[569]:
array([[ 1, 17, 33],
[ 9, 25, 41]])
to get blocks I have to add a dimension to the first indices:
In [570]: arr[ranges[0,:,None], ranges[1]]
Out[570]:
array([[[ 1, 17, 33],
[ 2, 18, 34]],
[[ 8, 24, 40],
[ 9, 25, 41]]])
these are the same values as in Out[554], but need to be transposed:
In [571]: _.transpose(2,0,1)
Out[571]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
The code's a bit clunky and needs to get generalized, but gives the general idea of how one can substitute one indexing for the iterative one, provide the slices are regular enough. For this small example it probably isn't faster, but it probably will come ahead as the problem size gets larger.
I'd like to produce a function like split(arr, i, j), which divides array arr by axis i, j?
But I do not know how to do it. In the following method using array_split. It is impossible for me to obtain the two-dimensional array that we are seeking by merely dividing N-dimensional arrays into N-1 dimensional arrays.
import numpy as np
arr = np.arange(36).reshape(4,9)
dim = arr.ndim
ax = np.arange(dim)
arritr = [np.array_split(arr, arr.shape[ax[i]], ax[i]) for i in range(dim)]
print(arritr[0])
print(arritr[1])
How can I achieve this?
I believe you would like to slice array by axis(row, column). Here is the doc. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
arr[1,:] # will return all values at index 1(row index 1)
arr[:,1] # will return all values at column 1
I'm guessing a bit here, but it sounds like you want to divide the array into 4 blocks.
In [120]: arr = np.arange(36).reshape(6,6)
In [122]: [arr[:3,:4], arr[:3:,4:], arr[3:, :4], arr[3:,4:]]
Out[122]:
[array([[ 0, 1, 2, 3],
[ 6, 7, 8, 9],
[12, 13, 14, 15]]),
array([[ 4, 5],
[10, 11],
[16, 17]]),
array([[18, 19, 20, 21],
[24, 25, 26, 27],
[30, 31, 32, 33]]),
array([[22, 23],
[28, 29],
[34, 35]])]
Don't worry about efficiency. array_split does the same sort of slicing. Check its code to verify that.
If you want more slices, you could add more arr[i1:i2, j1:j2], for any mix of indices.
Are you looking for something like matlab's mat2cell? Then you could do:
import numpy as np
def ndsplit(a, splits):
assert len(splits) <= a.ndim
splits = [np.r_[0, s, m] for s, m in zip(splits, a.shape)]
return np.frompyfunc(lambda *x: a[tuple(slice(s[i],s[i+1]) for s, i in zip(splits, x))], len(splits), 1)(*np.indices(tuple(len(s) - 1 for s in splits)))
# demo
a = np.arange(56).reshape(7, 8)
print(ndsplit(a, [(2, 4), (1, 5, 6)]))
# [[array([[0],
# [8]])
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])
# array([[ 5],
# [13]]) array([[ 6, 7],
# [14, 15]])]
# [array([[16],
# [24]])
# array([[17, 18, 19, 20],
# [25, 26, 27, 28]])
# array([[21],
# [29]]) array([[22, 23],
# [30, 31]])]
# [array([[32],
# [40],
# [48]])
# array([[33, 34, 35, 36],
# [41, 42, 43, 44],
# [49, 50, 51, 52]])
# array([[37],
# [45],
# [53]])
# array([[38, 39],
# [46, 47],
# [54, 55]])]]
I'm working using numpy 1.6.2 and python 2.7.
Given an N x M x D matrix A and a matrix I that contains a list of indices.
I have to fill a zeros matrix ACopy with the sum of element of A according to the indeces found in I (see code).
Here is my code:
ACopy = zeros(A.shape)
for j in xrange(0, size(A, 0)):
i = I[j]
ACopy[j, i, :] = A[j, i, :] + A[j, i + 1, :]
Indices matrix:
I = array([2, 0, 3, 2, 1])
A matrix:
A = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]],
[[30, 31, 32],
[33, 34, 35],
[36, 37, 38],
[39, 40, 41],
[42, 43, 44]],
[[45, 46, 47],
[48, 49, 50],
[51, 52, 53],
[54, 55, 56],
[57, 58, 59]],
[[60, 61, 62],
[63, 64, 65],
[66, 67, 68],
[69, 70, 71],
[72, 73, 74]]])
I try to improve my code avoiding the for loop in this way:
r = r_[0:len(I)]
ACopy[r, I, :] = A[r, I, :] + A[r, I + 1, :]
I noticed that the output matrices ACopy are different and I can't understand why. Any idea?
Thank you all!
EDIT: I'm computing a lot of matrices and I try with np.array_equals(ACopy1,ACopy2), where ACopy1 is the output of the first method and ACopy2 the output of the second method. Sometimes the matrices are the same, but not everytime. The two methods output should be the same or are there any bordeline case?
EDIT2: I noticed that this strange behaviour happens only when matrix height is bigger than 256.
Here is my test suite:
from numpy import *
w = 5
h = 257
for i in xrange(1000):
Z = random.rand(w, h, 5)
I = (random.rand(w) * h - 1).astype(uint8)
r = r_[0:w]
ZCopy = zeros(Z.shape)
ZCopy2 = zeros(Z.shape)
for j in xrange(0, size(Z, 0)):
i = I[j]
ZCopy[j, i, :] = Z[j, i, :] + Z[j, i + 1, :]
ZCopy2[r, I, :] = Z[r, I, :] + Z[r, I + 1, :]
if (ZCopy - ZCopy2).any() > 0:
print(ZCopy, ZCopy2, I)
raise ValueError
I get the problem!
I cast the matrix I to uint8 and so matrix I elements are between 0 and 255.
I resolved using I = (random.rand(w) * h - 1).astype(uint32)