Related
I looked into other posts related to indexing numpy array with another numpy array, but still could not wrap my head around to accomplish the following:
a = [[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]],
b = [[[1,0],[0,1]],[[1,1],[0,1]]]
a[b] = [[[7,8,9],[4,5,6]],[[10,11,12],[4,5,6]]]
a is an image represented by 3D numpy array, with dimension 2 * 2 * 3 with RGB values for the last dimension. b contains the index that will match to the image. For instance for pixel index (0,0), it should map to index (1,0) of the original image, which should give pixel values [7,8,9]. I wonder if there's a way to achieve this. Thanks!
Here's one way:
In [54]: a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
In [55]: b = np.array([[[1, 0], [0, 1]], [[1, 1], [0, 1]]])
In [56]: a[b[:, :, 0], b[:, :, 1]]
Out[56]:
array([[[ 7, 8, 9],
[ 4, 5, 6]],
[[10, 11, 12],
[ 4, 5, 6]]])
I have a Matrix of indices I e.g.
I = np.array([[1, 0, 2], [2, 1, 0]])
The index at i-th row selects an element from another Matrix M in the i-th row.
So having M e.g.
M = np.array([[6, 7, 8], [9, 10, 11])
M[I] should select:
[[7, 6, 8], [11, 10, 9]]
I could have:
I1 = np.repeat(np.arange(0, I.shape[0]), I.shape[1])
I2 = np.ravel(I)
Result = M[I1, I2].reshape(I.shape)
but this looks very complicated and I am looking for a more elegant solution. Preferably without flattening and reshaping.
In the example I used numpy, but I am actually using jax. So if there is a more efficient solution in jax, feel free to share.
In [108]: I = np.array([[1, 0, 2], [2, 1, 0]])
...: M = np.array([[6, 7, 8], [9, 10, 11]])
...:
...: I,M
I had to add a ']' to M.
Out[108]:
(array([[1, 0, 2],
[2, 1, 0]]),
array([[ 6, 7, 8],
[ 9, 10, 11]]))
Advanced indexing with broadcasting:
In [110]: M[np.arange(2)[:,None],I]
Out[110]:
array([[ 7, 6, 8],
[11, 10, 9]])
THe first index has shape (2,1) which pairs with the (2,3) shape of I to select a (2,3) block of values.
How about this one line code? The idea is to enumerate both the rows and the row indices of the matrix, so you can access the corresponding rows in the indexing matrix.
import numpy as np
I = np.array([[1, 0, 2], [2, 1, 0]])
M = np.array([[6, 7, 8], [9, 10, 11]])
Result = np.array([row[I[i]] for i, row in enumerate(M)])
print(Result)
Output:
[[ 7 6 8]
[11 10 9]]
np.take_along_axis can also be used here to take values of M using indices I over axis=1:
>>> np.take_along_axis(M, I, axis=1)
array([[ 7, 6, 8],
[11, 10, 9]])
I am dealing with a very large multi-dimensional data , but let me take a 2D array for example. Given a value array that is changing every iteration,
arr = np.array([[ 1, 2, 3, 4, 5], [5, 6, 7, 8, 9]]) # a*b
and an index array that is fixed all the time.
idx = np.array([[[0, 1, 1], [-1, -1, -1]],
[[5, 1, 3], [1, -1, -1]]]) # n*h*w, where n = a*b,
Here -1 means no index will be applied. And I wish to get a result
res = np.array([[1+2+2, 0],
[5+2+4, 2]]) # h*w
In real practice, I am doing with a very large 3D tensor (n ~ trillions), with a very sparse idx (i.e. lots of -1). As idx is fixed, my current solution is to pre-compute a n*(h*w) array index_tensor by filling 0 and 1, and then do
tmp = arr.reshape(1, n)
res = (tmp # index_tensor).reshape([h,w])
It works fine but takes a huge memory to store the index_tensor. Is there any approach that I can take the advantage of the sparsity and unchangeableness of idx to reduce the memory cost and keep a fair running speed in python (using numpy or pytorch would be the best)? Thanks in advance!
Ignoring the -1 complication for the moment, the straight forward indexing and summation is:
In [58]: arr = np.array([[ 1, 2, 3, 4, 5], [5, 6, 7, 8, 9]])
In [59]: idx = np.array([[[0, 1, 1], [2, 4, 6]],
...: [[5, 1, 3], [1, -1, -1]]])
In [60]: arr.flat[idx]
Out[60]:
array([[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, 9, 9]]])
In [61]: _.sum(axis=-1)
Out[61]:
array([[ 5, 14],
[11, 20]])
One way (not necessarily fast or memory efficient) of dealing with the -1 is with a masked array:
In [62]: mask = idx<0
In [63]: mask
Out[63]:
array([[[False, False, False],
[False, False, False]],
[[False, False, False],
[False, True, True]]])
In [65]: ma = np.ma.masked_array(Out[60],mask)
In [67]: ma
Out[67]:
masked_array(
data=[[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, --, --]]],
mask=[[[False, False, False],
[False, False, False]],
[[False, False, False],
[False, True, True]]],
fill_value=999999)
In [68]: ma.sum(axis=-1)
Out[68]:
masked_array(
data=[[5, 14],
[11, 2]],
mask=[[False, False],
[False, False]],
fill_value=999999)
Masked arrays deal with operations like the sum by replacing the masked values with something neutral, such as 0 for the case of sums.
(I may revisit this in the morning).
sum with matrix product
In [72]: np.einsum('ijk,ijk->ij',Out[60],~mask)
Out[72]:
array([[ 5, 14],
[11, 2]])
This is more direct, and faster, than the masked array approach.
You haven't elaborated on constructing the index_tensor so I won't try to compare it.
Another possibility is to pad the array with a 0, and adjust indexing:
In [83]: arr1 = np.hstack((0,arr.ravel()))
In [84]: arr1
Out[84]: array([0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9])
In [85]: arr1[idx+1]
Out[85]:
array([[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, 0, 0]]])
In [86]: arr1[idx+1].sum(axis=-1)
Out[86]:
array([[ 5, 14],
[11, 2]])
sparse
A first stab at using sparse matrix:
Reshape idx to 2d:
In [141]: idx1 = np.reshape(idx,(4,3))
make a sparse tensor from that. For a start I'll go the iterative lil approach, though usually constructing coo (or even csr) inputs directly is faster:
In [142]: M = sparse.lil_matrix((4,10),dtype=int)
...: for i in range(4):
...: for j in range(3):
...: v = idx1[i,j]
...: if v>=0:
...: M[i,v] = 1
...:
In [143]: M
Out[143]:
<4x10 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in List of Lists format>
In [144]: M.A
Out[144]:
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
This can then be used for a sum of products:
In [145]: M#arr.ravel()
Out[145]: array([ 3, 14, 11, 2])
Using M.A#arr.ravel() is essentially what you do. While M is sparse, arr is not. For this case M.A# is faster than M#.
I want to initialise a numpy array of a specific shape such that when I append numbers to it it will 'fill up' in that shape.
The length of the array will vary - and that is fine I do not mind how long it is - but I want it to have 4 columns. Ideally somthing similar to the following:
array = np.array([:, 4])
print(array)
array = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
Again the actual length of the array would not be defines. That way if I was to append a different array it would work as follows
test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
array = np.append(array, test_array)
print(array)
array = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]
Is there any way to do this?
If I understand well your issue, I think you do not need to initialize your array.
You sould check first that your array size divides by 4.
import numpy as np
l = test_array.shape[0]
cols = 4
rows = l / cols
my_array = np.reshape(test_array, (rows, cols))
The kind of behavior that you seek is unusual. You should explain why you need it. If you want something readily grows, use Python list. numpy arrays have a fixed size. Values can be assigned to an array in various ways, but to grow it, you need to create a new array with some version of concatenate. (Yes, there is a resize function/method, but that's not commonly used.)
I'll illustrate the value assignment options:
Initial an array with a known size. In your case the 5 could be larger than anticipated, and the 4 is the desired number of 'columns'.
In [1]: arr = np.zeros((5,4), dtype=int)
In [2]: arr
Out[2]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
Assign 4 values to one row:
In [3]: arr[0] = [1,2,3,4]
Assign 3 values starting at a given point in a flat view of the array:
In [4]: arr.flat[4:7] = [1,2,3]
In [5]: arr
Out[5]:
array([[1, 2, 3, 4],
[1, 2, 3, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
This array, while defined as (5,4) shape, can be viewed as (20,) 1d array. I had to choose the appropriate slice values in the flat view.
More commonly we assign values to a block of rows (or a variety of other indexed areas). arr[2:, :] is a (3,4) portion of arr. So we need to assign (3,4) array to it (or an equivalent list structure). To get full benefit of this sort of assignment you need to read up on broadcasting.
In [6]: arr[2:,:] = np.reshape(list(range(10,22)),(3,4))
In [7]: arr
Out[7]:
array([[ 1, 2, 3, 4],
[ 1, 2, 3, 0],
[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]])
In [8]: arr.ravel()
Out[8]:
array([ 1, 2, 3, 4, 1, 2, 3, 0, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21])
Lets say we have a 3D array like:
array = np.arange(8).reshape(2,2, 2)
new_array = np.zeros((2, 2, 2))
and lets assume we have some new random x,y,z indices for our array
x,y,z = np.meshgrid(array, array, array)
What is the fastest way to re-index our array?
A simple solution given here:
for x in range(0, 3):
for y in range(0, 3):
for z in range(0, 3):
new_x = x_coord[x,y,z]
new_y = y_coord[x,y,z]
new_z = z_coord[x,y,z]
new_array[x,y,z] = array[new_x, new_y, new_z]
Is there a one-liner for this that I am not aware of?
EDIT
Yes, there is... very easy:
vol = np.arange(8).reshape(2,2, 2)
arr = np.arange(2)
x,y,z = np.meshgrid(arr, arr, arr)
print(vol)
print(vol[y, x, z]) ### ---> You have to swap the axes here tho. Does anyone know why?
[[[0 1]
[2 3]]
[[4 5]
[6 7]]]
[[[0 1]
[2 3]]
[[4 5]
[6 7]]]
Also, it is very slow. Any ideas how to improve the performance?
Setup:
In [54]: arr = np.arange(9).reshape(3,3)
In [55]: x = np.random.randint(0,3,(3,3))
In [56]: y = np.random.randint(0,3,(3,3))
In [57]: x
Out[57]:
array([[2, 0, 1],
[0, 2, 1],
[0, 0, 1]])
In [58]: y
Out[58]:
array([[0, 0, 0],
[0, 1, 1],
[0, 1, 0]])
The simplest application of these indexing arrays:
In [59]: arr[x,y]
Out[59]:
array([[6, 0, 3],
[0, 7, 4],
[0, 1, 3]])
The iterative equivalent:
In [60]: out = np.empty_like(arr)
In [61]: for i in range(3):
...: for j in range(3):
...: out[i,j] = arr[x[i,j], y[i,j]]
...:
In [62]: out
Out[62]:
array([[6, 0, 3],
[0, 7, 4],
[0, 1, 3]])
Your code isn't the same, because it is modifying the source array as it iterates:
In [63]: arr1 = arr.copy()
In [64]: for i in range(3):
...: for j in range(3):
...: arr1[i,j] = arr1[x[i,j], y[i,j]]
...:
In [65]: arr1
Out[65]:
array([[6, 6, 3],
[6, 7, 7],
[6, 6, 6]])
There isn't a simple equivalent.
You can index with arr[x_coord,y_coord,z_coord] as long a indexing arrays broadcast together. Where they all have the same shape that is trivial.
In [68]: x1 = np.random.randint(0,3,(2,4))
In [69]: x1
Out[69]:
array([[2, 0, 2, 0],
[0, 0, 0, 2]])
In [70]: arr[x1,x1]
Out[70]:
array([[8, 0, 8, 0],
[0, 0, 0, 8]])
A simpler way of picking random values from an array is to create random row and column selectors, and use ix_ to create arrays that broadcast together:
In [71]: x1 = np.random.randint(0,3,(3))
In [72]: y1 = np.random.randint(0,3,(3))
In [75]: np.ix_(x1,y1)
Out[75]:
(array([[2],
[1],
[1]]), array([[2, 2, 1]]))
In [76]: arr[np.ix_(x1,y1)]
Out[76]:
array([[8, 8, 7],
[5, 5, 4],
[5, 5, 4]])
Almost sounds like you just want to shuffle the values of the array, like:
In [95]: arr
Out[95]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [96]: np.random.shuffle(arr.ravel())
In [97]: arr
Out[97]:
array([[0, 1, 2],
[7, 4, 3],
[6, 5, 8]])