Numpy Mutidimensional Subsetting - python

I have searched long and hard for an answer to this question, but haven't found anything that quite fits the bill. I have a multidimensional numpy array containing data (in my case 3 dimensional) and another array (2 dimensional) that contains information on which value I want along the last dimension of the original array. For instance, here is a simple example illustrating the problem. I have an array a of data, and another array b containing indices along dimension 2 of a. I want a new two dimensional array c where c[i, j] = a[i, j, b[i, j]].The only way that I can think to do it is with a loop, as outlined below. However, this seems clunky and slow.
In [3]: a = np.arange(8).reshape((2, 2, 2))
In [4]: a
Out[4]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [6]: b = np.array([[0, 1], [1, 1]])
In [8]: c = np.zeros_like(b)
In [9]: for i in xrange(2):
...: for j in xrange(2):
...: c[i, j] = a[i, j, b[i, j]]
In [10]: c
Out[10]:
array([[0, 3],
[5, 7]])
Is there a more pythonic way of doing this, perhaps some numpy indexing feature of which I am unaware?

When you fancy-index a multidimensional array with multidimensional arrays, the indices for each dimension are broadcasted together. With that in mind, you can do:
>>> rows = np.arange(a.shape[0])
>>> cols = np.arange(a.shape[1])
>>> a[rows[:, None], cols, b]
array([[0, 3],
[5, 7]])

In [40]: a = np.arange(8).reshape((2, 2, 2))
In [41]: b = np.array([[0, 1], [1, 1]])
In [42]: i = np.array([[0,0],[1,1]])
In [43]: a[i,i.T,b]
Out[43]:
array([[0, 3],
[5, 7]])
or using ix_ to generate the indexes:
In [47]: j = np.ix_([0,1],[0,1])
In [48]: a[j[0],j[1],b]
Out[48]:
array([[0, 3],
[5, 7]])
In [49]: j
Out[49]:
(array([[0],
[1]]), array([[0, 1]]))
or with ogrid
In [101]: i = np.ogrid[0:2,0:2]
In [102]: i.append(b)
In [103]: a[i]
Out[103]:
array([[0, 3],
[5, 7]])

Related

Selecting whole subarrays given a multidimensional index [duplicate]

This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 6 years ago.
For example, I have two numpy arrays,
A = np.array(
[[0,1],
[2,3],
[4,5]])
B = np.array(
[[1],
[0],
[1]], dtype='int')
and I want to extract one element from each row of A, and that element is indexed by B, so I want the following results:
C = np.array(
[[1],
[2],
[5]])
I tried A[:, B.ravel()], but it'll broadcast B, not what I want. Also looked into np.take, seems not the right solution to my problem.
However, I could use np.choose by transposing A,
np.choose(B.ravel(), A.T)
but any other better solution?
You can use NumPy's purely integer array indexing -
A[np.arange(A.shape[0]),B.ravel()]
Sample run -
In [57]: A
Out[57]:
array([[0, 1],
[2, 3],
[4, 5]])
In [58]: B
Out[58]:
array([[1],
[0],
[1]])
In [59]: A[np.arange(A.shape[0]),B.ravel()]
Out[59]: array([1, 2, 5])
Please note that if B is a 1D array or a list of such column indices, you could simply skip the flattening operation with .ravel().
Sample run -
In [186]: A
Out[186]:
array([[0, 1],
[2, 3],
[4, 5]])
In [187]: B
Out[187]: [1, 0, 1]
In [188]: A[np.arange(A.shape[0]),B]
Out[188]: array([1, 2, 5])
C = np.array([A[i][j] for i,j in enumerate(B)])

Is there a better way to vstack a numpy array from an empty array, like a list array?

I wish to vstack a numpy.array (like building a list) but, I cannot initialize the numpy.array with the correct shape to use numpy.append(numpy.empty/zero/like_empty, etc. did not do the trick... anyway. Finally, I figure the two pieces of code below. Is there someyhing more pythonic? I am using python 3.6.9
import numpy as np
a=[]
n=4
for i in range(n):
'''
some calculation resultinng for example in an numpy.array([[i,i+1,i+2])
'''
a.append(np.array([i,i+1,i+2]))
a=np.array(a).reshape(3,n)
print(a)
or because I prefer to mantain it as a numpy array inside the loop:
import numpy as np
a=np.array([])
n=4
for i in range(n):
'''
some calculation resultinng for example in an numpy.array [i,i+1,i+2]
'''
if a.size == 0:
a=np.array([i,i+1,i+2])
else:
a=np.vstack((a,np.array([i,i+1,i+2])))
print(a)
both output:
[[0 1 2]
[1 2 3]
[2 3 4]
[3 4 5]]
Your first use, with list append:
In [146]: alist=[]
In [147]: for i in range(4):
...: alist.append(np.arange(i,i+3))
...:
In [148]: alist
Out[148]: [array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5])]
and make the array:
In [149]: np.array(alist)
Out[149]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
or since vstack is happy with a list of arrays:
In [150]: np.vstack(alist)
Out[150]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
You could use vstack in the loop:
In [151]: arr = np.zeros((0,3),int)
In [152]: for i in range(4):
...: arr = np.vstack((arr, np.arange(i,i+3)))
...:
In [153]: arr
Out[153]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
This has two problems:
it is slower; list append operates in-place simply adding a pointer to the list. vstack makes whole new array each time, with a full copy!
it is harder to initialize, as you found out. You actually have to understand array shapes, and what concatenate does when it combines 2 or more arrays. Here I started with a (0,3) array.
np.array([np.arange(i, i+3) for i in range(n)])

using boolean array for indexing in numpy for 2D arrays

I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]
In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

Numpy 2D array indexing by other 2D along specific axis

I have a 2D array:
>>> in_arr = np.array([[1,2],[4,3]])
array([[1, 2],
[4, 3]])
and I find the sorted indices by columns to yield another 2D array:
>>> col_sort = np.argsort(in_arr, axis=1)
array([[0, 1],
[1, 0]])
I would like to know the efficient numpy slice to index the first by the second:
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(in_arr, col_sort, axis=1)
array([[1, 2],
[3, 4]])
The intention is to then perform a (more complicated) function on the array by column, e.g.:
>>> arr_with_function = reordered_in_arr ** np.array([1,2])
array([[1, 4],
[3, 16]])
and return the elements to their original position in the array
>>> return_order = np.argsort(col_sort, axis=1)
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(arr_with_function, return_order, axis=1)
array([[1, 4],
[16, 3]])
Ok so thinking about it as I type I might just use apply_over_axis, but I would still like know how to the above efficiently in case it is of value later..
If you want to do all those operations in-place then you don't need argsort(). Numpy supports in-place operations in such situations:
In [12]: in_arr = np.array([[1,2],[4,3]])
In [13]: in_arr.sort(axis=1)
In [14]: in_arr **= [1, 2]
In [15]: in_arr
Out[15]:
array([[ 1, 4],
[ 3, 16]])
But if you need the indices of the sorted items you can get the expected result with a simple indexing.
In [18]: in_arr[np.arange(2)[:,None], col_sort]
Out[18]:
array([[1, 2],
[3, 4]])

Categories

Resources