This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 6 years ago.
For example, I have two numpy arrays,
A = np.array(
[[0,1],
[2,3],
[4,5]])
B = np.array(
[[1],
[0],
[1]], dtype='int')
and I want to extract one element from each row of A, and that element is indexed by B, so I want the following results:
C = np.array(
[[1],
[2],
[5]])
I tried A[:, B.ravel()], but it'll broadcast B, not what I want. Also looked into np.take, seems not the right solution to my problem.
However, I could use np.choose by transposing A,
np.choose(B.ravel(), A.T)
but any other better solution?
You can use NumPy's purely integer array indexing -
A[np.arange(A.shape[0]),B.ravel()]
Sample run -
In [57]: A
Out[57]:
array([[0, 1],
[2, 3],
[4, 5]])
In [58]: B
Out[58]:
array([[1],
[0],
[1]])
In [59]: A[np.arange(A.shape[0]),B.ravel()]
Out[59]: array([1, 2, 5])
Please note that if B is a 1D array or a list of such column indices, you could simply skip the flattening operation with .ravel().
Sample run -
In [186]: A
Out[186]:
array([[0, 1],
[2, 3],
[4, 5]])
In [187]: B
Out[187]: [1, 0, 1]
In [188]: A[np.arange(A.shape[0]),B]
Out[188]: array([1, 2, 5])
C = np.array([A[i][j] for i,j in enumerate(B)])
Related
I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]
In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)
I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop
I have a 2D array:
>>> in_arr = np.array([[1,2],[4,3]])
array([[1, 2],
[4, 3]])
and I find the sorted indices by columns to yield another 2D array:
>>> col_sort = np.argsort(in_arr, axis=1)
array([[0, 1],
[1, 0]])
I would like to know the efficient numpy slice to index the first by the second:
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(in_arr, col_sort, axis=1)
array([[1, 2],
[3, 4]])
The intention is to then perform a (more complicated) function on the array by column, e.g.:
>>> arr_with_function = reordered_in_arr ** np.array([1,2])
array([[1, 4],
[3, 16]])
and return the elements to their original position in the array
>>> return_order = np.argsort(col_sort, axis=1)
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(arr_with_function, return_order, axis=1)
array([[1, 4],
[16, 3]])
Ok so thinking about it as I type I might just use apply_over_axis, but I would still like know how to the above efficiently in case it is of value later..
If you want to do all those operations in-place then you don't need argsort(). Numpy supports in-place operations in such situations:
In [12]: in_arr = np.array([[1,2],[4,3]])
In [13]: in_arr.sort(axis=1)
In [14]: in_arr **= [1, 2]
In [15]: in_arr
Out[15]:
array([[ 1, 4],
[ 3, 16]])
But if you need the indices of the sorted items you can get the expected result with a simple indexing.
In [18]: in_arr[np.arange(2)[:,None], col_sort]
Out[18]:
array([[1, 2],
[3, 4]])
I want to reduce the dimensions of an array after converting it to a list
a = np.array([[1,2],[3,4]])
print a.shape
b = np.array([[1],[3,4]])
print b.shape
Output:
(2, 2)
(2,)
I want a to have the same shape as b i.e. (2,)
>>> a = np.array([[1,2],[3,4], None])[:2]
>>> a
array([[1, 2], [3, 4]], dtype=object)
>>> a.shape
(2,)
Works, though is probably the wrong way to do it (I'm a numpy newb).
Do you understand what b is?
b = np.array([[1],[3,4]])
print(repr(b))
array([[1], [3, 4]], dtype=object)
b is a 1d array with 2 elements, each a list. np.array does this way because the 2 sublists have different length, so it can't create a 2d array.
a = np.array([[1,2],[3,4]])
print(repr(a))
array([[1, 2],
[3, 4]])
Here the 2 sublists have the same length, so it can create a 2d array. Each element is an integer. np.array tries to create the highest dimensional array that the input allows.
Probably the best way to create another array like b is to make a copy, and insert the desired lists.
a1 = b.copy()
a1[0] = [1,2]
# a1[1] = [3,4]
print(repr(a1))
array([[1, 2], [3, 4]], dtype=object)
You have to use this convoluted method because you trying to do something 'unnatural'.
You comment about using vstack. Both work:
In [570]: np.vstack((a,b)) # (3,2) array
Out[570]:
array([[1, 2],
[3, 4],
[[1], [3, 4]]], dtype=object)
In [571]: np.vstack((a1,b)) # (2,2) array
Out[571]:
array([[[1, 2], [3, 4]],
[[1], [3, 4]]], dtype=object)
Your array b is little more than the original list in an array wrapper. Is that really what you need? The 2d a is a normal numpy array. b is an oddball construction.
I have searched long and hard for an answer to this question, but haven't found anything that quite fits the bill. I have a multidimensional numpy array containing data (in my case 3 dimensional) and another array (2 dimensional) that contains information on which value I want along the last dimension of the original array. For instance, here is a simple example illustrating the problem. I have an array a of data, and another array b containing indices along dimension 2 of a. I want a new two dimensional array c where c[i, j] = a[i, j, b[i, j]].The only way that I can think to do it is with a loop, as outlined below. However, this seems clunky and slow.
In [3]: a = np.arange(8).reshape((2, 2, 2))
In [4]: a
Out[4]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [6]: b = np.array([[0, 1], [1, 1]])
In [8]: c = np.zeros_like(b)
In [9]: for i in xrange(2):
...: for j in xrange(2):
...: c[i, j] = a[i, j, b[i, j]]
In [10]: c
Out[10]:
array([[0, 3],
[5, 7]])
Is there a more pythonic way of doing this, perhaps some numpy indexing feature of which I am unaware?
When you fancy-index a multidimensional array with multidimensional arrays, the indices for each dimension are broadcasted together. With that in mind, you can do:
>>> rows = np.arange(a.shape[0])
>>> cols = np.arange(a.shape[1])
>>> a[rows[:, None], cols, b]
array([[0, 3],
[5, 7]])
In [40]: a = np.arange(8).reshape((2, 2, 2))
In [41]: b = np.array([[0, 1], [1, 1]])
In [42]: i = np.array([[0,0],[1,1]])
In [43]: a[i,i.T,b]
Out[43]:
array([[0, 3],
[5, 7]])
or using ix_ to generate the indexes:
In [47]: j = np.ix_([0,1],[0,1])
In [48]: a[j[0],j[1],b]
Out[48]:
array([[0, 3],
[5, 7]])
In [49]: j
Out[49]:
(array([[0],
[1]]), array([[0, 1]]))
or with ogrid
In [101]: i = np.ogrid[0:2,0:2]
In [102]: i.append(b)
In [103]: a[i]
Out[103]:
array([[0, 3],
[5, 7]])