using boolean array for indexing in numpy for 2D arrays - python

I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]

In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)

Related

Question about a array as another array index

I am confused about a operation of numpy, which using a matrix as index, like:
# a and b both are matrix:
a[b] = 0.0
The complete code are as following.
import numpy as np
a = np.mat('1 1; 1 2')
b = np.mat('0 0; 1 1')
print("a: ", a)
print("b: ", b)
a[~b] = 0.0
print("a: ", a)
And I get result are as following, but I don't know why.
a:
[[1 1]
[1 2]]
b:
[[0 0]
[1 1]]
a:
[[0 0]
[0 0]]
The ~ is not something we use much with numpy - so lets check that by itself:
In [184]: ~b
Out[184]:
matrix([[-1, -1],
[-2, -2]], dtype=int32)
Negative indices "count" from the end.
In [185]: a[~b]
Out[185]:
matrix([[[1, 2],
[1, 2]],
[[1, 1],
[1, 1]]])
In [186]: a[~b].shape
Out[186]: (2, 2, 2)
This result puzzles me, because np.matrix is supposed to restricted to 2d.
We are trying to discourage the use of np.matrix, but if I convert a and b to ndarray, the results are the same:
In [189]: a = np.mat('1 1; 1 2').A
...: b = np.mat('0 0; 1 1').A
In [192]: a[~b]
Out[192]:
array([[[1, 2],
[1, 2]],
[[1, 1],
[1, 1]]])
indexing with an array, acts just on one dimension, here the first; so it's using a (2,2) array to index the first dimension of a, resulting in a (2,2,2). This action would be clearer if the dimensions weren't all 2.
In [197]: a[~b, :]
Out[197]:
array([[[1, 2], # index is [-1,-1], the last row
[1, 2]],
[[1, 1], # index is [-2,-2], the 2nd to last row (first)
[1, 1]]])
Since this indexing selects both rows, when used as a setter, both rows are set to 0. So this is puzzling largely because ~b produces a numeric index array.
I wonder if instead, this code was meant to do boolean array indexing.
In [203]: b.astype(bool)
Out[203]:
array([[False, False],
[ True, True]])
Now ~ is a logical not:
In [204]: ~(b.astype(bool))
Out[204]:
array([[ True, True],
[False, False]])
Indexing with a boolean that matches in shape, selects/or/not on an element by element basis:
In [205]: a[~(b.astype(bool))]
Out[205]: array([1, 1])
In [206]: a[(b.astype(bool))]
Out[206]: array([1, 2])
Now the the 0 assignment just sets the first row.
In [207]: a[~(b.astype(bool))]=0
In [208]: a
Out[208]:
array([[0, 0],
[1, 2]])
The boolean array indexing would be clear with this example:
In [211]: b = np.array([[0,1],[1,0]], bool)
In [212]: b
Out[212]:
array([[False, True],
[ True, False]])
In [213]: a = np.arange(1,5).reshape(2,2); a
Out[213]:
array([[1, 2],
[3, 4]])
In [214]: a[b] # select the opposite corners
Out[214]: array([2, 3])
In [215]: a[~b] # select the diagonal
Out[215]: array([1, 4])

Selecting whole subarrays given a multidimensional index [duplicate]

This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 6 years ago.
For example, I have two numpy arrays,
A = np.array(
[[0,1],
[2,3],
[4,5]])
B = np.array(
[[1],
[0],
[1]], dtype='int')
and I want to extract one element from each row of A, and that element is indexed by B, so I want the following results:
C = np.array(
[[1],
[2],
[5]])
I tried A[:, B.ravel()], but it'll broadcast B, not what I want. Also looked into np.take, seems not the right solution to my problem.
However, I could use np.choose by transposing A,
np.choose(B.ravel(), A.T)
but any other better solution?
You can use NumPy's purely integer array indexing -
A[np.arange(A.shape[0]),B.ravel()]
Sample run -
In [57]: A
Out[57]:
array([[0, 1],
[2, 3],
[4, 5]])
In [58]: B
Out[58]:
array([[1],
[0],
[1]])
In [59]: A[np.arange(A.shape[0]),B.ravel()]
Out[59]: array([1, 2, 5])
Please note that if B is a 1D array or a list of such column indices, you could simply skip the flattening operation with .ravel().
Sample run -
In [186]: A
Out[186]:
array([[0, 1],
[2, 3],
[4, 5]])
In [187]: B
Out[187]: [1, 0, 1]
In [188]: A[np.arange(A.shape[0]),B]
Out[188]: array([1, 2, 5])
C = np.array([A[i][j] for i,j in enumerate(B)])

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

Numpy 2D array indexing by other 2D along specific axis

I have a 2D array:
>>> in_arr = np.array([[1,2],[4,3]])
array([[1, 2],
[4, 3]])
and I find the sorted indices by columns to yield another 2D array:
>>> col_sort = np.argsort(in_arr, axis=1)
array([[0, 1],
[1, 0]])
I would like to know the efficient numpy slice to index the first by the second:
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(in_arr, col_sort, axis=1)
array([[1, 2],
[3, 4]])
The intention is to then perform a (more complicated) function on the array by column, e.g.:
>>> arr_with_function = reordered_in_arr ** np.array([1,2])
array([[1, 4],
[3, 16]])
and return the elements to their original position in the array
>>> return_order = np.argsort(col_sort, axis=1)
>>> redordered_in_arr = np.*SOME_SLICE_METHOD*(arr_with_function, return_order, axis=1)
array([[1, 4],
[16, 3]])
Ok so thinking about it as I type I might just use apply_over_axis, but I would still like know how to the above efficiently in case it is of value later..
If you want to do all those operations in-place then you don't need argsort(). Numpy supports in-place operations in such situations:
In [12]: in_arr = np.array([[1,2],[4,3]])
In [13]: in_arr.sort(axis=1)
In [14]: in_arr **= [1, 2]
In [15]: in_arr
Out[15]:
array([[ 1, 4],
[ 3, 16]])
But if you need the indices of the sorted items you can get the expected result with a simple indexing.
In [18]: in_arr[np.arange(2)[:,None], col_sort]
Out[18]:
array([[1, 2],
[3, 4]])

Numpy Mutidimensional Subsetting

I have searched long and hard for an answer to this question, but haven't found anything that quite fits the bill. I have a multidimensional numpy array containing data (in my case 3 dimensional) and another array (2 dimensional) that contains information on which value I want along the last dimension of the original array. For instance, here is a simple example illustrating the problem. I have an array a of data, and another array b containing indices along dimension 2 of a. I want a new two dimensional array c where c[i, j] = a[i, j, b[i, j]].The only way that I can think to do it is with a loop, as outlined below. However, this seems clunky and slow.
In [3]: a = np.arange(8).reshape((2, 2, 2))
In [4]: a
Out[4]:
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
In [6]: b = np.array([[0, 1], [1, 1]])
In [8]: c = np.zeros_like(b)
In [9]: for i in xrange(2):
...: for j in xrange(2):
...: c[i, j] = a[i, j, b[i, j]]
In [10]: c
Out[10]:
array([[0, 3],
[5, 7]])
Is there a more pythonic way of doing this, perhaps some numpy indexing feature of which I am unaware?
When you fancy-index a multidimensional array with multidimensional arrays, the indices for each dimension are broadcasted together. With that in mind, you can do:
>>> rows = np.arange(a.shape[0])
>>> cols = np.arange(a.shape[1])
>>> a[rows[:, None], cols, b]
array([[0, 3],
[5, 7]])
In [40]: a = np.arange(8).reshape((2, 2, 2))
In [41]: b = np.array([[0, 1], [1, 1]])
In [42]: i = np.array([[0,0],[1,1]])
In [43]: a[i,i.T,b]
Out[43]:
array([[0, 3],
[5, 7]])
or using ix_ to generate the indexes:
In [47]: j = np.ix_([0,1],[0,1])
In [48]: a[j[0],j[1],b]
Out[48]:
array([[0, 3],
[5, 7]])
In [49]: j
Out[49]:
(array([[0],
[1]]), array([[0, 1]]))
or with ogrid
In [101]: i = np.ogrid[0:2,0:2]
In [102]: i.append(b)
In [103]: a[i]
Out[103]:
array([[0, 3],
[5, 7]])

Categories

Resources