Question about a array as another array index - python

I am confused about a operation of numpy, which using a matrix as index, like:
# a and b both are matrix:
a[b] = 0.0
The complete code are as following.
import numpy as np
a = np.mat('1 1; 1 2')
b = np.mat('0 0; 1 1')
print("a: ", a)
print("b: ", b)
a[~b] = 0.0
print("a: ", a)
And I get result are as following, but I don't know why.
a:
[[1 1]
[1 2]]
b:
[[0 0]
[1 1]]
a:
[[0 0]
[0 0]]

The ~ is not something we use much with numpy - so lets check that by itself:
In [184]: ~b
Out[184]:
matrix([[-1, -1],
[-2, -2]], dtype=int32)
Negative indices "count" from the end.
In [185]: a[~b]
Out[185]:
matrix([[[1, 2],
[1, 2]],
[[1, 1],
[1, 1]]])
In [186]: a[~b].shape
Out[186]: (2, 2, 2)
This result puzzles me, because np.matrix is supposed to restricted to 2d.
We are trying to discourage the use of np.matrix, but if I convert a and b to ndarray, the results are the same:
In [189]: a = np.mat('1 1; 1 2').A
...: b = np.mat('0 0; 1 1').A
In [192]: a[~b]
Out[192]:
array([[[1, 2],
[1, 2]],
[[1, 1],
[1, 1]]])
indexing with an array, acts just on one dimension, here the first; so it's using a (2,2) array to index the first dimension of a, resulting in a (2,2,2). This action would be clearer if the dimensions weren't all 2.
In [197]: a[~b, :]
Out[197]:
array([[[1, 2], # index is [-1,-1], the last row
[1, 2]],
[[1, 1], # index is [-2,-2], the 2nd to last row (first)
[1, 1]]])
Since this indexing selects both rows, when used as a setter, both rows are set to 0. So this is puzzling largely because ~b produces a numeric index array.
I wonder if instead, this code was meant to do boolean array indexing.
In [203]: b.astype(bool)
Out[203]:
array([[False, False],
[ True, True]])
Now ~ is a logical not:
In [204]: ~(b.astype(bool))
Out[204]:
array([[ True, True],
[False, False]])
Indexing with a boolean that matches in shape, selects/or/not on an element by element basis:
In [205]: a[~(b.astype(bool))]
Out[205]: array([1, 1])
In [206]: a[(b.astype(bool))]
Out[206]: array([1, 2])
Now the the 0 assignment just sets the first row.
In [207]: a[~(b.astype(bool))]=0
In [208]: a
Out[208]:
array([[0, 0],
[1, 2]])
The boolean array indexing would be clear with this example:
In [211]: b = np.array([[0,1],[1,0]], bool)
In [212]: b
Out[212]:
array([[False, True],
[ True, False]])
In [213]: a = np.arange(1,5).reshape(2,2); a
Out[213]:
array([[1, 2],
[3, 4]])
In [214]: a[b] # select the opposite corners
Out[214]: array([2, 3])
In [215]: a[~b] # select the diagonal
Out[215]: array([1, 4])

Related

why there is deference between the output type of this two Numpy slice commands

The output of the two commands below gives a different array shape, I do appreciate explaining why and referring me to a reference if any, I searched the internet but did not find any clear explanation for it.
data.shape
(11,2)
# outputs the values in column-0 in an (1x11) array.
data[:,0]
array([-7.24070e-01, -2.40724e+00, 2.64837e+00, 3.60920e-01,
6.73120e-01, -4.54600e-01, 2.20168e+00, 1.15605e+00,
5.06940e-01, -8.59520e-01, -5.99700e-01])
# outputs the values in column-0 in an (11x1) array
data[:,:-1]
array([[-7.24070e-01],
[-2.40724e+00],
[ 2.64837e+00],
[ 3.60920e-01],
[ 6.73120e-01],
[-4.54600e-01],
[ 2.20168e+00],
[ 1.15605e+00],
[ 5.06940e-01],
[-8.59520e-01],
[-5.99700e-01]])
I'll try to consolidate the comments into an answer.
First look at Python list indexing
In [92]: alist = [1,2,3]
selecting an item:
In [93]: alist[0]
Out[93]: 1
making a copy of the whole list:
In [94]: alist[:]
Out[94]: [1, 2, 3]
or a slice of length 2, or 1 or 0:
In [95]: alist[:2]
Out[95]: [1, 2]
In [96]: alist[:1]
Out[96]: [1]
In [97]: alist[:0]
Out[97]: []
Arrays follow the same basic rules
In [98]: x = np.arange(12).reshape(3,4)
In [99]: x
Out[99]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Select a row:
In [100]: x[0]
Out[100]: array([0, 1, 2, 3])
or a column:
In [101]: x[:,0]
Out[101]: array([0, 4, 8])
x[0,1] selects an single element.
https://numpy.org/doc/stable/user/basics.indexing.html#single-element-indexing
Indexing with a slice returns multiple rows:
In [103]: x[0:2]
Out[103]:
array([[0, 1, 2, 3],
[4, 5, 6, 7]])
In [104]: x[0:1] # it retains the dimensions, even if only 1 (or even 0)
Out[104]: array([[0, 1, 2, 3]])
Likewise for columns:
In [106]: x[:,0:1]
Out[106]:
array([[0],
[4],
[8]])
subslices on both dimensions:
In [107]: x[0:2,1:3]
Out[107]:
array([[1, 2],
[5, 6]])
https://numpy.org/doc/stable/user/basics.indexing.html
x[[0]] also returns a 2d array, but that gets into "advanced" indexing (which doesn't have a list equivalent).

using boolean array for indexing in numpy for 2D arrays

I use boolean indexing to select elements from a numpy array as
x = y[t<tmax]
where t a numpy array with as many elements as y. My question is how can I do the same with 2D numpy arrays? I tried
x = y[t<tmax][t<tmax]
This does not seem to work however since it seems to select first the rows and then complains that the second selection has the wrong dimension.
IndexError: boolean index did not match indexed array along dimension 0; dimension is 50 but corresponding boolean dimension is 200
#
Here is an example
x1D = np.array([1,2,3], np.int32)
x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
print(x1D[x1D<3]) --> [1 2]
print(x2D[x1D<3][x1D<3]) --> error
The second print statement produces an error similar to the error shown above. I use
print(x2D[x1D<3])
I get
[[1 2 3]
[1 2 3]]
but I want
[[1 2]
[1 2]]
In [28]: x1D = np.array([1,2,3], np.int32)
...: x2D = np.array([[1,2,3],[1,2,3],[1,2,3]], np.int32)
The 1d mask:
In [29]: x1D<3
Out[29]: array([ True, True, False])
applied to the 1d array (same size):
In [30]: x1D[_]
Out[30]: array([1, 2], dtype=int32)
applied to the 2d it selects 2 rows:
In [31]: x2D[_29]
Out[31]:
array([[1, 2, 3],
[1, 2, 3]], dtype=int32)
It can be used again to select columns - but note the : place holder for the row index:
In [32]: _[:, _29]
Out[32]:
array([[1, 2],
[1, 2]], dtype=int32)
If we generate an indexing array from that mask, we can do the indexing with one step:
In [37]: idx = np.nonzero(x1D<3)
In [38]: idx
Out[38]: (array([0, 1]),)
In [39]: x2D[idx[0][:,None], idx[0]]
Out[39]:
array([[1, 2],
[1, 2]], dtype=int32)
An alternate way of writing this '2d' indexing:
In [41]: x2D[ [[0],[1]], [[0,1]] ]
Out[41]:
array([[1, 2],
[1, 2]], dtype=int32)
ix_ is a convenient tool for tweaking the indexing dimensions:
In [42]: x2D[np.ix_(idx[0], idx[0])]
Out[42]:
array([[1, 2],
[1, 2]], dtype=int32)
Or passing the boolean mask to ix_:
In [44]: np.ix_(_29, _29)
Out[44]:
(array([[0],
[1]]), array([[0, 1]]))
In [45]: x2D[np.ix_(_29, _29)]
Out[45]:
array([[1, 2],
[1, 2]], dtype=int32)
Writing In[32] so it's close to to your try:
In [46]: x2D[x1D<3][:, x1D<3]
Out[46]:
array([[1, 2],
[1, 2]], dtype=int32)

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

How this numpy advance indexing code works?

I am learning numpy framework.This piece of code I don't understand.
import numpy as np
a =np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
print(a)
row = np.array([[0,0],[3,3]])
col = np.array([[0,2],[0,2]])
b = a[row,col]
print("This is b array:",b)
This b array returns the corner values of a array, that is, b equals [[0,2],[9,11]].
When indexing is done using an array or "array-like", to access/modify the elements of an array, then it's called advanced indexing.
In [37]: a
Out[37]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [38]: row
Out[38]:
array([[0, 0],
[3, 3]])
In [39]: col
Out[39]:
array([[0, 2],
[0, 2]])
In [40]: a[row, col]
Out[40]:
array([[ 0, 2],
[ 9, 11]])
That's what you got. Below is an explanation:
Indices of
`a[row, col]` row column
|| || || ||
VV VV VV VV
a[0, 0] a[0, 2]
a[3, 0] a[3, 2]
|__________| |
row-idx array |
|__________|
column-idx array
You're indexing a using two equally shaped 2d-arrays, hence you're output array will also have the same shape as col and row. To better understand how array indexing works you can check the docs, where as shown, indexing with 1d-arrays over the existing axis' of a given array works as follows:
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Where the same logic applies in the case of indexing with 2d-arrays over each axis, but instead you'd have a result array with up to i_N_M indices.
So going back to your example you are essentially selecting from the rows of a based on row, and from those rows you are selecting some columns col. You might find it more intuitive to translate the row and column indices into (x,y) coordinates:
(0,0), (0,2)
(3,0), (3,2)
Which, by accordingly selecting from a, results in the output array:
print(a[row,col])
array([[ 0, 2],
[ 9, 11]])
You can understand it by making more tries, to see more examples.
If you have one dimensional index:
In [58]: np.arange(10)[np.array([1,3,4,6])]
Out[58]: array([1, 3, 4, 6])
In case of two dimensional index:
In [57]: np.arange(10)[np.array([[1,3],[4,6]])]
Out[57]:
array([[1, 3],
[4, 6]])
If you use 3 dimensional index:
In [59]: np.arange(10)[np.array([[[1],[3]],[[4],[6]]])]
Out[59]:
array([[[1],
[3]],
[[4],
[6]]])
As you can see, if you make hierarchy in indexing, you will get it in the output as well.
Proceeding by steps:
import numpy as np
a = np.array([[0,1,2],[3,4,5],[6,7,8],[9,10,11]])
print(a)
gives 2d array a:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Then:
row = np.array([[0,0],[3,3]])
assigns to 2d array row values [0,0] and [3,3]:
array([[0, 0],
[3, 3]])
Then:
col = np.array([[0,2],[0,2]])
assigns to 2d array col values [0,2] and [0,2]:
array([[0, 2],
[0, 2]])
Finally:
b = a[row,col]
assigns to b values given by a[0,0], a[0,2] for the first row, a[3,0], a[3,2] for the second row, that is:
array([[ 0, 2],
[ 9, 11]])
Where does b[0,0] <-- a[0,0] come from? It comes from the combination of row[0,0] which is 0 and col[0,0] which is 0.
What about b[0,1] <-- a[0,2]? It comes from the combination of row[0,1] which is 0 and col[0,1] which is 2.
And so forth.

Boolean masking on multiple axes with numpy

I want to apply boolean masking both to rows and columns.
With
X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]
I expect the output to be
array([[1,2],[4,5]])
instead of
array([1,5])
It's known that
X[:, mask2]
can be used here but that's not a solution for the general case.
I would like to know how it works under the hood and why in this case the result is array([1,5]).
X[mask1, mask2] is described in Boolean Array Indexing Doc as the equivalent of
In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])
In effect it is giving you X[0,0] and X[1,1] (pairing the 0s and 1s).
What you want instead is:
In [251]: X[[[0],[1]], [0,1]]
Out[251]:
array([[1, 2],
[4, 5]])
np.ix_ is a handy tool for creating the right mix of dimensions
In [258]: np.ix_([0,1],[0,1])
Out[258]:
(array([[0],
[1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]:
array([[1, 2],
[4, 5]])
That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.
But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]
But that reference section says:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
In [260]: X[np.ix_(mask1, mask2)]
Out[260]:
array([[1, 2],
[4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]:
(array([[0],
[1]], dtype=int32), array([[0, 1]], dtype=int32))
The boolean section of ix_:
if issubdtype(new.dtype, _nx.bool_):
new, = new.nonzero()
So it works with a mix like X[np.ix_(mask1, [0,2])]
One solution would be to use sequential integer indexing and getting the integers for example from np.where:
>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
[4, 5]])
or as #user2357112 pointed out in the comments np.ix_ could be used as well. For example:
>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
[4, 5]])
Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:
>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])
>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
[4, 5]])
In a more general sense, your question is bout finding the subpart of an array containing certain rows and columns.
main_array = np.array([[1,2,3],[4,5,6]])
mask_ax_0 = np.array([True, True]) # about which rows, i want
mask_ax_1 = np.array([True, True, False]) # which columns, i want
Answer:
mask_2d = np.logical_and(mask_ax_0.reshape(-1,1), mask_ax_1.reshape(1,-1))
sub_array = main_array[mask_2d].reshape(np.sum(mask_ax_0), np.sum(mask_ax_1))
print(sub_array)
You should be using the numpy.ma module.
In particular, you could use mask_rowcols :
X = np.array([[1,2,3],[4,5,6]])
linesmask = np.array([True, True])
colsmask = np.array([True, True, False])
X = X.view(ma.MaskedArray)
for i in range(len(linesmask)):
X.mask[i][0] = not linemask[i]
for j in range(len(colsmask)):
X.mask[0][j] = not colsmask[j]
X = ma.mask_rowcols(X)

Categories

Resources