Unexpected behaviour when indexing a 2D np.array with two boolean arrays

Unexpected behaviour when indexing a 2D np.array with two boolean arrays - python

two_d = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
first = np.array((True, True, False, False, False))
second = np.array((False, False, False, True, True))
Now, when I enter:
two_d[first, second]
I get:
array([3,9])
which doesn't make a whole lot of sense to me. Can anybody explain that simply?

When given multiple boolean arrays to index with, NumPy pairs up the indices of the True values. The first true value in first in paired with the first true value in second, and so on. NumPy then fetches the elements at each of these (x, y) indices.
This means that two_d[first, second] is equivalent to:
two_d[[0, 1], [3, 4]]
In other words you're retrieving the values at index (0, 3) and index (1, 4); 3 and 9. Note that if the two arrays had different numbers of true values an error would be raised!
The documents on advanced indexing mention this behaviour briefly and suggest np.ix_ as a 'less surprising' alternative:
Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.
Hence you may be looking for:
>>> two_d[np.ix_(first, second)]
array([[3, 4],
[8, 9]])

Check the documentation on boolean indexing.
two_d[first, second] is the same as two_d[first.nonzero(), second.nonzero()], where:
>>> first.nonzero()
(array([0, 1]),)
>>> second.nonzero()
(array([3, 4]),)
Used as indices, this will select 3 and 9 because
>>> two_d[0,3]
3
>>> two_d[1,4]
9
and
>>> two_d[[0,1],[3,4]]
array([3, 9])
Also mildy related: NumPy indexing using List?

Related

Slicing arrays with lists

So, I create a numpy array:
a = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
A conventional slice a[1:3,1:3] returns
array([[ 6, 7],
[11, 12]])
as does using a list in the second a[1:3,[1,2]]
array([[ 6, 7],
[11, 12]])
However, a[[1,2],[1,2]] returns
array([ 6, 12])
Obviously I am not understanding something here. That said, slicing with a list might on occasion be very useful.
Cheers,
keng

You observed effect of so-called Advanced Indexing. Let consider example from link:
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
[[1 2]
[3 4]
[5 6]]
print(x[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
You might think about this as providing lists of (Cartesian) coordinates of grid, as
print(x[0,1]) # 1
print(x[1,1]) # 4
print(x[2,0]) # 5

In the last case, the two individual lists are treated as separate indexing operations (this is really awkward wording so please bear with me).
Numpy sees two lists of two integers and decides that you are therefore asking for two values. The row index of each value comes from the first list, while the column index of each value comes from the second list. Therefore, you get a[1,1] and a[2,2]. The : notation not only expands to the list you've accurately deduced, but also tells numpy that you want all the rows/columns in that range.
If you provide manually curated list indices, they have to be of the same size, because the size of each/any list is the number of elements you'll get back. For example, if you wanted the elements in columns 1 and 2 of rows 1,2,3:
>>> a[1:4,[1,2]]
array([[ 6, 7],
[11, 12],
[16, 17]])
But
>>> a[[1,2,3],[1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
The former tells numpy that you want a range of rows and specific columns, while the latter says "get me the elements at (1,1), (2,2), and (3, hey! what the?! where's the other index?)"

a[[1,2],[1,2]] is reading this as, I want a[1,1] and a[2,2]. There are a few ways around this and I likely don't even have the best ways but you could try
a[[1,1,2,2],[1,2,1,2]]
This will give you a flattened version of above
a[[1,2]][:,[1,2]]
This will give you the correct slice, it works be taking the rows [1,2] and then columns [1,2].

It triggers advanced indexing so first slice is the row index, second is the column index. For each row, it selects the corresponding column.
a[[1,2], [1,2]] -> [a[1, 1], a[2, 2]] -> [6, 12]

Most efficient way to test whether each element from one 1-d array exists in corresponding row of another 2-d array, using python

I would like to know the most efficient way to test whether each element from one 1-d array exists in corresponding row of another 2-d array, using python
Specifically, I have two arrays. The first is an 1-d array of integers. The second is a 2-d array of integers.
Sample input:
[1, 4, 12, 9] # array 1
[[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]] # array 2
Expected output:
[True, False, False, True]

You can reshape a to 2d array, compare with b and then check if there's any True in each row:
np.equal(np.reshape(a, (-1,1)), b).any(axis=1)
a = [1, 4, 12, 9] # array 1

b = [[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]]
np.equal(np.reshape(a, (-1,1)), b).any(1)
# array([ True, False, False, True], dtype=bool)

This is a fairly pythonic solution to your problem. Note that this works virtually identically for numpy arrays and standard lists. Definitely not the most efficient solution for huge numpy arrays, but I doubt this would be any type of performance bottleneck, and pursuing premature optimization over readability is a cardinal sin.
a = [1, 4, 12, 9]
b = [
[1, 12, 299],
[2, 5, 11],
[1, 3, 11],
[0, 1, 9]
]
c = [x in y for x, y in zip(a,b)]

Remove rows from numpy matrix

I have a list of integers which represent positions in a matrix (centre). For example,
centre = [2, 50, 100, 89]
I also have two numpy matrices, X and Y. I need to delete all of the rows from the matrix if the number is in the centre. I can do this:
for each in centre:
x = numpy.delete(x, (each), axis=0)
However, the numbers will all be out as the index's will all be out. So, how can I do this?

Just do the delete in one call:
In [266]: B
Out[266]:
array([[ 2, 4, 6],
[ 8, 10, 12],
[14, 16, 18],
[20, 22, 24]])
In [267]: B1=np.delete(B,[1,3],axis=0)
In [268]: B1
Out[268]:
array([[ 2, 4, 6],
[14, 16, 18]])
You question is a little confusing. I'm assuming that you want to delete rows by index number, not by some sort of content (not like the list find).
However if you must iterate (as with a list) do it in reverse order - that way indexing doesn't get messed up. You may have to sort the indices first (np.delete doesn't require that).
In [269]: B1=B.copy()
In [270]: for i in [1,3][::-1]:
...: B1=np.delete(B1,i,axis=0)
A list example that has to be iterative:
In [276]: B1=list(range(10))
In [277]: for i in [1,3,5,7][::-1]:
...: del B1[i]
In [278]: B1
Out[278]: [0, 2, 4, 6, 8, 9]
=============
With a list input like this, np.delete does the equivalent of:
In [285]: mask=np.ones((4,),bool)
In [286]: mask[[1,3]]=False
In [287]: mask
Out[287]: array([ True, False, True, False], dtype=bool)
In [288]: B[mask,:]
Out[288]:
array([[ 2, 4, 6],
[14, 16, 18]])

How do I return a nonflat numpy array selecting elements given a set of conditions?

I have a multidimensional array, say of shape (4, 3) that looks like
a = np.array([(1,2,3),(4,5,6),(7,8,9),(10,11,12)])
If I have a list of fixed conditions
conditions = [True, False, False, True]
How can I return the list
array([(1,2,3),(10,11,12)])
Using np.extract returns
>>> np.extract(conditions, a)
array([1, 4])
which only returns the first element along each nested array, as opposed to the array itself. I wasn't sure if or how I could do this with np.where. Any help is much appreciated, thanks!

Let's define you variables:
>>> import numpy as np
>>> a = np.array([(1,2,3),(4,5,6),(7,8,9),(10,11,12)])
>>> conditions = [True, False, False, True]
Now, let's select the elements that you want:
>>> a[np.array(conditions)]
array([[ 1, 2, 3],
[10, 11, 12]])
Aside
Note that the simpler a[conditions] has some ambiguity:
>>> a[conditions]
-c:1: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
array([[4, 5, 6],
[1, 2, 3],
[1, 2, 3],
[4, 5, 6]])
As you can see, conditions are treated here as (integer-like) index values which is not what we wanted.

you can use simple list slicing and np.where It's more or less made specifically for this situation..
>>> a[np.where(conditions)]
array([[[ 1, 2, 3],
[10, 11, 12]]])

Create array with elements inserted without using np.insert

I have two arrays, say,
n = [1,2,3,4,5,6,7,8,9]
nc = [3,0,2,0,1,2,0,0,0]
The nonzero elements in nc are ncz = [3,2,1,2]. The elements in n corresponding to non zero elements in nc are p = [1,3,5,6]. I need to create a new array with elements of p[1:] inserted after ncz.cumsum()[:-1]+1 i.e after [4,6,7]
Is there any way to do this without using np.insert or a for loop?
Suppose I have m such pairs of arrays. Would I be able to do the same thing for each pair without using a loop? The resulting arrays can be zero padded to bring them to the same shape.
The result would be [1, 2, 3, 4, 3, 5, 6, 5, 7, 6, 8, 9]
To do it using np.insert, one would do:
n = np.array([1,2,3,4,5,6,7,8,9])
nc = np.array([3,0,2,0,1,2,0,0,0])
p1 = n[nc.nonzero()][1:]
ncz1 = nc[nc.nonzero()][:-1].cumsum()
result = np.insert(n,ncz1+1,p1)
I know how to do this using numpy insert operation, but I need to replicate it in theano and theano doesn't have an insert op.

Because of its generality np.insert is rather complex (but available for study), but for your case, with a 1d array, and order insert points, it can be simplified to
np.insert(n, i, p1) with:
In [688]: n
Out[688]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [689]: p1
Out[689]: array([13, 15, 16])
In [690]: i
Out[690]: array([4, 6, 7], dtype=int32)
Target array, z, and the insertion points in that array:
In [691]: j=i+np.arange(len(i))
In [692]: z=np.zeros(len(n)+len(i),dtype=n.dtype)
make a boolean mask - True where n values go, False where p1 values go.
In [693]: ind=np.ones(z.shape,bool)
In [694]: ind[j]=False
In [695]: ind
Out[695]:
array([ True, True, True, True, False, True, True, False, True,
False, True, True], dtype=bool)
copy values in to the right slots:
In [696]: z[ind]=n
In [697]: z[~ind]=p1 # z[j]=p1 would also work
In [698]: z
Out[698]: array([ 1, 2, 3, 4, 13, 5, 6, 15, 7, 16, 8, 9])
This is typical of array operations that return a new array of a different size. Make the target, and copy the appropriate values. This is true even when the operations are done in compiled numpy code (e.g. concatenate).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unexpected behaviour when indexing a 2D np.array with two boolean arrays - python

Related

Slicing arrays with lists

Most efficient way to test whether each element from one 1-d array exists in corresponding row of another 2-d array, using python

Remove rows from numpy matrix

How do I return a nonflat numpy array selecting elements given a set of conditions?

Create array with elements inserted without using np.insert

Categories

Resources