What is going on behind this numpy selection behavior?

What is going on behind this numpy selection behavior? - python

Answering this question, some others and I were actually wrong by considering that the following would work:
Say one has
test = [ [ [0], 1 ],
[ [1], 1 ]
]
import numpy as np
nptest = np.array(test)
What is the reason behind
>>> nptest[:,0]==[1]
array([False, False], dtype=bool)
while one has
>>> nptest[0,0]==[1],nptest[1,0]==[1]
(False, True)
or
>>> nptest==[1]
array([[False, True],
[False, True]], dtype=bool)
or
>>> nptest==1
array([[False, True],
[False, True]], dtype=bool)
Is it the degeneracy in term of dimensions which causes this.

nptest is a 2D array of object dtype, and the first element of each row is a list.
nptest[:, 0] is a 1D array of object dtype, each of whose elements are lists.
When you do nptest[:,0]==[1], NumPy does not perform an elementwise comparison of each element of nptest[:,0] against the list [1]. It creates as high-dimensional an array as it can from [1], producing the 1D array np.array([1]), and then broadcasts the comparison, comparing each element of nptest[:,0] against the integer 1.
Since no list in nptest[:, 0] is equal to 1, all elements of the result are False.

Related

Numpy 2D indexing of a 1D array with known min, max indices

I have a 1D numpy array of False booleans, and a 2D numpy array containing the min,max indices of values in the first array to change to True.
An example:
my_data = numpy.zeros((10,), dtype=bool)
inds2true = numpy.array([[1, 3], [8, 9]])
And I want the following result:
out = numpy.array([False, True, True, True, False, False, False, False, True, True])
How is this possible in Python with Numpy?
Edit: I would like this to be performed in one step (i.e. no looping).

There's one rule-breaking hack:
my_data[inds2true] = True
my_data = np.cumsum(my_data) % 2 == 1
my_data
>>> array([False, True, True, False, False, False, False, False, True, False])
The most common practise is to change indices within np.arange([1, 3]) and np.arange([8, 9]), not including 3 or 9. If you still want to include them, do in addition: my_data[inds2true[:, 1]] = True
If you're looking for other options to do it in one go, the most probably it will include np.cumsum tricks.

import numpy as np
my_data = np.zeros((10,), dtype=bool)
inds2true = np.array([[1, 3], [8, 9]])
indeces = []
for ix_range in inds2true:
indeces += list(range(ix_range[0], ix_range[1] + 1))
my_data[indeces] = True

Get boolean array indicating which elements in array which belong to a list

This seems to be a simple question but I am struggling with errors from quite some time.
Imagine an array
a = np.array([2,3,4,5,6])
I want to test which elements in the array belong to another list
[2,3,6]
If I do
a in [2,3,6]
Python raises "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
In return, i would like to get a boolean array-like
array([ True, True, False, False, True], dtype=bool)

Use np.isin to create a boolean mask then use np.argwhere on this mask to find the indices of array elements that are non-zero:
m = np.isin(a, lst)
indices = np.argwhere(m)
# print(m)
array([ True, True, False, False, True])
# print(indices)
array([[0], [1], [4]])

import numpy as np
arr1 = np.array([2,3,4,5,6])
arr2 = np.array([2,3,6])
arr_result = [bool(a1 in arr2) for a1 in arr1]
print(arr_result)
I have used simple list-comprehension logic to do this.
Output:
[True,True,False,False,True]

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Python allows for a simple check if a string is contained in another string:
'ab' in 'abcd'
which evaluates to True.
Now take a numpy array of strings and you can do this:
import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)
A0[:,None] != A0
Resulting in a boolean array:
array([[False, True, True],
[ True, False, True],
[ True, True, False]], dtype=bool)
Lets now take another array:
A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)
I want to check where a string of A0 is not contained in a string in A1, essentially creating unique combinations, but the following does not yield a boolean array, only a single boolean, regardless of how I write the indices:
A0[:,None] not in A1
I also tried using numpy.in1d and np.ndarray.__contains__ but those methods don't seem to do the trick either.
Performance is an issue here so I want to make full use of numpy's optimizations.
How do I achieve this?
EDIT:
I found it can be done like this:
fv = np.vectorize(lambda x,y: x not in y)
fv(A0[:,None],A1)
But as the numpy docs state:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
So this is the same as just looping over the array, and it would be nice to solve this without explicit or implicit for-loop.

We can convert to string dtype and then use one of those NumPy based string functions.
Thus, using np.char.count, one solution would be -
np.char.count(A1.astype(str),A0.astype(str)[:,None])==0
Alternative using np.char.find -
np.char.find(A1.astype(str),A0.astype(str)[:,None])==-1
One more using np.char.rfind -
np.char.rfind(A1.astype(str),A0.astype(str)[:,None])==-1
If we are converting one to str dtype, we can skip the conversion for the other array, as internally it would be done anyway. So, the last method could be simplified to -
np.char.rfind(A1.astype(str),A0[:,None])==-1
Sample run -
In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)
In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)
In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
Out[99]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)
# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)
In [102]: fv(A0[:,None],A1)
Out[102]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)

Numpy Chain Indexing

I am trying to gain a better understanding of numpy and have come across something I can't quite understand when it comes to indexing.
Let's say we have this first array of random booleans
bools = np.random.choice([True, False],(7),p=[0.5,0.5])
array([False, True, False, False, True, False, False], dtype=bool)
Then let's also say we have this second array of random numbers selected from a normal distribution
data = np.random.randn(7,3)
array([[ 2.24116809, -0.41761776, -0.69026077],
[-0.85450123, 0.98218741, 0.0233551 ],
[-1.3157436 , -0.79753471, 1.77393444],
[-0.26672724, -0.9532758 , 0.67114247],
[-1.34177843, 1.220083 , -0.35341168],
[ 0.49629327, 1.73943962, 0.59050431],
[ 0.01609382, 0.91396293, 0.3754827 ]])
Using the numpy chain indexing I can do this
data[bools, 2:]
array([[ 0.0233551 ],
[-0.35341168]])
Now let's say I want to simply grab the first element, I can do this
data[bools, 2:][0]
array([ 0.0233551])
But why does this, data[bools, 2:, 0] not work?

But why does this, data[bools, 2:, 0] not work?
Because the input is a 2D array and as such you don't have three dimensions there to use something like : [bools, 2:, 0].
To achieve what you want you are trying to do, you could store the indices corresponding to the True ones in the mask bools and then use it as whole or one element from it for indexing.
A sample run to make things clear -
Inputs :
In [40]: data
Out[40]:
array([[ 1.02429045, 1.74104271, -0.54634826],
[-0.48451969, 0.83455196, 1.94444857],
[ 0.66504345, 0.41821317, 2.52517305],
[ 2.11428982, -0.05769528, 0.84432614],
[ 0.9251009 , -0.74646199, -0.93573164],
[ 0.07321257, -0.10708067, 1.78107884],
[-0.12961046, -0.5787856 , 0.2189466 ]])
In [41]: bools
Out[41]: array([ True, True, False, False, False, False, True], dtype=bool)
Store the valid indices :
In [42]: idx = np.flatnonzero(bools)
In [43]: idx
Out[43]: array([0, 1, 6])
Use as a whole or its first element :
In [44]: data[idx, 2:] # Same as data[bools, 2:]
Out[44]:
array([[-0.54634826],
[ 1.94444857],
[ 0.2189466 ]])
In [45]: data[idx[0], 2:]
Out[45]: array([-0.54634826])

I haven't seen 2d numpy indexing called 'chaining'
data is 2d, and thus can be indexed with a 2 element tuple
data[bools, 2:]
data([bools, slice(2,None,None))]
That can also be expressed as
data[bools,:][:,2:]
where it first selects from rows, and then from columns.
Notice that your indexing produces a (2,1) array; 2 from the number of True in bool, and 1 from the length of the 2: slice.
Your 2nd indexing with [0] is really a row selection:
data[bools, 2:][0]
data[bools, 2:][0,:]
The result is a (1,) array, the size of the 2nd dimension of the intermediate array.

Using numpy any() in bool array of arrays

I have a list of lists which are composed by bools, let's say l = [[False, False], [True, False]], and I need to convert l to a numpy array of arrays of booleans. I converted every sublist into a bool array, and the whole list to numpy array too. My current real list has a size of 121 sublists, and the result of np.any() throws just five results, not the 121 expected. My code is this:
>>> result = np.array([ np.array(extracted[aindices[i]:aindices[i + 1]]) for i in range(len(aux_regions)) ])
>>> np.any(result)
[false, false, false, false, false]
extracted[aindices[i]:aindices[i + 1]] is the sublist which I convert to a bool array. The list generated in the whole line is converted to array too.
In the first example l the expected result is, for every subarray (asuming the list as converted) should be [False, True]
What's is the problem using np.any? or the data types for the converted list are not the right ones?

If you have a list of list of bools, you could skip numpy and use a simple comprehension:
In [1]: l = [[False, False], [True, False]]
In [2]: [any(subl) for subl in l]
Out[2]: [False, True]
If the sublists are all the same length, you can pass the list directly to np.array to get a numpy array of bools:
In [3]: import numpy as np
In [4]: result = np.array(l)
In [5]: result
Out[5]:
array([[False, False],
[ True, False]], dtype=bool)
Then you can use the any method on axis 1 to get the result for each row:
In [6]: result.any(axis=1) # or `np.any(result, axis=1)`
Out[6]: array([False, True], dtype=bool)
If the sublists are not all the same length, then a numpy array might not be the best data structure for this problem.
This part of my answer should be considered a "side bar" to what I wrote above. If the sublists have variable lengths, the list comprehension given above is my recommendation. The following is an alternative that uses an advanced numpy feature. I only suggest it because it looks like you already have the data structures needed to used numpy's reduceat function. It works without having to explicitly form the list of lists.
From reading your code, I infer the following:
extracted is a list of bools. You are splitting this up into sublists.
aindices is a list of integers. Each consecutive pair of integers in aindices specifies a range in extracted that is a sublist.
len(aux_regions) is the number of sublists; I'll call this n. The length of aindices is n+1, and the last value in aindices is the length of extracted.
For example, if the data looks like this:
In [74]: extracted
Out[74]: [False, True, False, False, False, False, True, True, True, True, False, False]
In [75]: aindices
Out[75]: [0, 3, 7, 10, 12]
it means there are four sublists:
In [76]: extracted[0:3]
Out[76]: [False, True, False]
In [77]: extracted[3:7]
Out[77]: [False, False, False, True]
In [78]: extracted[7:10]
Out[78]: [True, True, True]
In [79]: extracted[10:12]
Out[79]: [False, False]
With these data structures, you are set up to use the reduceat feature of numpy. The ufunc in this case is logical_or. You can compute the result with this one line:
In [80]: np.logical_or.reduceat(extracted, aindices[:-1])
Out[80]: array([ True, True, True, False], dtype=bool)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is going on behind this numpy selection behavior? - python

Related

Numpy 2D indexing of a 1D array with known min, max indices

Get boolean array indicating which elements in array which belong to a list

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Numpy Chain Indexing

Using numpy any() in bool array of arrays

Categories

Resources