Using numpy any() in bool array of arrays - python

I have a list of lists which are composed by bools, let's say l = [[False, False], [True, False]], and I need to convert l to a numpy array of arrays of booleans. I converted every sublist into a bool array, and the whole list to numpy array too. My current real list has a size of 121 sublists, and the result of np.any() throws just five results, not the 121 expected. My code is this:
>>> result = np.array([ np.array(extracted[aindices[i]:aindices[i + 1]]) for i in range(len(aux_regions)) ])
>>> np.any(result)
[false, false, false, false, false]
extracted[aindices[i]:aindices[i + 1]] is the sublist which I convert to a bool array. The list generated in the whole line is converted to array too.
In the first example l the expected result is, for every subarray (asuming the list as converted) should be [False, True]
What's is the problem using np.any? or the data types for the converted list are not the right ones?

If you have a list of list of bools, you could skip numpy and use a simple comprehension:
In [1]: l = [[False, False], [True, False]]
In [2]: [any(subl) for subl in l]
Out[2]: [False, True]
If the sublists are all the same length, you can pass the list directly to np.array to get a numpy array of bools:
In [3]: import numpy as np
In [4]: result = np.array(l)
In [5]: result
Out[5]:
array([[False, False],
[ True, False]], dtype=bool)
Then you can use the any method on axis 1 to get the result for each row:
In [6]: result.any(axis=1) # or `np.any(result, axis=1)`
Out[6]: array([False, True], dtype=bool)
If the sublists are not all the same length, then a numpy array might not be the best data structure for this problem.
This part of my answer should be considered a "side bar" to what I wrote above. If the sublists have variable lengths, the list comprehension given above is my recommendation. The following is an alternative that uses an advanced numpy feature. I only suggest it because it looks like you already have the data structures needed to used numpy's reduceat function. It works without having to explicitly form the list of lists.
From reading your code, I infer the following:
extracted is a list of bools. You are splitting this up into sublists.
aindices is a list of integers. Each consecutive pair of integers in aindices specifies a range in extracted that is a sublist.
len(aux_regions) is the number of sublists; I'll call this n. The length of aindices is n+1, and the last value in aindices is the length of extracted.
For example, if the data looks like this:
In [74]: extracted
Out[74]: [False, True, False, False, False, False, True, True, True, True, False, False]
In [75]: aindices
Out[75]: [0, 3, 7, 10, 12]
it means there are four sublists:
In [76]: extracted[0:3]
Out[76]: [False, True, False]
In [77]: extracted[3:7]
Out[77]: [False, False, False, True]
In [78]: extracted[7:10]
Out[78]: [True, True, True]
In [79]: extracted[10:12]
Out[79]: [False, False]
With these data structures, you are set up to use the reduceat feature of numpy. The ufunc in this case is logical_or. You can compute the result with this one line:
In [80]: np.logical_or.reduceat(extracted, aindices[:-1])
Out[80]: array([ True, True, True, False], dtype=bool)

Related

Numpy 2D indexing of a 1D array with known min, max indices

I have a 1D numpy array of False booleans, and a 2D numpy array containing the min,max indices of values in the first array to change to True.
An example:
my_data = numpy.zeros((10,), dtype=bool)
inds2true = numpy.array([[1, 3], [8, 9]])
And I want the following result:
out = numpy.array([False, True, True, True, False, False, False, False, True, True])
How is this possible in Python with Numpy?
Edit: I would like this to be performed in one step (i.e. no looping).
There's one rule-breaking hack:
my_data[inds2true] = True
my_data = np.cumsum(my_data) % 2 == 1
my_data
>>> array([False, True, True, False, False, False, False, False, True, False])
The most common practise is to change indices within np.arange([1, 3]) and np.arange([8, 9]), not including 3 or 9. If you still want to include them, do in addition: my_data[inds2true[:, 1]] = True
If you're looking for other options to do it in one go, the most probably it will include np.cumsum tricks.
import numpy as np
my_data = np.zeros((10,), dtype=bool)
inds2true = np.array([[1, 3], [8, 9]])
indeces = []
for ix_range in inds2true:
indeces += list(range(ix_range[0], ix_range[1] + 1))
my_data[indeces] = True

Check if each element in a numpy array is in a separate list

I'd like to do something like this:
>>> y = np.arange(5)
>>> y in (0, 1, 2)
array([True, True, True, False, False])
This syntax doesn't work. What's the best way to achieve the desired result?
(I'm looking for a general solution. Obviously in this specific case I could do y < 3.)
I'll spell this out a little more clearly for you guys, since at least a few people seem to be confused.
Here is a long way of getting my desired behavior:
new_y = np.empty_like(y)
for i in range(len(y)):
if y[i] in (0, 1, 2):
new_y[i] = True
else:
new_y[i] = False
I'm looking for this behavior in a more compact form.
Here's another solution:
new_y = np.array([True if item in (0, 1, 2) else False for item in y])
Again, just looking for a simpler way.
A good general purpose tool is a broadcasted, or 'outer', comparison between elements of two arrays:
In [35]: y=np.arange(5)
In [36]: x=np.array([0,1,2])
In [37]: y[:,None]==x
Out[37]:
array([[ True, False, False],
[False, True, False],
[False, False, True],
[False, False, False],
[False, False, False]])
This is doing a fast comparison between every element of y and every element of x. Depending on your needs, you can condense this array along one of the axes:
In [38]: (y[:,None]==x).any(axis=1)
Out[38]: array([ True, True, True, False, False])
A comment suggested in1d. I think it's a good idea to look at its code. It has several strategies depending on the relative sizes of the inputs.
In [40]: np.in1d(y,x)
Out[40]: array([ True, True, True, False, False])
In [41]: np.array([True if item in x else False for item in y])
Out[41]: array([ True, True, True, False, False])
Which is fastest may depend on the size of the inputs. Starting lists your list comprehension might be faster. This pure list version is by far the fastest:
[True if item in (0,1,2) else False for item in (0,1,2,3,4)]
[item in (0,1,2) for item in (0,1,2,3,4)] # simpler

What is going on behind this numpy selection behavior?

Answering this question, some others and I were actually wrong by considering that the following would work:
Say one has
test = [ [ [0], 1 ],
[ [1], 1 ]
]
import numpy as np
nptest = np.array(test)
What is the reason behind
>>> nptest[:,0]==[1]
array([False, False], dtype=bool)
while one has
>>> nptest[0,0]==[1],nptest[1,0]==[1]
(False, True)
or
>>> nptest==[1]
array([[False, True],
[False, True]], dtype=bool)
or
>>> nptest==1
array([[False, True],
[False, True]], dtype=bool)
Is it the degeneracy in term of dimensions which causes this.
nptest is a 2D array of object dtype, and the first element of each row is a list.
nptest[:, 0] is a 1D array of object dtype, each of whose elements are lists.
When you do nptest[:,0]==[1], NumPy does not perform an elementwise comparison of each element of nptest[:,0] against the list [1]. It creates as high-dimensional an array as it can from [1], producing the 1D array np.array([1]), and then broadcasts the comparison, comparing each element of nptest[:,0] against the integer 1.
Since no list in nptest[:, 0] is equal to 1, all elements of the result are False.

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Python allows for a simple check if a string is contained in another string:
'ab' in 'abcd'
which evaluates to True.
Now take a numpy array of strings and you can do this:
import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)
A0[:,None] != A0
Resulting in a boolean array:
array([[False, True, True],
[ True, False, True],
[ True, True, False]], dtype=bool)
Lets now take another array:
A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)
I want to check where a string of A0 is not contained in a string in A1, essentially creating unique combinations, but the following does not yield a boolean array, only a single boolean, regardless of how I write the indices:
A0[:,None] not in A1
I also tried using numpy.in1d and np.ndarray.__contains__ but those methods don't seem to do the trick either.
Performance is an issue here so I want to make full use of numpy's optimizations.
How do I achieve this?
EDIT:
I found it can be done like this:
fv = np.vectorize(lambda x,y: x not in y)
fv(A0[:,None],A1)
But as the numpy docs state:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
So this is the same as just looping over the array, and it would be nice to solve this without explicit or implicit for-loop.
We can convert to string dtype and then use one of those NumPy based string functions.
Thus, using np.char.count, one solution would be -
np.char.count(A1.astype(str),A0.astype(str)[:,None])==0
Alternative using np.char.find -
np.char.find(A1.astype(str),A0.astype(str)[:,None])==-1
One more using np.char.rfind -
np.char.rfind(A1.astype(str),A0.astype(str)[:,None])==-1
If we are converting one to str dtype, we can skip the conversion for the other array, as internally it would be done anyway. So, the last method could be simplified to -
np.char.rfind(A1.astype(str),A0[:,None])==-1
Sample run -
In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)
In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)
In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
Out[99]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)
# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)
In [102]: fv(A0[:,None],A1)
Out[102]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)

How to map numpy arrays to one another?

I have two (A, B) boolean arrays of the same finite, but arbitrarily large, and only known at runtime shape and dimensions.
I want to calculate the value of a boolean function of corresponding elements in A and B and store them in C. At last I need a list of tuples where C is true.
How to get there?
I dont want to iterate over the single elements, because I dont know how many dimensions there are, there must be a better way.
>>> A = array([True, False, True, False, True, False]).reshape(2,3)
>>> B = array([True, True, False, True, True, False]).reshape(2,3)
>>> A == B
array([[ True, False, False],
[False, True, True]], dtype=bool)
as wanted, but:
>>> A and B
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How do I get "A and B"?
I tried "map", "zip", "nditer" and searched for other methods unsuccessfully.
As for the things with the tuples, I need something like "argmax" for booleans, but didn't find anything either.
Do you know somethign, that might help?
You can also use the & operator:
In [5]: A & B
array([[ True, False, False],
[False, True, False]], dtype=bool)
The big win with the logical_and call is that you can use the out parameter:
In [6]: C = empty_like(A)
In [7]: logical_and(A, B, C)
array([[ True, False, False],
[False, True, False]], dtype=bool)
Yes, there is a function in NumPy:
numpy.logical_and(A,B)

Categories

Resources