In NumPy, I can generate a boolean array like this:
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> arr > 2
array([False, False, False, False, True, True, True], dtype=bool)
How can we chain comparisons together? For example:
>>> 6 > arr > 2
array([False, False, False, False, True, False, False], dtype=bool)
Attempting to do so results in the error message
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
AFAIK the closest you can get is to use &, |, and ^:
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> (2 < arr) & (arr < 6)
array([False, False, False, False, True, False, False], dtype=bool)
>>> (2 < arr) | (arr < 6)
array([ True, True, True, True, True, True, True], dtype=bool)
>>> (2 < arr) ^ (arr < 6)
array([ True, True, True, True, False, True, True], dtype=bool)
I don't think you'll be able to get a < b < c-style chaining to work.
You can use the numpy logical operators to do something similar.
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> arr > 2
array([False, False, False, False, True, True, True], dtype=bool)
>>>np.logical_and(arr>2,arr<6)
Out[5]: array([False, False, False, False, True, False, False], dtype=bool)
Chained comparisons are not allowed in numpy. You need to write both left and right comparisons separately, and chain them with bitwise operators. Also you'll need to parenthesise both expressions due to operator precendence (|, & and ^ have a higher precedence). In this case, since you want both conditions to be satisfied you need an bitwise AND (&):
(2<arr) & (arr<6)
# array([False, False, False, False, True, False, False])
It was actually proposed to make this possible in PEP 535, though it still remains deferred. In it there is an explanation on why this occurs. As posed in the question, chaining comparisons in such way, yields:
2<arr<6
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
The problem here, is that python is internally expanding the above to:
2<arr and arr<6
Which is what causes the error, since and is implicitly calling bool, and NumPy only permits implicit coercion to a boolean value for single elements (not arrays with size>1), since a boolean array with many values does not evaluate neither to True or False. It is due to this ambiguity that this isn't allowed, and evaluating an array in boolean context always yields a ValueError
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
While using numpy today I was writing a couple lines to pull elements from a 1D array with several different identifying integers in a sequence. My filter is fiveseventy_idx but I got a deprication warning. How would I do this in the future
fiveseventy_idx = np.where(clusters == 1)
clusters = clusters[fiveseventy_idx]
<ipython-input-44-fd1ca1277d36>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
fiveseventy_idx = np.where(clusters == [1,570])
Say hypothetically clusters = np.array([2, 4, 2, 7, 7, 7, 1, 1, 3, 570, 1,]), and I only want specific integers, my filter is required for another array so I can get associated values in the same order as before. So I would want [1,1,1] after applying my filter.
Comparing 2 arrays of different length - the result is scalar False, along with the warning:
In [146]: np.arange(10) == np.array([2, 5])
<ipython-input-146-888c04a597c2>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
np.arange(10) == np.array([2, 5])
Out[146]: False
So it's just saying - no the 2 arrays are not equal. Normally though numpy will be doing elementwise comparisons. And it may, at one time just truncated the longer array to match the size of the shorter.
Anyways, a broadcasted comparision:
In [147]: np.arange(10)[:, None] == np.array([2, 5])
Out[147]:
array([[False, False],
[False, False],
[ True, False],
[False, False],
[False, False],
[False, True],
[False, False],
[False, False],
[False, False],
[False, False]])
In [148]: (np.arange(10)[:, None] == np.array([2, 5])).any(axis=1)
Out[148]:
array([False, False, True, False, False, True, False, False, False,
False])
In [149]: np.nonzero((np.arange(10)[:, None] == np.array([2, 5])).any(axis=1))
Out[149]: (array([2, 5]),)
Sometimes all can be used to test for catching True in both columns.
Another way:
In [151]: np.isin(np.arange(10),np.array([2,5]))
Out[151]:
array([False, False, True, False, False, True, False, False, False,
False])
or
In [152]: (np.arange(10)==2)|(np.arange(10)==5)
Out[152]:
array([False, False, True, False, False, True, False, False, False,
False])
Take the following example. I have an array test and want to get a boolean mask with True's for all elements that are equal to elements of ref.
import numpy as np
test = np.array([[2, 3, 1, 0], [5, 4, 2, 3], [6, 7, 5 ,4]])
ref = np.array([3, 4, 5])
I am looking for something equivalent to
mask = (test == ref[0]) | (test == ref[1]) | (test == ref[2])
which in this case should yield
>>> print(mask)
[[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]]
but without having to resort to any loops.
Numpy comes with a function isin that does exactly this
np.isin(test, ref)
which return
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
You can use numpy broadcasting:
mask = (test[:,None] == ref[:,None]).any(1)
output:
array([[False, True, False, False],
[ True, True, False, True],
[False, False, True, True]])
NB. this is faster that numpy.isin, but creates a (X, X, Y) sized intermediate array where X, Y is the shape of test, so this will consume some memory on very large arrays
I recently came across this operator in a data analysis book that I should have read a long time ago. It is used for general conditions but I don't understand it.
When applied to a numpy array with boolean dtype, it is the logical_not operator:
In [607]: np.array([True, False, True])
Out[607]: array([ True, False, True])
In [608]: ~np.array([True, False, True])
Out[608]: array([False, True, False])
In [611]: np.logical_not(np.array([True, False, True]))
Out[611]: array([False, True, False])
That's not the case with Python booleans:
In [613]: ~True
Out[613]: -2
In [614]: not True
Out[614]: False
as the title say, I want to make np.where() returning a coordinate multiple time if it comes across the same value, exemple:
import numpy as np
a = 2*np.arange(5)
b = [8,8]
condition = np.isin(a,b)
print np.where(condition)
>>> (array([4], dtype=int64),)
it returns [4] because a[4] = 8, but since b has two 8, I want it to returns [4,4], is there a way to do this without iterating throught each b value?
With your a,b:
In [687]: condition=isin(a,b)
In [688]: condition
Out[688]: array([False, False, False, False, True], dtype=bool)
where just tells us the index of that one True value.
Switch the test, and you find that both items of b are in a.
In [697]: isin(b,a)
Out[697]: array([ True, True], dtype=bool)
You could use a simple broadcasted comparison:
In [700]: a[:,None]==b
Out[700]:
array([[False, False],
[False, False],
[False, False],
[False, False],
[ True, True]], dtype=bool)
In [701]: np.where(a[:,None]==b)
Out[701]: (array([4, 4], dtype=int32), array([0, 1], dtype=int32))
isin (and in1d which it uses) worries about uniqueness, but you aren't. So testing the array == gives you more control.
test if both values in b match the same a element
In [703]: (a[:,None]==b).all(axis=1)
Out[703]: array([False, False, False, False, True], dtype=bool)
test if any - essentially what in1d does:
In [704]: (a[:,None]==b).any(axis=1)
Out[704]: array([False, False, False, False, True], dtype=bool)
Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.
For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)