Numpy `logical_or` for more than two arguments - python

Numpy's logical_or function takes no more than two arrays to compare. How can I find the union of more than two arrays? (The same question could be asked with regard to Numpy's logical_and and obtaining the intersection of more than two arrays.)

If you're asking about numpy.logical_or, then no, as the docs explicitly say, the only parameters are x1, x2, and optionally out:
numpy.logical_or(x1, x2[, out]) = <ufunc 'logical_or'>
You can of course chain together multiple logical_or calls like this:
>>> x = np.array([True, True, False, False])
>>> y = np.array([True, False, True, False])
>>> z = np.array([False, False, False, False])
>>> np.logical_or(np.logical_or(x, y), z)
array([ True, True, True, False], dtype=bool)
The way to generalize this kind of chaining in NumPy is with reduce:
>>> np.logical_or.reduce((x, y, z))
array([ True, True, True, False], dtype=bool)
And of course this will also work if you have one multi-dimensional array instead of separate arrays—in fact, that's how it's meant to be used:
>>> xyz = np.array((x, y, z))
>>> xyz
array([[ True, True, False, False],
[ True, False, True, False],
[False, False, False, False]], dtype=bool)
>>> np.logical_or.reduce(xyz)
array([ True, True, True, False], dtype=bool)
But a tuple of three equal-length 1D arrays is an array_like in NumPy terms, and can be used as a 2D array.
Outside of NumPy, you can also use Python's reduce:
>>> functools.reduce(np.logical_or, (x, y, z))
array([ True, True, True, False], dtype=bool)
However, unlike NumPy's reduce, Python's is not often needed. For most cases, there's a simpler way to do things—e.g., to chain together multiple Python or operators, don't reduce over operator.or_, just use any. And when there isn't, it's usually more readable to use an explicit loop.
And in fact NumPy's any can be used for this case as well, although it's not quite as trivial; if you don't explicitly give it an axis, you'll end up with a scalar instead of an array. So:
>>> np.any((x, y, z), axis=0)
array([ True, True, True, False], dtype=bool)
As you might expect, logical_and is similar—you can chain it, np.reduce it, functools.reduce it, or substitute all with an explicit axis.
What about other operations, like logical_xor? Again, same deal… except that in this case there is no all/any-type function that applies. (What would you call it? odd?)

In case someone still need this - Say you have three Boolean arrays a, b, c with the same shape, this gives and element-wise:
a * b * c
this gives or:
a + b + c
Is this what you want?
Stacking a lot of logical_and or logical_or is not practical.

Building on abarnert's answer for n-dimensional case:
TL;DR: np.logical_or.reduce(np.array(list))

As boolean algebras are both commutative and associative by definition, the following statements or equivalent for boolean values of a, b and c.
a or b or c
(a or b) or c
a or (b or c)
(b or a) or c
So if you have a "logical_or" which is dyadic and you need to pass it three arguments (a, b, and c), you can call
logical_or(logical_or(a, b), c)
logical_or(a, logical_or(b, c))
logical_or(c, logical_or(b, a))
or whatever permutation you like.
Back to python, if you want to test whether a condition (yielded by a function test that takes a testee and returns a boolean value) applies to a or b or c or any element of list L, you normally use
any(test(x) for x in L)

I use this workaround which can be extended to n arrays:
>>> a = np.array([False, True, False, False])
>>> b = np.array([True, False, False, False])
>>> c = np.array([False, False, False, True])
>>> d = (a + b + c > 0) # That's an "or" between multiple arrays
>>> d
array([ True, True, False, True], dtype=bool)

I've tried the following three different methods to get the logical_and of a list l of k arrays of size n:
Using a recursive numpy.logical_and (see below)
Using numpy.logical_and.reduce(l)
Using numpy.vstack(l).all(axis=0)
Then I did the same for the logical_or function. Surprisingly enough, the recursive method is the fastest one.
import numpy
import perfplot
def and_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_and(l[0],l[1])
elif len(l) > 2:
return and_recursive(and_recursive(*l[:2]),and_recursive(*l[2:]))
def or_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_or(l[0],l[1])
elif len(l) > 2:
return or_recursive(or_recursive(*l[:2]),or_recursive(*l[2:]))
def and_reduce(*l):
return numpy.logical_and.reduce(l)
def or_reduce(*l):
return numpy.logical_or.reduce(l)
def and_stack(*l):
return numpy.vstack(l).all(axis=0)
def or_stack(*l):
return numpy.vstack(l).any(axis=0)
k = 10 # number of arrays to be combined
perfplot.plot(
setup=lambda n: [numpy.random.choice(a=[False, True], size=n) for j in range(k)],
kernels=[
lambda l: and_recursive(*l),
lambda l: and_reduce(*l),
lambda l: and_stack(*l),
lambda l: or_recursive(*l),
lambda l: or_reduce(*l),
lambda l: or_stack(*l),
],
labels = ['and_recursive', 'and_reduce', 'and_stack', 'or_recursive', 'or_reduce', 'or_stack'],
n_range=[2 ** j for j in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
equality_check=None
)
Here below the performances for k = 4.
And here below the performances for k = 10.
It seems that there is an approximately constant time overhead also for higher n.

using the sum function:
a = np.array([True, False, True])
b = array([ False, False, True])
c = np.vstack([a,b,b])
Out[172]:
array([[ True, False, True],
[False, False, True],
[False, False, True]], dtype=bool)
np.sum(c,axis=0)>0
Out[173]: array([ True, False, True], dtype=bool)

a = np.array([True, False, True])
b = np.array([False, False, True])
c = np.array([True, True, True])
d = np.array([True, True, True])
# logical or
lor = (a+b+c+d).astype(bool)
# logical and
land = (a*b*c*d).astype(bool)

If you want a short (maybe not optimal) function for performing logical AND on multidimensional boolean masks, you may use this recursive lambda function:
masks_and = lambda *masks : masks[0] if len(masks) == 1 else masks_and(np.logical_and(masks[0], masks[-1]), *masks[1:-1])
result = masks_and(mask1, mask2, ...)
You can also generalize the lambda function for applying any operator (function of 2 arguments) with distributive property (such as multiplication/AND, sum/OR and so on), assuming the order is also important, to any objects like this:
fn2args_reduce = lambda fn2args, *args : args[0] if len(args) == 1 else fn2args_reduce(fn2args, fn2args(args[0], args[1]), *args[2:])
result = fn2args_reduce(np.dot, matrix1, matrix2, ... matrixN)
which gives you the same result as if you use # numpy operator):
np.dot(...(np.dot(np.dot(matrix1, matrix2), matrix3)...), matrixN)
For example fn2args_reduce(lambda a,b: a+b, 1,2,3,4,5) gives you 15 - sum of these numbers (of course you have a much more efficient sum function for this, but I like it).
Even more generalized model for functions of N arguments could look like this:
fnNargs_reduce = lambda fnNargs, N, *args : args[0] if len(args) == 1 else fnNargs_reduce(fnNargs, N, fnNargs(*args[:N]), *args[N:])
fnNargs = lambda x1, x2, x3=neutral, ..., xN=neutral: x1 (?) x2 (?) ... (?) xN
Where neutral means it is neutral element for (?) operator, eg. 0 for +, 1 for * etc.
Why? Just for fun :-)

Related

Check if each element in a numpy array is in a separate list

I'd like to do something like this:
>>> y = np.arange(5)
>>> y in (0, 1, 2)
array([True, True, True, False, False])
This syntax doesn't work. What's the best way to achieve the desired result?
(I'm looking for a general solution. Obviously in this specific case I could do y < 3.)
I'll spell this out a little more clearly for you guys, since at least a few people seem to be confused.
Here is a long way of getting my desired behavior:
new_y = np.empty_like(y)
for i in range(len(y)):
if y[i] in (0, 1, 2):
new_y[i] = True
else:
new_y[i] = False
I'm looking for this behavior in a more compact form.
Here's another solution:
new_y = np.array([True if item in (0, 1, 2) else False for item in y])
Again, just looking for a simpler way.
A good general purpose tool is a broadcasted, or 'outer', comparison between elements of two arrays:
In [35]: y=np.arange(5)
In [36]: x=np.array([0,1,2])
In [37]: y[:,None]==x
Out[37]:
array([[ True, False, False],
[False, True, False],
[False, False, True],
[False, False, False],
[False, False, False]])
This is doing a fast comparison between every element of y and every element of x. Depending on your needs, you can condense this array along one of the axes:
In [38]: (y[:,None]==x).any(axis=1)
Out[38]: array([ True, True, True, False, False])
A comment suggested in1d. I think it's a good idea to look at its code. It has several strategies depending on the relative sizes of the inputs.
In [40]: np.in1d(y,x)
Out[40]: array([ True, True, True, False, False])
In [41]: np.array([True if item in x else False for item in y])
Out[41]: array([ True, True, True, False, False])
Which is fastest may depend on the size of the inputs. Starting lists your list comprehension might be faster. This pure list version is by far the fastest:
[True if item in (0,1,2) else False for item in (0,1,2,3,4)]
[item in (0,1,2) for item in (0,1,2,3,4)] # simpler

How to compare two numpy arrays of strings with the "in" operator to get a boolean array using array broadcasting?

Python allows for a simple check if a string is contained in another string:
'ab' in 'abcd'
which evaluates to True.
Now take a numpy array of strings and you can do this:
import numpy as np
A0 = np.array(['z', 'u', 'w'],dtype=object)
A0[:,None] != A0
Resulting in a boolean array:
array([[False, True, True],
[ True, False, True],
[ True, True, False]], dtype=bool)
Lets now take another array:
A1 = np.array(['u_w', 'u_z', 'w_z'],dtype=object)
I want to check where a string of A0 is not contained in a string in A1, essentially creating unique combinations, but the following does not yield a boolean array, only a single boolean, regardless of how I write the indices:
A0[:,None] not in A1
I also tried using numpy.in1d and np.ndarray.__contains__ but those methods don't seem to do the trick either.
Performance is an issue here so I want to make full use of numpy's optimizations.
How do I achieve this?
EDIT:
I found it can be done like this:
fv = np.vectorize(lambda x,y: x not in y)
fv(A0[:,None],A1)
But as the numpy docs state:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
So this is the same as just looping over the array, and it would be nice to solve this without explicit or implicit for-loop.
We can convert to string dtype and then use one of those NumPy based string functions.
Thus, using np.char.count, one solution would be -
np.char.count(A1.astype(str),A0.astype(str)[:,None])==0
Alternative using np.char.find -
np.char.find(A1.astype(str),A0.astype(str)[:,None])==-1
One more using np.char.rfind -
np.char.rfind(A1.astype(str),A0.astype(str)[:,None])==-1
If we are converting one to str dtype, we can skip the conversion for the other array, as internally it would be done anyway. So, the last method could be simplified to -
np.char.rfind(A1.astype(str),A0[:,None])==-1
Sample run -
In [97]: A0
Out[97]: array(['z', 'u', 'w'], dtype=object)
In [98]: A1
Out[98]: array(['u_w', 'u_z', 'w_z', 'zz'], dtype=object)
In [99]: np.char.rfind(A1.astype(str),A0[:,None])==-1
Out[99]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)
# Loopy solution using np.vectorize for verification
In [100]: fv = np.vectorize(lambda x,y: x not in y)
In [102]: fv(A0[:,None],A1)
Out[102]:
array([[ True, False, False, False],
[False, False, True, True],
[False, True, False, True]], dtype=bool)

Numpy search for elements of an array in a subset

Suppose I have numpy arrays
a = np.array([1,3,5,7,9,11,13])
b = np.array([3,5,7,11,13])
and I want to create a boolean array of the size of a where each entry is True or False depending on whether the element of a is also in b.
So in this case, I want
a_b = np.array([False,True,True,True,False,True,True]).
I can do this when b consists of one element as a == b[0]. Is there a quick way to do this when b has length greater than 1.
Use numpy.in1d:
In [672]: np.in1d([1,2,3,4], [1,2])
Out[672]: array([ True, True, False, False], dtype=bool)
For your data:
In [674]: np.in1d(a, b)
Out[674]: array([False, True, True, True, False, True, True], dtype=bool)
This is available in version 1.4.0 or later according to the docs. The docs also describe how the operation might look in pure Python:
in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is roughly equivalent to np.array([item in b for item in a]).
The docs for this function are worthwhile to read as there is the invert keyword argument and the assume_unique keyword argument -- each of which can be quite useful in some situations.
I also found it interesting to create my own version using np.vectorize and operator.contains:
from operator import contains
v_in = np.vectorize(lambda x,y: contains(y, x), excluded={1,})
and then:
In [696]: v_in([1,2,3, 2], [1, 2])
Out[696]: array([ True, True, False, True], dtype=bool)
Because operator.contains flips the arguments, I needed the lambda to make the calling convention match your use case -- but you could skip this if it was okay to call with b first then a.
But note that you need to use the excluded option for vectorize since you want whichever argument represents the b sequence (the sequence to check for membership within) to actually remain as a sequence (so if you chose not to flip the contains arguments with the lambda then you would want to exclude index 0 not 1).
The way with in1d will surely be much faster and is a much better way since it relies on a well-known built-in. But it's good to know how to do these tricks with operator and vectorize sometimes.
You could even create a Python Infix recipe instance for this and then use v_in as an "infix" operation:
v_in = Infix(np.vectorize(lambda x,y: contains(y, x), excluded={1,}))
# even easier: v_in = Infix(np.in1d)
and example usage:
In [702]: v_in([1, 2, 3, 2], [1, 2])
Out[702]: array([ True, True, False, True], dtype=bool)
In [704]: [1, 2, 3, 2] <<v_in>> [1, 2]
Out[704]: array([ True, True, False, True], dtype=bool)

Combining logic statements AND in numpy array

What would be the way to select elements when two conditions are True in a matrix?
In R, it is basically possible to combine vectors of booleans.
So what I'm aiming for:
A = np.array([2,2,2,2,2])
A < 3 and A > 1 # A < 3 & A > 1 does not work either
Evals to:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It should eval to:
array([True,True,True,True,True])
My workaround usually is to sum these boolean vectors and equate to 2, but there must be a better way. What is it?
you could just use &, eg:
x = np.arange(10)
(x<8) & (x>2)
gives
array([False, False, False, True, True, True, True, True, False, False], dtype=bool)
A few details:
This works because & is shorthand for the numpy ufunc bitwise_and, which for the bool type is the same as logical_and. That is, this could also be spelled out asbitwise_and(less(x,8), greater(x,2))
You need the parentheses because in numpy & has higher precedence than < and >
and does not work because it is ambiguous for numpy arrays, so rather than guess, numpy raise the exception.
There's a function for that:
In [8]: np.logical_and(A < 3, A > 1)
Out[8]: array([ True, True, True, True, True], dtype=bool)
Since you can't override the and operator in Python it always tries to cast its arguments to bool. That's why the code you have gives an error.
Numpy has defined the __and__ function for arrays which overrides the & operator. That's what the other answer is using.
While this is primitive, what is wrong with
A = [2, 2, 2, 2, 2]
b = []
for i in A:
b.append(A[i]>1 and A[i]<3)
print b
The output is [True, True, True, True, True]

Python list of booleans comparison gives strange results

I try:
[True,True,False] and [True,True,True]
and get
[True, True True]
but
[True,True,True] and [True,True,False]
gives
[True,True,False]
Not too sure why it's giving those strange results, even after taking a look at some other python boolean comparison questions. Integer does the same (replace True -> 1 and False ->0 above and the results are the same). What am I missing? I obviously want
[True,True,False] and [True,True,True]
to evaluate to
[True,True,False]
Others have explained what's going on. Here are some ways to get what you want:
>>> a = [True, True, True]
>>> b = [True, True, False]
Use a listcomp:
>>> [ai and bi for ai,bi in zip(a,b)]
[True, True, False]
Use the and_ function with a map:
>>> from operator import and_
>>> map(and_, a, b)
[True, True, False]
Or my preferred way (although this does require numpy):
>>> from numpy import array
>>> a = array([True, True, True])
>>> b = array([True, True, False])
>>> a & b
array([ True, True, False], dtype=bool)
>>> a | b
array([ True, True, True], dtype=bool)
>>> a ^ b
array([False, False, True], dtype=bool)
Any populated list evaluates to True. True and x produces x, the second list.
From the Python documentation:
The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.
You're getting the second value returned.
P.S. I had never seen this behavior before either, I had to look it up myself. My naive expectation was that a boolean expression would yield a boolean result.
and returns the last element if they all are evaluated to True.
>>> 1 and 2 and 3
3
The same is valid for lists, which are evalueted to True if they are not empty (as in your case).
[True, True, False] is being evaluated as a boolean (because of the and operator), and evaluates to True since it is non-empty. Same with [True, True, True]. The result of either statement is then just whatever is after the and operator.
You could do something like [ai and bi for ai, bi in zip(a, b)] for lists a and b.
As far as I know, you need to zip through the list. Try a list comprehension of this sort:
l1 = [True,True,False]
l2 = [True,True,True]
res = [ x and y for (x,y) in zip(l1, l2)]
print res
Python works by short-circuiting its boolean and gives the result expression as the result.
A populated list evaluates to true and gives the result as the value of the second list. Look at this, when I just interchanged the position of your first and second list.
In [3]: [True,True,True] and [True, True, False]
Out[3]: [True, True, False]

Categories

Resources