Is there an effective way to count all the values in a numpy array which are between 0 and 1?
I know this is easily countable with a for loop, but that seems pretty inefficient to me. I tried to play around with the count_nonzero() function but I couldn't make it work the way I wanted.
Greetings
One quick and easy method is to use the logical_and() function, which returns a boolean mask array. Then simply use the .sum() function to sum the True values.
Example:
import numpy as np
a = np.array([0, .1, .2, .3, 1, 2])
np.logical_and(a>0, a<1).sum()
Output:
>>> 3
Example 2:
Or, if you'd prefer a more 'low-level' (non-helper function) approach, the & logical operator can be used:
((a > 0) & (a < 1)).sum()
This might be one way. You can easily replace <= and >= with strict inequalities as per your wish.
>>> import numpy as np
>>> a = np.random.randn(3,3)
>>> a
array([[-2.17470114, 0.59575531, 0.06795138],
[-0.57380035, 0.05663369, 1.12636801],
[ 0.55363332, -0.04039947, 1.14837819]])
>>> inds1 = a >= 0
>>> inds2 = a <= 1
>>> inds = inds1 * inds2
>>> inds
array([[False, True, True],
[False, True, False],
[ True, False, False]])
>>> inds.sum()
4
I have the 2 arrays as follows:
x = array(['2019-02-28', '2019-03-01'], dtype=object)
z = array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
I'm trying to use np.where to determine on which index the 2 matrixes are aligned.
I'm doing
i = np.where (z == x) but it doesn't work, I get an empty array as a result. It looks like it's comparing the whole array is equal to the other whole array whereas I'm looking for the matching values and would like to get matching results between the 2. How should I do it ?
Thanks
Regards
edit: expected outcome is yes [True, False, False]
The where result is only as good as the boolean it searches. If the argument does not have any True values, where returns empty:
In [308]: x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
...: z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
In [309]: x==z
/usr/local/bin/ipython3:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
#!/usr/bin/python3
Out[309]: False
If you aren't concerned about order:
In [311]: np.isin(z,x)
Out[311]: array([ True, False, True])
or trimming z:
In [312]: x==z[:2]
Out[312]: array([ True, False])
to extend x you could first use np.pad, or use itertools.zip_longest
In [353]: list(itertools.zip_longest(x,z))
Out[353]:
[('2019-02-28', '2019-02-28'),
('2019-03-01', '2019-03-02'),
(None, '2019-03-01')]
In [354]: [i==j for i,j in itertools.zip_longest(x,z)]
Out[354]: [True, False, False]
zip_longest accepts other fill values if that makes the comparison better.
Is this what you need:
print([i for i, (x, y) in enumerate(zip(x, z)) if x == y])
As the two arrays have different sizes compare over the minimum of the two sizes.
Edit:
I just reread the question and comments.
result= np.zeros( max(x.size, z.size), dtype=bool) # result size of the biggest array.
size = min(x.size, z.size)
result[:size] = z[:size] == x[:size] # Comparison at smallest size.
result
# array([ True, False, False])
This gives the boolean mask the comment asks for.
Original answer
import numpy as np
x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
size = min(x.size, z.size)
np.where(z[:size]==x[:size]) # Select the common range
# (array([0], dtype=int64),)
On my machine this is slower than the list comprehension from #U10-Forward for dtype=object but faster if numpy selects the dtype, 'Unicode 10'.
x = np.array(['2019-02-28', '2019-03-01'])
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'])
x = np.empty([2], dtype=object)
> array([None, None], dtype=object)
x[0] = 'a'
> array(['a', None], dtype=object)
I'm trying to get a boolean array [False, True] from this object typed ndarray where the object type is None.
Things that don't work: x is None, x.isfinite(), x == None, np.isnan(x). The array may be in n dimensions, making for loop iterations unpleasant to look at.
In NumPy 1.12 and earlier, you'll need to explicitly call numpy.equal to get a broadcasted equality comparison. Leave a comment, so future readers understand why you're doing it:
# Comparisons to None with == don't broadcast (yet, as of NumPy 1.12).
# We need to use numpy.equal explicitly.
numpy.equal(x, None)
In NumPy 1.13 and later, x == None will give you a broadcasted equality comparison, but you can still use numpy.equal(x, None) if you want backward compatibility with earlier versions.
You can wrap None in a list or array to force element-wise comparisons:
>>> x == [None]
array([False, True], dtype=bool)
>>> x == np.array([None])
array([False, True], dtype=bool)
A few possible ways to do that is -
x < 0
x!='a'
array([ True, False], dtype=bool)
I need to compare some numpy arrays which should have the same elements in the same order, excepting for some NaN values in the second one.
I need a function more or less like this:
def func( array1, array2 ):
if ???:
return True
else:
return False
Example:
x = np.array( [ 1, 2, 3, 4, 5 ] )
y = np.array( [ 11, 2, 3, 4, 5 ] )
z = np.array( [ 1, 2, np.nan, 4, 5] )
func( x, z ) # returns True
func( y, z ) # returns False
The arrays have always the same length and the NaN values are always in the third one (x and y have always numbers only). I can imagine there is a function or something already, but I just don't find it.
Any ideas?
You can use masked arrays, which have the behaviour you're asking for when combined with np.all:
zm = np.ma.masked_where(np.isnan(z), z)
np.all(x == zm) # returns True
np.all(y == zm) # returns False
Or you could just write out your logic explicitly, noting that numpy has to use | instead of or, and the difference in operator precedence that results:
def func(a, b):
return np.all((a == b) | np.isnan(a) | np.isnan(b))
You could use isclose to check for equality (or closeness to within a given tolerance -- this is particularly useful when comparing floats) and use isnan to check for NaNs in the second array.
Combine the two with bitwise-or (|), and use all to demand every pair is either close or contains a NaN to obtain the desired result:
In [62]: np.isclose(x,z)
Out[62]: array([ True, True, False, True, True], dtype=bool)
In [63]: np.isnan(z)
Out[63]: array([False, False, True, False, False], dtype=bool)
So you could use:
def func(a, b):
return (np.isclose(a, b) | np.isnan(b)).all()
In [67]: func(x, z)
Out[67]: True
In [68]: func(y, z)
Out[68]: False
What about:
from math import isnan
def fun(array1,array2):
return all(isnan(x) or isnan(y) or x == y for x,y in zip(array1,array2))
This function works in both directions (if there are NaNs in the first list, these are also ignored). If you do not want that (which is a bit odd since equality usually works bidirectional). You can define:
from math import isnan
def fun(array1,array2):
return all(isnan(y) or x == y for x,y in zip(array1,array2))
The code works as follows: we use zip to emit tuples of elements of both arrays. Next we check if either the element of the first list is NaN, or the second, or they are equal.
Given you want to write a really elegant function, you better also perform a length check:
from math import isnan
def fun(array1,array2):
return len(array1) == len(array2) and all(isnan(y) or x == y for x,y in zip(array1,array2))
numpy.islcose() now provides an argument equal_nan for this case!
>>> import numpy as np
>>> np.isclose([1.0, np.nan], [1.0, np.nan])
array([ True, False])
>>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
array([ True, True])
docs https://numpy.org/doc/stable/reference/generated/numpy.isclose.html
Numpy's logical_or function takes no more than two arrays to compare. How can I find the union of more than two arrays? (The same question could be asked with regard to Numpy's logical_and and obtaining the intersection of more than two arrays.)
If you're asking about numpy.logical_or, then no, as the docs explicitly say, the only parameters are x1, x2, and optionally out:
numpy.logical_or(x1, x2[, out]) = <ufunc 'logical_or'>
You can of course chain together multiple logical_or calls like this:
>>> x = np.array([True, True, False, False])
>>> y = np.array([True, False, True, False])
>>> z = np.array([False, False, False, False])
>>> np.logical_or(np.logical_or(x, y), z)
array([ True, True, True, False], dtype=bool)
The way to generalize this kind of chaining in NumPy is with reduce:
>>> np.logical_or.reduce((x, y, z))
array([ True, True, True, False], dtype=bool)
And of course this will also work if you have one multi-dimensional array instead of separate arrays—in fact, that's how it's meant to be used:
>>> xyz = np.array((x, y, z))
>>> xyz
array([[ True, True, False, False],
[ True, False, True, False],
[False, False, False, False]], dtype=bool)
>>> np.logical_or.reduce(xyz)
array([ True, True, True, False], dtype=bool)
But a tuple of three equal-length 1D arrays is an array_like in NumPy terms, and can be used as a 2D array.
Outside of NumPy, you can also use Python's reduce:
>>> functools.reduce(np.logical_or, (x, y, z))
array([ True, True, True, False], dtype=bool)
However, unlike NumPy's reduce, Python's is not often needed. For most cases, there's a simpler way to do things—e.g., to chain together multiple Python or operators, don't reduce over operator.or_, just use any. And when there isn't, it's usually more readable to use an explicit loop.
And in fact NumPy's any can be used for this case as well, although it's not quite as trivial; if you don't explicitly give it an axis, you'll end up with a scalar instead of an array. So:
>>> np.any((x, y, z), axis=0)
array([ True, True, True, False], dtype=bool)
As you might expect, logical_and is similar—you can chain it, np.reduce it, functools.reduce it, or substitute all with an explicit axis.
What about other operations, like logical_xor? Again, same deal… except that in this case there is no all/any-type function that applies. (What would you call it? odd?)
In case someone still need this - Say you have three Boolean arrays a, b, c with the same shape, this gives and element-wise:
a * b * c
this gives or:
a + b + c
Is this what you want?
Stacking a lot of logical_and or logical_or is not practical.
Building on abarnert's answer for n-dimensional case:
TL;DR: np.logical_or.reduce(np.array(list))
As boolean algebras are both commutative and associative by definition, the following statements or equivalent for boolean values of a, b and c.
a or b or c
(a or b) or c
a or (b or c)
(b or a) or c
So if you have a "logical_or" which is dyadic and you need to pass it three arguments (a, b, and c), you can call
logical_or(logical_or(a, b), c)
logical_or(a, logical_or(b, c))
logical_or(c, logical_or(b, a))
or whatever permutation you like.
Back to python, if you want to test whether a condition (yielded by a function test that takes a testee and returns a boolean value) applies to a or b or c or any element of list L, you normally use
any(test(x) for x in L)
I use this workaround which can be extended to n arrays:
>>> a = np.array([False, True, False, False])
>>> b = np.array([True, False, False, False])
>>> c = np.array([False, False, False, True])
>>> d = (a + b + c > 0) # That's an "or" between multiple arrays
>>> d
array([ True, True, False, True], dtype=bool)
I've tried the following three different methods to get the logical_and of a list l of k arrays of size n:
Using a recursive numpy.logical_and (see below)
Using numpy.logical_and.reduce(l)
Using numpy.vstack(l).all(axis=0)
Then I did the same for the logical_or function. Surprisingly enough, the recursive method is the fastest one.
import numpy
import perfplot
def and_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_and(l[0],l[1])
elif len(l) > 2:
return and_recursive(and_recursive(*l[:2]),and_recursive(*l[2:]))
def or_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_or(l[0],l[1])
elif len(l) > 2:
return or_recursive(or_recursive(*l[:2]),or_recursive(*l[2:]))
def and_reduce(*l):
return numpy.logical_and.reduce(l)
def or_reduce(*l):
return numpy.logical_or.reduce(l)
def and_stack(*l):
return numpy.vstack(l).all(axis=0)
def or_stack(*l):
return numpy.vstack(l).any(axis=0)
k = 10 # number of arrays to be combined
perfplot.plot(
setup=lambda n: [numpy.random.choice(a=[False, True], size=n) for j in range(k)],
kernels=[
lambda l: and_recursive(*l),
lambda l: and_reduce(*l),
lambda l: and_stack(*l),
lambda l: or_recursive(*l),
lambda l: or_reduce(*l),
lambda l: or_stack(*l),
],
labels = ['and_recursive', 'and_reduce', 'and_stack', 'or_recursive', 'or_reduce', 'or_stack'],
n_range=[2 ** j for j in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
equality_check=None
)
Here below the performances for k = 4.
And here below the performances for k = 10.
It seems that there is an approximately constant time overhead also for higher n.
using the sum function:
a = np.array([True, False, True])
b = array([ False, False, True])
c = np.vstack([a,b,b])
Out[172]:
array([[ True, False, True],
[False, False, True],
[False, False, True]], dtype=bool)
np.sum(c,axis=0)>0
Out[173]: array([ True, False, True], dtype=bool)
a = np.array([True, False, True])
b = np.array([False, False, True])
c = np.array([True, True, True])
d = np.array([True, True, True])
# logical or
lor = (a+b+c+d).astype(bool)
# logical and
land = (a*b*c*d).astype(bool)
If you want a short (maybe not optimal) function for performing logical AND on multidimensional boolean masks, you may use this recursive lambda function:
masks_and = lambda *masks : masks[0] if len(masks) == 1 else masks_and(np.logical_and(masks[0], masks[-1]), *masks[1:-1])
result = masks_and(mask1, mask2, ...)
You can also generalize the lambda function for applying any operator (function of 2 arguments) with distributive property (such as multiplication/AND, sum/OR and so on), assuming the order is also important, to any objects like this:
fn2args_reduce = lambda fn2args, *args : args[0] if len(args) == 1 else fn2args_reduce(fn2args, fn2args(args[0], args[1]), *args[2:])
result = fn2args_reduce(np.dot, matrix1, matrix2, ... matrixN)
which gives you the same result as if you use # numpy operator):
np.dot(...(np.dot(np.dot(matrix1, matrix2), matrix3)...), matrixN)
For example fn2args_reduce(lambda a,b: a+b, 1,2,3,4,5) gives you 15 - sum of these numbers (of course you have a much more efficient sum function for this, but I like it).
Even more generalized model for functions of N arguments could look like this:
fnNargs_reduce = lambda fnNargs, N, *args : args[0] if len(args) == 1 else fnNargs_reduce(fnNargs, N, fnNargs(*args[:N]), *args[N:])
fnNargs = lambda x1, x2, x3=neutral, ..., xN=neutral: x1 (?) x2 (?) ... (?) xN
Where neutral means it is neutral element for (?) operator, eg. 0 for +, 1 for * etc.
Why? Just for fun :-)