Counting The number None values using labmda function - python

I have an array consisting of a bunch values, where some of them are Nan and the others are None. I want to count each of them. I can achieve this with a simple for loop as shown,
xx = np.array([2,3,4,None,np.nan,None])
count_None = 0
count_nan = 0
for i in xx:
if i is None:
count_None =+ 1
if i is np.nan:
count_nan =+ 1
I want to find out if I can achieve the same result in one line, perhaps using a lambda function. I tried writing it as so. But of course, the syntax is incorrect. Any ideas?
lambda xx: count_None =+1 if xx is None

One way of achieving it as a one liner is :
len([i for i in xx if i is None])
# or the count function
xx.count(None)
or you can use the numpy.count_nonzero:
np.count_nonzero(xx == None)
Using a lambda function, you can create a list.count() - like function:
>>> counter = lambda x,y:len([i for i in x if i == y])
>>> counter(xx,None)
2

This isn't a lambda but it creates a new list of just the None values and counts the length of that list.
import numpy as np
xx = np.array([2,3,4,None,np.nan,None])
print(len([elem for elem in xx if elem is None]))
if you don't need it to be in numpy you can use the list count method
xx = [2,3,4,None,np.nan,None]
print(xx.count(None))

A third approach:
>>> nan_count, none_count = np.sum([i is np.nan for i in xx]), np.sum([i is None for i in xx])
>>> print(nan_count, none_count)
1, 2
I'd tend to prefer two lines (one for each computation), but this works. It works by adding 1 for each True value, and 0 for each False value.
Another approach if you really want to use a lambda is to use functools.reduce which will perform the sum iteratively. Here, we start with a value of 0, and add 1 for each element that evaluates true:
>>> functools.reduce(lambda x,y: x+(y is np.nan), xx, 0)
1
>>> functools.reduce(lambda x,y: x+(y is None), xx, 0)
2

l= len(list(filter(lambda x:x is None, xx)))
It will return the number of NaN values. But the filter function will work with the list only.
You can use this approach if you want to use lambda.
I prefer using numpy function (np.count_nonzero)

lambda is just a restricted format for creating a function. It is 'one-line' and returns a value. It should not be used for side effects. You use of counter += 1 is a side effect, so can't be use in a lambda.
A lambda that identifies the None values, can be used with map:
In [27]: alist = [2,3,4,None,np.nan,None]
In [28]: list(map(lambda x: x is None, alist))
Out[28]: [False, False, False, True, False, True]
map returns an iterable, which has to be expanded with list, or with sum:
In [29]: sum(map(lambda x: x is None, alist))
Out[29]: 2
But as others have shown, the list count method is simpler.
In [43]: alist.count(None)
Out[43]: 2
In [44]: alist.count(np.nan)
Out[44]: 1
An array containing None will be object dtype. Iteration on such an array is slower than iteration on the list:
In [45]: arr = np.array(alist)
In [46]: arr
Out[46]: array([2, 3, 4, None, nan, None], dtype=object)
The array doesn't have the count method. Also testing for np.nan is trickier.
In [47]: arr == None
Out[47]: array([False, False, False, True, False, True])
In [48]: arr == np.nan
Out[48]: array([False, False, False, False, False, False])
There is a np.isnan function, but that only works for float dtype arrays.
In [51]: arr.astype(float)
Out[51]: array([ 2., 3., 4., nan, nan, nan])
In [52]: np.isnan(arr.astype(float))
Out[52]: array([False, False, False, True, True, True])

Related

Count all values of an array that are between 0 and 1

Is there an effective way to count all the values in a numpy array which are between 0 and 1?
I know this is easily countable with a for loop, but that seems pretty inefficient to me. I tried to play around with the count_nonzero() function but I couldn't make it work the way I wanted.
Greetings
One quick and easy method is to use the logical_and() function, which returns a boolean mask array. Then simply use the .sum() function to sum the True values.
Example:
import numpy as np
a = np.array([0, .1, .2, .3, 1, 2])
np.logical_and(a>0, a<1).sum()
Output:
>>> 3
Example 2:
Or, if you'd prefer a more 'low-level' (non-helper function) approach, the & logical operator can be used:
((a > 0) & (a < 1)).sum()
This might be one way. You can easily replace <= and >= with strict inequalities as per your wish.
>>> import numpy as np
>>> a = np.random.randn(3,3)
>>> a
array([[-2.17470114, 0.59575531, 0.06795138],
[-0.57380035, 0.05663369, 1.12636801],
[ 0.55363332, -0.04039947, 1.14837819]])
>>> inds1 = a >= 0
>>> inds2 = a <= 1
>>> inds = inds1 * inds2
>>> inds
array([[False, True, True],
[False, True, False],
[ True, False, False]])
>>> inds.sum()
4

python numpy - unable to compare 2 arrays

I have the 2 arrays as follows:
x = array(['2019-02-28', '2019-03-01'], dtype=object)
z = array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
I'm trying to use np.where to determine on which index the 2 matrixes are aligned.
I'm doing
i = np.where (z == x) but it doesn't work, I get an empty array as a result. It looks like it's comparing the whole array is equal to the other whole array whereas I'm looking for the matching values and would like to get matching results between the 2. How should I do it ?
Thanks
Regards
edit: expected outcome is yes [True, False, False]
The where result is only as good as the boolean it searches. If the argument does not have any True values, where returns empty:
In [308]: x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
...: z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
In [309]: x==z
/usr/local/bin/ipython3:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
#!/usr/bin/python3
Out[309]: False
If you aren't concerned about order:
In [311]: np.isin(z,x)
Out[311]: array([ True, False, True])
or trimming z:
In [312]: x==z[:2]
Out[312]: array([ True, False])
to extend x you could first use np.pad, or use itertools.zip_longest
In [353]: list(itertools.zip_longest(x,z))
Out[353]:
[('2019-02-28', '2019-02-28'),
('2019-03-01', '2019-03-02'),
(None, '2019-03-01')]
In [354]: [i==j for i,j in itertools.zip_longest(x,z)]
Out[354]: [True, False, False]
zip_longest accepts other fill values if that makes the comparison better.
Is this what you need:
print([i for i, (x, y) in enumerate(zip(x, z)) if x == y])
As the two arrays have different sizes compare over the minimum of the two sizes.
Edit:
I just reread the question and comments.
result= np.zeros( max(x.size, z.size), dtype=bool) # result size of the biggest array.
size = min(x.size, z.size)
result[:size] = z[:size] == x[:size] # Comparison at smallest size.
result
# array([ True, False, False])
This gives the boolean mask the comment asks for.
Original answer
import numpy as np
x = np.array(['2019-02-28', '2019-03-01'], dtype=object)
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'], dtype=object)
size = min(x.size, z.size)
np.where(z[:size]==x[:size]) # Select the common range
# (array([0], dtype=int64),)
On my machine this is slower than the list comprehension from #U10-Forward for dtype=object but faster if numpy selects the dtype, 'Unicode 10'.
x = np.array(['2019-02-28', '2019-03-01'])
z = np.array(['2019-02-28', '2019-03-02', '2019-03-01'])

Elementwise comparison to None with ndarray of object dtype

x = np.empty([2], dtype=object)
> array([None, None], dtype=object)
x[0] = 'a'
> array(['a', None], dtype=object)
I'm trying to get a boolean array [False, True] from this object typed ndarray where the object type is None.
Things that don't work: x is None, x.isfinite(), x == None, np.isnan(x). The array may be in n dimensions, making for loop iterations unpleasant to look at.
In NumPy 1.12 and earlier, you'll need to explicitly call numpy.equal to get a broadcasted equality comparison. Leave a comment, so future readers understand why you're doing it:
# Comparisons to None with == don't broadcast (yet, as of NumPy 1.12).
# We need to use numpy.equal explicitly.
numpy.equal(x, None)
In NumPy 1.13 and later, x == None will give you a broadcasted equality comparison, but you can still use numpy.equal(x, None) if you want backward compatibility with earlier versions.
You can wrap None in a list or array to force element-wise comparisons:
>>> x == [None]
array([False, True], dtype=bool)
>>> x == np.array([None])
array([False, True], dtype=bool)
A few possible ways to do that is -
x < 0
x!='a'
array([ True, False], dtype=bool)

How to compare two numpy arrays with some NaN values?

I need to compare some numpy arrays which should have the same elements in the same order, excepting for some NaN values in the second one.
I need a function more or less like this:
def func( array1, array2 ):
if ???:
return True
else:
return False
Example:
x = np.array( [ 1, 2, 3, 4, 5 ] )
y = np.array( [ 11, 2, 3, 4, 5 ] )
z = np.array( [ 1, 2, np.nan, 4, 5] )
func( x, z ) # returns True
func( y, z ) # returns False
The arrays have always the same length and the NaN values are always in the third one (x and y have always numbers only). I can imagine there is a function or something already, but I just don't find it.
Any ideas?
You can use masked arrays, which have the behaviour you're asking for when combined with np.all:
zm = np.ma.masked_where(np.isnan(z), z)
np.all(x == zm) # returns True
np.all(y == zm) # returns False
Or you could just write out your logic explicitly, noting that numpy has to use | instead of or, and the difference in operator precedence that results:
def func(a, b):
return np.all((a == b) | np.isnan(a) | np.isnan(b))
You could use isclose to check for equality (or closeness to within a given tolerance -- this is particularly useful when comparing floats) and use isnan to check for NaNs in the second array.
Combine the two with bitwise-or (|), and use all to demand every pair is either close or contains a NaN to obtain the desired result:
In [62]: np.isclose(x,z)
Out[62]: array([ True, True, False, True, True], dtype=bool)
In [63]: np.isnan(z)
Out[63]: array([False, False, True, False, False], dtype=bool)
So you could use:
def func(a, b):
return (np.isclose(a, b) | np.isnan(b)).all()
In [67]: func(x, z)
Out[67]: True
In [68]: func(y, z)
Out[68]: False
What about:
from math import isnan
def fun(array1,array2):
return all(isnan(x) or isnan(y) or x == y for x,y in zip(array1,array2))
This function works in both directions (if there are NaNs in the first list, these are also ignored). If you do not want that (which is a bit odd since equality usually works bidirectional). You can define:
from math import isnan
def fun(array1,array2):
return all(isnan(y) or x == y for x,y in zip(array1,array2))
The code works as follows: we use zip to emit tuples of elements of both arrays. Next we check if either the element of the first list is NaN, or the second, or they are equal.
Given you want to write a really elegant function, you better also perform a length check:
from math import isnan
def fun(array1,array2):
return len(array1) == len(array2) and all(isnan(y) or x == y for x,y in zip(array1,array2))
numpy.islcose() now provides an argument equal_nan for this case!
>>> import numpy as np
>>> np.isclose([1.0, np.nan], [1.0, np.nan])
array([ True, False])
>>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True)
array([ True, True])
docs https://numpy.org/doc/stable/reference/generated/numpy.isclose.html

Numpy `logical_or` for more than two arguments

Numpy's logical_or function takes no more than two arrays to compare. How can I find the union of more than two arrays? (The same question could be asked with regard to Numpy's logical_and and obtaining the intersection of more than two arrays.)
If you're asking about numpy.logical_or, then no, as the docs explicitly say, the only parameters are x1, x2, and optionally out:
numpy.logical_or(x1, x2[, out]) = <ufunc 'logical_or'>
You can of course chain together multiple logical_or calls like this:
>>> x = np.array([True, True, False, False])
>>> y = np.array([True, False, True, False])
>>> z = np.array([False, False, False, False])
>>> np.logical_or(np.logical_or(x, y), z)
array([ True, True, True, False], dtype=bool)
The way to generalize this kind of chaining in NumPy is with reduce:
>>> np.logical_or.reduce((x, y, z))
array([ True, True, True, False], dtype=bool)
And of course this will also work if you have one multi-dimensional array instead of separate arrays—in fact, that's how it's meant to be used:
>>> xyz = np.array((x, y, z))
>>> xyz
array([[ True, True, False, False],
[ True, False, True, False],
[False, False, False, False]], dtype=bool)
>>> np.logical_or.reduce(xyz)
array([ True, True, True, False], dtype=bool)
But a tuple of three equal-length 1D arrays is an array_like in NumPy terms, and can be used as a 2D array.
Outside of NumPy, you can also use Python's reduce:
>>> functools.reduce(np.logical_or, (x, y, z))
array([ True, True, True, False], dtype=bool)
However, unlike NumPy's reduce, Python's is not often needed. For most cases, there's a simpler way to do things—e.g., to chain together multiple Python or operators, don't reduce over operator.or_, just use any. And when there isn't, it's usually more readable to use an explicit loop.
And in fact NumPy's any can be used for this case as well, although it's not quite as trivial; if you don't explicitly give it an axis, you'll end up with a scalar instead of an array. So:
>>> np.any((x, y, z), axis=0)
array([ True, True, True, False], dtype=bool)
As you might expect, logical_and is similar—you can chain it, np.reduce it, functools.reduce it, or substitute all with an explicit axis.
What about other operations, like logical_xor? Again, same deal… except that in this case there is no all/any-type function that applies. (What would you call it? odd?)
In case someone still need this - Say you have three Boolean arrays a, b, c with the same shape, this gives and element-wise:
a * b * c
this gives or:
a + b + c
Is this what you want?
Stacking a lot of logical_and or logical_or is not practical.
Building on abarnert's answer for n-dimensional case:
TL;DR: np.logical_or.reduce(np.array(list))
As boolean algebras are both commutative and associative by definition, the following statements or equivalent for boolean values of a, b and c.
a or b or c
(a or b) or c
a or (b or c)
(b or a) or c
So if you have a "logical_or" which is dyadic and you need to pass it three arguments (a, b, and c), you can call
logical_or(logical_or(a, b), c)
logical_or(a, logical_or(b, c))
logical_or(c, logical_or(b, a))
or whatever permutation you like.
Back to python, if you want to test whether a condition (yielded by a function test that takes a testee and returns a boolean value) applies to a or b or c or any element of list L, you normally use
any(test(x) for x in L)
I use this workaround which can be extended to n arrays:
>>> a = np.array([False, True, False, False])
>>> b = np.array([True, False, False, False])
>>> c = np.array([False, False, False, True])
>>> d = (a + b + c > 0) # That's an "or" between multiple arrays
>>> d
array([ True, True, False, True], dtype=bool)
I've tried the following three different methods to get the logical_and of a list l of k arrays of size n:
Using a recursive numpy.logical_and (see below)
Using numpy.logical_and.reduce(l)
Using numpy.vstack(l).all(axis=0)
Then I did the same for the logical_or function. Surprisingly enough, the recursive method is the fastest one.
import numpy
import perfplot
def and_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_and(l[0],l[1])
elif len(l) > 2:
return and_recursive(and_recursive(*l[:2]),and_recursive(*l[2:]))
def or_recursive(*l):
if len(l) == 1:
return l[0].astype(bool)
elif len(l) == 2:
return numpy.logical_or(l[0],l[1])
elif len(l) > 2:
return or_recursive(or_recursive(*l[:2]),or_recursive(*l[2:]))
def and_reduce(*l):
return numpy.logical_and.reduce(l)
def or_reduce(*l):
return numpy.logical_or.reduce(l)
def and_stack(*l):
return numpy.vstack(l).all(axis=0)
def or_stack(*l):
return numpy.vstack(l).any(axis=0)
k = 10 # number of arrays to be combined
perfplot.plot(
setup=lambda n: [numpy.random.choice(a=[False, True], size=n) for j in range(k)],
kernels=[
lambda l: and_recursive(*l),
lambda l: and_reduce(*l),
lambda l: and_stack(*l),
lambda l: or_recursive(*l),
lambda l: or_reduce(*l),
lambda l: or_stack(*l),
],
labels = ['and_recursive', 'and_reduce', 'and_stack', 'or_recursive', 'or_reduce', 'or_stack'],
n_range=[2 ** j for j in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
equality_check=None
)
Here below the performances for k = 4.
And here below the performances for k = 10.
It seems that there is an approximately constant time overhead also for higher n.
using the sum function:
a = np.array([True, False, True])
b = array([ False, False, True])
c = np.vstack([a,b,b])
Out[172]:
array([[ True, False, True],
[False, False, True],
[False, False, True]], dtype=bool)
np.sum(c,axis=0)>0
Out[173]: array([ True, False, True], dtype=bool)
a = np.array([True, False, True])
b = np.array([False, False, True])
c = np.array([True, True, True])
d = np.array([True, True, True])
# logical or
lor = (a+b+c+d).astype(bool)
# logical and
land = (a*b*c*d).astype(bool)
If you want a short (maybe not optimal) function for performing logical AND on multidimensional boolean masks, you may use this recursive lambda function:
masks_and = lambda *masks : masks[0] if len(masks) == 1 else masks_and(np.logical_and(masks[0], masks[-1]), *masks[1:-1])
result = masks_and(mask1, mask2, ...)
You can also generalize the lambda function for applying any operator (function of 2 arguments) with distributive property (such as multiplication/AND, sum/OR and so on), assuming the order is also important, to any objects like this:
fn2args_reduce = lambda fn2args, *args : args[0] if len(args) == 1 else fn2args_reduce(fn2args, fn2args(args[0], args[1]), *args[2:])
result = fn2args_reduce(np.dot, matrix1, matrix2, ... matrixN)
which gives you the same result as if you use # numpy operator):
np.dot(...(np.dot(np.dot(matrix1, matrix2), matrix3)...), matrixN)
For example fn2args_reduce(lambda a,b: a+b, 1,2,3,4,5) gives you 15 - sum of these numbers (of course you have a much more efficient sum function for this, but I like it).
Even more generalized model for functions of N arguments could look like this:
fnNargs_reduce = lambda fnNargs, N, *args : args[0] if len(args) == 1 else fnNargs_reduce(fnNargs, N, fnNargs(*args[:N]), *args[N:])
fnNargs = lambda x1, x2, x3=neutral, ..., xN=neutral: x1 (?) x2 (?) ... (?) xN
Where neutral means it is neutral element for (?) operator, eg. 0 for +, 1 for * etc.
Why? Just for fun :-)

Categories

Resources