Can numpy/pandas handle boolean operators acting on null values?

Can numpy/pandas handle boolean operators acting on null values? - python

If I use the standard Python boolean operators and/or/not, one nice feature is that they treat None the way I would logically expect. That is, not only
True and True == True
True and False == False
but also
True and None == None
False and None == False
True or None == True
False or None == None
This follows the logic that, for instance, if A is False and B is unknown, (A and B) must still be False, while (A or B) is unknown.
I needed to perform boolean operations on Pandas DataFrames with missing data, and was hoping I'd be able to use the same logic. For boolean logic on numpy arrays and Pandas series, we need to use bitwise operators &/|/~. Pandas seems to have behaviour that is partially the same as and/or/not, but partially different. In short, it seems to return False when the value should logically be unknown.
For example:
a = pd.Series([True,False,True,False])
b = pd.Series([True,True,None,None])
Then we get
> a & b
0 True
1 False
2 False
3 False
dtype: bool
and
> a | b
0 True
1 True
2 True
3 False
I would expect that the output of a & b should be a Series [True,False,None,False] and that the output of a | b should be a Series [True,True,True,None]. The actual result matches what I'd expect except returns False instead of any missing values.
Finally, ~b just gives a TypeError:
TypeError: bad operand type for unary ~: 'NoneType'
which seems odd since & and | at least partially work.
Is there a better way to carry out boolean logic in this situation? Is this a bug in Pandas?
Analogous tests with numpy arrays just give type errors, so I assume Pandas is handling the logic itself here.

You might need something like this:
c = pd.Series([x and y for x,y in zip(a,b)])
print(c)
Output:
0 True
1 False
2 None
3 False
And correspondingly, for the second expression:
d = pd.Series([x or y for x,y in zip(a,b)])
print(d)
Output:
0 True
1 True
2 True
3 None
also look at here for understanding and and & operations.
If you want to and two columns a and b of a dataframe df, one way is to define a function and apply it to df:
df = pd.DataFrame({'a':[True,False,True,False], 'b':[True,True,None,None]})
def and_(row):
return row['a'] and row['b']
df.loc[:, 'a_and_b'] = df.apply(and_, axis=1)
print(df)
Output:
a b a_and_b
0 True True True
1 False True False
2 True None None
3 False None False

Related

Create new pandas dataframe column containing boolean output from searching for substrings

I'd like to create a new column where if a substring is found in an existing column, it will return True and vice versa.
So in this example, I'd like to search for the substring "abc" in column a and create a Boolean column b whether column a contained the string or not.
a b
zabc True
wxyz False
abcy True
defg False
I've tried something like
df['b'] = df['a'].map(lambda x: True if 'abc' in x else False)
But this gave me an error saying "argument of type 'NoneType' is not iterable"
I also tried
df['b'] = False
df['b'][df['a'].str.contains('abc')] = True
But I got the error "cannot index with vector containing NA / NaN values"
Can someone explain the errors and what I can do about it. I have confirmed that ['a'] exists and contains values. But there are rows that contain None values.

This how to do it.
df["b"] = df["a"].str.contains("abc")
Regarding your error.
It's seems that you have np.nan value in your column a, then the method str.contain will return np.nan for those value, as you try to index with an array containing np.nan value, pandas tell you that is not possible.

Not the best solution but you can check for null values with pd.isnull() or convert null values to a string with str().
df = pd.DataFrame({'a':['zabc', None, 'abcy', 'defg']})
df['a'].map(lambda x: True if 'abc' in str(x) else False)
or
df['a'].map(lambda x: False if pd.isnull(x) or 'abc' not in x else True)
Reuslt:
0 True
1 False
2 True
3 False
Name: a, dtype: bool

Your first code is ok, here is the output on my sample.
s = pd.Series(['cat','hat','dog','fog','pet'])
d = pd.DataFrame(s, columns=['test'])
d['b'] = d['test'].map(lambda x: True if 'og' in x else False)
d

Why does the AND, OR operator gives one sided output for data types like Boolean and sets in python ?

# FOR SETS
a=set([1,2,3])
b=set([4,5,6])
print(a and b) #always prints right side value that is b here
print(a or b) #always prints left side value that is a here
print(b and a)#prints a as its on right
print(b or a)#prints b as its on left
#FOR BOOLEANS
print(False and 0) #prints False as it is on the left
print(0 and False) #prints 0 , same operator is used than why diff output
print(False or '') #prints '' as it is on the right
print('' or False) #prints False as now it is on the right
print(1 or True) #prints 1
print(True or 1) #prints True
print(True and 1)#prints 1
print(1 and True)#prints True
AND always print the left side value and OR always prints the right side value for False type of boolean. Reverse happens with True type of boolean.
When applied to any number of sets OR gives leftmost value and AND gives rightmost value. Why ?

The Python reference answers your question. https://docs.python.org/2/reference/expressions.html#boolean-operations
The expression x and y first evaluates x; if x is false, its value is
returned; otherwise, y is evaluated and the resulting value is
returned.
The expression x or y first evaluates x; if x is true, its value is
returned; otherwise, y is evaluated and the resulting value is
returned.
Values of container type (such as sets and lists) are considered false if they are empty, and true otherwise.

I don't think it's due to left or right side, but more of how and - or are evaluated.
First of all in python 0 is equal to False and 1 to True, but it is possible for True and False to be reassigned, but by default 0 == 0 or 0 == False are both True.
This said, you can now look how and - or operator condition are evaluated.
To summerise:
The and operator is always looking for a False (0) value, if he finds it on the first evaluated parameter then he stop, but if he finds True or 1 he had to evaluate the second condition to see if it's a False. Because False and someting else will always be false. This table might help you, look when i have 0 (False) answer is always 0 (False)
*and* truth table
0 0 = 0
0 1 = 0
1 0 = 0
1 1 = 1
The or is a bit different but with same mechanic:
The or will look for the first True (1) he finds. So if he finds False first he will evaluate the second parameter, if he finds True (1) in first then he stops.
Here when you have a 1(True) answer is always 1(True)
*or* truth table
0 0 = 0
0 1 = 1
1 0 = 1
1 1 = 1
You can look at other operator just google operator truth table and you will have a lot of other and more detailed exemple.
For your exemple:
#FOR BOOLEANS
print(False and 0) #prints False because *and* only need one False
print(0 and False) #prints 0 , because 0 = False and *and* only need one False
print(False or '') #prints '' because *or* is looking for a 1(True) but here both are False so he print the last one
print('' or False) #prints False (same reason as the one above) and he print the last one.
print(1 or True) #prints 1 because it's *or* and he found a True(1)
print(True or 1) #prints True same reason as above 1 = True so...
print(True and 1) #prints 1 because *and* is looking for a False and the first condition is True so he need to check the second one
print(1 and True) #prints True same as above 1 = True so and check for second paramater.

Precise Membership Test in Python

The in operator tests for equivalence using comparison, but Python's comparison isn't precise in the sense that True == 1 and 0 == False, yielding -
>>> True in [ 1 ]
True
>>> False in [ 0 ]
True
>>> 1 in [ True ]
True
>>> 0 in [ False ]
True
whereas I need a precise comparison (similar to === in other languages) that would yield False in all of the above examples. I could of course iterate over the list:
res = False
for member in mylist:
if subject == member and type( subject ) == type( member ):
res = True
break
This is obviously much less efficient then using the builtin in operator, even if I pack it as a list comprehension. Is there some native alternative to in such as a list method or some way to tweak in's behavior to get the required result?
The in operator is used in my case for testing the uniqueness of all list members, so a native uniqueness test would do as well.
Important note: The list may contain mutable values, so using set isn't an option.
Python version is 3.4, would be great for the solution to work on 2.7 too.
EDIT TO ALL THOSE WHO SUGGEST USING IS:
I look for a non-iterating, native alternative to a in b.
The is operator is not relevant for this case. For example, in the following situation in works just fine but is won't:
>>> [1,2] in [[1,2]]
True
Please, do read the question before answering it...

in doesn't test for equivalence at all. It checks if an item is in a container. Example:
>>> 5 in [1,2,3,4,5]
True
>>> 6 in [1,2,3,4,5]
False
>>> True in {True, False}
True
>>> "k" in ("b","c")
True
What you are looking for is is.
>>> True == 1
True
>>> True is 1
False
>>> False == 0
True
>>> False is 0
False
EDIT
After reading your edit, I don't think there is something built in in python libraries that suits your needs. What you want is basically to differentiate between int and bool (True, False). But python itself treats True and False as integers. This is because bool is a subclass of int. Which is why True == 1 and False==0 evaluates to true. You can even do:
>>> isinstance ( True, int)
True
I cannot think of anything better than your own solution, However, if your list is certain to contain any item not more than once you can use list.index()
try:
index_val = mylist.index(subject)
except ValueError:
index_val = None
if (index_val!=None):
return type(subject) == type(member)
Since index is built-in, it might be a little faster, though rather inelegant.

Python in operator is precise and the behavior you're complaining of is perfectly expected, since bool is a subclass of int.
Below is the excerpt of the official Python documentation describing the boolean type:
Booleans
These represent the truth values False and True. The two objects representing the values False and True are the only Boolean objects. The Boolean type is a subtype of plain integers, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings "False" or "True" are returned, respectively.
You can also have a look at PEP 285.

You're looking for the is operator:
if any(x is True for x in l):
...
is, however, isn't exactly === from other languages. is checks identity, not just equality without type coercion. Since CPython uses string and integer interning, two objects that are equal may not be the same object:
In [19]: a = '12'
In [20]: b = '123'
In [21]: a += '3'
In [22]: a is b
Out[22]: False
In [23]: a == b
Out[23]: True
In [27]: 100001 is 100000 + 1
Out[27]: False
In [28]: 100001 == 100000 + 1
Out[28]: True
In Python 3, None, True, and False are essentially singletons, so using is for discerning True from 1 will work perfectly fine. In Python 2, however, this is possible:
In [29]: True = 1
In [31]: True is 1
Out[31]: True
Equality can be overridden __eq__ method, so you can define an object that is equal to any other object:
In [1]: %paste
class Test(object):
def __eq__(self, other):
return True
## -- End pasted text --
In [2]: x = Test()
In [3]: x == None
Out[3]: True
In [4]: x == True
Out[4]: True
In [5]: x == False
Out[5]: True
In this case, how would === work? There is no general solution, so Python has no built-in method of lists that does what you want.

why functions are false in python?

I can't figure out why:
f = lambda x: x
In [8]: f is True
Out[8]: False
In [9]: not f is True
Out[9]: True
In [10]: f is False
Out[10]: False
In [11]: f is True
Out[11]: False
In [12]: not f
Out[12]: False
In [13]: not f is True
Out[13]: True
In [14]: not f is False
Out[14]: True
ok. So until now we can imagine that is due to the use of "is" instead of "==". As shown here:
In [15]: 0.00000 is 0
Out[15]: False
In [16]: 0.00000 == 0
Out[16]: True
Ok. But why then if i do it on the function:
In [17]: not f == False
Out[17]: True
In [18]: not f == True
Out[18]: True
In [19]: f ==True
Out[19]: False
In [20]: f ==False
Out[20]: False
In [21]: f
Out[21]: <function __main__.<lambda>>
I was trying to explain it as due to "is" instead of "==" but examples 19 and 20 crushed my logic. Can someone explain?

is tests for object identity. Comparing anything other than True with is True is always going to be false.
Your next set of tests test if not (f == False) or not (f == True); again, boolean objects only test equal against themselves, so anything other than False will test as false when comparing with == False. not False then is true.
You want to use bool() instead to test if something is true or false:
>>> bool(f)
True
>>> bool(0)
False
Don't use equality testing to see if something is true or false.
Note that only numeric 0, empty containers and strings, and False is considered false in Python. Everything else, by default, is considered true in a boolean context. Custom types can implement either the __nonzero__ method (when numeric) or the __len__ method (to implement a container) to alter that behaviour. Python 3 replaced __nonzero__ with the __bool__ method.
Functions do not have a __nonzero__ or __len__ method, so they are always considered true.

If you check the "truthyness" of a function, you will see it is True.
>>> f = lambda x: x
>>> bool(f)
True
You were simply comparing the function itself to True or False Which it would never be, since it is a function.

== checks for equivelency ... is checks identity ...
a function is a non-falsey value however it is not equivelent to True
def xyz():
pass
if xyz:
#this will trigger since a method is not a falsey value
xyz == True #No it is not equal to true
xyz == False #no it is not equal to false
xyz is True #no it certainly is not the same memory location as true
xyz is False #no it is also not the same memory location as false

Your own example shows that f is False is false so I'm confused by your title.
Why would you expect a function to evaluate as equal to either Boolean value? Wouldn't that be kind of weird behaviour?

Is it safe to replace '==' with 'is' to compare Boolean-values

I did several Boolean Comparisons:
>>> (True or False) is True
True
>>> (True or False) == True
True
It sounds like == and is are interchangeable for Boolean-values.
Sometimes it's more clear to use is
I want to know that:
Are True and False pre-allocated in python?
Is bool(var) always return the same True(or False) with the pre-allocated True(or False)?
Is it safe to replace == with is to compare Boolean-values?
It's not about Best-Practice.
I just want to know the Truth.

You probably shouldn't ever need to compare booleans. If you are doing something like:
if some_bool == True:
...
...just change it to:
if some_bool:
...
No is or == needed.
As commenters have pointed out, there are valid reasons to compare booleans. If both booleans are unknown and you want to know if one is equal to the other, you should use == or != rather than is or is not (the reason is explained below). Note that this is logically equivalent to xnor and xor respectively, which don't exist as logical operators in Python.
Internally, there should only ever be two boolean literal objects (see also the C API), and bool(x) is True should be True if bool(x) == True for any Python program. Two caveats:
This does not mean that x is True if x == True, however (eg. x = 1).
This is true for the usual implementation of Python (CPython) but might not be true in other implementations. Hence == is a more reliable comparison.

Watch out for what else you may be comparing.
>>> 1 == True
True
>>> 1 is True
False
True and False will have stable object ids for their lifetime in your python instance.
>>> id(True)
4296106928
>>> id(True)
4296106928
is compares the id of an object
EDIT: adding or
Since OP is using or in question it may be worth pointing this out.
or that evaluates True: returns the first 'True' object.
>>> 1 or True
1
>>> 'a' or True
'a'
>>> True or 1
True
or that evaluates False: returns the last 'False' object
>>> False or ''
''
>>> '' or False
False
and that evaluates to True: returns the last 'True' object
>>> True and 1
1
>>> 1 and True
True
and that evaluates to False: returns the first 'False' object
>>> '' and False
''
>>> False and ''
False
This is an important python idiom and it allows concise and compact code for dealing with boolean logic over regular python objects.
>>> bool([])
False
>>> bool([0])
True
>>> bool({})
False
>>> bool({False: False})
True
>>> bool(0)
False
>>> bool(-1)
True
>>> bool('False')
True
>>> bool('')
False
Basically 'empty' objects are False, 'non empty' are True.
Combining this with #detly's and the other answers should provide some insight into how to use if and bools in python.

Yes. There are guaranteed to be exactly two bools, True and False:
Class bool cannot be subclassed
further. Its only instances are False
and True.
That means if you know both operands are bool, == and is are equivalent. However, as detly notes, there's usually no reason to use either in this case.

It seems that all answers deal with True and False as defined after an interpreter startup. Before booleans became part of Python they were often defined as part of a program. Even now (Python 2.6.6) they are only names that can be pointed to different objects:
>>> True = 1
>>> (2 > 1)
True
>>> (2 > 1) == True
True
>>> (2 > 1) is True
False
If you have to deal with older software, be aware of that.

The == operator tests for equality The is keyword tests for object identity. Whether we are talking about the same object. Note, that more variables may refer to the same object.

== and is are both comparison operators, which would return a boolean value - True or False. True has a numeric value of 1 and False has a numeric value of 0.
The operator == compare the values of two objects and objects compared are most often are the same types (int vs int, float vs float), If you compare objects of different types, then they are unequal. The operator is tests for object identity, 'x is y' is true if both x and y have the same id. That is, they are same objects.
So, when you are comparing if you comparing the return values of same type, use == and if you are comparing if two objects are same (be it boolean or anything else), you can use is.
42 is 42 is True and is same as 42 == 42.

Another reason to compare values using == is that both None and False are “falsy” values. And sometimes it’s useful to use None to mark a value as “not defined” or “no value” while considering True and False values to work with:
def some_function(val = None):
"""This function does an awesome thing."""
if val is None:
# Values was not defined.
elif val == False:
# User passed boolean value.
elif val == True:
# User passed boolean value.
else:
# Quack quack.
Somewhat related question: Python != operation vs “is not”.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can numpy/pandas handle boolean operators acting on null values? - python

Related

Create new pandas dataframe column containing boolean output from searching for substrings

Why does the AND, OR operator gives one sided output for data types like Boolean and sets in python ?

Precise Membership Test in Python

why functions are false in python?

Is it safe to replace '==' with 'is' to compare Boolean-values

Categories

Resources