How to check if all values in a dataframe are True - python

pd.DataFrame.all and pd.DataFrame.any convert to bool all values and than assert all identities with the keyword True. This is ok as long as we are fine with the fact that non-empty lists and strings evaluate to True. However let assume that this is not the case.
>>> pd.DataFrame([True, 'a']).all().item()
True # Wrong
A workaround is to assert equality with True, but a comparison to True does not sound pythonic.
>>> (pd.DataFrame([True, 'a']) == True).all().item()
False # Right
Question: can we assert for identity with True without using == True

First of all, I do not advise this. Please do not use mixed dtypes inside your dataframe columns - that defeats the purpose of dataframes and they are no more useful than lists and no more performant than loops.
Now, addressing your actual question, spolier alert, you can't get over the ==. But you can hide it using the eq function. You may use
df.eq(True).all()
Or,
df.where(df.eq(True), False).all()
Note that
df.where(df.eq(True), False)
0
0 True
1 False
Which you may find useful if you want to convert non-"True" values to False for any other reason.

I would actually use
(pd.DataFrame([True, 'a']) == True).all().item()
This way, you're checking for the value of the object, not just checking the "truthy-ness" of it.
This seems perfectly pythonic to me because you're explicitly checking for the value of the object, not just whether or not it's a truthy value.

Related

What's the difference between If not (variable) and if (variable) == false?

I'm learning Python and I just started learning conditionals with booleans
I am very confused though as to the specific topic of "If Not". Could someone please explain to me the difference between :
x = False
if not x:
print("hello")
if x == False:
print("hello")
When testing this code on a Python compiler, I receive "hello" twice. I can assume this means that they both mean the same thing to the computer.
Could someone please explain to me why one would use one method over the other method?
It depends™. Python doesn't know what any of its operators should do. It calls magic methods on objects and lets them decide. We can see this with a simple test
class Foo:
"""Demonstrates the difference between a boolean and equality test
by overriding the operations that implement them."""
def __bool__(self):
print("bool")
return True
def __eq__(self, other):
print("eq", repr(other))
return True
x = Foo()
print("This is a boolean operation without an additional parameter")
if not x:
print("one")
print("This is an equality operation with a parameter")
if x == False:
print("two")
Produces
This is a boolean operation without an additional parameter
bool
This is an equality operation with a parameter
eq False
two
In the first case, python did a boolean test by calling __bool__, and in the second, an equality test by calling __eq__. What this means depends on the class. Its usually obvious but things like pandas may decide to get tricky.
Usually not x is faster than x == False because the __eq__ operator will typically do a second boolean comparison before it knows for sure. In your case, when x = False you are dealing with a builtin class written in C and its two operations will be similar. But still, the x == False comparison needs to do a type check against the other side, so it will be a bit slower.
There are already several good answers here, but none discuss the general concept of "truthy" and "falsy" expressions in Python.
In Python, truthy expressions are expression that return True when converted to bool, and falsy expressions are expressions that return False when converted to bool. (Ref: Trey Hunner's regular expression tutorial; I'm not affiliated with Hunner, I just love his tutorials.)
Truthy stuff:
What's important here is that 0, 0.0, [], None and False are all falsy.
When used in an if statement, they will fail the test, and they will pass the test in an if not statement.
Falsy stuff:
Non-zero numbers, non-empty lists, many objects (but read #tdelaney's answer for more details here), and True are all truthy, so they pass if and fail if not tests.
Equality tests
When you use equality tests, you're not asking about the truthiness of an expression, you're asking whether it is equal to the other thing you provide, which is much more restrictive than general truthiness or falsiness.
EDIT: Additional references
Here are more references on "Truthy" and "Falsy" values in Python:
Truth value testing in the Python manual
The exhaustive list of Falsy values
Truthy and Falsy tutorial from freeCodeCamp
In one case you are checking for equality with the value "False", on the other you are performing a boolean test on a variable. In Python, several variables pass the "if not x" test but only x = False passes "if x == False".
See example code:
x = [] # an empty list
if not x: print("Here!")
# x passes this test
if x == False: print("There!")
# x doesn't pass this test
Try it with x = None: not x would be True then and x == False would be False. Unlike with x = False when both of these are True. not statement also accounts for an empty value.

Weird behavior when checking for NaNs in a pandas dataframe

I want to loop over all the rows in a df, checking that two conditions hold and, if they do, replace the value in a column with something else. I've attempted to do this two different ways:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & np.isnan(sales.iloc[idx]['traceable_blend']):
sales.iloc[idx]['traceable_blend'] = False
and:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & (sales.iloc[idx]['traceable_blend'] == np.NaN):
sales.iloc[idx]['traceable_blend'] = False
By including print statements we've verified that the if statement is actually functional, but no assignment ever takes place. Once we've run the loop, there are True and NaN values in the 'traceable_blend' column, but never False. Somehow the assignment is failing.
It looks like this might've worked:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & np.isnan(sales.iloc[idx]['traceable_blend']):
sales.at[idx, 'traceable_blend'] = False
But I would still like to understand what's happening.
This, sales.iloc[idx]['traceable_blend']=False, is index chaining, and will almost never work. In fact, you don't need to loop:
sales['traceable_blend'] = sales['traceable_blend'].fillna(sales['shelf'].isin(['DRY BLENDS', 'LIQUID BLENDS']))
Pandas offers two functions for checking for missing data (NaN or null): isnull() and notnull() - They return a boolean value. I suggest to try these instead of isnan()
You can also determine if any value is missing in your series by chaining .values.any()

Difference between if and if not python

Can anybody please tell me here what is the exact difference between if and if not here in the code.
def postordertraversse(self,top):
m=[]
if(top):
if not self.postordertraversse(top.left):
m.append(top.root)
top_most=m.pop(0)
conv=createlist();
conv.postordertraversse(conv.top)
What i can understand is if top means if top object instance exists then move inside the block and check if not i.e till top.left is not null keep appending.
if x: means "if x is truthy".
if not x: means "if x is falsey".
Whether something is truthy or falsey depends on what kind of object it is.
For numbers, 0 is falsey, and all other values are truthy.
For boolean values, True is truthy and False is falsey (obviously!)
For collections (lists, tuples, dictionaries, strings, etc), empty ones are falsey and non-empty ones are truthy.
So in your example code, the two if statements are saying:
if top is truthy:
if the result of self.postordertraversse(top.left) is falsey:
not in python is like negation in other programming languages.
'not' statement just converts further expression. For example:
not True - False
not False - True

True/False in lists [duplicate]

False is equivalent to 0 and True is equivalent 1 so it's possible to do something like this:
def bool_to_str(value):
"""value should be a bool"""
return ['No', 'Yes'][value]
bool_to_str(True)
Notice how value is bool but is used as an int.
Is this this kind of use Pythonic or should it be avoided?
I'll be the odd voice out (since all answers are decrying the use of the fact that False == 0 and True == 1, as the language guarantees) as I claim that the use of this fact to simplify your code is perfectly fine.
Historically, logical true/false operations tended to simply use 0 for false and 1 for true; in the course of Python 2.2's life-cycle, Guido noticed that too many modules started with assignments such as false = 0; true = 1 and this produced boilerplate and useless variation (the latter because the capitalization of true and false was all over the place -- some used all-caps, some all-lowercase, some cap-initial) and so introduced the bool subclass of int and its True and False constants.
There was quite some pushback at the time since many of us feared that the new type and constants would be used by Python newbies to restrict the language's abilities, but Guido was adamant that we were just being pessimistic: nobody would ever understand Python so badly, for example, as to avoid the perfectly natural use of False and True as list indices, or in a summation, or other such perfectly clear and useful idioms.
The answers to this thread prove we were right: as we feared, a total misunderstanding of the roles of this type and constants has emerged, and people are avoiding, and, worse!, urging others to avoid, perfectly natural Python constructs in favor of useless gyrations.
Fighting against the tide of such misunderstanding, I urge everybody to use Python as Python, not trying to force it into the mold of other languages whose functionality and preferred style are quite different. In Python, True and False are 99.9% like 1 and 0, differing exclusively in their str(...) (and thereby repr(...)) form -- for every other operation except stringification, just feel free to use them without contortions. That goes for indexing, arithmetic, bit operations, etc, etc, etc.
I'm with Alex. False==0 and True==1, and there's nothing wrong with that.
Still, in Python 2.5 and later I'd write the answer to this particular question using Python's conditional expression:
def bool_to_str(value):
return 'Yes' if value else 'No'
That way there's no requirement that the argument is actually a bool -- just as if x: ... accepts any type for x, the bool_to_str() function should do the right thing when it is passed None, a string, a list, or 3.14.
surely:
def bool_to_str(value):
"value should be a bool"
return 'Yes' if value else 'No'
is more readable.
Your code seems inaccurate in some cases:
>>> def bool_to_str(value):
... """value should be a bool"""
... return ['No', 'Yes'][value]
...
>>> bool_to_str(-2)
'No'
And I recommend you to use just the conditional operator for readability:
def bool_to_str(value):
"""value should be a bool"""
return "Yes" if value else "No"
It is actually a feature of the language that False == 0 and True == 1 (it does not depend on the implementation): Is False == 0 and True == 1 in Python an implementation detail or is it guaranteed by the language?
However, I do agree with most of the other answers: there are more readable ways of obtaining the same result as ['No', 'Yes'][value], through the use of the … if value else … or of a dictionary, which have the respective advantages of hinting and stating that value is a boolean.
Plus, the … if value else … follows the usual convention that non-0 is True: it also works even when value == -2 (value is True), as hinted by dahlia. The list and dict approaches are not as robust, in this case, so I would not recommend them.
Using a bool as an int is quite OK because bool is s subclass of int.
>>> isinstance(True, int)
True
>>> isinstance(False, int)
True
About your code: Putting it in a one-line function like that is over the top. Readers need to find your function source or docs and read it (the name of the function doesn't tell you much). This interrupts the flow. Just put it inline and don't use a list (built at run time), use a tuple (built at compile time if the values are constants). Example:
print foo, bar, num_things, ("OK", "Too many!)[num_things > max_things]
Personally I think it depends on how do you want to use this fact, here are two examples
Just simply use boolean as conditional statement is fine. People do this all the time.
a = 0
if a:
do something
However say you want to count how many items has succeed, the code maybe not very friendly for other people to read.
def succeed(val):
if do_something(val):
return True
else:
return False
count = 0
values = [some values to process]
for val in values:
count += succeed(val)
But I do see the production code look like this.
all_successful = all([succeed(val) for val in values])
at_least_one_successful = any([succeed(val) for val in values])
total_number_of_successful = sum([succeed(val) for val in values])

String comparison in Python: is vs. == [duplicate]

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 9 years ago.
I noticed a Python script I was writing was acting squirrelly, and traced it to an infinite loop, where the loop condition was while line is not ''. Running through it in the debugger, it turned out that line was in fact ''. When I changed it to !='' rather than is not '', it worked fine.
Also, is it generally considered better to just use '==' by default, even when comparing int or Boolean values? I've always liked to use 'is' because I find it more aesthetically pleasing and pythonic (which is how I fell into this trap...), but I wonder if it's intended to just be reserved for when you care about finding two objects with the same id.
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
Not always. NaN is a counterexample. But usually, identity (is) implies equality (==). The converse is not true: Two distinct objects can have the same value.
Also, is it generally considered better to just use '==' by default, even
when comparing int or Boolean values?
You use == when comparing values and is when comparing identities.
When comparing ints (or immutable types in general), you pretty much always want the former. There's an optimization that allows small integers to be compared with is, but don't rely on it.
For boolean values, you shouldn't be doing comparisons at all. Instead of:
if x == True:
# do something
write:
if x:
# do something
For comparing against None, is None is preferred over == None.
I've always liked to use 'is' because
I find it more aesthetically pleasing
and pythonic (which is how I fell into
this trap...), but I wonder if it's
intended to just be reserved for when
you care about finding two objects
with the same id.
Yes, that's exactly what it's for.
I would like to show a little example on how is and == are involved in immutable types. Try that:
a = 19998989890
b = 19998989889 +1
>>> a is b
False
>>> a == b
True
is compares two objects in memory, == compares their values. For example, you can see that small integers are cached by Python:
c = 1
b = 1
>>> b is c
True
You should use == when comparing values and is when comparing identities. (Also, from an English point of view, "equals" is different from "is".)
The logic is not flawed. The statement
if x is y then x==y is also True
should never be read to mean
if x==y then x is y
It is a logical error on the part of the reader to assume that the converse of a logic statement is true. See http://en.wikipedia.org/wiki/Converse_(logic)
See This question
Your logic in reading
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
is slightly flawed.
If is applies then == will be True, but it does NOT apply in reverse. == may yield True while is yields False.

Categories

Resources