Understanding the truthiness of strings - python

I understand that Python built-in types have a "truthiness" value, and the empty string is considered False, while any non-empty string is considered True.
This makes sense
I can check this using the built-in function bool.
>>> bool("")
False
>>> bool("dog")
True
I can also make use of these truthiness values when using conditionals. For example:
>>> if "dog":
... print("yes")
...
yes
This is confusing
This doesn't work with the == operator though:
>>> "dog" == True
False
>>> "dog" == False
False
Can anyone explain why == seems to act differently than a conditional?

See the truth value testing and comparisons sections of the documentation, excerpted below.
In a nutshell, most things are truthy by default, which is why bool("dog") is true. The == operator compares two objects for equality, as opposed to comparing their truthinesses, as I assume you had expected.
4.1. Truth Value Testing
Any object can be tested for truth value, for use in an if or while condition or as operand of the Boolean operations below.
By default, an object is considered true unless its class defines
either a __bool__() method that returns False or a __len__() method
that returns zero, when called with the object.
Here are most of the built-in objects considered false:
constants defined to be false: None and False
zero of any numeric type: 0, 0.0, 0j, Decimal(0), Fraction(0, 1)
empty sequences and collections: '', (), [], {}, set(), range(0)
Operations and built-in functions that have a Boolean result always
return 0 or False for false and 1 or True for true, unless otherwise
stated. (Important exception: the Boolean operations or and and
always return one of their operands.)
4.3. Comparisons
Objects of different types, except different numeric types, never
compare equal.
...
Non-identical instances of a class normally compare as non-equal
unless the class defines the __eq__() method.

The basics
I believe your confusion might come from comparing Python to languages such as JavaScript where there is a == and a === operator. Python does not work this way.
In Python the only way to compare for equality is with == and this compares both value and type.
Thus if you compare True == "dog", then the expression is immediately False because the types bool and str are not types that can be compared.
Although, note that it does not mean that there are no types that are comparable between themselves. Examples are set and frozenset:
frozenset({1,2,3}) == {1,2,3} # True
Or simply int and float
1 == 1.0 # True
This is the behaviour for most built-in types.
The classy part
In the case where you define your own types, i.e. when you define classes, you can write the __eq__ which is called when you compare a class object to another value.
By example you could do this (which by the way was pointed out as a terrible idea in the comments, you should not inherit built-in types).
class WeirdString(str):
def __eq__(self, other):
return str(self) == str(other) or bool(self) == bool(other)
s = WeirdString("dog")
s == True # True
In the case where you do not define __eq__, then Python fall back on comparing whether the objects are the same object with is.

When you compare "dog" == True, you are also comparing the type of these objects and not just their boolean value.
Now as True has a type bool and "dog" has a type str, they are not equivalent according to the == operator, irrespective of their boolean values being equal.
Note: Both the object's type,boolean values are being checked here.

Related

What's the difference between If not (variable) and if (variable) == false?

I'm learning Python and I just started learning conditionals with booleans
I am very confused though as to the specific topic of "If Not". Could someone please explain to me the difference between :
x = False
if not x:
print("hello")
if x == False:
print("hello")
When testing this code on a Python compiler, I receive "hello" twice. I can assume this means that they both mean the same thing to the computer.
Could someone please explain to me why one would use one method over the other method?
It depends™. Python doesn't know what any of its operators should do. It calls magic methods on objects and lets them decide. We can see this with a simple test
class Foo:
"""Demonstrates the difference between a boolean and equality test
by overriding the operations that implement them."""
def __bool__(self):
print("bool")
return True
def __eq__(self, other):
print("eq", repr(other))
return True
x = Foo()
print("This is a boolean operation without an additional parameter")
if not x:
print("one")
print("This is an equality operation with a parameter")
if x == False:
print("two")
Produces
This is a boolean operation without an additional parameter
bool
This is an equality operation with a parameter
eq False
two
In the first case, python did a boolean test by calling __bool__, and in the second, an equality test by calling __eq__. What this means depends on the class. Its usually obvious but things like pandas may decide to get tricky.
Usually not x is faster than x == False because the __eq__ operator will typically do a second boolean comparison before it knows for sure. In your case, when x = False you are dealing with a builtin class written in C and its two operations will be similar. But still, the x == False comparison needs to do a type check against the other side, so it will be a bit slower.
There are already several good answers here, but none discuss the general concept of "truthy" and "falsy" expressions in Python.
In Python, truthy expressions are expression that return True when converted to bool, and falsy expressions are expressions that return False when converted to bool. (Ref: Trey Hunner's regular expression tutorial; I'm not affiliated with Hunner, I just love his tutorials.)
Truthy stuff:
What's important here is that 0, 0.0, [], None and False are all falsy.
When used in an if statement, they will fail the test, and they will pass the test in an if not statement.
Falsy stuff:
Non-zero numbers, non-empty lists, many objects (but read #tdelaney's answer for more details here), and True are all truthy, so they pass if and fail if not tests.
Equality tests
When you use equality tests, you're not asking about the truthiness of an expression, you're asking whether it is equal to the other thing you provide, which is much more restrictive than general truthiness or falsiness.
EDIT: Additional references
Here are more references on "Truthy" and "Falsy" values in Python:
Truth value testing in the Python manual
The exhaustive list of Falsy values
Truthy and Falsy tutorial from freeCodeCamp
In one case you are checking for equality with the value "False", on the other you are performing a boolean test on a variable. In Python, several variables pass the "if not x" test but only x = False passes "if x == False".
See example code:
x = [] # an empty list
if not x: print("Here!")
# x passes this test
if x == False: print("There!")
# x doesn't pass this test
Try it with x = None: not x would be True then and x == False would be False. Unlike with x = False when both of these are True. not statement also accounts for an empty value.

Python3 comparison operators

Python 3 does not support comparison between different data types.
1 < '1' will execute with:
`TypeError: '<' not supported between instances of 'float' and 'str'`
But why does 1 == '1' (or something like 156 == ['foo']) returns False?
from the docs:
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality. A
motivation for this default behavior is the desire that all objects
should be reflexive (i.e. x is y implies x == y).
Sometimes we would like to know if two variable are the same, meaning that they refer to the same object, e.g. True is True will return True, but on the other hand "True" is True returns False, hence it makes sense that "True" == True returns False (I didn't provide the best use case for using is operator and this example will raise a SyntaxWarning in Python3.8+ but that's the main idea)
Because it makes sense to check if something is equal to something else (or is it something else) even if they are not of the same type. However, it doesn't make much sense to check which "quantity" is larger if they aren't of the same type because "quantity" may be defined in a different way for each type (in other words, the "quantity" might measure a different quality of the object).
A non-code example: an apple clearly can not be == to an orange. However, if we define the "quantity" of an apple to be its "redness", and the "quantity" of an orange to be its "taste", we can not check if an apple is > than an orange. > will try to compare different qualities of these objects.
Back to code:
It is clear that 4 is not (or is not equal to) the list [4]. But what meaning will a check like 4 > [4] have? what does it mean for an integer to be "smaller" or "larger" from a list?

How to check if all values in a dataframe are True

pd.DataFrame.all and pd.DataFrame.any convert to bool all values and than assert all identities with the keyword True. This is ok as long as we are fine with the fact that non-empty lists and strings evaluate to True. However let assume that this is not the case.
>>> pd.DataFrame([True, 'a']).all().item()
True # Wrong
A workaround is to assert equality with True, but a comparison to True does not sound pythonic.
>>> (pd.DataFrame([True, 'a']) == True).all().item()
False # Right
Question: can we assert for identity with True without using == True
First of all, I do not advise this. Please do not use mixed dtypes inside your dataframe columns - that defeats the purpose of dataframes and they are no more useful than lists and no more performant than loops.
Now, addressing your actual question, spolier alert, you can't get over the ==. But you can hide it using the eq function. You may use
df.eq(True).all()
Or,
df.where(df.eq(True), False).all()
Note that
df.where(df.eq(True), False)
0
0 True
1 False
Which you may find useful if you want to convert non-"True" values to False for any other reason.
I would actually use
(pd.DataFrame([True, 'a']) == True).all().item()
This way, you're checking for the value of the object, not just checking the "truthy-ness" of it.
This seems perfectly pythonic to me because you're explicitly checking for the value of the object, not just whether or not it's a truthy value.

False or None vs. None or False

In [20]: print None or False
-------> print(None or False)
False
In [21]: print False or None
-------> print(False or None)
None
This behaviour confuses me. Could someone explain to me why is this happening like this? I expected them to both behave the same.
The expression x or y evaluates to x if x is true, or y if x is false.
Note that "true" and "false" in the above sentence are talking about "truthiness", not the fixed values True and False. Something that is "true" makes an if statement succeed; something that's "false" makes it fail. "false" values include False, None, 0 and [] (an empty list).
The or operator returns the value of its first operand, if that value is true in the Pythonic boolean sense (aka its "truthiness"), otherwise it returns the value of its second operand, whatever it happens to be. See the subsection titled Boolean operations in the section on Expressions in the current online documentation.
In both your examples, the first operand is considered false, so the value of the second one becomes the result of evaluating the expression.
EXPLANATION
You must realize that True, False, and None are all singletons in Python, which means that there exist and can only exist one, single instance of a singleton object, hence the name singleton. Also, you can't modify a singleton object because its state is set in stone, if I may use that phrase.
Now, let me explain how those Python singletons are meant to be used.
Let's have a Python object named foo that has a value None, then if foo is not None is saying that foo has a value other than None. This works the same as saying if foo, which is basically if foo == True.
So, not None and True work the same way, as well as None and False.
>>> foo = not None
>>> bool(foo)
True
>>> foo = 5 # Giving an arbitrary value here
>>> bool(foo)
True
>>> foo = None
>>> bool(foo)
False
>>> foo = 5 # Giving an arbitrary value here
>>> bool(foo)
True
The critical thing to realize and be aware of when coding is that when comparing two objects, None needs is, but True and False need ==. Avoid using if foo == None, only use if foo is None. Also, avoid using if foo != None and only use if foo is not None.
WARNING
If you are using if foo or if not foo when the value of foo happens to be None, beware of potential bugs in your code. So, don't check for a potential None value in conditional statements this way. Be on the safe side by checking for it as explained above, i.e., if foo is None or if foo is not None. It is very important to follow best practices shared by Python experts.
REMINDER
True is a 1 and False is a 0.
In the old days of Python, we only had the integer 1 to represent a truthy value and we had the integer 0 to represent a falsy value. However, it is more understandable and human-friendly to say True instead of 1 and it is more understandable and human-friendly to say False instead of 0.
GOOD TO KNOW
The bool type (i.e., values True and False) are a subtype of the int type. So, if you use type hints and you annotate that a function/method returns either an int or a bool (i.e., -> int | bool or -> Union[int, bool]), then mypy (or any other static type checker) won't be able to correctly determine the return type of such a function/method. That's something you need to be aware of. There's no fix for this.
A closely related topic: Python's or and and short-circuit. In a logical or operation, if any argument is true, then the whole thing will be true and nothing else needs to be evaluated; Python promptly returns that "true" value. If it finishes and nothing was true, it returns the last argument it handled, which will be a "false" value.
and is the opposite, if it sees any false values, it will promptly exit with that "false" value, or if it gets through it all, returns the final "true" value.
>>> 1 or 2 # first value TRUE, second value doesn't matter
1
>>> 1 and 2 # first value TRUE, second value might matter
2
>>> 0 or 0.0 # first value FALSE, second value might matter
0.0
>>> 0 and 0.0 # first value FALSE, second value doesn't matter
0
From a boolean point of view they both behave the same, both return a value that evaluates to false.
or just "reuses" the values that it is given, returning the left one if that was true and the right one otherwise.
Condition1 or Condition2
if Condition1 is False then evalute and return Condition2.
None evalutes to False.

When is the `==` operator not equivalent to the `is` operator? (Python)

I noticed I can use the == operator to compare all the native data types (integers, strings, booleans, floating point numbers etc) and also lists, tuples, sets and dictionaries which contain native data types. In these cases the == operator checks if two objects are equal. But in some other cases (trying to compare instances of classes I created) the == operator just checks if the two variables reference the same object (so in these cases the == operator is equivalent to the is operator)
My question is: When does the == operator do more than just comparing identities?
EDIT: I'm using Python 3
In Python, the == operator is implemented in terms of the magic method __eq__, which by default implements it by identity comparison. You can, however, override the method in order to provide your own concept of object equality. Note, that if you do so, you will usually also override at least __ne__ (which implements the != operator) and __hash__, which computes a hash code for the instance.
I found it very helpful, even in Python, to make my __eq__ implementations comply with the rules set out in the Java language for implementations of the equals method, namely:
It is reflexive: for any non-null reference value x, x.equals(x) should return true.
It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
For any non-null reference value x, x.equals(null) should return false.
the last one should probably replace null with None, but the rules are not as easy here in Python as in Java.
== and is are always conceptually distinct: the former delegates to the left-hand object's __eq__ [1], the latter always checks identity, without any delegation. What seems to be confusing you is that object.__eq__ (which gets inherited by default by user-coded classes that don't override it, of course!) is implemented in terms of identity (after all, a bare object has absolutely nothing to check except its identity, so what else could it possibly do?!-).
[1] omitting for simplicity the legacy concept of the __cmp__ method, which is just a marginal complication and changes nothing important in the paragraph's gist;-).
The == does more than comparing identity when ints are involved. It doesn't just check that the two ints are the same object; it actually ensures their values match. Consider:
>>> x=10000
>>> y=10000
>>> x==y,x is y
(True, False)
>>> del x
>>> del y
>>> x=10000
>>> y=x
>>> x==y,x is y
(True, True)
The "standard" Python implementation does some stuff behind the scenes for small ints, so when testing with small values you may get something different. Compare this to the equivalent 10000 case:
>>> del y
>>> del x
>>> x=1
>>> y=1
>>> x==y,x is y
(True, True)
What is maybe most important point is that recommendation is to always use:
if myvalue is None:
not
if myvalue == None:
And never to use:
if myvalue is True:
but use:
if myvalue:
This later point is not so supper clear to me as I think there is times to separate the boolean True from other True values like "Alex Martelli" , say there is not False in "Alex Martelli" (absolutely not, it even raises exception :) ) but there is '' in "Alex Martelli" (as is in any other string).

Categories

Resources