String comparison in Python: is vs. == [duplicate] - python

This question already has answers here:
Why does comparing strings using either '==' or 'is' sometimes produce a different result?
(15 answers)
Closed 9 years ago.
I noticed a Python script I was writing was acting squirrelly, and traced it to an infinite loop, where the loop condition was while line is not ''. Running through it in the debugger, it turned out that line was in fact ''. When I changed it to !='' rather than is not '', it worked fine.
Also, is it generally considered better to just use '==' by default, even when comparing int or Boolean values? I've always liked to use 'is' because I find it more aesthetically pleasing and pythonic (which is how I fell into this trap...), but I wonder if it's intended to just be reserved for when you care about finding two objects with the same id.

For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
Not always. NaN is a counterexample. But usually, identity (is) implies equality (==). The converse is not true: Two distinct objects can have the same value.
Also, is it generally considered better to just use '==' by default, even
when comparing int or Boolean values?
You use == when comparing values and is when comparing identities.
When comparing ints (or immutable types in general), you pretty much always want the former. There's an optimization that allows small integers to be compared with is, but don't rely on it.
For boolean values, you shouldn't be doing comparisons at all. Instead of:
if x == True:
# do something
write:
if x:
# do something
For comparing against None, is None is preferred over == None.
I've always liked to use 'is' because
I find it more aesthetically pleasing
and pythonic (which is how I fell into
this trap...), but I wonder if it's
intended to just be reserved for when
you care about finding two objects
with the same id.
Yes, that's exactly what it's for.

I would like to show a little example on how is and == are involved in immutable types. Try that:
a = 19998989890
b = 19998989889 +1
>>> a is b
False
>>> a == b
True
is compares two objects in memory, == compares their values. For example, you can see that small integers are cached by Python:
c = 1
b = 1
>>> b is c
True
You should use == when comparing values and is when comparing identities. (Also, from an English point of view, "equals" is different from "is".)

The logic is not flawed. The statement
if x is y then x==y is also True
should never be read to mean
if x==y then x is y
It is a logical error on the part of the reader to assume that the converse of a logic statement is true. See http://en.wikipedia.org/wiki/Converse_(logic)

See This question
Your logic in reading
For all built-in Python objects (like
strings, lists, dicts, functions,
etc.), if x is y, then x==y is also
True.
is slightly flawed.
If is applies then == will be True, but it does NOT apply in reverse. == may yield True while is yields False.

Related

Where is the logical 'or' equivalent in the 'operator' module?

Adding or multiplying a large list of numbers in Python can elegantly be done by folding the list with the addition or multiplication operator:
import functools, operator
lst = range(1,100)
sum = functools.reduce(operator.add, lst)
prod = functools.reduce(operator.mul, lst)
This needs the function equivalents of the operators + and * which
are provided by the operator module as operator.add and
operator.mul, respectively.
If I want to use the same idiom with the operator or:
ingredients = ['onion', 'celery', 'cyanide', 'chicken stock']
soup_is_poisonous = functools.reduce(operator.or, map(is_poisonous, ingredients))
... then I discover that operator doesn't have a function equivalent of the logical and and or operators (though it has one for logical not)
Of course, I can trivially write one that works:
def operator_or(x,y):
return x or y
But I wonder: why are there no operator.or and operator.and in operator? Bitwise and and or are there, but not the logical ones.
Of course this is just a minor annoyance, and the answer may well be
the same as with the missing identity function:
that it is easy to write one. But this holds for * and + as well, so why the difference?
To wrap up all your helpful answers
and comments, in order of somewhat decreasing (to me) convincingness:
the addition of operator.or would break an important promise made by the module
For all operators <op> that have function equivalents
operator.op in the operator module, it is the case that a <op> b is equivalent to (i.e. can always, without changing program
behaviour, replace or be replaced by) operator.op(a, b). This
equivalence is actually mentioned in the module docstring. This is
impossible to do for the operators and and or as their
evaluation is short-circuiting while Python function calls are always evaluated after all of their arguments are.
On the values True and False, | and &, hence also the existing (bitwise) operator.and_ and operator.or_
already return the same results (if they return at all, that is) as or and and.
If is_poisonous() returns either True of False (not an unreasonable requirement), I could use
soup_is_poisonous = reduce(operator.or_, map(is_poisonous, ingredients), False)
in the example from the original question. However, many Python
programs conveniently use any "truthy" value as True in idioms like
your_model_T_color = "black" or any_color_you_like
using | or operator.or_ instead of or here will result in a
TypeError or, even worse, some unexpected value (if the operands
are ints)
The functions any and all can be used instead of
functools.reduce(operator.or, ....)
I'm not convinced by this
argument: operator functions are used in many more contexts than
as a first argument to reduce. Moreover, any always returns
either True or False, not the first truthy value:
any([0,0,0,5,6,7]) # returns True
reduce(lambda x, y: x or y, [0,0,0,5,6,7]) # returns 5
so any and reduce(operator.or would not really be equivalent
any([x,y]) does the same (and more, as it accepts iterables) as operator.or(x,y) would.
That is not quite true (see above), any([0,5]) returns True while operator.or(0,5) would return 5. Moreover, the number of arguments matters greatly if we use a function as an argument to another function like reduce()
all is short-circuiting logical-and.
any is short-circuiting logical-or.
No need to put versions that take exactly two arguments (instead of an iterable) into the operator module, I guess.

Python3 comparison operators

Python 3 does not support comparison between different data types.
1 < '1' will execute with:
`TypeError: '<' not supported between instances of 'float' and 'str'`
But why does 1 == '1' (or something like 156 == ['foo']) returns False?
from the docs:
The default behavior for equality comparison (== and !=) is based on
the identity of the objects. Hence, equality comparison of instances
with the same identity results in equality, and equality comparison of
instances with different identities results in inequality. A
motivation for this default behavior is the desire that all objects
should be reflexive (i.e. x is y implies x == y).
Sometimes we would like to know if two variable are the same, meaning that they refer to the same object, e.g. True is True will return True, but on the other hand "True" is True returns False, hence it makes sense that "True" == True returns False (I didn't provide the best use case for using is operator and this example will raise a SyntaxWarning in Python3.8+ but that's the main idea)
Because it makes sense to check if something is equal to something else (or is it something else) even if they are not of the same type. However, it doesn't make much sense to check which "quantity" is larger if they aren't of the same type because "quantity" may be defined in a different way for each type (in other words, the "quantity" might measure a different quality of the object).
A non-code example: an apple clearly can not be == to an orange. However, if we define the "quantity" of an apple to be its "redness", and the "quantity" of an orange to be its "taste", we can not check if an apple is > than an orange. > will try to compare different qualities of these objects.
Back to code:
It is clear that 4 is not (or is not equal to) the list [4]. But what meaning will a check like 4 > [4] have? what does it mean for an integer to be "smaller" or "larger" from a list?

Usage of arithmetic operations on bool values True and False

In python, there is such a feature - True and False can be added, subtracted, etc
Are there any examples where this can be useful?
Is there any real benefit from this feature, for example, when:
it increases productivity
it makes the code more concise (without losing speed)
etc
While in most cases it would just be confusing and completely unwarranted to (ab)use this functionality, I'd argue that there are a few cases that are exceptions.
One example would be counting. True casts to 1, so you can count the number of elements that pass some criteria in this fashion, while remaining concise and readable. An example of this would be:
valid_elements = sum(is_valid(element) for element in iterable)
As mentioned in the comments, this could be accomplished via:
valid_elements = list(map(is_valid, iterable)).count(True)
but to use .count(...), the object must be a list, which imposes a linear space complexity (iterable may have been a constant space generator for all we know).
Another case where this functionality might be usable is as a play on the ternary operator for sequences, where you either want the sequence or an empty sequence depending on the value. Say you want to return the resulting list if a condition holds, otherwise an empty list:
return result_list * return_empty
or if you are doing a conditional string concatentation
result = str1 + str2 * do_concatenate
of course, both of these could be solved by using python's ternary operator:
return [] if return_empty else result_list
...
result = str1 + str2 if do_concatenate else str1
The point being, this behavior does provide other options in a few scenarios that isn't all too unreasonable. Its just a matter of using your best judgement as to whether it'll cause confusion for future readers (yourself included).
I would avoid it at all cost. It is confusing and goes against typing. Python being permissive does not mean you should do it ...

Why does [] and bool return []? [duplicate]

This question already has answers here:
and / or operators return value [duplicate]
(4 answers)
Closed 4 years ago.
I am trying to return a boolean in a function like this:
return mylist and any(condition(x) for x in mylist)
The behavior should be to return True if the list is empty or if any element in it meets the condition. I am using the first operand as a shortcircuit since any would return True if the list was empty, which is not what I am after.
I would expect [] and boolval to return False since the list is empty, but to my surprise it returns [] whether boolval is True or False. I would expect the first operand to be automatically evaluated as a boolean since it is involved in a comparison operation, and not whatever is happening.
I am not really asking how to solve my problem, which is easily done by an explicit type conversion: bool(mylist), but rather asking what is happening and why.
edit: when I ask "why" this is happening I am not looking for the "facts" only, as they are already explained in the linked duplicate question, but also the reasons behind the implementation of this behavior.
The and and or operators do not return True/False. They return the last thing evaluated (that's the case in other dynamic languages too, eg. javascript).
The official documentation describes that
for and, the first falsy value, or the last operand
for or, the first truthy value, or the last operand
That's by design, so you can create expressions like return username or 'guest'. So, if you want guarantee that a boolean value is returned, you have to
return bool(x or y)
instead of
return x or y
Because as khelwood said:
x and y gives x if x is falsey, otherwise it gives y.
That's the point, (and is not or :-)), so still best is:
return all([my_list,any(condition(x) for x in my_list)])
This has to do with how python evaluate the expression.
An empty list is considered as false by python, that means that the code after 'and' will not be executed, as this will not change the result.
Python does not need to convert the empty list into bool as it is not compared to anything, and just return it as empty list.
This shouldn't change anything for you, if you test the returned value of the function, it will be evaluate the same way as if the function did return False.

When is the `==` operator not equivalent to the `is` operator? (Python)

I noticed I can use the == operator to compare all the native data types (integers, strings, booleans, floating point numbers etc) and also lists, tuples, sets and dictionaries which contain native data types. In these cases the == operator checks if two objects are equal. But in some other cases (trying to compare instances of classes I created) the == operator just checks if the two variables reference the same object (so in these cases the == operator is equivalent to the is operator)
My question is: When does the == operator do more than just comparing identities?
EDIT: I'm using Python 3
In Python, the == operator is implemented in terms of the magic method __eq__, which by default implements it by identity comparison. You can, however, override the method in order to provide your own concept of object equality. Note, that if you do so, you will usually also override at least __ne__ (which implements the != operator) and __hash__, which computes a hash code for the instance.
I found it very helpful, even in Python, to make my __eq__ implementations comply with the rules set out in the Java language for implementations of the equals method, namely:
It is reflexive: for any non-null reference value x, x.equals(x) should return true.
It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
For any non-null reference value x, x.equals(null) should return false.
the last one should probably replace null with None, but the rules are not as easy here in Python as in Java.
== and is are always conceptually distinct: the former delegates to the left-hand object's __eq__ [1], the latter always checks identity, without any delegation. What seems to be confusing you is that object.__eq__ (which gets inherited by default by user-coded classes that don't override it, of course!) is implemented in terms of identity (after all, a bare object has absolutely nothing to check except its identity, so what else could it possibly do?!-).
[1] omitting for simplicity the legacy concept of the __cmp__ method, which is just a marginal complication and changes nothing important in the paragraph's gist;-).
The == does more than comparing identity when ints are involved. It doesn't just check that the two ints are the same object; it actually ensures their values match. Consider:
>>> x=10000
>>> y=10000
>>> x==y,x is y
(True, False)
>>> del x
>>> del y
>>> x=10000
>>> y=x
>>> x==y,x is y
(True, True)
The "standard" Python implementation does some stuff behind the scenes for small ints, so when testing with small values you may get something different. Compare this to the equivalent 10000 case:
>>> del y
>>> del x
>>> x=1
>>> y=1
>>> x==y,x is y
(True, True)
What is maybe most important point is that recommendation is to always use:
if myvalue is None:
not
if myvalue == None:
And never to use:
if myvalue is True:
but use:
if myvalue:
This later point is not so supper clear to me as I think there is times to separate the boolean True from other True values like "Alex Martelli" , say there is not False in "Alex Martelli" (absolutely not, it even raises exception :) ) but there is '' in "Alex Martelli" (as is in any other string).

Categories

Resources