Python != operation vs "is not" - python

In a comment on this question, I saw a statement that recommended using
result is not None
vs
result != None
What is the difference? And why might one be recommended over the other?

== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)
is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can't influence the is operation.
You use is (and is not) for singletons, like None, where you don't care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.

First, let me go over a few terms. If you just want your question answered, scroll down to "Answering your question".
Definitions
Object identity: When you create an object, you can assign it to a variable. You can then also assign it to another variable. And another.
>>> button = Button()
>>> cancel = button
>>> close = button
>>> dismiss = button
>>> print(cancel is close)
True
In this case, cancel, close, and dismiss all refer to the same object in memory. You only created one Button object, and all three variables refer to this one object. We say that cancel, close, and dismiss all refer to identical objects; that is, they refer to one single object.
Object equality: When you compare two objects, you usually don't care that it refers to the exact same object in memory. With object equality, you can define your own rules for how two objects compare. When you write if a == b:, you are essentially saying if a.__eq__(b):. This lets you define a __eq__ method on a so that you can use your own comparison logic.
Rationale for equality comparisons
Rationale: Two objects have the exact same data, but are not identical. (They are not the same object in memory.)
Example: Strings
>>> greeting = "It's a beautiful day in the neighbourhood."
>>> a = unicode(greeting)
>>> b = unicode(greeting)
>>> a is b
False
>>> a == b
True
Note: I use unicode strings here because Python is smart enough to reuse regular strings without creating new ones in memory.
Here, I have two unicode strings, a and b. They have the exact same content, but they are not the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the unicode object has implemented the __eq__ method.
class unicode(object):
# ...
def __eq__(self, other):
if len(self) != len(other):
return False
for i, j in zip(self, other):
if i != j:
return False
return True
Note: __eq__ on unicode is definitely implemented more efficiently than this.
Rationale: Two objects have different data, but are considered the same object if some key data is the same.
Example: Most types of model data
>>> import datetime
>>> a = Monitor()
>>> a.make = "Dell"
>>> a.model = "E770s"
>>> a.owner = "Bob Jones"
>>> a.warranty_expiration = datetime.date(2030, 12, 31)
>>> b = Monitor()
>>> b.make = "Dell"
>>> b.model = "E770s"
>>> b.owner = "Sam Johnson"
>>> b.warranty_expiration = datetime.date(2005, 8, 22)
>>> a is b
False
>>> a == b
True
Here, I have two Dell monitors, a and b. They have the same make and model. However, they neither have the same data nor are the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the Monitor object implemented the __eq__ method.
class Monitor(object):
# ...
def __eq__(self, other):
return self.make == other.make and self.model == other.model
Answering your question
When comparing to None, always use is not. None is a singleton in Python - there is only ever one instance of it in memory.
By comparing identity, this can be performed very quickly. Python checks whether the object you're referring to has the same memory address as the global None object - a very, very fast comparison of two numbers.
By comparing equality, Python has to look up whether your object has an __eq__ method. If it does not, it examines each superclass looking for an __eq__ method. If it finds one, Python calls it. This is especially bad if the __eq__ method is slow and doesn't immediately return when it notices that the other object is None.
Did you not implement __eq__? Then Python will probably find the __eq__ method on object and use that instead - which just checks for object identity anyway.
When comparing most other things in Python, you will be using !=.

Consider the following:
class Bad(object):
def __eq__(self, other):
return True
c = Bad()
c is None # False, equivalent to id(c) == id(None)
c == None # True, equivalent to c.__eq__(None)

None is a singleton, and therefore identity comparison will always work, whereas an object can fake the equality comparison via .__eq__().

>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True
Some objects are singletons, and thus is with them is equivalent to ==. Most are not.

Related

What does set() in Python use to test equality between objects?

I have written some code in Python which has a class called product and overrided the magic functions __eq__ and __hash__. Now I need to make a set which should remove duplicates from the list based on the ID of the product. As you can see the output of this code the hashes of two objects are the same, yet when i make a set of those two objects the length is 2 not one.
But, when i change the __eq__ method of the code to this
def __eq__(self, b) -> bool:
if self.id == b.id:
return True
return False
and use it with the same hash function it works and the length of the set is 1. So i am confused whether the set data-structure uses the __eq__ method to test for equality or the __hash__ method.
Equality tests can be expensive, so the set starts by comparing hashes. If the hashes are not equal, then the check ends. If the hashes are equal, the set then tests for equality. If it only used __eq__, it might have to do a lot of unnecessary work, but if it only used __hash__, there would be no way to resolve a hash collision.
Here's a simple example of using equality to resolve a hash collision. All integers are their own hashes, except for -1:
>>> hash(-1)
-2
>>> hash(-2)
-2
>>> s = set()
>>> s.add(-1)
>>> -2 in s
False
Here's an example of the set skipping an equality check because the hashes aren't equal. Let's subclass an int so it return a new hash every second:
>>> class TimedInt(int):
... def __hash__(self):
... return int(time.time())
...
>>> a = TimedInt(5)
>>> a == 5
True
>>> a == a
True
>>> s = set()
>>> s.add(a) # Now wait a few seconds...
>>> a in s
False

Best Practice for Equality in Python

is there a best practice to determine the equality of two arbitrary python objects?
Let's say I write a container for some sort of object and I need to figure out whether new objects are equal to the old ones stored into the container.
The problem is I cannot use "is" since this will only check if the variables are bound to the very same object (but we might have a deep copy of an object, which is in my sense equal to its original). I cannot use "==" either, since some of these objects return an element-wise equal, like numpy arrays.
Is there a best practice to determine the equality of any kind of objects?
For instance would
repr(objectA)==repr(objectB)
suffice?
Or is it common to use:
numpy.all(objectA==objectB)
Which probably fails if objectA == objectB evaluates to "[]"
Cheers,
Robert
EDIT:
Ok, regarding the 3rd comment, I elaborate more on
"What's your definition of "equal objects"?"
In the strong sense I don't have any definition of equality, I rather let the objects decide whether they are equal or not. The problem is, as far as I understand, there is no well agreed standard for eq or ==,respectively. The statement can return arrays or all kinds of things.
What I have in mind is to have some operator lets call it SEQ (strong equality) in between eq and "is".
SEQ is superior to eq in the sense that it will always evaluate to a single boolean value (for numpy arrays that could mean all elements are equal, for example) and determine if the objects consider themselves equal or not. But SEQ would be inferior to "is" in the sense that objects that are distinct in memory can be equal as well.
I suggest you write a custom recursive equality-checker, something like this:
from collections import Sequence, Mapping, Set
import numpy as np
def nested_equal(a, b):
"""
Compare two objects recursively by element, handling numpy objects.
Assumes hashable items are not mutable in a way that affects equality.
"""
# Use __class__ instead of type() to be compatible with instances of
# old-style classes.
if a.__class__ != b.__class__:
return False
# for types that implement their own custom strict equality checking
seq = getattr(a, "seq", None)
if seq and callable(seq):
return seq(b)
# Check equality according to type type [sic].
if isinstance(a, basestring):
return a == b
if isinstance(a, np.ndarray):
return np.all(a == b)
if isinstance(a, Sequence):
return all(nested_equal(x, y) for x, y in zip(a, b))
if isinstance(a, Mapping):
if set(a.keys()) != set(b.keys()):
return False
return all(nested_equal(a[k], b[k]) for k in a.keys())
if isinstance(a, Set):
return a == b
return a == b
The assumption that hashable objects are not mutable in a way that affects equality is rather safe, since it would break dicts and sets if such objects were used as keys.

How is the 'is' keyword implemented in Python?

... the is keyword that can be used for equality in strings.
>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False
I tried both __is__() and __eq__() but they didn't work.
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __is__(self, s):
... return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __eq__(self, s):
... return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>
Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.
is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.
For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.
The is operator is equivalent to comparing id(x) values. For example:
>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2) # equivalent to `s1 is s2`
True
id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.
So, you can't. Unusual in python, but there it is.
The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).
However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.
The is keyword compares objects (or, rather, compares if two references are to the same object).
Which is, I think, why there's no mechanism to provide your own implementation.
It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.
>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True
You can hopefully see the reference vs data comparison in a simple 'copy' example:
>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False
If you are not afraid of messing up with bytecode, you can intercept and patch COMPARE_OP with 8 ("is") argument to call your hook function on objects being compared. Look at dis module documentation for start-in.
And don't forget to intercept __builtin__.id() too if someone will do id(a) == id(b) instead of a is b.
'is' compares object identity whereas == compares values.
Example:
a=[1,2]
b=[1,2]
#a==b returns True
#a is b returns False
p=q=[1,2]
#p==q returns True
#p is q returns True
is fails to compare a string variable to string value and two string variables when the string starts with '-'. My Python version is 2.6.6
>>> s = '-hi'
>>> s is '-hi'
False
>>> s = '-hi'
>>> k = '-hi'
>>> s is k
False
>>> '-hi' is '-hi'
True
You can't overload the is operator. What you want to overload is the == operator. This can be done by defining a __eq__ method in the class.
You are using identity comparison. == is probably what you want. The exception to this is when you want to be checking if one item and another are the EXACT same object and in the same memory position. In your examples, the item's aren't the same, since one is of a different type (my_string) than the other (string). Also, there's no such thing as someclass.__is__ in python (unless, of course, you put it there yourself). If there was, comparing objects with is wouldn't be reliable to simply compare the memory locations.
When I first encountered the is keyword, it confused me as well. I would have thought that is and == were no different. They produced the same output from the interpreter on many objects. This type of assumption is actually EXACTLY what is... is for. It's the python equivalent "Hey, don't mistake these two objects. they're different.", which is essentially what [whoever it was that straightened me out] said. Worded much differently, but one point == the other point.
the
for some helpful examples and some text to help with the sometimes confusing differences
visit a document from python.org's mail host written by "Danny Yoo"
or, if that's offline, use the unlisted pastebin I made of it's body.
in case they, in some 20 or so blue moons (blue moons are a real event), are both down, I'll quote the code examples
###
>>> my_name = "danny"
>>> your_name = "ian"
>>> my_name == your_name
0 #or False
###
###
>>> my_name[1:3] == your_name[1:3]
1 #or True
###
###
>>> my_name[1:3] is your_name[1:3]
0
###
Assertion Errors can easily arise with is keyword while comparing objects. For example, objects a and b might hold same value and share same memory address. Therefore, doing an
>>> a == b
is going to evaluate to
True
But if
>>> a is b
evaluates to
False
you should probably check
>>> type(a)
and
>>> type(b)
These might be different and a reason for failure.
Because string interning, this could look strange:
a = 'hello'
'hello' is a #True
b= 'hel-lo'
'hel-lo' is b #False

What are the semantics of the 'is' operator in Python?

How does the is operator determine if two objects are the same? How does it work? I can't find it documented.
From the documentation:
Every object has an identity, a type
and a value. An object’s identity
never changes once it has been
created; you may think of it as the
object’s address in memory. The ‘is‘
operator compares the identity of two
objects; the id() function returns an
integer representing its identity
(currently implemented as its
address).
This would seem to indicate that it compares the memory addresses of the arguments, though the fact that it says "you may think of it as the object's address in memory" might indicate that the particular implementation is not guranteed; only the semantics are.
Comparison Operators
Is works by comparing the object referenced to see if the operands point to the same object.
>>> a = [1, 2]
>>> b = a
>>> a is b
True
>>> c = [1, 2]
>>> a is c
False
c is not the same list as a therefore the is relation is false.
To add to the other answers, you can think of a is b working as if it was is_(a, b):
def is_(a, b):
return id(a) == id(b)
Note that you cannot directly replace a is b with id(a) == id(b), but the above function avoids that through parameters.

Is there any difference between "foo is None" and "foo == None"?

Is there any difference between:
if foo is None: pass
and
if foo == None: pass
The convention that I've seen in most Python code (and the code I myself write) is the former, but I recently came across code which uses the latter. None is an instance (and the only instance, IIRC) of NoneType, so it shouldn't matter, right? Are there any circumstances in which it might?
is always returns True if it compares the same object instance
Whereas == is ultimately determined by the __eq__() method
i.e.
>>> class Foo(object):
def __eq__(self, other):
return True
>>> f = Foo()
>>> f == None
True
>>> f is None
False
You may want to read this object identity and equivalence.
The statement 'is' is used for object identity, it checks if objects refer to the same instance (same address in memory).
And the '==' statement refers to equality (same value).
A word of caution:
if foo:
# do something
Is not exactly the same as:
if x is not None:
# do something
The former is a boolean value test and can evaluate to false in different contexts. There are a number of things that represent false in a boolean value tests for example empty containers, boolean values. None also evaluates to false in this situation but other things do too.
(ob1 is ob2) equal to (id(ob1) == id(ob2))
The reason foo is None is the preferred way is that you might be handling an object that defines its own __eq__, and that defines the object to be equal to None. So, always use foo is None if you need to see if it is infact None.
There is no difference because objects which are identical will of course be equal. However, PEP 8 clearly states you should use is:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
is tests for identity, not equality. For your statement foo is none, Python simply compares the memory address of objects. It means you are asking the question "Do I have two names for the same object?"
== on the other hand tests for equality as determined by the __eq__() method. It doesn't cares about identity.
In [102]: x, y, z = 2, 2, 2.0
In [103]: id(x), id(y), id(z)
Out[103]: (38641984, 38641984, 48420880)
In [104]: x is y
Out[104]: True
In [105]: x == y
Out[105]: True
In [106]: x is z
Out[106]: False
In [107]: x == z
Out[107]: True
None is a singleton operator. So None is None is always true.
In [101]: None is None
Out[101]: True
For None there shouldn't be a difference between equality (==) and identity (is). The NoneType probably returns identity for equality. Since None is the only instance you can make of NoneType (I think this is true), the two operations are the same. In the case of other types this is not always the case. For example:
list1 = [1, 2, 3]
list2 = [1, 2, 3]
if list1==list2: print "Equal"
if list1 is list2: print "Same"
This would print "Equal" since lists have a comparison operation that is not the default returning of identity.
#Jason:
I recommend using something more along the lines of
if foo:
#foo isn't None
else:
#foo is None
I don't like using "if foo:" unless foo truly represents a boolean value (i.e. 0 or 1). If foo is a string or an object or something else, "if foo:" may work, but it looks like a lazy shortcut to me. If you're checking to see if x is None, say "if x is None:".
Some more details:
The is clause actually checks if the two objects are at the same
memory location or not. i.e whether they both point to the same
memory location and have the same id.
As a consequence of 1, is ensures whether, or not, the two lexically represented objects have identical attributes (attributes-of-attributes...) or not
Instantiation of primitive types like bool, int, string(with some exception), NoneType having a same value will always be in the same memory location.
E.g.
>>> int(1) is int(1)
True
>>> str("abcd") is str("abcd")
True
>>> bool(1) is bool(2)
True
>>> bool(0) is bool(0)
True
>>> bool(0)
False
>>> bool(1)
True
And since NoneType can only have one instance of itself in the python's "look-up" table therefore the former and the latter are more of a programming style of the developer who wrote the code(maybe for consistency) rather then having any subtle logical reason to choose one over the other.
John Machin's conclusion that None is a singleton is a conclusion bolstered by this code.
>>> x = None
>>> y = None
>>> x == y
True
>>> x is y
True
>>>
Since None is a singleton, x == None and x is None would have the same result. However, in my aesthetical opinion, x == None is best.
a is b # returns true if they a and b are true alias
a == b # returns true if they are true alias or they have values that are deemed equivalence
a = [1,3,4]
b = a[:] #creating copy of list
a is b # if gives false
False
a == b # gives true
True

Categories

Resources