What are the semantics of the 'is' operator in Python? - python

How does the is operator determine if two objects are the same? How does it work? I can't find it documented.

From the documentation:
Every object has an identity, a type
and a value. An object’s identity
never changes once it has been
created; you may think of it as the
object’s address in memory. The ‘is‘
operator compares the identity of two
objects; the id() function returns an
integer representing its identity
(currently implemented as its
address).
This would seem to indicate that it compares the memory addresses of the arguments, though the fact that it says "you may think of it as the object's address in memory" might indicate that the particular implementation is not guranteed; only the semantics are.

Comparison Operators
Is works by comparing the object referenced to see if the operands point to the same object.
>>> a = [1, 2]
>>> b = a
>>> a is b
True
>>> c = [1, 2]
>>> a is c
False
c is not the same list as a therefore the is relation is false.

To add to the other answers, you can think of a is b working as if it was is_(a, b):
def is_(a, b):
return id(a) == id(b)
Note that you cannot directly replace a is b with id(a) == id(b), but the above function avoids that through parameters.

Related

Comparison operator "==" for value equality or reference equality?

So many tutorials have stated that the == comparison operator is for value equality, like in this answer, quote:
== is for value equality. Use it when you would like to know if two objects have the same value.
is is for reference equality. Use it when you would like to know if two references refer to the same object.
However, I found that the Python doc says that:
x==y calls x.__eq__(y). By default, object implements __eq__() by using is, returning NotImplemented in the case of a false comparison: True if x is y else NotImplemented."
It seems the default behavior of the == operator is to compare the reference quality like the is operator, which contradicts what these tutorials say.
So what exactly should I use == for? value equality or reference equality? Or it just depends on how you implement the __eq__ method.
I think the doc of Value comparisons has illustrated this question clearly:
The operators <, >, ==, >=, <=, and != compare the values of two objects. The value of an object is a rather abstract notion in Python. Comparison operators implement a particular notion of what the value of an object is. One can think of them as defining the value of an object indirectly, by means of their comparison implementation.
The behavior of the default equality comparison, that instances with different identities are always unequal, may be in contrast to what types will need that have a sensible definition of object value and value-based equality. Such types will need to customize their comparison behavior, and in fact, a number of built-in types have done that.
The default behavior for equality comparison (== and !=) is based on the identity of the objects. Hence, equality comparison of instances with the same identity results in equality, and equality comparison of instances with different identities results in inequality. A motivation for this default behavior is the desire that all objects should be reflexive (i.e. x is y implies x == y).
It also includes a list that describes the comparison behavior of the most important built-in types like numbers, strings and sequences, etc.
It solely depends on what __eq__ does. The default __eq__ of type object behaves like is. Some builtin datatypes use their own implementation. For example, two lists are equal if all their values are equal. You just have to know this.
object implements __eq__() by using is, but many classes in the standard library implement __eq__() using value equality. E.g.:
>>> l1 = [1, 2, 3]
>>> l2 = [1, 2, 3]
>>> l3 = l1
>>> l1 is l2
False
>>> l1 == l2
True
>>> l1 is l3
True
In your own classes, you can implement __eq__() as it makes sense, e.g.:
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.x
To add your thought process with a simple definition ,
The Equality operator == compares the values of both the operands and checks for value equality. Whereas the is operator checks whether both the operands refer to the same object or not (present in the same memory location).
In a nutshell, is checks whether two references point to the same object or not.== checks whether two objects have the same value or not.
For example:
a=[1,2,3]
b=a #a and b point to the same object
c=list(a) #c points to different object
if a==b:
print('#') #output:#
if a is b:
print('##') #output:##
if a==c:
print('###') #output:##
if a is c:
print('####') #no output as c and a point to different object
one = 1
a = one
b = one
if (a == b): # this if works like this: if (1 == 1)
if (a is 1): # this if works like this: if (int.object(1, memory_location:somewhere) == int.object(1, memory_location:variable.one))
thus, a is 1 won't work because its arguments are not pointing to the same location.

Is the "is" Python operator reliable to test reference equality of mutable objects?

I know the is operator in Python has an unexpected behavior on immutable objects like integers and strings. See "is" operator behaves unexpectedly with integers
>>> a = 0
>>> b = 0
>>> a is b
True # Unexpected, we assigned b independently from a
When it comes to mutable objects, are we guaranteed that two variables expected (as written in the code) to reference two distinct objects (with equal value), will not be internally bound to the same object ? (Until we mutate one of the two variables, then of course the references will differ.)
>>> a = [0]
>>> b = [0]
>>> a is b
# Is False guaranteed ?
Put in other words, if somewhere x is y returns True (x and y being mutable objects), are we guaranteed that mutating x will mutate y as well ?
So long as you think some "is" behavior is "unexpected", your mental model falls short of reality ;-)
Your question is really about when Python guarantees to create a new object. And when it doesn't. For mutable objects, yes, a constructor (including a literal) yielding a mutable object always creates a new object. That's why:
>>> a = [0]
>>> b = [0]
>>> a is b
is always False. Python could have said that it's undefined whether each instance of [0] creates a new object, but it doesn't: it guarantees each instance always creates a new object. is behavior is a consequence of that, not a driver of that.
Similarly,
>>> a = set()
>>> b = set()
>>> a is b
False
is also guaranteed. Because set() returns a mutable object, it always guarantees to create a new such object.
But for immutable objects, it's not defined. For example, the result of this is not defined:
>>> a = frozenset()
>>> b = frozenset()
>>> a is b
frozenset() - like integer literals - returns an immutable object, and it's up to the implementation whether to return a new object or reuse an existing one. In this specific example, a is b is True, because the implementation du jour happens to reuse an empty frozenset. But, e.g., it just so happens that
>>> a = frozenset([3])
>>> b = frozenset([3])
>>> a is b
False
today. It could just as well return True tomorrow (although that's unlikely - while an empty frozenset is an easy-to-detect special case, it would be expensive to ensure uniqueness across all frozenset objects).

Python object references

I'm aware that in python every identifier or variable name is a reference to the actual object.
a = "hello"
b = "hello"
When I compare the two strings
a == b
the output is
True
If I write an equivalent code in Java,the output would be false because the comparison is between references(which are different) but not the actual objects.
So what i see here is that the references(variable names) are replaced by actual objects by the interpreter at run time.
So,is is safe for me to assume that "Every time the interpreter sees an already assigned variable name,it replaces it with the object it is referring to" ? I googled it but couldn't find any appropriate answer I was looking for.
If you actually ran that in Java, I think you'd find it probably prints out true because of string interning, but that's somewhat irrelevant.
I'm not sure what you mean by "replaces it with the object it is referring to". What actually happens is that when you write a == b, Python calls a.__eq__(b), which is just like any other method call on a with b as an argument.
If you want an equivalent to Java-like ==, use the is operator: a is b. That compares whether the name a refers to the same object as b, regardless of whether they compare as equal.
Python interning:
>>> a = "hello"
>>> b = "hello"
>>> c = "world"
>>> id(a)
4299882336
>>> id(b)
4299882336
>>> id(c)
4299882384
Short strings tend to get interned automatically, explaining why a is b == True. See here for more.
To show that equal strings don't always have the same id
>>> a = "hello"+" world"
>>> b = "hello world"
>>> c = a
>>> a == b
True
>>> a is b
False
>>> b is c
False
>>> a is c
True
also:
>>> str([]) == str("[]")
True
>>> str([]) is str("[]")
False

How is the 'is' keyword implemented in Python?

... the is keyword that can be used for equality in strings.
>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False
I tried both __is__() and __eq__() but they didn't work.
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __is__(self, s):
... return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __eq__(self, s):
... return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>
Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.
is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.
For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.
The is operator is equivalent to comparing id(x) values. For example:
>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2) # equivalent to `s1 is s2`
True
id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.
So, you can't. Unusual in python, but there it is.
The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).
However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.
The is keyword compares objects (or, rather, compares if two references are to the same object).
Which is, I think, why there's no mechanism to provide your own implementation.
It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.
>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True
You can hopefully see the reference vs data comparison in a simple 'copy' example:
>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False
If you are not afraid of messing up with bytecode, you can intercept and patch COMPARE_OP with 8 ("is") argument to call your hook function on objects being compared. Look at dis module documentation for start-in.
And don't forget to intercept __builtin__.id() too if someone will do id(a) == id(b) instead of a is b.
'is' compares object identity whereas == compares values.
Example:
a=[1,2]
b=[1,2]
#a==b returns True
#a is b returns False
p=q=[1,2]
#p==q returns True
#p is q returns True
is fails to compare a string variable to string value and two string variables when the string starts with '-'. My Python version is 2.6.6
>>> s = '-hi'
>>> s is '-hi'
False
>>> s = '-hi'
>>> k = '-hi'
>>> s is k
False
>>> '-hi' is '-hi'
True
You can't overload the is operator. What you want to overload is the == operator. This can be done by defining a __eq__ method in the class.
You are using identity comparison. == is probably what you want. The exception to this is when you want to be checking if one item and another are the EXACT same object and in the same memory position. In your examples, the item's aren't the same, since one is of a different type (my_string) than the other (string). Also, there's no such thing as someclass.__is__ in python (unless, of course, you put it there yourself). If there was, comparing objects with is wouldn't be reliable to simply compare the memory locations.
When I first encountered the is keyword, it confused me as well. I would have thought that is and == were no different. They produced the same output from the interpreter on many objects. This type of assumption is actually EXACTLY what is... is for. It's the python equivalent "Hey, don't mistake these two objects. they're different.", which is essentially what [whoever it was that straightened me out] said. Worded much differently, but one point == the other point.
the
for some helpful examples and some text to help with the sometimes confusing differences
visit a document from python.org's mail host written by "Danny Yoo"
or, if that's offline, use the unlisted pastebin I made of it's body.
in case they, in some 20 or so blue moons (blue moons are a real event), are both down, I'll quote the code examples
###
>>> my_name = "danny"
>>> your_name = "ian"
>>> my_name == your_name
0 #or False
###
###
>>> my_name[1:3] == your_name[1:3]
1 #or True
###
###
>>> my_name[1:3] is your_name[1:3]
0
###
Assertion Errors can easily arise with is keyword while comparing objects. For example, objects a and b might hold same value and share same memory address. Therefore, doing an
>>> a == b
is going to evaluate to
True
But if
>>> a is b
evaluates to
False
you should probably check
>>> type(a)
and
>>> type(b)
These might be different and a reason for failure.
Because string interning, this could look strange:
a = 'hello'
'hello' is a #True
b= 'hel-lo'
'hel-lo' is b #False

Python != operation vs "is not"

In a comment on this question, I saw a statement that recommended using
result is not None
vs
result != None
What is the difference? And why might one be recommended over the other?
== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)
is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can't influence the is operation.
You use is (and is not) for singletons, like None, where you don't care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.
First, let me go over a few terms. If you just want your question answered, scroll down to "Answering your question".
Definitions
Object identity: When you create an object, you can assign it to a variable. You can then also assign it to another variable. And another.
>>> button = Button()
>>> cancel = button
>>> close = button
>>> dismiss = button
>>> print(cancel is close)
True
In this case, cancel, close, and dismiss all refer to the same object in memory. You only created one Button object, and all three variables refer to this one object. We say that cancel, close, and dismiss all refer to identical objects; that is, they refer to one single object.
Object equality: When you compare two objects, you usually don't care that it refers to the exact same object in memory. With object equality, you can define your own rules for how two objects compare. When you write if a == b:, you are essentially saying if a.__eq__(b):. This lets you define a __eq__ method on a so that you can use your own comparison logic.
Rationale for equality comparisons
Rationale: Two objects have the exact same data, but are not identical. (They are not the same object in memory.)
Example: Strings
>>> greeting = "It's a beautiful day in the neighbourhood."
>>> a = unicode(greeting)
>>> b = unicode(greeting)
>>> a is b
False
>>> a == b
True
Note: I use unicode strings here because Python is smart enough to reuse regular strings without creating new ones in memory.
Here, I have two unicode strings, a and b. They have the exact same content, but they are not the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the unicode object has implemented the __eq__ method.
class unicode(object):
# ...
def __eq__(self, other):
if len(self) != len(other):
return False
for i, j in zip(self, other):
if i != j:
return False
return True
Note: __eq__ on unicode is definitely implemented more efficiently than this.
Rationale: Two objects have different data, but are considered the same object if some key data is the same.
Example: Most types of model data
>>> import datetime
>>> a = Monitor()
>>> a.make = "Dell"
>>> a.model = "E770s"
>>> a.owner = "Bob Jones"
>>> a.warranty_expiration = datetime.date(2030, 12, 31)
>>> b = Monitor()
>>> b.make = "Dell"
>>> b.model = "E770s"
>>> b.owner = "Sam Johnson"
>>> b.warranty_expiration = datetime.date(2005, 8, 22)
>>> a is b
False
>>> a == b
True
Here, I have two Dell monitors, a and b. They have the same make and model. However, they neither have the same data nor are the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the Monitor object implemented the __eq__ method.
class Monitor(object):
# ...
def __eq__(self, other):
return self.make == other.make and self.model == other.model
Answering your question
When comparing to None, always use is not. None is a singleton in Python - there is only ever one instance of it in memory.
By comparing identity, this can be performed very quickly. Python checks whether the object you're referring to has the same memory address as the global None object - a very, very fast comparison of two numbers.
By comparing equality, Python has to look up whether your object has an __eq__ method. If it does not, it examines each superclass looking for an __eq__ method. If it finds one, Python calls it. This is especially bad if the __eq__ method is slow and doesn't immediately return when it notices that the other object is None.
Did you not implement __eq__? Then Python will probably find the __eq__ method on object and use that instead - which just checks for object identity anyway.
When comparing most other things in Python, you will be using !=.
Consider the following:
class Bad(object):
def __eq__(self, other):
return True
c = Bad()
c is None # False, equivalent to id(c) == id(None)
c == None # True, equivalent to c.__eq__(None)
None is a singleton, and therefore identity comparison will always work, whereas an object can fake the equality comparison via .__eq__().
>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True
Some objects are singletons, and thus is with them is equivalent to ==. Most are not.

Categories

Resources