How is the 'is' keyword implemented in Python? - python

... the is keyword that can be used for equality in strings.
>>> s = 'str'
>>> s is 'str'
True
>>> s is 'st'
False
I tried both __is__() and __eq__() but they didn't work.
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __is__(self, s):
... return self.s == s
...
>>>
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work
False
>>>
>>> class MyString:
... def __init__(self):
... self.s = 'string'
... def __eq__(self, s):
... return self.s == s
...
>>>
>>> m = MyString()
>>> m is 'ss'
False
>>> m is 'string' # <--- Expected to work, but again failed
False
>>>

Testing strings with is only works when the strings are interned. Unless you really know what you're doing and explicitly interned the strings you should never use is on strings.
is tests for identity, not equality. That means Python simply compares the memory address a object resides in. is basically answers the question "Do I have two names for the same object?" - overloading that would make no sense.
For example, ("a" * 100) is ("a" * 100) is False. Usually Python writes each string into a different memory location, interning mostly happens for string literals.

The is operator is equivalent to comparing id(x) values. For example:
>>> s1 = 'str'
>>> s2 = 'str'
>>> s1 is s2
True
>>> id(s1)
4564468760
>>> id(s2)
4564468760
>>> id(s1) == id(s2) # equivalent to `s1 is s2`
True
id is currently implemented to use pointers as the comparison. So you can't overload is itself, and AFAIK you can't overload id either.
So, you can't. Unusual in python, but there it is.

The Python is keyword tests object identity. You should NOT use it to test for string equality. It may seem to work frequently because Python implementations, like those of many very high level languages, performs "interning" of strings. That is to say that string literals and values are internally kept in a hashed list and those which are identical are rendered as references to the same object. (This is possible because Python strings are immutable).
However, as with any implementation detail, you should not rely on this. If you want to test for equality use the == operator. If you truly want to test for object identity then use is --- and I'd be hard-pressed to come up with a case where you should care about string object identity. Unfortunately you can't count on whether two strings are somehow "intentionally" identical object references because of the aforementioned interning.

The is keyword compares objects (or, rather, compares if two references are to the same object).
Which is, I think, why there's no mechanism to provide your own implementation.
It happens to work sometimes on strings because Python stores strings 'cleverly', such that when you create two identical strings they are stored in one object.
>>> a = "string"
>>> b = "string"
>>> a is b
True
>>> c = "str"+"ing"
>>> a is c
True
You can hopefully see the reference vs data comparison in a simple 'copy' example:
>>> a = {"a":1}
>>> b = a
>>> c = a.copy()
>>> a is b
True
>>> a is c
False

If you are not afraid of messing up with bytecode, you can intercept and patch COMPARE_OP with 8 ("is") argument to call your hook function on objects being compared. Look at dis module documentation for start-in.
And don't forget to intercept __builtin__.id() too if someone will do id(a) == id(b) instead of a is b.

'is' compares object identity whereas == compares values.
Example:
a=[1,2]
b=[1,2]
#a==b returns True
#a is b returns False
p=q=[1,2]
#p==q returns True
#p is q returns True

is fails to compare a string variable to string value and two string variables when the string starts with '-'. My Python version is 2.6.6
>>> s = '-hi'
>>> s is '-hi'
False
>>> s = '-hi'
>>> k = '-hi'
>>> s is k
False
>>> '-hi' is '-hi'
True

You can't overload the is operator. What you want to overload is the == operator. This can be done by defining a __eq__ method in the class.

You are using identity comparison. == is probably what you want. The exception to this is when you want to be checking if one item and another are the EXACT same object and in the same memory position. In your examples, the item's aren't the same, since one is of a different type (my_string) than the other (string). Also, there's no such thing as someclass.__is__ in python (unless, of course, you put it there yourself). If there was, comparing objects with is wouldn't be reliable to simply compare the memory locations.
When I first encountered the is keyword, it confused me as well. I would have thought that is and == were no different. They produced the same output from the interpreter on many objects. This type of assumption is actually EXACTLY what is... is for. It's the python equivalent "Hey, don't mistake these two objects. they're different.", which is essentially what [whoever it was that straightened me out] said. Worded much differently, but one point == the other point.
the
for some helpful examples and some text to help with the sometimes confusing differences
visit a document from python.org's mail host written by "Danny Yoo"
or, if that's offline, use the unlisted pastebin I made of it's body.
in case they, in some 20 or so blue moons (blue moons are a real event), are both down, I'll quote the code examples
###
>>> my_name = "danny"
>>> your_name = "ian"
>>> my_name == your_name
0 #or False
###
###
>>> my_name[1:3] == your_name[1:3]
1 #or True
###
###
>>> my_name[1:3] is your_name[1:3]
0
###

Assertion Errors can easily arise with is keyword while comparing objects. For example, objects a and b might hold same value and share same memory address. Therefore, doing an
>>> a == b
is going to evaluate to
True
But if
>>> a is b
evaluates to
False
you should probably check
>>> type(a)
and
>>> type(b)
These might be different and a reason for failure.

Because string interning, this could look strange:
a = 'hello'
'hello' is a #True
b= 'hel-lo'
'hel-lo' is b #False

Related

Why use isinstance() instead of type()? [duplicate]

This question already has answers here:
What are the differences between type() and isinstance()?
(8 answers)
Closed 7 years ago.
I've seen people often recommend using isinstance() instead of type(), which I find odd because type() seems more readable to me.
>>> isinstance("String",str)
True
>>> type("String") is str
True
The latter seems more pythonic and clear while the former is slightly obfuscated and implies a specific use to me (for example, testing user built classes rather than built in types) but is there some other benefit that I'm unaware of? Does type() cause some erratic behaviour or miss some comparisons?
Let me give you a simple example why they can be different:
>>> class A(object): pass
>>> class B(A) : pass
>>> a = A()
>>> b = B()
So far, declared a variable A and a variable B derived from A.
>>> isinstance(a, A)
True
>>> type(a) == A
True
BUT
>>> isinstance(b, A)
True
>>> type(b) == A
False
You are precluding the possibility to use subclasses here.
Why limit yourself to just the one type when a subclass would satisfy the interface? If someone wants to use class mystr(str): ... for something, your code would still work and doesn't need to know that a subclass was used.
As such, using isinstance() is the more pythonic approach; your code should look for supported behaviours, not specific types.
Basic example for Python 2.x: Standard strings and unicode strings, with common base class basestring:
s = "123"
u = u"123"
isinstance(s, str) is True
type(u) == str # something which has string nature returns False for this test
# these tests are common-pattern in Python 2.x and behaves correctly
isinstance(s, basestring) is True
isinstance(u, basestring) is True
Here is the counter-example:
class A():
pass
a = A()
>>> isinstance(a, A)
True
>>> type(a) is A
False
Thus I consider isinstance as a more generic one.

Python object references

I'm aware that in python every identifier or variable name is a reference to the actual object.
a = "hello"
b = "hello"
When I compare the two strings
a == b
the output is
True
If I write an equivalent code in Java,the output would be false because the comparison is between references(which are different) but not the actual objects.
So what i see here is that the references(variable names) are replaced by actual objects by the interpreter at run time.
So,is is safe for me to assume that "Every time the interpreter sees an already assigned variable name,it replaces it with the object it is referring to" ? I googled it but couldn't find any appropriate answer I was looking for.
If you actually ran that in Java, I think you'd find it probably prints out true because of string interning, but that's somewhat irrelevant.
I'm not sure what you mean by "replaces it with the object it is referring to". What actually happens is that when you write a == b, Python calls a.__eq__(b), which is just like any other method call on a with b as an argument.
If you want an equivalent to Java-like ==, use the is operator: a is b. That compares whether the name a refers to the same object as b, regardless of whether they compare as equal.
Python interning:
>>> a = "hello"
>>> b = "hello"
>>> c = "world"
>>> id(a)
4299882336
>>> id(b)
4299882336
>>> id(c)
4299882384
Short strings tend to get interned automatically, explaining why a is b == True. See here for more.
To show that equal strings don't always have the same id
>>> a = "hello"+" world"
>>> b = "hello world"
>>> c = a
>>> a == b
True
>>> a is b
False
>>> b is c
False
>>> a is c
True
also:
>>> str([]) == str("[]")
True
>>> str([]) is str("[]")
False

What are the semantics of the 'is' operator in Python?

How does the is operator determine if two objects are the same? How does it work? I can't find it documented.
From the documentation:
Every object has an identity, a type
and a value. An object’s identity
never changes once it has been
created; you may think of it as the
object’s address in memory. The ‘is‘
operator compares the identity of two
objects; the id() function returns an
integer representing its identity
(currently implemented as its
address).
This would seem to indicate that it compares the memory addresses of the arguments, though the fact that it says "you may think of it as the object's address in memory" might indicate that the particular implementation is not guranteed; only the semantics are.
Comparison Operators
Is works by comparing the object referenced to see if the operands point to the same object.
>>> a = [1, 2]
>>> b = a
>>> a is b
True
>>> c = [1, 2]
>>> a is c
False
c is not the same list as a therefore the is relation is false.
To add to the other answers, you can think of a is b working as if it was is_(a, b):
def is_(a, b):
return id(a) == id(b)
Note that you cannot directly replace a is b with id(a) == id(b), but the above function avoids that through parameters.

Python != operation vs "is not"

In a comment on this question, I saw a statement that recommended using
result is not None
vs
result != None
What is the difference? And why might one be recommended over the other?
== is an equality test. It checks whether the right hand side and the left hand side are equal objects (according to their __eq__ or __cmp__ methods.)
is is an identity test. It checks whether the right hand side and the left hand side are the very same object. No methodcalls are done, objects can't influence the is operation.
You use is (and is not) for singletons, like None, where you don't care about objects that might want to pretend to be None or where you want to protect against objects breaking when being compared against None.
First, let me go over a few terms. If you just want your question answered, scroll down to "Answering your question".
Definitions
Object identity: When you create an object, you can assign it to a variable. You can then also assign it to another variable. And another.
>>> button = Button()
>>> cancel = button
>>> close = button
>>> dismiss = button
>>> print(cancel is close)
True
In this case, cancel, close, and dismiss all refer to the same object in memory. You only created one Button object, and all three variables refer to this one object. We say that cancel, close, and dismiss all refer to identical objects; that is, they refer to one single object.
Object equality: When you compare two objects, you usually don't care that it refers to the exact same object in memory. With object equality, you can define your own rules for how two objects compare. When you write if a == b:, you are essentially saying if a.__eq__(b):. This lets you define a __eq__ method on a so that you can use your own comparison logic.
Rationale for equality comparisons
Rationale: Two objects have the exact same data, but are not identical. (They are not the same object in memory.)
Example: Strings
>>> greeting = "It's a beautiful day in the neighbourhood."
>>> a = unicode(greeting)
>>> b = unicode(greeting)
>>> a is b
False
>>> a == b
True
Note: I use unicode strings here because Python is smart enough to reuse regular strings without creating new ones in memory.
Here, I have two unicode strings, a and b. They have the exact same content, but they are not the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the unicode object has implemented the __eq__ method.
class unicode(object):
# ...
def __eq__(self, other):
if len(self) != len(other):
return False
for i, j in zip(self, other):
if i != j:
return False
return True
Note: __eq__ on unicode is definitely implemented more efficiently than this.
Rationale: Two objects have different data, but are considered the same object if some key data is the same.
Example: Most types of model data
>>> import datetime
>>> a = Monitor()
>>> a.make = "Dell"
>>> a.model = "E770s"
>>> a.owner = "Bob Jones"
>>> a.warranty_expiration = datetime.date(2030, 12, 31)
>>> b = Monitor()
>>> b.make = "Dell"
>>> b.model = "E770s"
>>> b.owner = "Sam Johnson"
>>> b.warranty_expiration = datetime.date(2005, 8, 22)
>>> a is b
False
>>> a == b
True
Here, I have two Dell monitors, a and b. They have the same make and model. However, they neither have the same data nor are the same object in memory. However, when we compare them, we want them to compare equal. What's happening here is that the Monitor object implemented the __eq__ method.
class Monitor(object):
# ...
def __eq__(self, other):
return self.make == other.make and self.model == other.model
Answering your question
When comparing to None, always use is not. None is a singleton in Python - there is only ever one instance of it in memory.
By comparing identity, this can be performed very quickly. Python checks whether the object you're referring to has the same memory address as the global None object - a very, very fast comparison of two numbers.
By comparing equality, Python has to look up whether your object has an __eq__ method. If it does not, it examines each superclass looking for an __eq__ method. If it finds one, Python calls it. This is especially bad if the __eq__ method is slow and doesn't immediately return when it notices that the other object is None.
Did you not implement __eq__? Then Python will probably find the __eq__ method on object and use that instead - which just checks for object identity anyway.
When comparing most other things in Python, you will be using !=.
Consider the following:
class Bad(object):
def __eq__(self, other):
return True
c = Bad()
c is None # False, equivalent to id(c) == id(None)
c == None # True, equivalent to c.__eq__(None)
None is a singleton, and therefore identity comparison will always work, whereas an object can fake the equality comparison via .__eq__().
>>> () is ()
True
>>> 1 is 1
True
>>> (1,) == (1,)
True
>>> (1,) is (1,)
False
>>> a = (1,)
>>> b = a
>>> a is b
True
Some objects are singletons, and thus is with them is equivalent to ==. Most are not.

Is there any difference between "foo is None" and "foo == None"?

Is there any difference between:
if foo is None: pass
and
if foo == None: pass
The convention that I've seen in most Python code (and the code I myself write) is the former, but I recently came across code which uses the latter. None is an instance (and the only instance, IIRC) of NoneType, so it shouldn't matter, right? Are there any circumstances in which it might?
is always returns True if it compares the same object instance
Whereas == is ultimately determined by the __eq__() method
i.e.
>>> class Foo(object):
def __eq__(self, other):
return True
>>> f = Foo()
>>> f == None
True
>>> f is None
False
You may want to read this object identity and equivalence.
The statement 'is' is used for object identity, it checks if objects refer to the same instance (same address in memory).
And the '==' statement refers to equality (same value).
A word of caution:
if foo:
# do something
Is not exactly the same as:
if x is not None:
# do something
The former is a boolean value test and can evaluate to false in different contexts. There are a number of things that represent false in a boolean value tests for example empty containers, boolean values. None also evaluates to false in this situation but other things do too.
(ob1 is ob2) equal to (id(ob1) == id(ob2))
The reason foo is None is the preferred way is that you might be handling an object that defines its own __eq__, and that defines the object to be equal to None. So, always use foo is None if you need to see if it is infact None.
There is no difference because objects which are identical will of course be equal. However, PEP 8 clearly states you should use is:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
is tests for identity, not equality. For your statement foo is none, Python simply compares the memory address of objects. It means you are asking the question "Do I have two names for the same object?"
== on the other hand tests for equality as determined by the __eq__() method. It doesn't cares about identity.
In [102]: x, y, z = 2, 2, 2.0
In [103]: id(x), id(y), id(z)
Out[103]: (38641984, 38641984, 48420880)
In [104]: x is y
Out[104]: True
In [105]: x == y
Out[105]: True
In [106]: x is z
Out[106]: False
In [107]: x == z
Out[107]: True
None is a singleton operator. So None is None is always true.
In [101]: None is None
Out[101]: True
For None there shouldn't be a difference between equality (==) and identity (is). The NoneType probably returns identity for equality. Since None is the only instance you can make of NoneType (I think this is true), the two operations are the same. In the case of other types this is not always the case. For example:
list1 = [1, 2, 3]
list2 = [1, 2, 3]
if list1==list2: print "Equal"
if list1 is list2: print "Same"
This would print "Equal" since lists have a comparison operation that is not the default returning of identity.
#Jason:
I recommend using something more along the lines of
if foo:
#foo isn't None
else:
#foo is None
I don't like using "if foo:" unless foo truly represents a boolean value (i.e. 0 or 1). If foo is a string or an object or something else, "if foo:" may work, but it looks like a lazy shortcut to me. If you're checking to see if x is None, say "if x is None:".
Some more details:
The is clause actually checks if the two objects are at the same
memory location or not. i.e whether they both point to the same
memory location and have the same id.
As a consequence of 1, is ensures whether, or not, the two lexically represented objects have identical attributes (attributes-of-attributes...) or not
Instantiation of primitive types like bool, int, string(with some exception), NoneType having a same value will always be in the same memory location.
E.g.
>>> int(1) is int(1)
True
>>> str("abcd") is str("abcd")
True
>>> bool(1) is bool(2)
True
>>> bool(0) is bool(0)
True
>>> bool(0)
False
>>> bool(1)
True
And since NoneType can only have one instance of itself in the python's "look-up" table therefore the former and the latter are more of a programming style of the developer who wrote the code(maybe for consistency) rather then having any subtle logical reason to choose one over the other.
John Machin's conclusion that None is a singleton is a conclusion bolstered by this code.
>>> x = None
>>> y = None
>>> x == y
True
>>> x is y
True
>>>
Since None is a singleton, x == None and x is None would have the same result. However, in my aesthetical opinion, x == None is best.
a is b # returns true if they a and b are true alias
a == b # returns true if they are true alias or they have values that are deemed equivalence
a = [1,3,4]
b = a[:] #creating copy of list
a is b # if gives false
False
a == b # gives true
True

Categories

Resources