Python , variable store in memory - python

a=[1234,1234] #list
a
[1234, 1234]
id(a[0])
38032480
id(a[1])
38032480
b=1234 #b is a variable of integer type
id(b)
38032384
Why id(b) is not same as id(a[0]) and id(a[1]) in python ?

When the CPython REPL executes a line, it will:
parse, and compile it to a code object of bytecode, and then
execute the bytecode.
The compilation result can be checked through the dis module:
>>> dis.dis('a = [1234, 1234, 5678, 90123, 5678, 4321]')
1 0 LOAD_CONST 0 (1234)
2 LOAD_CONST 0 (1234)
4 LOAD_CONST 1 (5678)
6 LOAD_CONST 2 (90123)
8 LOAD_CONST 1 (5678)
10 LOAD_CONST 3 (4321)
12 BUILD_LIST 6
14 STORE_NAME 0 (a)
16 LOAD_CONST 4 (None)
18 RETURN_VALUE
Note that all 1234s are loaded with "LOAD_CONST 0", and all 5678s are are loaded with "LOAD_CONST 1". These refer to the constant table associated with the code object. Here, the table is (1234, 5678, 90123, 4321, None).
The compiler knows that all the copies of 1234 in the code object are the same, so will only allocate one object to all of them.
Therefore, as OP observed, a[0] and a[1] do indeed refer to the same object: the same constant from the constant table of the code object of that line of code.
When you execute b = 1234, this will again be compiled and executed, independent of the previous line, so a different object will be allocated.
(You may read http://akaptur.com/blog/categories/python-internals/ for a brief introduction for how code objects are interpreted)
Outside of the REPL, when you execute a *.py file, each function is compiled into separate code objects, so when we run:
a = [1234, 1234]
b = 1234
print(id(a[0]), id(a[1]))
print(id(b))
a = (lambda: [1234, 1234])()
b = (lambda: 1234)()
print(id(a[0]), id(a[1]))
print(id(b))
We may see something like:
4415536880 4415536880
4415536880
4415536912 4415536912
4415537104
The first three numbers share the same address 4415536880, and they belong to the constants of the "__main__" code object
Then a[0] and a[1] have addresses 4415536912 of the first lambda.
The b has address 4415537104 of the second lambda.
Also note that this result is valid for CPython only. Other implementations have different strategies on allocating constants. For instance, running the above code in PyPy gives:
19745 19745
19745
19745 19745
19745

There is no rule or guarantee stating that the id(a[0]) should be equal to the id(a[1]), so the question itself is moot. The question you should be asking is why id(a[0]) and id(a[1]) are in fact the same.
If you do a.append(1234) followed by id(a[2]) you may or may not get the same id. As #hiro protagonist has pointed out, these are just internal optimizations that you shouldn't depend upon.

A Python list is very much unlike a C array.
A C array is just a block of contiguous memory, so the address of its first (0-th) element is the address of the array itself, by definition. Array access in C is just pointer arithmetic, and the [] notation is just a thin crust of syntactic sugar over that pointer arithmetic. An expression int x[] is just another form of int * x.
For the sake of the example, let's assume that in in Python, id(x) is a "memory address of X", as *x would be in C. (This is not true for all Python implementations, and not even guaranteed in CPython. It's just an unique number.)
In C, an int is just an architecture-dependent number of bytes, so for int x = 1 the expression *x points to these bytes.
Everything in Python is an object, including numbers. This is why id(1) refers to an object of type int describing number 1. You can call its methods: (1).__str__() will return a string '1'.
So, when you have x = [1, 2, 3], id(x) is a "pointer" to a list object with three elements. The list object itself is pretty complex. But x[0] is not the bytes that comprise the integer value 1; it's internally a reference to an int object for number 1. Thus id(x[0]) is a "pointer" to that object.
In C terms, the elements of the array could be seen as pointers to the objects stored in it, not the objects themselves.
Since there's no point to have two objects representing the same number 1, id(1) is always the same during a Python interpreter run. An illustration:
x = [1, 2, 3]
y = [1, 100, 1000]
assert id(x) != id(y) # obviously
assert id(x[0]) == id(y[0]) == id(1) # yes, the same int object
CPython actually preallocates objects for a few most-used small numbers (see comments here). For larger numbers, it's not so, which can lead to two 'copies' of a larger number having different id() values.

You must note that: id() actually gives id of the value of variables or literals. For every literal/value that is used in your program (even when within the id() itself), id() returns (attempts to return) an unique identifier for the literal/variable within the program life-cycle. This can be used by:
User: to check if two objects/variables are the same as in: a is b
Python: to optimise memory i.e. avoid unwanted duplications of same stuff in memory
As for your case, it isn't even guaranteed that a[0] and a[1] will give the same id though the value of both can be the same. It depends on the order/chronology of creation of literals/variables in the python program lifecycle and internally handled by python.
Case 1:
Type "help", "copyright", "credits" or "license" for more information.
>>> a=[1234,1234]
>>> id(a[0])
52687424
>>> id(a[1])
52687424
Case 2 (Note that at the end of case , a[0] and a[1] have same value but different ids):
Type "help", "copyright", "credits" or "license" for more information.
>>> a=[1,1234]
>>> id(1)
1776174736
>>> id(1234)
14611088
>>> id(a[0])
1776174736
>>> id(a[1])
14611008
>>> a[0]=1234
>>> id(1234)
14611104
>>> id(a[0])
14611152
>>> id(a[1])
14611008
>>>

Related

For CPython implementation, id(x) is the memory address where x is stored? [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

Multiple assignment and identity for integers in python [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

why is python's 'is not' inconsistent? [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

Why is the memory location of two variables is same for the different integer variable's declared separately in Python? [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

Weird thing in python3 about shared reference [duplicate]

Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
Take a look at this:
>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False
Here's what I found in the documentation for "Plain Integer Objects":
The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
Python's “is” operator behaves unexpectedly with integers?
In summary - let me emphasize: Do not use is to compare integers.
This isn't behavior you should have any expectations about.
Instead, use == and != to compare for equality and inequality, respectively. For example:
>>> a = 1000
>>> a == 1000 # Test integers like this,
True
>>> a != 5000 # or this!
True
>>> a is 1000 # Don't do this! - Don't use `is` to test integers!!
False
Explanation
To know this, you need to know the following.
First, what does is do? It is a comparison operator. From the documentation:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
And so the following are equivalent.
>>> a is b
>>> id(a) == id(b)
From the documentation:
id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.
So what is the use-case for is? PEP8 describes:
Comparisons to singletons like None should always be done with is or
is not, never the equality operators.
The Question
You ask, and state, the following question (with code):
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.
But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.
>>> 257 is 257
True # Yet the literal numbers compare properly
Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.
What is does
is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:
>>> a is b
is the same as
>>> id(a) == id(b)
Why would we want to use is then?
This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.
SENTINEL_SINGLETON = object() # this will only be created one time.
def foo(keyword_argument=None):
if keyword_argument is None:
print('no argument given to foo')
bar()
bar(keyword_argument)
bar('baz')
def bar(keyword_argument=SENTINEL_SINGLETON):
# SENTINEL_SINGLETON tells us if we were not passed anything
# as None is a legitimate potential argument we could get.
if keyword_argument is SENTINEL_SINGLETON:
print('no argument given to bar')
else:
print('argument to bar: {0}'.format(keyword_argument))
foo()
Which prints:
no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz
And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.
I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.
PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:
PyObject *
PyLong_FromLong(long ival)
{
// omitting declarations
CHECK_SMALL_INT(ival);
if (ival < 0) {
/* negate: cant write this as abs_ival = -ival since that
invokes undefined behaviour when ival is LONG_MIN */
abs_ival = 0U-(unsigned long)ival;
sign = -1;
}
else {
abs_ival = (unsigned long)ival;
}
/* Fast path for single-digit ints */
if (!(abs_ival >> PyLong_SHIFT)) {
v = _PyLong_New(1);
if (v) {
Py_SIZE(v) = sign;
v->ob_digit[0] = Py_SAFE_DOWNCAST(
abs_ival, unsigned long, digit);
}
return (PyObject*)v;
}
Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:
#define CHECK_SMALL_INT(ival) \
do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
return get_small_int((sdigit)ival); \
} while(0)
So it's a macro that calls function get_small_int if the value ival satisfies the condition:
if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)
So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
So our condition is if (-5 <= ival && ival < 257) call get_small_int.
Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:
/* Small integers are preallocated in this array so that they
can be shared.
The integers that are preallocated are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.
Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.
But, when are they allocated??
During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:
for (ival = -NSMALLNEGINTS; ival < NSMALLPOSINTS; ival++, v++) {
Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But, 257 is 257? What's up?
This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
>>> 257 is 257
During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:
>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)
When CPython does the operation, it's now just going to load the exact same object:
>>> import dis
>>> dis.dis(codeObj)
1 0 LOAD_CONST 0 (257) # dis
3 LOAD_CONST 0 (257) # dis again
6 COMPARE_OP 8 (is)
So is will return True.
It depends on whether you're looking to see if 2 things are equal, or the same object.
is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency
In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144
You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.
As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use == for that purpose.
I think your hypotheses is correct. Experiment with id (identity of object):
In [1]: id(255)
Out[1]: 146349024
In [2]: id(255)
Out[2]: 146349024
In [3]: id(257)
Out[3]: 146802752
In [4]: id(257)
Out[4]: 148993740
In [5]: a=255
In [6]: b=255
In [7]: c=257
In [8]: d=257
In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)
It appears that numbers <= 255 are treated as literals and anything above is treated differently!
There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)
Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).
PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:
class Unequal:
def __eq__(self, other):
return False
PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.
Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.
We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:
import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)
By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.
In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).
It also happens with strings:
>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
Now everything seems fine.
>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)
That's expected too.
>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)
>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)
Now that's unexpected.
What’s New In Python 3.8: Changes in Python behavior:
The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

Categories

Resources