understanding python id() uniqueness - python

Python documentation for id() function states the following:
This is an integer which is guaranteed to be unique and constant for
this object during its lifetime. Two objects with non-overlapping
lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Although, the snippet below shows that id's are repeated. Since I didn't explicitly del the objects, I presume they are all alive and unique (I do not know what non-overlapping means).
>>> g = [0, 1, 0]
>>> for h in g:
... print(h, id(h))
...
0 10915712
1 10915744
0 10915712
>>> a=0
>>> b=1
>>> c=0
>>> d=[a, b,c]
>>> for e in d:
... print(e, id(e))
...
0 10915712
1 10915744
0 10915712
>>> id(a)
10915712
>>> id(b)
10915744
>>> id(c)
10915712
>>>
How can the id values for different objects be the same? Is it so because the value 0 (object of class int) is a constant and the interpreter/C compiler optimizes?
If I were to do a = c, then I understand c to have the same id as a since c would just be a reference to a (alias). I expected the objects a and c to have different id values otherwise, but, as shown above, they have the same values.
What's happening? Or am I looking at this the wrong way?
I would expect the id's for user-defined class' objects to ALWAYS be unique even if they have the exact same member values.
Could someone explain this behavior? (I looked at the other questions that ask uses of id(), but they steer in other directions)
EDIT (09/30/2019):
TO extend what I already wrote, I ran python interpreters in separate terminals and checked the id's for 0 on all of them, they were exactly the same (for the same interpreter); multiple instances of different interpreters had the same id for 0. Python2 vs Python3 had different values, but the same Python2 interpreter had same id values.
My question is because the id()'s documentation doesn't state any such optimizations, which seems misleading (I don't expect every quirk to be noted, but some note alongside the CPython note would be nice)...
EDIT 2 (09/30/2019):
The question is stemmed in understanding this behavior and knowing if there are any hooks to optimize user-define classes in a similar way (by modifying the __equals__ method to identify if two objects are same; perhaps the would point to the same address in memory i.e. same id? OR use some metaclass properties)

Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.
CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.
There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.
a is b or a == b # This is OK
If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().
Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?
The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.
>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False
Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.
Caching is not unique to builtin types. You can implement caching for your own objects.
>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True
Even user-defined classes can cache instances. For example, this class only makes one instance of itself.
>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True
Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.
The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.

Related

Python identity operators [duplicate]

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:
If the evaluation takes place in the function it returns True, if it is done outside it returns False.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.
tl;dr:
As the reference manual states:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block.
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000, so id(a) == id(b) will yield True.
In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).
Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).
Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.
Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.
For the function func:
Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:
>>> print(dis.code_info(func))
Name: func
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 2
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 1000
Variable names:
0: a
1: b
We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.
Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:
>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])
Which, of course, will evaluate to True because we're referring to the same object.
For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the compile built-in:
>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")
For each assignment statement, we will get a similar looking code object which looks like the following:
>>> print(dis.code_info(com1))
Name: <module>
Filename:
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: NOFREE
Constants:
0: 1000
1: None
Names:
0: a
The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:
>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False
Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:
/* snippet for brevity */
u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();
/* snippet for brevity */
During compilation this is checked for already existing constants. See #Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of True
It should be more clear now why exactly the following evaluates to True:
>>> a = 1000; b = 1000;
>>> a is b
In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.
Execution on a module level yields True again:
As previously mentioned, the reference manual states that:
... The following are blocks: a module ...
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
a = []; b = []
a is b # always evaluates to False
Again, in the documentation, this is specified:
after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.
At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your a = 1000 and b = 1000 are two separate statements, so no folding takes place.
FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)

Python: why assigning a literal to multiple variables in a suite results in their pointing to the same address? [duplicate]

When playing around with the Python interpreter, I stumbled upon this conflicting case regarding the is operator:
If the evaluation takes place in the function it returns True, if it is done outside it returns False.
>>> def func():
... a = 1000
... b = 1000
... return a is b
...
>>> a = 1000
>>> b = 1000
>>> a is b, func()
(False, True)
Since the is operator evaluates the id()'s for the objects involved, this means that a and b point to the same int instance when declared inside of function func but, on the contrary, they point to a different object when outside of it.
Why is this so?
Note: I am aware of the difference between identity (is) and equality (==) operations as described in Understanding Python's "is" operator. In addition, I'm also aware about the caching that is being performed by python for the integers in range [-5, 256] as described in "is" operator behaves unexpectedly with integers.
This isn't the case here since the numbers are outside that range and I do want to evaluate identity and not equality.
tl;dr:
As the reference manual states:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class definition.
Each command typed interactively is a block.
This is why, in the case of a function, you have a single code block which contains a single object for the numeric literal
1000, so id(a) == id(b) will yield True.
In the second case, you have two distinct code objects each with their own different object for the literal 1000 so id(a) != id(b).
Take note that this behavior doesn't manifest with int literals only, you'll get similar results with, for example, float literals (see here).
Of course, comparing objects (except for explicit is None tests ) should always be done with the equality operator == and not is.
Everything stated here applies to the most popular implementation of Python, CPython. Other implementations might differ so no assumptions should be made when using them.
Longer Answer:
To get a little clearer view and additionally verify this seemingly odd behaviour we can look directly in the code objects for each of these cases using the dis module.
For the function func:
Along with all other attributes, function objects also have a __code__ attribute that allows you to peek into the compiled bytecode for that function. Using dis.code_info we can get a nice pretty view of all stored attributes in a code object for a given function:
>>> print(dis.code_info(func))
Name: func
Filename: <stdin>
Argument count: 0
Kw-only arguments: 0
Number of locals: 2
Stack size: 2
Flags: OPTIMIZED, NEWLOCALS, NOFREE
Constants:
0: None
1: 1000
Variable names:
0: a
1: b
We're only interested in the Constants entry for function func. In it, we can see that we have two values, None (always present) and 1000. We only have a single int instance that represents the constant 1000. This is the value that a and b are going to be assigned to when the function is invoked.
Accessing this value is easy via func.__code__.co_consts[1] and so, another way to view our a is b evaluation in the function would be like so:
>>> id(func.__code__.co_consts[1]) == id(func.__code__.co_consts[1])
Which, of course, will evaluate to True because we're referring to the same object.
For each interactive command:
As noted previously, each interactive command is interpreted as a single code block: parsed, compiled and evaluated independently.
We can get the code objects for each command via the compile built-in:
>>> com1 = compile("a=1000", filename="", mode="single")
>>> com2 = compile("b=1000", filename="", mode="single")
For each assignment statement, we will get a similar looking code object which looks like the following:
>>> print(dis.code_info(com1))
Name: <module>
Filename:
Argument count: 0
Kw-only arguments: 0
Number of locals: 0
Stack size: 1
Flags: NOFREE
Constants:
0: 1000
1: None
Names:
0: a
The same command for com2 looks the same but has a fundamental difference: each of the code objects com1 and com2 have different int instances representing the literal 1000. This is why, in this case, when we do a is b via the co_consts argument, we actually get:
>>> id(com1.co_consts[0]) == id(com2.co_consts[0])
False
Which agrees with what we actually got.
Different code objects, different contents.
Note: I was somewhat curious as to how exactly this happens in the source code and after digging through it I believe I finally found it.
During compilations phase the co_consts attribute is represented by a dictionary object. In compile.c we can actually see the initialization:
/* snippet for brevity */
u->u_lineno = 0;
u->u_col_offset = 0;
u->u_lineno_set = 0;
u->u_consts = PyDict_New();
/* snippet for brevity */
During compilation this is checked for already existing constants. See #Raymond Hettinger's answer below for a bit more on this.
Caveats:
Chained statements will evaluate to an identity check of True
It should be more clear now why exactly the following evaluates to True:
>>> a = 1000; b = 1000;
>>> a is b
In this case, by chaining the two assignment commands together we tell the interpreter to compile these together. As in the case for the function object, only one object for the literal 1000 will be created resulting in a True value when evaluated.
Execution on a module level yields True again:
As previously mentioned, the reference manual states that:
... The following are blocks: a module ...
So the same premise applies: we will have a single code object (for the module) and so, as a result, single values stored for each different literal.
The same doesn't apply for mutable objects:
Meaning that unless we explicitly initialize to the same mutable object (for example with a = b = []), the identity of the objects will never be equal, for example:
a = []; b = []
a is b # always evaluates to False
Again, in the documentation, this is specified:
after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists.
At the interactive prompt, entry are compiled in a single mode which processes one complete statement at a time. The compiler itself (in Python/compile.c) tracks the constants in a dictionary called u_consts that maps the constant object to its index.
In the compiler_add_o() function, you see that before adding a new constant (and incrementing the index), the dict is checked to see whether the constant object and index already exist. If so, they are reused.
In short, that means that repeated constants in one statement (such as in your function definition) are folded into one singleton. In contrast, your a = 1000 and b = 1000 are two separate statements, so no folding takes place.
FWIW, this is all just a CPython implementation detail (i.e. not guaranteed by the language). This is why the references given here are to the C source code rather than the language specification which makes no guarantees on the subject.
Hope you enjoyed this insight into how CPython works under the hood :-)

Calculate a identifier for an object [duplicate]

This would be similar to the java.lang.Object.hashcode() method.
I need to store objects I have no control over in a set, and make sure that only if two objects are actually the same object (not contain the same values) will the values be overwritten.
id(x)
will do the trick for you. But I'm curious, what's wrong about the set of objects (which does combine objects by value)?
For your particular problem I would probably keep the set of ids or of wrapper objects. A wrapper object will contain one reference and compare by x==y <==> x.ref is y.ref.
It's also worth noting that Python objects have a hash function as well. This function is necessary to put an object into a set or dictionary. It is supposed to sometimes collide for different objects, though good implementations of hash try to make it less likely.
That's what "is" is for.
Instead of testing "if a == b", which tests for the same value,
test "if a is b", which will test for the same identifier.
As ilya n mentions, id(x) produces a unique identifier for an object.
But your question is confusing, since Java's hashCode method doesn't give a unique identifier. Java's hashCode works like most hash functions: it always returns the same value for the same object, two objects that are equal always get equal codes, and unequal hash values imply unequal hash codes. In particular, two different and unequal objects can get the same value.
This is confusing because cryptographic hash functions are quite different from this, and more like (though not exactly) the "unique id" that you asked for.
The Python equivalent of Java's hashCode method is hash(x).
You don't have to compare objects before placing them in a set. set() semantics already takes care of this.
class A(object):
a = 10
b = 20
def __hash__(self):
return hash((self.a, self.b))
a1 = A()
a2 = A()
a3 = A()
a4 = a1
s = set([a1,a2,a3,a4])
s
=> set([<__main__.A object at 0x222a8c>, <__main__.A object at 0x220684>, <__main__.A object at 0x22045c>])
Note: You really don't have to override hash to prove this behaviour :-)

What is an object reference in Python?

A introductory Python textbook defined 'object reference' as follows, but I didn't understand:
An object reference is nothing more than a concrete representation of the object’s identity (the memory address where the object is stored).
The textbook tried illustrating this by using an arrow to show an object reference as some sort of relation going from a variable a to an object 1234 in the assignment statement a = 1234.
From what I gathered off of Wikipedia, the (object) reference of a = 1234 would be an association between a and 1234 were a was "pointing" to 1234 (feel free to clarify "reference vs. pointer"), but it has been a bit difficult to verify as (1) I'm teaching myself Python, (2) many search results talk about references for Java, and (3) not many search results are about object references.
So, what is an object reference in Python? Thanks for the help!
Whatever is associated with a variable name has to be stored in the program's memory somewhere. An easy way to think of this, is that every byte of memory has an index-number. For simplicity's sake, lets imagine a simple computer, these index-numbers go from 0 (the first byte), upwards to however many bytes there are.
Say we have a sequence of 37 bytes, that a human might interpret as some words:
"The Owl and the Pussy-cat went to sea"
The computer is storing them in a contiguous block, starting at some index-position in memory. This index-position is most often called an "address". Obviously this address is absolutely just a number, the byte-number of the memory these letters are residing in.
#12000 The Owl and the Pussy-cat went to sea
So at address 12000 is a T, at 12001 an h, 12002 an e ... up to the last a at 12037.
I am labouring the point here because it's fundamental to every programming language. That 12000 is the "address" of this string. It's also a "reference" to it's location. For most intents and purposes an address is a pointer is a reference. Different languages have differing syntactic handling of these, but essentially they're the same thing - dealing with a block of data at a given number.
Python and Java try to hide this addressing as much as possible, where languages like C are quite happy to expose pointers for exactly what they are.
The take-away from this, is that an object reference is the number of where the data is stored in memory. (As is a pointer.)
Now, most programming languages distinguish between simple types: characters and numbers, and complex types: strings, lists and other compound-types. This is where the reference to an object makes a difference.
So when performing operations on simple types, they are independent, they each have their own memory for storage. Imagine the following sequence in python:
>>> a = 3
>>> b = a
>>> b
3
>>> b = 4
>>> b
4
>>> a
3 # <-- original has not changed
The variables a and b do not share the memory where their values are stored. But with a complex type:
>>> s = [ 1, 2, 3 ]
>>> t = s
>>> t
[1, 2, 3]
>>> t[1] = 8
>>> t
[1, 8, 3]
>>> s
[1, 8, 3] # <-- original HAS changed
We assigned t to be s, but obviously in this case t is s - they share the same memory. Wait, what! Here we have found out that both s and t are a reference to the same object - they simply share (point to) the same address in memory.
One place Python differs from other languages is that it considers strings as a simple type, and these are independent, so they behave like numbers:
>>> j = 'Pussycat'
>>> k = j
>>> k
'Pussycat'
>>> k = 'Owl'
>>> j
'Pussycat' # <-- Original has not changed
Whereas in C strings are definitely handled as complex types, and would behave like the Python list example.
The upshot of all this, is that when objects that are handled by reference are modified, all references-to this object "see" the change. So if the object is passed to a function that modifies it (i.e.: the content of memory holding the data is changed), the change is reflected outside that function too.
But if a simple type is changed, or passed to a function, it is copied to the function, so the changes are not seen in the original.
For example:
def fnA( my_list ):
my_list.append( 'A' )
a_list = [ 'B' ]
fnA( a_list )
print( str( a_list ) )
['B', 'A'] # <-- a_list was changed inside the function
But:
def fnB( number ):
number += 1
x = 3
fnB( x )
print( x )
3 # <-- x was NOT changed inside the function
So keeping in mind that the memory of "objects" that are used by reference is shared by all copies, and memory of simple types is not, it's fairly obvious that the two types operate differently.
Objects are things. Generally, they're what you see on the right hand side of an equation.
Variable names (often just called "names") are references to the actual object. When a name is on the right hand side of an equation1, the object that it references is automatically looked up and used in the equation. The result of the expression on the right hand side is an object. The name on the left hand side of the equation becomes a reference to this (possibly new) object.
Note, you can have object references that aren't explicit names if you are working with container objects (like lists or dictionaries):
a = [] # the name a is a reference to a list.
a.append(12345) # the container list holds a reference to an integer object
In a similar way, multiple names can refer to the same object:
a = []
b = a
We can demonstrate that they are the same object by looking at the id of a and b and noting that they are the same. Or, we can look at the "side-effects" of mutating the object referenced by a or b (if we mutate one, we mutate both because they reference the same object).
a.append(1)
print a, b # look mom, both are [1]!
1More accurately, when a name is used in an expression
In python, strictly speaking, the language has only naming references to the objects, that behave as labels. The assignment operator only binds to the name. The objects will stay in the memory until they are garbage collected
Ok, first things first.
Remember, there are two types of objects in python.
Mutable : Whose values can be changed. Eg: dictionaries, lists and user defined objects(unless defined immutable)
Immutable : Whose values can't be changed. Eg: tuples, numbers, booleans and strings.
Now, when python says PASS BY OBJECT REFERENECE, just remember that
If the underlying object is mutable, then any modifications done will persist.
and,
If the underlying object is immutable, then any modifications done will not persist.
If you still want examples for clarity, scroll down or click here .

Python: Identical strings (or numbers) with unique ids?

Python is wonderfully optimized, but I have a case where I'd like to work around it. It seems for small numbers and strings, python will automatically collapse multiple objects into one. For example:
>>> a = 1
>>> b = 1
>>> id(a) == id(b)
True
>>> a = str(a)
>>> b = str(b)
>>> id(a) == id(b)
True
>>> a += 'foobar'
>>> b += 'foobar'
>>> id(a) == id(b)
False
>>> a = a[:-6]
>>> b = b[:-6]
>>> id(a) == id(b)
True
I have a case where I'm comparing objects based on their Python ids. This is working really well except for the few cases where I run into small numbers. Does anyone know how to turn off this optimization for specific strings and integers? Something akin to an anti-intern()?
You shouldn't be relying on these objects to be different objects at all. There's no way to turn this behavior off without modifying and recompiling Python, and which particular objects it applies to is subject to change without notice.
You can't turn it off without re-compiling your own version of CPython.
But if you want to have "separate" versions of the same small integers, you can do that by maintaining your own id (for example a uuid4) associated with the object.
Since ints and strings are immutable, there's no obvious reason to do this - if you can't modify the object at all, you shouldn't care whether you have the "original" or a copy because there is no use-case where it can make any difference.
Related: How to create the int 1 at two different memory locations?
Sure, it can be done, but its never really a good idea:
#
Z =1
class MyString(string):
def __init__(self, *args):
global Z
super(MyString,
self).__init__(*args)
self.i = Z
Z += 1
>>> a = MyString("1")
>>> b = MyString("1")
>>> a is b
False
btw, to compare if objects have the same id just use a is b instead of id(a)==id(b)
The Python documentation on id() says
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So it's guaranteed to be unique, it must be intended as a way to tell if two variables are bound to the same object.
In a comment on StackOverflow here, Alex Martelli says the CPython implementation is not the authoritative Python, and other correct implementations of Python can and do behave differently in some ways - and that the Python Language Reference (PLR) is the closest thing Python has to a definitive specification.
In the PLR section on objects it says much the same:
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address).
The language reference doesn't say it's guaranteed to be unique. It also says (re: the object's lifetime):
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
and:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
This isn't actually an answer, I was hoping this would end up somewhere conclusive. But I don't want to delete it now I've quoted and cited.
I'll go with turning your premise around: python will automatically collapse multiple objects into one. - no it willn't, they were never multiple objects, they can't be, because they have the same id().
If id() is Python's definitive answer on whether two objects are the same or different, your premise is incorrect - this isn't an optimization, it's a fundamental part of Python's view on the world.
This version accounts for wim's concerns about more aggressive internment in the future. It will use more memory, which is why I discarded it originally, but probably is more future proof.
>>> class Wrapper(object):
... def __init__(self, obj):
... self.obj = obj
>>> a = 1
>>> b = 1
>>> aWrapped = Wrapper(a)
>>> bWrapped = Wrapper(b)
>>> aWrapped is bWrapped
False
>>> aUnWrapped = aWrapped.obj
>>> bUnwrapped = bWrapped.obj
>>> aUnWrapped is bUnwrapped
True
Or a version that works like the pickle answer (wrap + pickle = wrapple):
class Wrapple(object):
def __init__(self, obj):
self.obj = obj
#staticmethod
def dumps(obj):
return Wrapple(obj)
def loads(self):
return self.obj
aWrapped = Wrapple.dumps(a)
aUnWrapped = Wrapple.loads(a)
Well, seeing as no one posted a response that was useful, I'll just let you know what I ended up doing.
First, some friendly advice to someone who might read this one day. This is not recommended for normal use, so if you're contemplating it, ask yourself if you have a really good reason. There are good reason, but they are rare, and if someone says there aren't, they just aren't thinking hard enough.
In the end, I just used pickle.dumps() on all the objects and passed the output in instead of the real object. On the other side I checked the id and then used pickle.loads() to restore the object. The nice part of this solution was it works for all types including None and Booleans.
>>> a = 1
>>> b = 1
>>> a is b
True
>>> aPickled = pickle.dumps(a)
>>> bPickled = pickle.dumps(b)
>>> aPickled is bPickled
False
>>> aUnPickled = pickle.loads(aPickled)
>>> bUnPickled = pickle.loads(bPickled)
>>> aUnPickled is bUnPickled
True
>>> aUnPickled
1

Categories

Resources