why id(A()) == id(A()) is different to A() is A()?

why id(A()) == id(A()) is different to A() is A()? - python

I am very confused with the python code below:
>>> class A(): pass
...
>>> id(A()) == id(A())
True
>>> id(A())
19873304
>>> id(A())
19873304
>>> A() is A()
False
>>> a = A()
>>> b = A()
>>> id (a) == id (b)
False
>>> a is b
False
>>> id (a)
19873304
>>> id (b)
20333272
>>> def f():
... print id(A())
... print id(A())
...
>>> f()
20333312
20333312
I can tell myself clearly what python doing when creating objects.
Can anyone tell me more about what happend? Thanks!

When you say
print id(A()) == id(A())
you are creating an object of type A and passing it to id function. When the function returns there are no references to that object created for the parameter. So, the reference count becomes zero and it becomes ready for garbage collection.
When you do id(A()) in the same expression again, you are trying to create another object of the same type. Now, Python might try to reuse the same memory location which was used for the previous object created (if it is already garbage collected). Otherwise it will create it in a different location. So, id may or may not be the same.
If you take,
print A() is A()
We create an object of type A and we are trying to compare it against another object of type A. Now the first object is still referenced in this expression. So, it will not marked for garbage collection and so the references will be different always.
Suggestion: Never do anything like this in production code.
Quoting from the docs,
Due to automatic garbage-collection, free lists, and the dynamic
nature of descriptors, you may notice seemingly unusual behaviour in
certain uses of the is operator, like those involving comparisons
between instance methods, or constants. Check their documentation for
more info.

Two different objects can be at the same location in memory, if one of them is freed before the other is created.
That is to say -- if you allocate an object, take its id, and then have no more reference to it, it can then be freed, so another object can get the same id.
By contrast, if you retain a reference to the first object, any subsequent object allocated will necessarily have a different id.

A() creates a temporary variable and then clears it
so the next A() gets the same id (that was just garbage collected,although this behavior is probably not guaranteed))... thus when you print them they have the same id
id(A()) == id(A())
has to create two temporary variables each with a different id

Since id() is based on object pointer, it is only guaranteed to be unique if both objects are in memory. In some situations, the second instance of A probably reused the same spot in memory of the first (that had already been garbage-collected).

Related

How do I create a list of explicit references in Python? [duplicate]

Supose you have something like:
x = "something"
b = x
l = [b]
How can you delete the object only having one reference, say x?
del x won't do the trick; the object is still reachable from b, for example.

No no no. Python has a garbage collector that has very strong territory issues - it won't mess with you creating objects, you don't mess with it deleting objects.
Simply put, it can't be done, and for a good reason.
If, for instance, your need comes from cases of, say, caching algorithms that keep references, but should not prevent data from being garbage collected once no one is using it, you might want to take a look at weakref.

The only solution I see right now is that you should make sure that you are holding the only reference to x, everyone else must not get x itself but a weak reference pointing to x. Weak references are implemented in the weakref module and you can use it this way:
>>> import weakref
>>> class TestClass(object):
... def bark(self):
... print "woof!"
... def __del__(self):
... print "destructor called"
...
>>> x = TestClass()
>>> b = weakref.proxy(x)
>>> b
<weakproxy at 0x7fa44dbddd08; to TestClass at 0x7fa44f9093d0>
>>> b.bark()
woof!
>>> del x
destructor called
>>> b.bark()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ReferenceError: weakly-referenced object no longer exists
However, note that not all classes can be weak-referenced. In particular, most built-in types cannot. Some built-in types can be weak-referenced if you subclass them (like dict), but others cannot (like int).

You don't. That's the entire point. Imagine if l is in a library outside of your control. It has every right to expect that the collection elements don't dissappear.
Also, imagine if it was otherwise. You'd have questions here on SO "How do I prevent others from deleting my objects?". As a language designer, you can't satisfy both demands.

understanding python id() uniqueness

Python documentation for id() function states the following:
This is an integer which is guaranteed to be unique and constant for
this object during its lifetime. Two objects with non-overlapping
lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Although, the snippet below shows that id's are repeated. Since I didn't explicitly del the objects, I presume they are all alive and unique (I do not know what non-overlapping means).
>>> g = [0, 1, 0]
>>> for h in g:
... print(h, id(h))
...
0 10915712
1 10915744
0 10915712
>>> a=0
>>> b=1
>>> c=0
>>> d=[a, b,c]
>>> for e in d:
... print(e, id(e))
...
0 10915712
1 10915744
0 10915712
>>> id(a)
10915712
>>> id(b)
10915744
>>> id(c)
10915712
>>>
How can the id values for different objects be the same? Is it so because the value 0 (object of class int) is a constant and the interpreter/C compiler optimizes?
If I were to do a = c, then I understand c to have the same id as a since c would just be a reference to a (alias). I expected the objects a and c to have different id values otherwise, but, as shown above, they have the same values.
What's happening? Or am I looking at this the wrong way?
I would expect the id's for user-defined class' objects to ALWAYS be unique even if they have the exact same member values.
Could someone explain this behavior? (I looked at the other questions that ask uses of id(), but they steer in other directions)
EDIT (09/30/2019):
TO extend what I already wrote, I ran python interpreters in separate terminals and checked the id's for 0 on all of them, they were exactly the same (for the same interpreter); multiple instances of different interpreters had the same id for 0. Python2 vs Python3 had different values, but the same Python2 interpreter had same id values.
My question is because the id()'s documentation doesn't state any such optimizations, which seems misleading (I don't expect every quirk to be noted, but some note alongside the CPython note would be nice)...
EDIT 2 (09/30/2019):
The question is stemmed in understanding this behavior and knowing if there are any hooks to optimize user-define classes in a similar way (by modifying the __equals__ method to identify if two objects are same; perhaps the would point to the same address in memory i.e. same id? OR use some metaclass properties)

Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.
CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.
There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.
a is b or a == b # This is OK
If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().
Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?
The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.
>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False
Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.
Caching is not unique to builtin types. You can implement caching for your own objects.
>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True
Even user-defined classes can cache instances. For example, this class only makes one instance of itself.
>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True
Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.
The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.

Is there a performance advantage to using clear method of a dictonary or set?

I just discovered that in Python dictionaries and sets both have a clear method. The method literally just removes all the entries from the object.
Is there a good reason, or even situation, where it makes sense to call foo.clear() rather than foo = {}, or foo = set()?
I can imagine it might work more efficiently for garbage collection, but it seems to violate "There should be one-- and preferably only one --obvious way to do it."

When you have multiple references to the same set object, only clear() can empty the set without having to re-assign.
Compare:
>>> def clear(s):
... s.clear()
...
>>> def clear_assignment(s):
... s = set()
...
>>> foo = {'bar', 'baz'}
>>> clear_assignment(foo)
>>> foo
{'baz', 'bar'}
>>> clear(foo)
>>> foo
set()
Assignment rebinds one name to a new set() object, while set.clear() removes everything from the mutable object without re-assignment, so you can continue to use other references to that set.

There could be other references to this object. If you replace your reference by a new object, the other references will still point to the original dictionary/set.

foo={} does not delete the object, just references foo to a new empty object (the old one still remains though if other variables still reference to it).
So, foo.clear() is memory efficient and this is the preferable way to do it.

Python: Identical strings (or numbers) with unique ids?

Python is wonderfully optimized, but I have a case where I'd like to work around it. It seems for small numbers and strings, python will automatically collapse multiple objects into one. For example:
>>> a = 1
>>> b = 1
>>> id(a) == id(b)
True
>>> a = str(a)
>>> b = str(b)
>>> id(a) == id(b)
True
>>> a += 'foobar'
>>> b += 'foobar'
>>> id(a) == id(b)
False
>>> a = a[:-6]
>>> b = b[:-6]
>>> id(a) == id(b)
True
I have a case where I'm comparing objects based on their Python ids. This is working really well except for the few cases where I run into small numbers. Does anyone know how to turn off this optimization for specific strings and integers? Something akin to an anti-intern()?

You shouldn't be relying on these objects to be different objects at all. There's no way to turn this behavior off without modifying and recompiling Python, and which particular objects it applies to is subject to change without notice.

You can't turn it off without re-compiling your own version of CPython.
But if you want to have "separate" versions of the same small integers, you can do that by maintaining your own id (for example a uuid4) associated with the object.
Since ints and strings are immutable, there's no obvious reason to do this - if you can't modify the object at all, you shouldn't care whether you have the "original" or a copy because there is no use-case where it can make any difference.
Related: How to create the int 1 at two different memory locations?

Sure, it can be done, but its never really a good idea:
#
Z =1
class MyString(string):
def __init__(self, *args):
global Z
super(MyString,
self).__init__(*args)
self.i = Z
Z += 1
>>> a = MyString("1")
>>> b = MyString("1")
>>> a is b
False
btw, to compare if objects have the same id just use a is b instead of id(a)==id(b)

The Python documentation on id() says
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So it's guaranteed to be unique, it must be intended as a way to tell if two variables are bound to the same object.
In a comment on StackOverflow here, Alex Martelli says the CPython implementation is not the authoritative Python, and other correct implementations of Python can and do behave differently in some ways - and that the Python Language Reference (PLR) is the closest thing Python has to a definitive specification.
In the PLR section on objects it says much the same:
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address).
The language reference doesn't say it's guaranteed to be unique. It also says (re: the object's lifetime):
Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
and:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
This isn't actually an answer, I was hoping this would end up somewhere conclusive. But I don't want to delete it now I've quoted and cited.
I'll go with turning your premise around: python will automatically collapse multiple objects into one. - no it willn't, they were never multiple objects, they can't be, because they have the same id().
If id() is Python's definitive answer on whether two objects are the same or different, your premise is incorrect - this isn't an optimization, it's a fundamental part of Python's view on the world.

This version accounts for wim's concerns about more aggressive internment in the future. It will use more memory, which is why I discarded it originally, but probably is more future proof.
>>> class Wrapper(object):
... def __init__(self, obj):
... self.obj = obj
>>> a = 1
>>> b = 1
>>> aWrapped = Wrapper(a)
>>> bWrapped = Wrapper(b)
>>> aWrapped is bWrapped
False
>>> aUnWrapped = aWrapped.obj
>>> bUnwrapped = bWrapped.obj
>>> aUnWrapped is bUnwrapped
True
Or a version that works like the pickle answer (wrap + pickle = wrapple):
class Wrapple(object):
def __init__(self, obj):
self.obj = obj
#staticmethod
def dumps(obj):
return Wrapple(obj)
def loads(self):
return self.obj
aWrapped = Wrapple.dumps(a)
aUnWrapped = Wrapple.loads(a)

Well, seeing as no one posted a response that was useful, I'll just let you know what I ended up doing.
First, some friendly advice to someone who might read this one day. This is not recommended for normal use, so if you're contemplating it, ask yourself if you have a really good reason. There are good reason, but they are rare, and if someone says there aren't, they just aren't thinking hard enough.
In the end, I just used pickle.dumps() on all the objects and passed the output in instead of the real object. On the other side I checked the id and then used pickle.loads() to restore the object. The nice part of this solution was it works for all types including None and Booleans.
>>> a = 1
>>> b = 1
>>> a is b
True
>>> aPickled = pickle.dumps(a)
>>> bPickled = pickle.dumps(b)
>>> aPickled is bPickled
False
>>> aUnPickled = pickle.loads(aPickled)
>>> bUnPickled = pickle.loads(bPickled)
>>> aUnPickled is bUnPickled
True
>>> aUnPickled
1

Weak References in python

I have been trying to understand how python weak reference lists/dictionaries work. I've read the documentation for it, however I cannot figure out how they work, and what they can be used for. Could anyone give me a basic example of what they do and an explanation of how they work?
(EDIT) Using Thomas's code, when i substitute obj for [1,2,3] it throws:
Traceback (most recent call last):
File "C:/Users/nonya/Desktop/test.py", line 9, in <module>
r = weakref.ref(obj)
TypeError: cannot create weak reference to 'list' object

Theory
The reference count usually works as such: each time you create a reference to an object, it is increased by one, and whenever you delete a reference, it is decreased by one.
Weak references allow you to create references to an object that will not increase the reference count.
The reference count is used by python's Garbage Collector when it runs: any object whose reference count is 0 will be garbage collected.
You would use weak references for expensive objects, or to avoid circle references (although the garbage collector usually does it on its own).
Usage
Here's a working example demonstrating their usage:
import weakref
import gc
class MyObject(object):
def my_method(self):
print 'my_method was called!'
obj = MyObject()
r = weakref.ref(obj)
gc.collect()
assert r() is obj #r() allows you to access the object referenced: it's there.
obj = 1 #Let's change what obj references to
gc.collect()
assert r() is None #There is no object left: it was gc'ed.

Just want to point out that weakref.ref does not work for built-in list because there is no __weakref__ in the __slots__ of list.
For example, the following code defines a list container that supports weakref.
import weakref
class weaklist(list):
__slots__ = ('__weakref__',)
l = weaklist()
r = weakref.ref(l)

The point is that they allow references to be retained to objects without preventing them from being garbage collected.
The two main reasons why you would want this are where you do your own periodic resource management, e.g. closing files, but because the time between such passes may be long, the garbage collector may do it for you; or where you create an object, and it may be relatively expensive to track down where it is in the programme, but you still want to deal with instances that actually exist.
The second case is probably the more common - it is appropriate when you are holding e.g. a list of objects to notify, and you don't want the notification system to prevent garbage collection.

Here is the example comparing dict and WeakValueDictionary:
class C: pass
ci=C()
print(ci)
wvd = weakref.WeakValueDictionary({'key' : ci})
print(dict(wvd), len(wvd)) #1
del ci
print(dict(wvd), len(wvd)) #0
ci2=C()
d=dict()
d['key']=ci2
print(d, len(d))
del ci2
print(d, len(d))
And here is the output:
<__main__.C object at 0x00000213775A1E10>
{'key': <__main__.C object at 0x00000213775A1E10>} 1
{} 0
{'key': <__main__.C object at 0x0000021306B0E588>} 1
{'key': <__main__.C object at 0x0000021306B0E588>} 1
Note how in the first case once we del ci the actual object will be also removed from the dictionary wvd.
In the case or regular Python dictionary dict class, we may try to remove the object but it will still be there as shown.
Note: if we use del, we do not to call gc.collect() after that, since just del effectively removes the object.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.