Weak References in python - python

I have been trying to understand how python weak reference lists/dictionaries work. I've read the documentation for it, however I cannot figure out how they work, and what they can be used for. Could anyone give me a basic example of what they do and an explanation of how they work?
(EDIT) Using Thomas's code, when i substitute obj for [1,2,3] it throws:
Traceback (most recent call last):
File "C:/Users/nonya/Desktop/test.py", line 9, in <module>
r = weakref.ref(obj)
TypeError: cannot create weak reference to 'list' object

Theory
The reference count usually works as such: each time you create a reference to an object, it is increased by one, and whenever you delete a reference, it is decreased by one.
Weak references allow you to create references to an object that will not increase the reference count.
The reference count is used by python's Garbage Collector when it runs: any object whose reference count is 0 will be garbage collected.
You would use weak references for expensive objects, or to avoid circle references (although the garbage collector usually does it on its own).
Usage
Here's a working example demonstrating their usage:
import weakref
import gc
class MyObject(object):
def my_method(self):
print 'my_method was called!'
obj = MyObject()
r = weakref.ref(obj)
gc.collect()
assert r() is obj #r() allows you to access the object referenced: it's there.
obj = 1 #Let's change what obj references to
gc.collect()
assert r() is None #There is no object left: it was gc'ed.

Just want to point out that weakref.ref does not work for built-in list because there is no __weakref__ in the __slots__ of list.
For example, the following code defines a list container that supports weakref.
import weakref
class weaklist(list):
__slots__ = ('__weakref__',)
l = weaklist()
r = weakref.ref(l)

The point is that they allow references to be retained to objects without preventing them from being garbage collected.
The two main reasons why you would want this are where you do your own periodic resource management, e.g. closing files, but because the time between such passes may be long, the garbage collector may do it for you; or where you create an object, and it may be relatively expensive to track down where it is in the programme, but you still want to deal with instances that actually exist.
The second case is probably the more common - it is appropriate when you are holding e.g. a list of objects to notify, and you don't want the notification system to prevent garbage collection.

Here is the example comparing dict and WeakValueDictionary:
class C: pass
ci=C()
print(ci)
wvd = weakref.WeakValueDictionary({'key' : ci})
print(dict(wvd), len(wvd)) #1
del ci
print(dict(wvd), len(wvd)) #0
ci2=C()
d=dict()
d['key']=ci2
print(d, len(d))
del ci2
print(d, len(d))
And here is the output:
<__main__.C object at 0x00000213775A1E10>
{'key': <__main__.C object at 0x00000213775A1E10>} 1
{} 0
{'key': <__main__.C object at 0x0000021306B0E588>} 1
{'key': <__main__.C object at 0x0000021306B0E588>} 1
Note how in the first case once we del ci the actual object will be also removed from the dictionary wvd.
In the case or regular Python dictionary dict class, we may try to remove the object but it will still be there as shown.
Note: if we use del, we do not to call gc.collect() after that, since just del effectively removes the object.

Related

Circular reference in python

I am not sure how python deal with circular reference (reference cycle). I check some answers and found this:
Python's standard reference counting mechanism cannot free cycles, so the structure in your example would leak.
The supplemental garbage collection facility, however, is enabled by default and should be able to free that structure, if none of its components are reachable from the outside anymore and they do not have __del__() methods.
I guess it means that if none of the instances in reference cycle are reachable outside, both of them will be cleaned up. Is this true?
On the other hand, there is a package weakref which is often used to deal with map dictionary. The purpose of its existence, I suppose, is to avoid reference cycle.
In summary, can python deal with reference cycle automatically? If they can, why do we have to use weakref?
You do not have to worry about reference cycles if the objects in the cycle don't have a custom __del__ method, as Python can (and will) destroy the objects in any order.
If your custom methods do have a __del__ method, Python doesn't know if one objects deletion effects another objects deletion at all. Say when an object is deleted, it sets some global variable. So, the objects stick around. As a quick test, you can make a __del__ method that prints something:
class Deletor(str):
def __del__(self):
print(self, 'has been deleted')
a = Deletor('a') # refcount: 1
del a # refcount: 0
Outputs:
a has been deleted
But if you had code like this:
a = Deletor('a') # refcount: 1
a.circular = a # refcount: 2
del a # refcount: 1
It outputs nothing, as Python can't safely delete a. It becomes an "uncollectable object", and can be found in gc.garbage †
There are two solutions to this. The weakref (which doesn't increase the reference count):
# refcount: a b
a = Deletor('a') # 1 0
b = Deletor('b') # 1 1
b.a = a # 2 1
a.b = weakref.ref(b) # 2 1
del a # 1 1
del b # 1 0
# del b kills b.a # 0 0
Outputs:
b has been deleted
a has been deleted
(Note how b is deleted first before it can delete a)
And you can manually delete the cycles (If you can keep track of them):
# refcount a b
a = Deletor('a') # 1 0
b = Deletor('b') # 1 1
b.a = a # 2 1
a.b = b # 2 2
del b # 2 1
print('del b')
del a.b # 2 0
# b is deleted, now dead
# b.a now dead # 1 0
print('del a.b')
del a # 0 0
print('del a')
Outputs:
del b
b has been deleted
del a.b
a has been deleted
del a
Notice how b is deleted after a.b is deleted.
† Starting with Python 3.4, things changed due to PEP 442. __del__ may be called even on objects in a reference cycle and the semantics are slightly different, so it is slightly harder to become an uncollectable object.
A weakref is still helpful, as it is less intense on the garbage collector, and memory can be reclaimed earlier.
(Trying to answer why do we then have the weak references subquestion.)
Weakrefs do not only break circular references, but also prevent unwanted non-circular references.
My favourite example is counting simultaneous network connections (kind of load measuring) using a WeakSet. In this example every new connection has to be added to the WeakSet, but that's the only task the networking code needs to do. The connection can be closed by the server, by the client, or in an error handler, but none of these routines has the responsibility to remove the connection from the set, and that is because the additional references are weak.
Variables are memory references.
my_var=10
this is stored in the one of the memory slots. my_var actually references the address of the memory slot where the 10 is stored. if you type:
id(my_var)
you will get the address of the slot in base-10. hex(id(my_var) ) will give the hex representation of the address.
Whenever we use my_var python memory manager goes to the memory and retrieves the value of 10. Python memory manager also keeps track of the number of references for this memory slot. if there is no reference to this memory address, python memory manager destroys this object, and uses this memory slot for new objects.
imagine we have two classes:
class A:
def __init__(self):
self.b = B(self)
print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b))))
class B:
def __init__(self, a):
self.a = a
print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a))))
when you define an instance of class A:
my_var = A()
you will get this printed: (in your system, you will have different addresses)
B: self: 0x1fc1eae44e0, a: 0x1fc1eae4908
A: self: 0x1fc1eae4908, b:0x1fc1eae44e0
pay attention to the references. they are circular referenced.
NOTE: in order to see those references you have to disable the garbage collector otherwise it will delete them automatically.
gc.disable()
Currently reference count of my_var which is (0x1fc1eae4908) is 2. my_var and classB are referencing to this address. if we change the my_var
my_var= None
now my_var is not pointing the same memory address. Now reference count of (0x1fc1eae4908) is 1 therefore this memory slot is not cleaned up.
now we will have Memory Leak which is when the memory is no longer needed is not cleaned up.
Garbage Collector will automatically identify the memory leak in circular references and clean them up. But if even one of the objects in the circular reference has a destructor (del()), Garbage Collector does not know the destruction order of the objects. So the object is marked as uncollectable and objects in the circular reference are not cleaned up which leads to memory leak.
weakref is used for caching purposes. I think python has very good documentation.
here is the reference for the weakref:
weakref documentation

Call destructor of an element from a list

I have something like that:
a = [instance1, instance2, ...]
if I do a
del a[1]
instance2 is removed from list, but is instance2 desctructor method called?
I'm interested in this because my code uses a lot of memory and I need to free memory deleting instances from a list.
Coming from a language like c++ (as I did), this tends to be a subject many people find difficult to grasp when first learning Python.
The bottomline is this: when you do del XXX, you are never* deleting an object when you use del. You are only deleting an object reference. However, in practice, assuming there are no other references laying about to the instance2 object, deleting it from your list will free the memory as you desire.
If you don't understand the difference between an object and an object reference, read on.
Python: Pass by value, or pass by reference?
You are likely familiar with the concept of passing arguments to a function by reference, or by value. However, Python does things differently. Arguments are always passed by object reference. Read this article for a helpful explanation of what this means.
To summarize: this means that when you pass a variable to a function, you are not passing a copy of the value of the variable (pass by value), nor are you passing the object itself - i.e., the address of the value in memory. You are passing the name-object that indirectly refers to the value held in memory.
What does this have to do with del...?
Well, I'll tell you.
Say you do this:
def deleteit(thing):
del thing
mylist = [1,2,3]
deleteit(mylist)
...what do you think will happen? Has mylist been deleted from the global namespace?
The answer is NO:
assert mylist # No error
The reason is that when you do del thing in the deleteit function, you are only deleting a local object reference to the object. That object reference exists ONLY inside of the function. As a sidebar, you might ask: is it possible to delete an object reference from the global namespace while inside a function? The answer is yes, but you have to declare the object reference to be part of the global namespace first:
def deletemylist():
global mylist
del mylist
mylist = [1,2,3]
deletemylist()
assert mylist #ERROR, as expected
Putting it all together
Now to get back to your original question. When, in ANY namespace, you do this:
del XXX
...you have NOT deleted the object signified by XXX. You CAN'T do that. You have only deleted the object reference XXX, which refers to some object in memory. The object itself is managed by the Python memory manager. This is a very important distinction.
Note that as a consequence, when you override the __del__ method of some object, which gets called when the object is deleted (NOT the object reference!):
class MyClass():
def __del__(self):
print(self, "deleted")
super().__del__()
m = MyClass()
del m
...the print statement inside the __del__ method does not necessarily occur immediately after you do del m. It only occurs at the point in time the object itself is deleted, and that is not up to you. It is up to the Python memory manager. When all object references in all the namespaces have been deleted, the __del__ method will eventually be executed. But not necessarily immediately.
The same is true when you delete an object reference that is part of a list, like in the original example. When you do del a[1], only the object reference to the object signified by a[1] is deleted, and the __del__ method of that object may or may not be called immediately (though as stated before, it will eventually be called once there are no more references to the object, and the object is garbage collected by the memory manager).
As a result of this, it is not recommended that you put things in the __del__ method that you want to happen immediately upon del mything, because it may not happen that way.
*I believe it is never. Inevitably someone will likely downvote my answer and leave a comment discussing the exception to the rule. But whatevs.
No. Calling del on a list element only removes a reference to an object from the list, it doesn't do anything (explicitly) to the object itself. However: If the reference in the list was the last one referring to the object, the object can now be destroyed and recycled. I think that the "normal" CPython implementation will immediately destroy and recycle the object, other variants' behaviour can vary.
If your object is resource-heavy and you want to be sure that the resources are freed correctly, use the with() construct. It's very easy to leak resources when relying on destructors. See this SO post for more details.

Python Static Variable List __del__

I'm trying to create a class using a static List, which collects all new instances of an object class. The problem I'm facing, seems like as soon as i try to use a list the same way as for example an integer, i can't use the magic marker __del__ anymore.
My Example:
class MyClass(object):
count = 0
#instances = []
def __init__(self, a, b):
self.a = a
self.b = b
MyClass.count += 1
#MyClass.instances.append(self)
def __str__(self):
return self.__repr__()
def __repr__(self):
return "a: " + str(self.a) + ", b: " + str(self.b)
def __del__(self):
MyClass.count -= 1
#MyClass.instances.remove(self)
A = MyClass(1,'abc')
B = MyClass(2,'def')
print MyClass.count
del B
print MyClass.count
With comments I get the correct answer:
2
1
But without the comments - including now the static object list MyClass.instances I get the wrong answer:
2
2
It seems like MyClass can't reach its __del__ method anymore! How Come?
From the docs,
del x doesn’t directly call x.__del__() — the former decrements the reference
count for x by one, and the latter is only called when x‘s reference count
reaches zero.
When you uncomment,
instances = []
...
...
MyClass.instances.append(self)
You are storing a reference to the current Object in the MyClass.instances. That means, the reference count is internally incremented by 1. That is why __del__ is not getting called immediately.
To resolve this problem, explicitly remove the item from the list like this
MyClass.instances.remove(B)
del B
Now it will print
2
1
as expected.
There is one more way to fix this problem. That is to use weakref. From the docs,
A weak reference to an object is not enough to keep the object alive:
when the only remaining references to a referent are weak references,
garbage collection is free to destroy the referent and reuse its
memory for something else. A primary use for weak references is to
implement caches or mappings holding large objects, where it’s desired
that a large object not be kept alive solely because it appears in a
cache or mapping.
So, having a weakref will not postpone object's deletion. With weakref, this can be fixed like this
MyClass.instances.append(weakref.ref(self))
...
...
# MyClass.instances.remove(weakref.ref(self))
MyClass.instances = [w_ref for w_ref in MyClass.instances if w_ref() is None]
Instead of using remove method, we can call each of the weakref objects and if they return None, they are already dead. So, we filter them out with the list comprehension.
So, now, when you say del B, even though weakrefs exist for B, it will call __del__ (unless you have made some other variable point to the same object, like by doing an assigment).
From to http://docs.python.org/2.7/reference/datamodel.html#basic-customization I quote (paragraph in gray after object.__del__):
del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
Here you call del B but there is still an instance of B in MyClass.instances, so that B is still referenced and hence not destroyed, so that the __del__ function is not called.
If you call directly B.__del__(), it works.
__del__ is only called when no more instances are left.
You should consider putting only weak refs into the MyClass.instances list.
This can be achieved with import weakref and then
either using a WeakSet for the list
or putting weakref.ref(self) into the list.
__del__ is automatically called whenever the last "strict" reference is removed. The weakrefs disappear automatically.
But be aware that there are some caveats on __del__ mentioned in the docs.
__del__ is used when the garbage collector remove an object from the memory. If you add your object to MyClass.instances then the object is marked as "used" and the garbage collector will never try to remove it. And so __del__ is never called.
You'd better use an explicit function (MyClass.del_element()) because you can't really predict when __del__ will be called (even if you don't add it to a list).

Why doesn't __del__ work properly

I think this is the most common question on interviews:
class A:
def __init__(self, name):
self.name = name
def __del__(self):
print self.name,
aa = [A(str(i)) for i in range(3)]
for a in aa:
del a
And so what output of this code and why.
Output will be is nothing and why?
Thats because a is ref on object in list and then we call del method we remove this ref but not object?
There are at least 2 references to the object that a references (variables are references to objects, they are not the objects themselves). There's the one reference inside the list, and then there's the reference "a". When you del a, you remove one reference (the variable a) but not the reference from inside the list.
Also note that Python doesn't guarantee that __del__ will ever be called ...
__del__ gets called when an object is destroyed; this will happen after the last possible reference to the object is removed from the program's accessible memory. Depending on the implementation this might happen immediately or might be after some time.
Your code just removes the local name a from the execution scope; the object remains in the list so is still accessible. Try writing del aa[0], for example.
From the docs:
Note del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
__del__ is triggered when the garbage collector finds an object to be destroyed. The garbage collector will try to destroy objects with a reference count of 0. del just decouples the label in the local namespace, thereby decrementing the reference count for the object in the interpreter. The behavior of the garbage collector is for the most part considered an implementation detail of the interpreter, so there's no guarantee that __del__ on objects will be called in any specific order, or even at all. That's why the behavior of this code is undefined.

Preserving circular references after garbage collection

import weakref
import gc
class MyClass(object):
def refer_to(self, thing):
self.refers_to = thing
foo = MyClass()
bar = MyClass()
foo.refer_to(bar)
bar.refer_to(foo)
foo_ref = weakref.ref(foo)
bar_ref = weakref.ref(bar)
del foo
del bar
gc.collect()
print foo_ref()
I want foo_ref and bar_ref to retain weak references to foo and bar respectively as long as they reference each other*, but this instead prints None. How can I prevent the garbage collector from collecting certain objects within reference cycles?
bar should be garbage-collected in this code because it is no longer part of the foo-bar reference cycle:
baz = MyClass()
baz.refer_to(foo)
foo.refer_to(baz)
gc.collect()
* I realize it might seem pointless to prevent circular references from being garbage-collected, but my use case requires it. I have a bunch of objects that refer to each other in a web-like fashion, along with a WeakValueDictionary that keeps a weak reference to each object in the bunch. I only want an object in the bunch to be garbage-collected when it is orphaned, i.e. when no other objects in the bunch refer to it.
Normally using weak references means that you cannot prevent objects from being garbage collected.
However, there is a trick you can use to prevent objects part of a reference cycle from being garbage collected: define a __del__() method on these.
From the gc module documentation:
gc.garbage
A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). By default, this list
contains only objects with __del__() methods. Objects that have
__del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily
in the cycle but reachable only from it. Python doesn’t collect such
cycles automatically because, in general, it isn’t possible for Python
to guess a safe order in which to run the __del__() methods. If you
know a safe order, you can force the issue by examining the garbage
list, and explicitly breaking cycles due to your objects within the
list. Note that these objects are kept alive even so by virtue of
being in the garbage list, so they should be removed from garbage too.
For example, after breaking cycles, do del gc.garbage[:] to empty the
list. It’s generally better to avoid the issue by not creating cycles
containing objects with __del__() methods, and garbage can be examined
in that case to verify that no such cycles are being created.
When you define MyClass as follows:
class MyClass(object):
def refer_to(self, thing):
self.refers_to = thing
def __del__(self):
print 'Being deleted now, bye-bye!'
then your example script prints:
<__main__.MyClass object at 0x108476a50>
but commenting out one of the .refer_to() calls results in:
Being deleted now, bye-bye!
Being deleted now, bye-bye!
None
In other words, by simply having defined a __del__() method, we prevented the reference cycle from being garbage collected, but any orphaned objects are being deleted.
Note that in order for this to work, you need circular references; any object in your object graph that is not part of a reference circle will be picked off regardless.

Categories

Resources