Should I worry about circular references in Python? - python

Suppose I have code that maintains a parent/children structure. In such a structure I get circular references, where a child points to a parent and a parent points to a child. Should I worry about them? I'm using Python 2.5.
I am concerned that they will not be garbage collected and the application will eventually consume all memory.

"Worry" is misplaced, but if your program turns out to be slow, consume more memory than expected, or have strange inexplicable pauses, the cause is indeed likely to be in those garbage reference loops -- they need to be garbage collected by a different procedure than "normal" (acyclic) reference graphs, and that collection is occasional and may be slow if you have a lot of objects tied up in such loops (the cyclical-garbage collection is also inhibited if an object in the loop has a __del__ special method).
So, reference loops will not affect your program's correctness, but may affect its performance and/or footprint.
If and when you want to remove unwanted loops of references, you can often use the weakref module in Python's standard library.
If and when you want to exert more direct control (or perform debugging, see what exactly is happening) regarding cyclical garbage collection, use the gc module in Python's standard library.

Experimentally: you're fine:
import itertools
for i in itertools.count():
a = {}
b = {"a":a}
a["b"] = b
It consistently stays at using 3.6 MB of RAM.

Python will detect the cycle and release the memory when there are no outside references.

Circular references are a normal thing to do, so I don't see a reason to be worried about them. Many tree algorithms require that each node have links to its children and its parent. They're also required to implement something like a doubly linked list.

I don't think you should worry. Try the following program and will you see that it won't consume all memory:
while True:
a=range(100)
b=range(100)
a.append(b)
b.append(a)
a.append(a)
b.append(b)

There seems to be a issue with references to methods in lists in a variable. Here are two examples. The first one does not call __del__. The second one with weakref is ok for __del__. However, in this later case the problem is that you cannot weakly reference methods: http://docs.python.org/2/library/weakref.html
import sys, weakref
class One():
def __init__(self):
self.counters = [ self.count ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(One)
one = One()
sys.getrefcount(One)
del one
sys.getrefcount(One)
class Two():
def __init__(self):
self.counters = [ weakref.ref(self.count) ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(Two)
two = Two()
sys.getrefcount(Two)
del two
sys.getrefcount(Two)

Related

Is it safe to give a python WeakSet to a list constructor?

The question Safely iterating over WeakKeyDictionary and WeakValueDictionary did not put me at ease as I had hoped, and it's old enough that it's worth asking again rather than commenting.
Suppose I have a class MyHashable that's hashable, and I want to build a WeakSet:
obj1 = MyHashable()
obj2 = MyHashable()
obj3 = MyHashable()
obj2.cycle_sibling = obj3
obj3.cycle_sibling = obj2
ws = WeakSet([obj1, obj2, obj3])
Then I delete some local variables, and convert to a list in preparation for a later loop:
del obj2
del obj3
list_remaining = list(ws)
The question I cite seems to claim this is just fine, but even without any kind of explicit for loop, have I not already risked the cyclic garbage collector kicking in during the constructor of list_remaining and changing the size of the set? I would expect this problem to be rare enough that it would be difficult to detect experimentally, but could crash my program once in a blue moon.
I don't even feel like the various commenters on that post really came to an agreement whether something like
for obj in list(ws):
...
was ok, but they did all seem to assume that list(ws) itself can run all the way through without crashing, and I'm not even convinced of that. Does the list constructor avoid using iterators somehow and thus not care about set size changes? Can garbage collection not occur during a list constructor because list is built-in?
For the moment I've written my code to destructively pop items out of the WeakSet, thus avoiding iterators altogether. I don't mind doing it destructively because at that point in my code I'm done with the WeakSet anyway. But I don't know if I'm being paranoid.
The docs are frustratingly lacking in information on this, but looking at the implementation, we can see that WeakSet.__iter__ has a guard against this kind of problem.
During iteration over a WeakSet, weakref callbacks will add references to a list of pending removals rather than removing references from the underlying set directly. If an element dies before iteration reaches it, the iterator won't yield the element, but you're not going to get a segfault or a RuntimeError: Set changed size during iteration or anything.
Here's the guard (not threadsafe, despite what the comment says):
class _IterationGuard:
# This context manager registers itself in the current iterators of the
# weak container, such as to delay all removals until the context manager
# exits.
# This technique should be relatively thread-safe (since sets are).
def __init__(self, weakcontainer):
# Don't create cycles
self.weakcontainer = ref(weakcontainer)
def __enter__(self):
w = self.weakcontainer()
if w is not None:
w._iterating.add(self)
return self
def __exit__(self, e, t, b):
w = self.weakcontainer()
if w is not None:
s = w._iterating
s.remove(self)
if not s:
w._commit_removals()
Here's where __iter__ uses the guard:
class WeakSet:
...
def __iter__(self):
with _IterationGuard(self):
for itemref in self.data:
item = itemref()
if item is not None:
# Caveat: the iterator will keep a strong reference to
# `item` until it is resumed or closed.
yield item
And here's where the weakref callback checks the guard:
def _remove(item, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(item)
else:
self.data.discard(item)
You can also see the same guard used in WeakKeyDictionary and WeakValueDictionary.
On old Python versions (3.0, or 2.6 and earlier), this guard is not present. If you need to support 2.6 or earlier, it looks like it should be safe to use keys, values, and items with the weak dict classes; I list no option for WeakSet because WeakSet didn't exist back then. If there's a safe, non-destructive option on 3.0, I haven't found one, but hopefully no one needs to support 3.0.

Problems with the GC when using a WeakValueDictionary for caches

According to the official Python documentation for the weakref module the "primary use for weak references is to implement caches or mappings holding large objects,...". So, I used a WeakValueDictionary to implement a caching mechanism for a long running function. However, as it turned out, values in the cache never stayed there until they would actually be used again, but needed to be recomputed almost every time. Since there were no strong references between accesses to the values stored in the WeakValueDictionary, the GC got rid of them (even though there was absolutely no problem with memory).
Now, how am I then supposed to use the weak reference stuff to implement a cache? If I keep strong references somewhere explicitly to keep the GC from deleting my weak references, there would be no point using a WeakValueDictionary in the first place. There should probably be some option to the GC that tells it: delete everything that has no references at all and everything with weak references only when memory is running out (or some threshold is exceeded). Is there something like that? Or is there a better strategy for this kind of cache?
I'll attempt to answer your inquiry with an example of how to use the weakref module to implement caching. We'll keep our cache's weak references in a weakref.WeakValueDictionary, and the strong references in a collections.deque because it has a maxlen property that controls how many objects it holds on to. Implemented in function closure style:
import weakref, collections
def createLRUCache(factory, maxlen=64):
weak = weakref.WeakValueDictionary()
strong = collections.deque(maxlen=maxlen)
notFound = object()
def fetch(key):
value = weak.get(key, notFound)
if value is notFound:
weak[key] = value = factory(key)
strong.append(value)
return value
return fetch
The deque object will only keep the last maxlen entries, simply dropping references to the old entries once it reaches capacity. When the old entries are dropped and garbage collected by python, the WeakValueDictionary will remove those keys from the map. Hence, the combination of the two objects helps us keep only maxlen entries in our LRU cache.
class Silly(object):
def __init__(self, v):
self.v = v
def fib(i):
if i > 1:
return Silly(_fibCache(i-1).v + _fibCache(i-2).v)
elif i: return Silly(1)
else: return Silly(0)
_fibCache = createLRUCache(fib)
It looks like there is no way to overcome this limitation, at least in CPython 2.7 and 3.0.
Reflecting on solution createLRUCache():
The solution with createLRUCache(factory, maxlen=64) is not fine with my expectations. The idea of binding to 'maxlen' is something I would like to avoid. It would force me to specify here some non scalable constant or create some heuristic, to decide which constant is better for this or that host memory limits.
I would prefer GC will eliminate unreferenced values from WeakValueDictionary not straight away, but on condition is used for regular GC:
When the number of allocations minus the number of deallocations exceeds threshold0, collection starts.

Deleting attributes when deleting instance

class A:
def __get(self):
return self._x
def __set(self, y):
self._x = y
def __delete_x(self):
print('DELETING')
del self._x
x = property(__get,__set,__delete_x)
b = A()
# Here, when b is deleted, i'd like b.x to be deleted, i.e __delete_x()
# called (and for immediate consequence, "DELETING" printed)
del b
The semantics of the del statement don't really lend themselves to what you want here. del b simple removes the reference to the A object you just instantiated from the local scope frame / dictionary; this does not directly cause any operation to be performed on the object itself. If that was the last reference to the object, then the reference count dropping to zero, or the garbage collector collecting a cycle, may cause the object to be deallocated. You could observe this by adding a __del__ method to the object, or by adding a weakref callback that performs the desired actions.
Neither of the latter two solutions seems like a great idea, though; __del__ methods prevent the garbage collector from collecting any cycles involving the object; and while weakrefs do not suffer from this problem, in either case you may be running in a strange environment (such as during program shutdown), which may make it difficult to get done what you want to accomplish.
If you can expand on your exact use case, it may be that there is an entirely different approach to accomplishing your desired end goal, but it is difficult to speculate based on such a general and limited example.
To control what happens when an instance of class A goes away (whether by being deleted or garbage collected), you can implement special method __del__(self) in A. If you want to have your code involved when a specific attribute of that instance goes away, you can either wrap that attribute with a wrapper class which has __del__, or, probably better in most cases, use the weakref module (however, not all types are subject to being target of weak references, so you may also need some wrapping for this case).
Avoiding __del__ is generally preferable, if you possibly can, because it can interfere with garbage collection and thereby cause "memory leaks" if and when you have circular references.
An ugly way to do it would be :
def __del__(self):
for x in dir(self.__class__):
if type(getattr(self.__class__, x)) == property:
getattr(self.__class__, x).fdel(self)

Using explicit del in python on local variables

What are the best practices and recommendations for using explicit del statement in python? I understand that it is used to remove attributes or dictionary/list elements and so on, but sometimes I see it used on local variables in code like this:
def action(x):
result = None
something = produce_something(x)
if something:
qux = foo(something)
result = bar(qux, something)
del qux
del something
return result
Are there any serious reasons for writing code like this?
Edit: consider qux and something to be something "simple" without a __del__ method.
I don't remember when I last used del -- the need for it is rare indeed, and typically limited to such tasks as cleaning up a module's namespace after a needed import or the like.
In particular, it's not true, as another (now-deleted) answer claimed, that
Using del is the only way to make sure
a object's __del__ method is called
and it's very important to understand this. To help, let's make a class with a __del__ and check when it is called:
>>> class visdel(object):
... def __del__(self): print 'del', id(self)
...
>>> d = visdel()
>>> a = list()
>>> a.append(d)
>>> del d
>>>
See? del doesn't "make sure" that __del__ gets called: del removes one reference, and only the removal of the last reference causes __del__ to be called. So, also:
>>> a.append(visdel())
>>> a[:]=[1, 2, 3]
del 550864
del 551184
when the last reference does go away (including in ways that don't involve del, such as a slice assignment as in this case, or other rebindings of names and other slots), then __del__ gets called -- whether del was ever involved in reducing the object's references, or not, makes absolutely no difference whatsoever.
So, unless you specifically need to clean up a namespace (typically a module's namespace, but conceivably that of a class or instance) for some specific reason, don't bother with del (it can be occasionally handy for removing an item from a container, but I've found that I'm often using the container's pop method or item or slice assignment even for that!-).
No.
I'm sure someone will come up with some silly reason to do this, e.g. to make sure someone doesn't accidentally use the variable after it's no longer valid. But probably whoever wrote this code was just confused. You can remove them.
When you are running programs handling really large amounts of data ( to my experience when the totals memory consumption of the program approaches something like 1GB) deleting some objects:
del largeObject1
del largeObject2
…
can give your program the necessary breathing room to function without running out of memory. This can be the easiest way to modify a given program, in case of a “MemoryError” runtime error.
Actually, I just came across a use for this. If you use locals() to return a dictionary of local variables (useful when parsing things) then del is useful to get rid of a temporary that you don't want to return.

How to do cleanup reliably in python?

I have some ctypes bindings, and for each body.New I should call body.Free. The library I'm binding doesn't have allocation routines insulated out from the rest of the code (they can be called about anywhere there), and to use couple of useful features I need to make cyclic references.
I think It'd solve if I'd find a reliable way to hook destructor to an object. (weakrefs would help if they'd give me the callback just before the data is dropped.
So obviously this code megafails when I put in velocity_func:
class Body(object):
def __init__(self, mass, inertia):
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
I also tried to solve it through weakrefs, with those the things seem getting just worse, just only largely more unpredictable.
Even if I don't put in the velocity_func, there will appear cycles at least then when I do this:
class Toy(object):
def __init__(self, body):
self.body.owner = self
...
def collision(a, b, contacts):
whatever(a.body.owner)
So how to make sure Structures will get garbage collected, even if they are allocated/freed by the shared library?
There's repository if you are interested about more details: http://bitbucket.org/cheery/ctypes-chipmunk/
What you want to do, that is create an object that allocates things and then deallocates automatically when the object is no longer in use, is almost impossible in Python, unfortunately. The del statement is not guaranteed to be called, so you can't rely on that.
The standard way in Python is simply:
try:
allocate()
dostuff()
finally:
cleanup()
Or since 2.5 you can also create context-managers and use the with statement, which is a neater way of doing that.
But both of these are primarily for when you allocate/lock in the beginning of a code snippet. If you want to have things allocated for the whole run of the program, you need to allocate the resource at startup, before the main code of the program runs, and deallocate afterwards. There is one situation which isn't covered here, and that is when you want to allocate and deallocate many resources dynamically and use them in many places in the code. For example of you want a pool of memory buffers or similar. But most of those cases are for memory, which Python will handle for you, so you don't have to bother about those. There are of course cases where you want to have dynamic pool allocation of things that are NOT memory, and then you would want the type of deallocation you try in your example, and that is tricky to do with Python.
If weakrefs aren't broken, I guess this may work:
from weakref import ref
pointers = set()
class Pointer(object):
def __init__(self, cfun, ptr):
pointers.add(self)
self.ref = ref(ptr, self.cleanup)
self.data = cast(ptr, c_void_p).value # python cast it so smart, but it can't be smarter than this.
self.cfun = cfun
def cleanup(self, obj):
print 'cleanup 0x%x' % self.data
self.cfun(self.data)
pointers.remove(self)
def cleanup(cfun, ptr):
Pointer(cfun, ptr)
I yet try it. The important piece is that the Pointer doesn't have any strong references to the foreign pointer, except an integer. This should work if ctypes doesn't free memory that I should free with the bindings. Yeah, it's basicly a hack, but I think it may work better than the earlier things I've been trying.
Edit: Tried it, and it seem to work after small finetuning my code. A surprising thing is that even if I got del out from all of my structures, it seem to still fail. Interesting but frustrating.
Neither works, from some weird chance I've been able to drop away cyclic references in places, but things stay broke.
Edit: Well.. weakrefs WERE broken after all! so there's likely no solution for reliable cleanup in python, except than forcing it being explicit.
In CPython, __del__ is a reliable destructor of an object, because it will always be called when the reference count reaches zero (note: there may be cases - like circular references of items with __del__ method defined - where the reference count will never reaches zero, but that is another issue).
Update
From the comments, I understand the problem is related to the order of destruction of objects: body is a global object, and it is being destroyed before all other objects, thus it is no longer available to them.
Actually, using global objects is not good; not only because of issues like this one, but also because of maintenance.
I would then change your class with something like this
class Body(object):
def __init__(self, mass, inertia):
self._bodyref = body
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
A couple of notes:
The change is only adding a reference to the global body object, that thus will live at least as much as all the objects derived from that class.
Still, using a global object is not good because of unit testing and maintenance; better would be to have a factory for the object, that will set the correct "body" to the class, and in case of unit test will easily put a mock object. But that's really up to you and how much effort you think makes sense in this project.

Categories

Resources