While coding a cache class for one of my projects I wanted to try out the weakref package as its functionality seems to fit this purpose very well. The class is supposed to cache blocks of data from disk as readable and writable buffers for ctypes.Structures. The blocks of data are supposed to be discarded when no structure is pointing to them, unless the buffer was modified due to some change to the structures.
To prevent dirty blocks from being garbage collected my idea was to set block.some_attr_name = block in the structures' __setattr__. Even when all structures are eventually garbage collected, the underlying block of data still has a reference count of at least 1 because block.some_attr_name references block.
I wanted to test this idea, so I opened up an IPython session and typed
import weakref
class Test:
def __init__ (self):
self.self = self
ref = weakref.ref(Test(), lambda r: print("Test was trashed"))
As expected, this didn't print Test was trashed. But when I went to type del ref().self to see whether the referent will be discarded, while typing, before hitting Enter, Test was trashed appeared. Oddly enough, even hitting the arrow keys or resizing the command line window after assigning ref will cause the referent to be trashed, even though the referent's reference count cannot drop to zero because it is referencing itself. This behavior persists even if I artificially increase the reference count by replacing self.self = self with self.refs = [self for i in range(20)].
I couldn't reproduce this behavior in the standard python.exe interpreter (interactive session) which is why I assume this behavior to be tied to IPython (but I am not actually sure about this).
Is this behavior expected with the devil hiding somewhere in the details of IPython's implementation or is this behavior a bug?
Edit 1: It gets stranger. If in the IPython session I run
import weakref
class Test:
def __init__ (self):
self.self = self
test = Test()
ref = weakref.ref(test, lambda r: print("Aaaand it's gone...", flush = True))
del test
the referent is not trashed immediately. But if I hold down any key, "typing" out "aaaa..." (~200 a's), suddenly Aaaand it's gone... appears. And since I added flush = True I can rule out buffering for the late response. I definitely wouldn't expect IPython to be decreasing reference counts just because I was holding down a key. Maybe Python itself checks for circular references in some garbage collection cycles?
(tested with IPython 7.30.1 running Python 3.10.1 on Windows 10 x64)
In Python's documentation on Extending and Embedding the Python Interpreter under subsection 1.10 Reference Counts the second to last paragraph reads:
While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself.
So I guess my idea of circular references to prevent garbage collection from eating my objects won't work out.
Related
I saw a class in which a __del__ method is defined. This method is used to destroy an instance of the class. However, I cannot find a place where this method is used. How is this method used? Like that: obj1.del()?.
How do I call the __del__ method?
__del__ is a finalizer. It is called when an object is garbage collected which happens at some point after all references to the object have been deleted.
In a simple case this could be right after you say del x or, if x is a local variable, after the function ends. In particular, unless there are circular references, CPython (the standard Python implementation) will garbage collect immediately.*
However, this is an implementation detail of CPython. The only required property of Python garbage collection is that it happens after all references have been deleted, so this might not necessary happen right after and might not happen at all.
Even more, variables can live for a long time for many reasons, e.g. a propagating exception or module introspection can keep variable reference count greater than 0. Also, variable can be a part of cycle of references — CPython with garbage collection turned on breaks most, but not all, such cycles, and even then only periodically.
Since you have no guarantee it's executed, one should never put the code that you need to be run into __del__() — instead, this code belongs to the finally clause of a try statement or to a context manager in a with statement. However, there are valid use cases for __del__: e.g. if an object X references Y and also keeps a copy of Y reference in a global cache (cache['X -> Y'] = Y) then it would be polite for X.__del__ to also delete the cache entry.
If you know that the destructor provides (in violation of the above guideline) a required cleanup, you might want to call it directly, since there is nothing special about it as a method: x.__del__(). Obviously, you should only do so if you know it can be called twice. Or, as a last resort, you can redefine this method using
type(x).__del__ = my_safe_cleanup_method
* Reference:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable [...] Other implementations act differently and CPython may change.
I wrote up the answer for another question, though this is a more accurate question for it.
How do constructors and destructors work?
Here is a slightly opinionated answer.
Don't use __del__. This is not C++ or a language built for destructors. The __del__ method really should be gone in Python 3.x, though I'm sure someone will find a use case that makes sense. If you need to use __del__, be aware of the basic limitations per http://docs.python.org/reference/datamodel.html:
__del__ is called when the garbage collector happens to be collecting the objects, not when you lose the last reference to an object and not when you execute del object.
__del__ is responsible for calling any __del__ in a superclass, though it is not clear if this is in method resolution order (MRO) or just calling each superclass.
Having a __del__ means that the garbage collector gives up on detecting and cleaning any cyclic links, such as losing the last reference to a linked list. You can get a list of the objects ignored from gc.garbage. You can sometimes use weak references to avoid the cycle altogether. This gets debated now and then: see http://mail.python.org/pipermail/python-ideas/2009-October/006194.html.
The __del__ function can cheat, saving a reference to an object, and stopping the garbage collection.
Exceptions explicitly raised in __del__ are ignored.
__del__ complements __new__ far more than __init__. This gets confusing. See http://www.algorithm.co.il/blogs/programming/python-gotchas-1-del-is-not-the-opposite-of-init/ for an explanation and gotchas.
__del__ is not a "well-loved" child in Python. You will notice that sys.exit() documentation does not specify if garbage is collected before exiting, and there are lots of odd issues. Calling the __del__ on globals causes odd ordering issues, e.g., http://bugs.python.org/issue5099. Should __del__ called even if the __init__ fails? See http://mail.python.org/pipermail/python-dev/2000-March/thread.html#2423 for a long thread.
But, on the other hand:
__del__ means you do not forget to call a close statement. See http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python/ for a pro __del__ viewpoint. This is usually about freeing ctypes or some other special resource.
And my pesonal reason for not liking the __del__ function.
Everytime someone brings up __del__ it devolves into thirty messages of confusion.
It breaks these items in the Zen of Python:
Simple is better than complicated.
Special cases aren't special enough to break the rules.
Errors should never pass silently.
In the face of ambiguity, refuse the temptation to guess.
There should be one – and preferably only one – obvious way to do it.
If the implementation is hard to explain, it's a bad idea.
So, find a reason not to use __del__.
The __del__ method, it will be called when the object is garbage collected. Note that it isn't necessarily guaranteed to be called though. The following code by itself won't necessarily do it:
del obj
The reason being that del just decrements the reference count by one. If something else has a reference to the object, __del__ won't get called.
There are a few caveats to using __del__ though. Generally, they usually just aren't very useful. It sounds to me more like you want to use a close method or maybe a with statement.
See the python documentation on __del__ methods.
One other thing to note: __del__ methods can inhibit garbage collection if overused. In particular, a circular reference that has more than one object with a __del__ method won't get garbage collected. This is because the garbage collector doesn't know which one to call first. See the documentation on the gc module for more info.
The __del__ method (note spelling!) is called when your object is finally destroyed. Technically speaking (in cPython) that is when there are no more references to your object, ie when it goes out of scope.
If you want to delete your object and thus call the __del__ method use
del obj1
which will delete the object (provided there weren't any other references to it).
I suggest you write a small class like this
class T:
def __del__(self):
print "deleted"
And investigate in the python interpreter, eg
>>> a = T()
>>> del a
deleted
>>> a = T()
>>> b = a
>>> del b
>>> del a
deleted
>>> def fn():
... a = T()
... print "exiting fn"
...
>>> fn()
exiting fn
deleted
>>>
Note that jython and ironpython have different rules as to exactly when the object is deleted and __del__ is called. It isn't considered good practice to use __del__ though because of this and the fact that the object and its environment may be in an unknown state when it is called. It isn't absolutely guaranteed __del__ will be called either - the interpreter can exit in various ways without deleteting all objects.
As mentioned earlier, the __del__ functionality is somewhat unreliable. In cases where it might seem useful, consider using the __enter__ and __exit__ methods instead. This will give a behaviour similar to the with open() as f: pass syntax used for accessing files. __enter__ is automatically called when entering the scope of with, while __exit__ is automatically called when exiting it. See this question for more details.
I am writing a python class like this:
class MyImageProcessor:
def __init__ (self, image, metadata):
self.image=image
self.metadata=metadata
Both image and metadata are objects of a class written by a
colleague. Now I need to make sure there is no waste of memory. I am thinking of defining a quit() method like this,
def quit():
self.image=None
self.metadata=None
import gc
gc.collect()
and suggest users to call quit() systematically. I would like to know whether this is the right way. In particular, do the instructions in quit() above guarantee that unused memories being well collected?
Alternatively, I could rename quit() to the build-in __exit__(), and suggest users to use the "with" syntax. But my question is
more about whether the instructions in quit() indeed fulfill the garbage collection work one would need in this situation.
Thank you for your help.
In python every object has a built-in reference_count, the variables(names) you create are only pointers to the objects. There are mutable and unmutable variables (for example if you change the value of an integer, the name will be pointed to another integer object, while changing a list element will not cause changing of the list name).
Reference count basically counts how many variable uses that data, and it is incremented/decremented automatically.
The garbage collector will destroy the objects with zero references (actually not always, it takes extra steps to save time). You should check out this article.
Similarly to object constructors (__init__()), which are called on object creation, you can define destructors (__del__()), which are executed on object deletion (usually when the reference count drops to 0). According to this article, in python they are not needed as much needed in C++ because Python has a garbage collector that handles memory management automatically. You can check out those examples too.
Hope it helps :)
No need for quit() (Assuming you're using C-based python).
Python uses two methods of garbage collection, as alluded to in the other answers.
First, there's reference counting. Essentially each time you add a reference to an object it gets incremented & each time you remove the reference (e.g., it goes out of scope) it gets decremented.
From https://devguide.python.org/garbage_collector/:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference counts are decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can get information about current reference counts for an object using sys.getrefcount(x), but really, why bother.
The second way is through garbage collection (gc). [Reference counting is a type of garbage collection, but python specifically calls this second method "garbage collection" -- so we'll also use this terminology. ] This is intended to find those places where reference count is not zero, but the object is no longer accessible. ("Reference cycles") For example:
class MyObj:
pass
x = MyObj()
x.self = x
Here, x refers to itself, so the actual reference count for x is more than 1. You can call del x but that merely removes it from your scope: it lives on because "someone" still has a reference to it.
gc, and specifically gc.collect() goes through objects looking for cycles like this and, when it finds an unreachable cycle (such as your x post deletion), it will deallocate the whole lot.
Back to your question: You don't need to have a quit() object because as soon as your MyImageProcessor object goes out of scope, it will decrement reference counters for image and metadata. If that puts them to zero, they're deallocated. If that doesn't, well, someone else is using them.
Your setting them to None first, merely decrements the reference count right then, but when MyImageProcessor goes out of scope, it won't decrement those reference count again, because MyImageProcessor no longer holds the image or metadata objects! So you're just explicitly doing what python does for you already for free: no more, no less.
You didn't create a cycle, so your calling gc.collect() is unlikely to change anything.
Check out https://devguide.python.org/garbage_collector/ if you are interested in more earthy details.
Not sure if it make sense but to my logic you could
Use :
gc.get_count()
before and after
gc.collect()
to see if something has been removed.
what are count0, count1 and count2 values returned by the Python gc.get_count()
Taking the following code for example, does return object from function lead to memory leak?
I'm very curious about what happens to the object handle after used by the function use_age.
class Demo(object):
def _get_mysql_handle(self):
handle = MySQLdb.connect(host=self.conf["host"],
port=self.conf["port"],
user=self.conf["user"],
passwd=self.conf["passwd"],
db=self.conf["db"])
return handle
def use_age(self):
cursor = self._get_mysql_handle().cursor()
if __name__ == "__main__":
demo = Demo()
demo.use_age()
No, that code won't lead to a memory leak.
CPython handles object lifetimes by reference counting. In your example the reference count drops back to 0 and the database connection object is deleted again.
The local name handle in _get_mysql_handle is one reference, it is dropped when _get_mysql_handle returns.
The stack holding the return value from self._get_mysql_handle() is another, it too is dropped when the expression result is completed.
.cursor() is a method, so it'll have another reference for the self argument to that method, until the method exits.
The return value from .cursor() probably stores a reference, it'll be dropped when the cursor itself is reaped. That then depends on the lifetime of the local cursor variable in the use_age() method. As a local it doesn't live beyond the use_age() function.
Other Python implementations use garbage collection strategies; Jython uses the Java runtime facilities, for example. The object may live a little longer, but won't 'leak'.
In Python versions < 3.4, you do need to watch out for creating circular references with custom classes that define a __del__ method. Those are the circular references that the gc module does not break. You can introspect such chains in the gc.garbage object.
I want a Roach class to "die" when it reaches a certain amount of "hunger", but I don't know how to delete the instance. I may be making a mistake with my terminology, but what I mean to say is that I have a ton of "roaches" on the window and I want specific ones to disappear entirely.
I would show you the code, but it's quite long. I have the Roach class being appended into a Mastermind classes roach population list.
In general:
Each binding variable -> object increases internal object's reference counter
there are several usual ways to decrease reference (dereference object -> variable binding):
exiting block of code where variable was declared (used for the first time)
destructing object will release references of all attributes/method variable -> object references
calling del variable will also delete reference in the current context
after all references to one object are removed (counter==0) it becomes good candidate for garbage collection, but it is not guaranteed that it will be processed (reference here):
CPython currently uses a reference-counting scheme with (optional)
delayed detection of cyclically linked garbage, which collects most
objects as soon as they become unreachable, but is not guaranteed to
collect garbage containing circular references. See the documentation
of the gc module for information on controlling the collection of
cyclic garbage. Other implementations act differently and CPython may
change. Do not depend on immediate finalization of objects when they
become unreachable (ex: always close files).
how many references on the object exists, use sys.getrefcount
module for configure/check garbage collection is gc
GC will call object.__ del__ method when destroying object (additional reference here)
some immutable objects like strings are handled in a special way - e.g. if two vars contain same string, it is possible that they reference the same object, but some not - check identifying objects, why does the returned value from id(...) change?
id of object can be found out with builtin function id
module memory_profiler looks interesting - A module for monitoring memory usage of a python program
there is lot of useful resources for the topic, one example: Find all references to an object in python
You cannot force a Python object to be deleted; it will be deleted when nothing references it (or when it's in a cycle only referred to be the items in the cycle). You will have to tell your "Mastermind" to erase its reference.
del somemastermind.roaches[n]
for i,roach in enumerate(roachpopulation_list)
if roach.hunger == 100
del roachpopulation_list[i]
break
Remove the instance by deleting it from your population list (containing all the roach instances.
If your Roaches are Sprites created in Pygame, then a simple command of .kill would remove the instance.
When a generator is not used any more, it should be garbage collected, right? I tried the following code but I am not sure which part I was wrong.
import weakref
import gc
def countdown(n):
while n:
yield n
n-=1
cd = countdown(10)
cdw = weakref.ref(cd)()
print cd.next()
gc.collect()
print cd.next()
gc.collect()
print cdw.next()
On the second last line, I called garbage collector and since there is no call to cd any more. gc should free cd right. But when I call cdw.next(), it is still printing 8. I tried a few more cdw.next(), it could successfully print all the rest until StopIteration.
I tried this because I wanted to understand how generator and coroutine work. On slide 28 of David Beazley's PyCon presentation "A Curious Course on Coroutines and Concurrency", he said that a coroutine might run indefinitely and we should use .close() to shut it down. Then he said that garbage collector will call .close(). In my understanding, once we called .close() ourselves, gc will call .close() again. Will gc receive a warning that it can't call .close() on an already closed coroutine?
Thanks for any inputs.
Due to the dynamic nature of python, the reference to cd isn't freed until you reach the end of the current routine because (at least) the Cpython implementation of python doesn't "read ahead". (If you don't know what python implementation you're using, it's almost certainly "Cpython"). There are a number of subtleties that would make that virtually impossible for the interpreter to determine whether an object should be free if it still exists in the current namespace in the general case (e.g. you can still reach it by a call to locals()).
In some less general cases, other python implementations may be able to free an object before the end of the current stack frame, but Cpython doesn't bother.
Try this code instead which demonstrates that the generator is free to be cleaned up in Cpython:
import weakref
def countdown(n):
while n:
yield n
n-=1
def func():
a = countdown(10)
b = weakref.ref(a)
print next(a)
print next(a)
return b
c = func()
print c()
Objects (including generators) are garbage collected when their reference count reaches 0 (in Cpython -- Other implementations may work differently). In Cpython, reference counts are only decremented when you see a del statement, or when an object goes out of scope because the current namespace changes.
The important thing is that once there are no more references to an object, it is free to be cleaned up by the garbage collector. The details of how the implementation determines that there are no more references are left to the implementers of the particular python distribution you're using.
In your example, the generator won't get garbage collected until the end of the script. Python doesn't know if you're going to be using cd again, so it can't throw it away. To put it precisely, there's still a reference to your generator in the global namespace.
A generator will get GCed when its reference count drops to zero, just like any other object. Even if the generator is not exhausted.
This can happen under lots of normal circumstances - if it's in a local name that falls out of scope, if it's deled, if its owner gets GCed. But if any live objects (including namespaces) hold strong references to it, it won't get GCed.
The Python garbage collector isn't quite that smart. Even though you don't refer to cd any more after that line, the reference is still live in local variables, so it can't be collected. (In fact, it's possible that some code you're using might dig around in your local variables and resurrect it. Unlikely, but possible. So Python can't make any assumptions.)
If you want to make the garbage collector actually do something here, try adding:
del cd
This will remove the local variable, allowing the object to be collected.
The other answers have explained that gc.collect() won't garbage collect anything that still has references to it. There is still a live reference cd to the generator, so it will not be gc'ed until cd is deleted.
However in addition, the OP is creating a SECOND strong reference to the object using this line, which calls the weak reference object:
cdw = weakref.ref(cd)()
So if one were to do del cd and call gc.collect(), the generator would still not be gc'ed because cdw is also a reference.
To obtain an actual weak reference, do not call the weakref.ref object. Simply do this:
cdw = weakref.ref(cd)
Now when cd is deleted and garbage collected, the reference count will be zero and calling the weak reference will result in None, as expected.