When does a module scope variable reference get released by the interpreter? - python

I'm trying to implement a clean-up routine in a utility module I have. In looking around for solutions to my problem, I finally settled on using a weakref callback to do my cleanup. However, I'm concerned that it won't work as expected because of a strong reference to the object from within the same module. To illustrate:
foo_lib.py
class Foo(object):
_refs = {}
def __init__(self, x):
self.x = x
self._weak_self = weakref.ref(self, Foo._clean)
Foo._refs[self._weak_self] = x
#classmethod
def _clean(cls, ref):
print 'cleaned %s' % cls._refs[ref]
foo = Foo()
Other classes then reference foo_lib.foo. I did find an old document from 1.5.1 that sort of references my concerns (http://www.python.org/doc/essays/cleanup/) but nothing that makes me fully comfortable that foo will be released in such a way that the callback will be triggered reliably. Can anyone point me towards some docs that would clear this question up for me?

The right thing to do here is to explicitly release your strong reference at some point, rather than counting on shutdown to do it.
In particular, if the module is released, its globals will be released… but it doesn't seem to be documented anywhere that the module will get released. So, there may still be a reference to your object at shutdown. And, as Martijn Pieters pointed out:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
However, if you can ensure that there are no (non-weak) references to your object some time before the interpreter exits, you can guarantee that your cleanup runs.
You can use atexit handlers to explicitly clean up after yourself, but you can just do it explicitly before falling off the end of your main module (or calling sys.exit, or finishing your last non-daemon thread, or whatever). The simplest thing to do is often to take your entire main function and wrap it in a with or try/finally.
Or, even more simply, don't try to put cleanup code into __del__ methods or weakref callbacks; just put the cleanup code itself into your with or finally or atexit.
In a comment on another answer:
what I'm actually trying to do is close out a subprocess that is normally kept open by a timer, but needs to be nuked when the program exits. Is the only really "reliable" way to do this to start a daemonic subprocess to monitor and kill the other process separately?
The usual way to do this kind of thing is to replace the timer with something signalable from outside. Without knowing your app architecture and what kind of timer you're using (e.g., a single-threaded async server where the reactor kicks the timer vs. a single-threaded async GUI app where an OS timer message kicks the timer vs. a multi-threaded app where the timer is just a thread that sleeps between intervals vs. …), it's hard to explain more specifically.
Meanwhile, you may also want to look at whether there's a simpler way to handle your subprocesses. For example, maybe using an explicit process group, and killing your process group instead of your process (which will kill all of the children, on both Windows and Unix… although the details are very different)? Or maybe give the subprocess a pipe and have it quit when the other end of the pipe goes down?
Note that the documentation also gives you no guarantees about the order in which left-over references are deleted, if they are. In fact, if you're using CPython, Py_Finalize specifically says that it's "done in random order".
The source is interesting. It's obviously not explicitly randomized, and it's not even entirely arbitrary. First it does GC collect until nothing is left, then it finalizes the GC itself, then it does a PyImport_Cleanup (which is basically just sys.modules.clear()), then there's another collect commented out (with some discussion as to why), and finally a _PyImport_Fini (which is defined only as "For internal use only").
But this means that, assuming your module really is holding the only (non-weak) reference(s) to your object, and there are no unbreakable cycles involving the module itself, your module will get cleaned up at shutdown, which will drop the last reference to your object, causing it to get cleaned up as well. (Of course you cannot count on anything other than builtins, extension modules, and things you have a direct reference to still existing at this point… but your code above should be fine, because foo can't be cleaned up before Foo, and it doesn't rely on any other non-builtins.)
Keep in mind that this is CPython-specific—and in fact CPython 3.3-specific; you will want to read the relevant equivalent source for your version to be sure. Again, the documentation explicitly says things get deleted "in random order", so that's what you have to expect if you don't want to rely on implementation-specific behavior.
Of course your cleanup code still isn't guaranteed to be called. For example, an unhandled signal (on Unix) or structured exception (on Windows) will kill the interpreter without giving it a chance to clean up anything. And even if you write handlers for that, someone could always pull the power cord. So, if you need a completely robust design, you need to be interruptable without cleanup at any point (by journaling, using atomic file operations, protocols with explicit acknowledgement, etc.).

Python modules are cleaned up when exiting, and any __del__ methods probably are called:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
Names starting with an underscore are cleared first:
Starting with version 1.5, Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the __del__() method is called.
Weak reference callbacks rely on the same mechanisms as __del__ methods do; the C deallocation functions (type->tp_dealloc).
The foo instance will retain a reference to the Foo._clean class method, but the global name Foo could be cleared already (it is assigned None in CPython); your method should be safe as it never refers to Foo once the callback has been registered.

Related

How can I check if a thread holds the GIL with sub-interpreters?

I am working on some changes to a library which embeds Python which require me to utilize sub-interpreters in order to support resetting the python state, while avoiding calling Py_Finalize (since calling Py_Initialize afterwards is a no-no).
I am only somewhat familiar with the library, but I am increasingly discovering places where PyGILState_Ensure and other PyGILState_* functions are being used to acquire the GIL in response to some external callback. Some of these callbacks originate from outside Python, so our thread certainly doesn't hold the GIL, but sometimes the callback originates from within Python, so we definitely hold the GIL.
After switching to sub-interpreters, I almost immediately saw a deadlock on a line calling PyGILState_Ensure, since it called PyEval_RestoreThread even though it was clearly already being executed from within Python (and so the GIL was held):
For what it's worth, I have verified that a line that calls PyEval_RestoreThread does get executed before this call to PyGILState_Ensure (it's well before the first call into Python in the above picture).
I am using Python 3.8.2. Clearly, the documentation wasn't lying when it says:
Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.
It is quite a lot of work to refactor the library so that it tracks internally if the GIL is held or not, and seems rather silly. There should be a way to determine if the GIL is held! However, the only function I can find is PyGILState_Check, but that's a member of the forbidden PyGILState API. I'm not sure it'll work. Is there a canonical way to do this with sub-interpreters?
I've been pondering this line in the documentation:
Also note that combining this functionality with PyGILState_* APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters.
I suspect that the issue was that there's something involving thread local storage on the PyGILState_* API.
I've come to think that it's actually not really possible to tell if the GIL is held by the application. There's no central static place where Python stores that the GIL is held, because it's either held by "you" (in your external code) or by the Python code. It's always held by someone. So the question of "is the GIL held" isn't the question the PyGILState API is asking. It's asking "does this thread hold the GIL", which makes it easier to have multiple non-Python threads interacting with the interpreter.
I overcame this issue by restoring the bijection as best I could by creating a separate thread per sub-interpreter, with the order of operations being very strictly as follows:
Grab the GIL in the main thread, either explicitly or with Py_Initialize (if this is the first time). Be very careful, the thread state from Py_Initialize must only ever be used in the main thread. Don't restore it to another thread: Some module might use the PyGILState_* API and the deadlock will happen again.
Create the thread. I just used std::thread.
Spawn the subinterpreter with Py_NewInterpreter. Be very careful, this will give you a new thread state. As with the main thread state, this thread state must only be used from this thread.
Release the GIL in the new thread when you're ready for Python to do its thing.
Now, there's some gotchas I discovered:
asyncio in Python 3.8-3.9 has a use-after-free bug where the first interpreter loading it manages some resources. So if that interpreter is ended (releasing those resources) and a new interpreter grabs asyncio, there will be a segfault. I overcame this by manually loading asyncio through the C API in the main interpreter, since that one lives forever.
Many libraries, including numpy, lxml, and several networking libraries will have trouble with multiple subinterpreters. I believe that Python itself is enforcing this: An ImportError results when importing any of these libraries with: Interpreter change detected - This module can only be loaded into one interpreter per process. This so far seems to be an insurmountable issue for me since I do require numpy in my application.

Is __del__ really a destructor?

I do things mostly in C++, where the destructor method is really meant for destruction of an acquired resource. Recently I started with python (which is really a fun and fantastic), and I came to learn it has GC like java.
Thus, there is no heavy emphasis on object ownership (construction and destruction).
As far as I've learned, the __init__() method makes more sense to me in python than it does for ruby too, but the __del__() method, do we really need to implement this built-in function in our class? Will my class lack something if I miss __del__()? The one scenario I could see __del__() useful is, if I want to log something when destroying an object. Is there anything other than this?
In the Python 3 docs the developers have now made clear that destructor is in fact not the appropriate name for the method __del__.
object.__del__(self)
Called when the instance is about to be destroyed. This is also called a finalizer or (improperly) a destructor.
Note that the OLD Python 3 docs used to suggest that 'destructor' was the proper name:
object.__del__(self)
Called when the instance is about to be destroyed. This is also called a destructor. If a base class has a __del__() method, the derived class’s __del__() method, if any, must explicitly call it to ensure proper deletion of the base class part of the instance.
From other answers but also from the Wikipedia:
In a language with an automatic garbage collection mechanism, it would be difficult to deterministically ensure the invocation of a destructor, and hence these languages are generally considered unsuitable for RAII [Resource Acquisition Is Initialization]
So you should almost never be implementing __del__, but it gives you the opportunity to do so in some (rare?) use cases
As the other answers have already pointed out, you probably shouldn't implement __del__ in Python. If you find yourself in the situation thinking you'd really need a destructor (for example if your class wraps a resource that needs to be explicitly closed) then the Pythonic way to go is using context managers.
Is del really a destructor?
No, __del__ method is not a destructor, is just a normal method you can call whenever you want to perform any operation, but it is always called before the garbage collector destroys the object.
Think of it like a clean or last will method.
So uncommon it is that I have learned about it today (and I'm long ago into python).
Memory is deallocated, files closed, ... by the GC. But you could need to perform some task with effects outside of the class.
My use case is about implementing some sort of RAII regarding some temporal directories. I'd like it to be removed no matter what.
Instead of removing it after the processing (which, after some change, was no longer run) I've moved it to the __del__ method, and it works as expected.
This is a very specific case, where we don't really care about when the method is called, as long as it's called before leaving the program. So, use with care.

Is relying on __del__() for cleanup in Python unreliable?

I was reading about different ways to clean up objects in Python, and I have stumbled upon these questions (1, 2) which basically say that cleaning up using __del__() is unreliable and the following code should be avoid:
def __init__(self):
rc.open()
def __del__(self):
rc.close()
The problem is, I'm using exactly this code, and I can't reproduce any of the issues cited in the questions above. As far as my knowledge goes, I can't go for the alternative with with statement, since I provide a Python module for a closed-source software (testIDEA, anyone?) This software will create instances of particular classes and dispose of them, these instances have to be ready to provide services in between. The only alternative to __del__() that I see is to manually call open() and close() as needed, which I assume will be quite bug-prone.
I understand that when I'll close the interpreter, there's no guarantee that my objects will be destroyed correctly (and it doesn't bother me much, heck, even Python authors decided it was OK). Apart from that, am I playing with fire by using __del__() for cleanup?
You observe the typical issue with finalizers in garbage collected languages. Java has it, C# has it, and they all provide a scope based cleanup method like the Python with keyword to deal with it.
The main issue is, that the garbage collector is responsible for cleaning up and destroying objects. In C++ an object gets destroyed when it goes out of scope, so you can use RAII and have well defined semantics. In Python the object goes out of scope and lives on as long as the GC likes. Depending on your Python implementation this may be different. CPython with its refcounting based GC is rather benign (so you rarely see issues), while PyPy, IronPython and Jython might keep an object alive for a very long time.
For example:
def bad_code(filename):
return open(filename, 'r').read()
for i in xrange(10000):
bad_code('some_file.txt')
bad_code leaks a file handle. In CPython it doesn't matter. The refcount drops to zero and it is deleted right away. In PyPy or IronPython you might get IOErrors or similar issues, as you exhaust all available file descriptors (up to ulimit on Unix or 509 handles on Windows).
Scope based cleanup with a context manager and with is preferable if you need to guarantee cleanup. You know exactly when your objects will be finalized. But sometimes you cannot enforce this kind of scoped cleanup easily. Thats when you might use __del__, atexit or similar constructs to do a best effort at cleaning up. It is not reliable but better than nothing.
You can either burden your users with explicit cleanup or enforcing explicit scopes or you can take the gamble with __del__ and see some oddities now and then (especially interpreter shutdown).
There are a few problems with using __del__ to run code.
For one, it only works if you're actively keeping track of references, and even then, there's no guarantee that it will be run immediately unless you're manually kicking off garbage collections throughout your code. I don't know about you, but automatic garbage collection has pretty much spoiled me in terms of accurately keeping track of references. And even if you are super diligent in your code, you're also relying on other users that use your code being just as diligent when it comes to reference counts.
Two, there are lots of instances where __del__ is never going to run. Was there an exception while objects were being initialized and created? Did the interpreter exit? Is there a circular reference somewhere? Yep, lots that could go wrong here and very few ways to cleanly and consistently deal with it.
Three, even if it does run, it won't raise exceptions, so you can't handle exceptions from them like you can with other code. It's also nearly impossible to guarantee that the __del__ methods from various objects will run in any particular order. So the most common use case for destructors - cleaning up and deleting a bunch of objects - is kind of pointless and unlikely to go as planned.
If you actually want code to run, there are much better mechanisms -- context managers, signals/slots, events, etc.
If you're using CPython, then __del__ fires perfectly reliably and predictably as soon as an object's reference count hits zero. The docs at https://docs.python.org/3/c-api/intro.html state:
When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference count is decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on.
You can easily test and see this immediate cleanup happening yourself:
>>> class Foo:
... def __del__(self):
... print('Bye bye!')
...
>>> x = Foo()
>>> x = None
Bye bye!
>>> for i in range(5):
... print(Foo())
...
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
<__main__.Foo object at 0x7f037e6a0550>
Bye bye!
>>>
(Though if you want to test stuff involving __del__ at the REPL, be aware that the last evaluated expression's result gets stored as _, which counts as a reference.)
In other words, if your code is strictly going to be run in CPython, relying on __del__ is safe.

How can I protect a logging object from the garbage collector in a multiprocessing process?

I create a couple of worker processes using Python's Multiprocessing module 2.6.
In each worker I use the standard logging module (with log rotation and file per worker)
to keep an eye on the worker. I've noticed that after a couple of hours that no more
events are written to the log. The process doesn't appear to crash and still responds
to commands via my queue. Using lsof I can see that the log file is no longer open.
I suspect the log object may be killed by the garbage collector, if so is there a way
that I can mark it to protect it?
I agree with #THC4k. This doesn't seem like a GC issue. I'll give you my reasons why, and I'm sure somebody will vote me down if I'm wrong (if so, please leave a comment pointing out my error!).
If you're using CPython, it primarily uses reference counting, and objects are destroyed immediately when the ref count goes to zero (since 2.0, supplemental garbage collection is also provided to handle the case of circular references). Keep a reference to your log object and it won't be destroyed.
If you're using Jython or IronPython, the underlying VM does the garbage collection. Again, keep a reference and the GC shouldn't touch it.
Either way, it seems that either you're not keeping a reference to an object you need to keep alive, or you have some other error.
http://docs.python.org/reference/datamodel.html#object.__del__
According to this documentation the del() method is called on object destruction and you can at this point create a reference to the object to prevent it from being collected. I am not sure how to do this, hopefully this gives you some food for thought.
You could run gc.collect() immediately after fork() to see if that causes the log to be closed. But it's not likely garbage collection would take effect only after a few hours.

Is it really OK to do object closeing/disposing in __del__?

I have been thinking about how I write classes in Python. More specifically how the constructor is implemented and how the object should be destroyed. I don't want to rely on CPython's reference counting to do object cleanup. This basically tells me I should use with statements to manage my object life times and that I need an explicit close/dispose method (this method could be called from __exit__ if the object is also a context manager).
class Foo(object):
def __init__(self):
pass
def close(self):
pass
Now, if all my objects behave in this way and all my code uses with statements or explicit calls to close() (or dispose()) I don't realy see the need for me to put any code in __del__. Should we really use __del__ to dispose of our objects?
Short answer : No.
Long answer: Using __del__ is tricky, mainly because it's not guaranteed to be called. That means you can't do things there that absolutely has to be done. This in turn means that __del__ basically only can be used for cleanups that would happen sooner or later anyway, like cleaning up resources that would be cleaned up when the process exits, so it doesn't matter if __del__ doesn't get called. Of course, these are also generally the same things Python will do for you. So that kinda makes __del__ useless.
Also, __del__ gets called when Python garbage collects, and you didn't want to wait for Pythons garbage collecting, which means you can't use __del__ anyway.
So, don't use __del__. Use __enter__/__exit__ instead.
FYI: Here is an example of a non-circular situation where the destructor did not get called:
class A(object):
def __init__(self):
print('Constructing A')
def __del__(self):
print('Destructing A')
class B(object):
a = A()
OK, so it's a class attribute. Evidently that's a special case. But it just goes to show that making sure __del__ gets called isn't straightforward. I'm pretty sure I've seen more non-circular situations where __del__ isn't called.
Not necessarily. You'll encounter problems when you have cyclic references. Eli Bendersky does a good job of explaining this in his blog post:
Safely using destructors in Python
If you are sure you will not go into cyclic references, then using __del__ in that way is OK: as soon as the reference count goes to zero, the CPython VM will call that method and destroy the object.
If you plan to use cyclic references - please think it very thoroughly, and check if weak references may help; in many cases, cyclic references are a first symptom of bad design.
If you have no control on the way your object is going to be used, then using __del__ may not be safe.
If you plan to use JPython or IronPython, __del__ is unreliable at all, because final object destruction will happen at garbage collection, and that's something you cannot control.
In sum, in my opinion, __del__ is usually perfectly safe and good; however, in many situation it could be better to make a step back, and try to look at the problem from a different perspective; a good use of try/except and of with contexts may be a more pythonic solution.

Categories

Resources