I have an application that relies on SIGINT for a graceful shutdown. I noticed that every once in awhile it just keeps running. The cause turned out to be a generator in xml/etree/ElementTree.py.
If SIGINT arrives while that generator is being cleaned up, all exceptions are ignored (recall that default action for SIGINT is to raise a KeyboardInterrupt). That's not unique to this particular generator, or to generators in general.
From the python docs:
"Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead"
In over five years of programming in python, this is the first time I run into this issue.
If garbage collection can occur at any point, then SIGINT can also theoretically be ignored at any point, and I can't ever rely on it. Is that correct? Have I just been lucky this whole time?
Or is it something about this particular package and this particular generator?
Related
I'm trying to put a timeout on a function that runs in a thread in python. My first attempt was using a signal, as suggested in any number of answers here, but since the function I want a timeout on will sometimes be run in a thread (as part of a larger function that is allowed to take some time), this turned out to not be an option.
So I then tried this answer, which involves using a Thread from the multithreading class. That worked fine in initial testing, but the comments on that answer indicate that if an exception is raised, that answer will hide the exception, and also that it can be unsafe due to GIL interactions. Additionally, I note that there is no way to kill a thread, which would seem to indicate that there is the potential of ending up with numerous hung threads, should the process never complete. Or even if it does eventually complete, it's still using resources that are now useless, since the result will never be seen.
Digging around, it would appear a multiprocessing library based solution would be preferable, using the same basic structure as in the above referenced answer and a Queue for communication back to the originating thread. No GIL issues to worry about (since it is a separate process), and I can kill off the process should the timeout expire. So I tried that, only to run into another problem: to put something on a Queue, it has to be pickle-able - and the item being returned from the function is not.
So, given the situation:
running in a thread
return value is not pickle-able
want to avoid the issues presented with the threading answer
is there a solution that solves all the above issues? Or do I just need to go with the above referenced threading answer, and live with the limitations?
Default handler for SIGINT raises KeyboardInterrupt. However, if a program is inside a __del__ method (because of an ongoing garbage collection), the exception is ignored with the following message printed to stderr:
Exception KeyboardInterrupt in <...> ignored
As a result, the program continues to work despite receiving SIGINT. Of course, I can define my own handler for SIGINT that sets a global variable sigint_received to True, and then often check the value of the variable in my program. But this looks ugly.
Is there an elegant and reliable way to make sure that the python program gets interrupted after receiving SIGINT?
Before I dive into my solution, I want to highlight the scary red "Warning:" sidebar in the docs for object.__del__ (emphasis mine):
Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead. [...] __del__() methods should do the absolute minimum needed to maintain external invariants.
This suggests to me that any __del__ method that's at serious risk of being interrupted by an interactive user's Ctrl-C might be doing too much. So my first suggestion would be to look for ways to minimize your __del__ method, whatever it is.
Or to put it another way: If your __del__ method really does do "the absolute minimum needed", then how can it be safe to kill the process half-way through?
Custom Signal Handler
The only solution I could find was indeed a custom signal handler for signal.SIGINT... but a lot of the obvious tricks didn't work:
Failed: sys.exit
Calling sys.exit from the signal handler just raised a SystemExit exception, which was ignored. Python's C API docs suggest that it is impossible for the Python interpreter to raise any exception during a __del__ method:
voidPyErr_WriteUnraisable(PyObject *obj)
[Called when...] it is impossible for the interpreter to actually raise the exception [...] for example, when an exception occurs in an __del__() method.
Partial Success: Flag Variable
Your idea of setting a global "drop dead" variable inside the signal handler worked only partially --- although it updated the variable, nothing got a chance to read that variable until after the __del__ method returned. So for several seconds, the Ctrl-C appeared to have done nothing.
This might be good enough if you just want to terminate the process "eventually", since it will exit whenever the __del__ method returns. But since you probably want to shut down the process without waiting (both SIGINT and KeyboardInterrupt typically come from an impatient user), this won't do.
Success: os.kill
Since I couldn't find a way to convince the Python interpreter to kill itself, my solution was to have the (much more persuasive) operating system do it for me. This signal handler uses os.kill to send a stronger SIGTERM to its own process ID, causing the Python interpreter itself to exit.
def _sigterm_this_process(signum, frame):
pid = os.getpid()
os.kill(pid, signal.SIGTERM)
return
# Elsewhere...
signal.signal(signal.SIGINT, _sigterm_this_process)
Once the custom signal handler was set, Ctrl-C caused the __del__ method (and the entire program) to exit immediately.
Obviously the GIL prevents switching contexts between threads to protect reference counting, but is signal handling completely safe in CPython?
Signals in Python are caught by a very simple signal handler which, in effect, simply schedules the actual signal handler function to be called on the main thread. The C signal handler doesn't touch any Python objects, so it doesn't risk corrupting any state, while the Python signal handler is executed in-between bytecode op evaluations, so it too won't corrupt CPython's internal state.
A signal could be delivered and handled in the middle of a reference counting operation. In case you wonder why CPython doesn't use atomic CPU instructions for reference counting: They are too slow. Atomic operations use memory barrier to sync CPU caches (L1, L2, shared L3) and CPUs (ccNUMA). As you can imagine it prevents lots of optimizations. Modern CPU are insanely fast, so fast that they spend a lot of time doing nothing but waiting for data. Reference increment and decrement are very common operations in CPython. Memory barriers prevent out-of-order execution which is a very important optimization trick.
The reference counting code is carefully written and takes multi-threading and signals into account. Signal handlers cannot access a partly created or destroyed Python object just like threads can't, too. Macros like Py_CLEAR take care of edge cases. I/O functions take care of EINTR, too. 3.3 has an improved subprocess module that uses only async-signal-safe function between fork() and execvpe().
You don't have to worry. We have some clever people that know their POSIX fu quite well.
Sometimes it happens that an ongoing ipython evaluation won't respond to one, or even several, Ctrl-C's from the keyboard1.
Is there some other way to goose the ipython process to abort the current evaluation, and come back to its "read" state?
Maybe with kill -SOMESECRETSIGNAL <pid>? I've tried a few (SIGINT, SIGTERM, SIGUSR1, ...) to no avail: either they have no effect (e.g. SIGINT), or they kill the ipython process. Or maybe some arcane ipython configuration? Some sentinel file? ... ?
1"Promptly enough", that is. Of course, it is impossible to specify precisely how promptly is "promptly enough"; it depends on the situation, the reliability of the delay's duration, the temperament of the user, the day's pickings at Hacker News, etc.
It depends on where execution is occurring when you decide to interrupt (in a python function, in a lower level library,...). If this commonly occurs within a function you have created, you can try putting a try/except block in the function and catching KeyboardInterrupt exceptions. It may not break out of a low level library (if that is indeed where you are running) but it should prevent the ipython interpreter from exiting.
I've heard there are problems when calling os.waitpid from within a thread. I have not experienced such problems yet (especially using os.WNOHANG option). However I have not paid much attention to the performance implications of such use.
Are there any performance penalties or any other issues one should be aware of?
Does this have to do with os.waitpid (potentially) using signals?
I don't see how signals could be related, though, since otherwise (I suppose) I wouldn't be able to get os.waitpid to return when calling it from a non-main thread.
By default, a child process dies, the parent is sent a SIGCHLD signal. Concern for calling os.waitpid() probably comes from this.
If you look in the Python "signal" module documentation the warning is pretty clear:
Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead.
http://docs.python.org/library/signal.html
BUT... if you leave the SIGCHLD signal alone, then you should be happily able to call os.waitpid() (or any other os.wait() variant) from a thread.
The main drawback then is that you'll need to use os.waitpid() with WNOHANG and poll periodically, if you want any way to cancel the operation. If you don't ever need to cancel the os.waitpid(), then you can just invoke it in blocking mode.
My guess: people are just referring to calling waitpid() without WNOHANG, which of course obviates the reason you use multiple threads in the first place. (That is, of course, unless you are just using it to reap the zombies).