Unlabeled exception in threading - python

I have a chunk of code like this
def f(x):
try:
g(x)
except Exception, e:
print "Exception %s: %d" % (x, e)
def h(x):
thread.start_new_thread(f, (x,))
Once in a while, I get this:
Unhandled exception in thread started by
Error in sys.excepthook:
Original exception was:
Unlike the code sample, that's the complete text. I assume after the "by" there's supposed to be a thread ID and after the colon there are supposed to be stack traces, but nope, nothing. I don't know how to even start to debug this.

The error you're seeing means the interpreter was exiting (because the main thread exited) while another thread was still executing Python code. Python will clean up its environment, cleaning out and throwing away all of the loaded modules (to make sure as many finalizers as possible execute) but unfortunately that means the still-running thread will start raising exceptions when it tries to use something that was already destroyed. And then that exception propagates up to the start_new_thread function that started the thread, and it will try to report the exception -- only to find that what it tries to use to report the exception is also gone, which causes the confusing empty error messages.
In your specific example, this is all caused by your thread being started and your main thread exiting right away. Whether the newly started thread gets a chance to run before, during or after the interpreter exits (and thus whether you see it run as normal, run partially and report an error or never see it run) is entirely up to the OS thread scheduler.
If you're using threads (which is not a bad thing to avoid) you probably want to not have threads running while you're exiting the interpreter. The threading.Thread class is a better interface for starting new threads, and it will make the interpreter wait for all threads by default, on exit. If you really don't want to wait for a thread to end, you can set its 'daemonic' flag in the Thread object to get the old behaviour -- including the problem you see here.

Related

Python: throw exception in main thread on Windows shutdown (similar to Ctrl+C)

I have a Python script for automating simple tasks. Its main loop looks like this:
while True:
input = download_task_input()
if input:
output = process_task(input)
upload_task_output(output)
sleep(60)
Some local files are altered during task processing. They are modified when the task is started, and restored back to proper state when the task is done, or if exception is caught. Restoring these files on program exit is very important to me: leaving them in altered state causes some trouble later that I'd like to avoid.
When I want to terminate the script, I hit Ctrl+C. It raises KeyboardInterrupt exception which both stops task processing and triggers files restoration. However, if I hit Ctrl+Break, the program is simply terminated: if a task is being processed at this moment, then local files are left in altered state (which is undesirable).
The question: I'm worried about the situation when Windows OS is shutdown by pressing the Power button. Is it possible to make Python handle it exactly like it handles Ctrl+C? I.e. I'd like to detect OS shutdown in Python script and raise Python exception on the main thread.
I know it is possible to call SetConsoleCtrlHandler function from WinAPI and install own handler for situations like Ctrl+C, Ctrl+Break, Shutdown, etc. However, this handler seems to be executed in additional thread, and raising exception in it does not achieve anything. On the other hand, Python itself supposedly uses the same WinAPI feature to raise KeyboardInterrupt on the main thread on Ctrl+C, so it should be doable.
This is not a serious automation script, so I don't mind if a solution is hacky or not 100% reliable.

Python ThreadPoolExecutor Suppress Exceptions

from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def div_zero(x):
print('In div_zero')
return x / 0
with ThreadPoolExecutor(max_workers=4) as executor:
futures = executor.submit(div_zero, 1)
done, _ = wait([futures], return_when=ALL_COMPLETED)
# print(done.pop().result())
print('Done')
The program above will run to completion without any error message.
You can only get the exception if you explicitly call future.result() or future.exception(), like what I did in the line commented-out.
I wonder why this Python module chose this kind of behavior even if it hides problems. Because of this, I spent hours debugging
a programming error (referencing a non-exist attribute in a class) that would otherwise be very obvious if the program just crashes with exception, like Java for instance.
I suspect the reason is so that the entire pool does not crash because of one thread raising an exception. This way, the pool will process all the tasks and you can get the threads that raised exceptions separately if you need to.
Each thread is (mostly) isolated from the other threads, including the primary thread. The primary thread does not communicate with the other threads until you ask it to do so.
This includes errors. The result is what you are seeing, the errors occurring other threads do not interfere with the primary thread. You only need to handle them when you ask for the results.

How to avoid suppressing KeyboardInterrupt during garbage collection in python?

Default handler for SIGINT raises KeyboardInterrupt. However, if a program is inside a __del__ method (because of an ongoing garbage collection), the exception is ignored with the following message printed to stderr:
Exception KeyboardInterrupt in <...> ignored
As a result, the program continues to work despite receiving SIGINT. Of course, I can define my own handler for SIGINT that sets a global variable sigint_received to True, and then often check the value of the variable in my program. But this looks ugly.
Is there an elegant and reliable way to make sure that the python program gets interrupted after receiving SIGINT?
Before I dive into my solution, I want to highlight the scary red "Warning:" sidebar in the docs for object.__del__ (emphasis mine):
Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead. [...] __del__() methods should do the absolute minimum needed to maintain external invariants.
This suggests to me that any __del__ method that's at serious risk of being interrupted by an interactive user's Ctrl-C might be doing too much. So my first suggestion would be to look for ways to minimize your __del__ method, whatever it is.
Or to put it another way: If your __del__ method really does do "the absolute minimum needed", then how can it be safe to kill the process half-way through?
Custom Signal Handler
The only solution I could find was indeed a custom signal handler for signal.SIGINT... but a lot of the obvious tricks didn't work:
Failed: sys.exit
Calling sys.exit from the signal handler just raised a SystemExit exception, which was ignored. Python's C API docs suggest that it is impossible for the Python interpreter to raise any exception during a __del__ method:
voidPyErr_WriteUnraisable(PyObject *obj)
[Called when...] it is impossible for the interpreter to actually raise the exception [...] for example, when an exception occurs in an __del__() method.
Partial Success: Flag Variable
Your idea of setting a global "drop dead" variable inside the signal handler worked only partially --- although it updated the variable, nothing got a chance to read that variable until after the __del__ method returned. So for several seconds, the Ctrl-C appeared to have done nothing.
This might be good enough if you just want to terminate the process "eventually", since it will exit whenever the __del__ method returns. But since you probably want to shut down the process without waiting (both SIGINT and KeyboardInterrupt typically come from an impatient user), this won't do.
Success: os.kill
Since I couldn't find a way to convince the Python interpreter to kill itself, my solution was to have the (much more persuasive) operating system do it for me. This signal handler uses os.kill to send a stronger SIGTERM to its own process ID, causing the Python interpreter itself to exit.
def _sigterm_this_process(signum, frame):
pid = os.getpid()
os.kill(pid, signal.SIGTERM)
return
# Elsewhere...
signal.signal(signal.SIGINT, _sigterm_this_process)
Once the custom signal handler was set, Ctrl-C caused the __del__ method (and the entire program) to exit immediately.

Debugging techniques for shut-down problems in Python daemons

I am doing some gnarly stuff with Python threads including daemons.
I am getting an intermittent error on some tests:
Exception in thread myconsumerthread (most likely raised during interpreter shutdown):
Note that there are no stack trace/exception details provided.
Scrutinising my own code hasn't helped, but I am at a bit of a loss about the next step in debugging. What debugging techniques can I use to find out more about what exception might be bringing down the runtime during shutdown?
Fine print:
Windows, CPython, 2.7.2 - Not reproduceable on Ubuntu.
The problem occurs about 3% of the time - so reproducable, just not reliably.
The code in myconsumerthread has a catch-all exception handler, which tries to write the name of the exception to sys.stderr. (Could sys already be shut-down?)
I suspect the problem is related to shutting down daemon threads very quickly; before they have completely initialised. Something in this area, but I have little evidence - certainly insufficient to be pointing at a Python bug.
Ha, I have discovered a new symptom that marks a turning point in my descent into insanity!
If I import time in my test harness (not the live code), and never use it, the frequency drops to about 0.5%.
If I import turtle in my test harness (I swear on my life, there are no turtle graphics in my code; I chose this as the most irrelevant library I could quickly find) the exception starts to be caught in a different thread, and it occurs in about a third of the runs.
I have encountered the same error on a few occasions. I'm trying to locate / generate an example that displays the exact message.
Until then, if my memory serves me well, these were the areas that I focused on.
Looking for ports, files, queues, etc... removed or closed outside the daemon threads.
Scrutinize blocking calls in the daemon threads. IE a Queue.get(block=True), pyserial.read() - with timeout=None
After digging a little more I see the same types of errors popping up relating to Queue's see comments here.
I find it odd that it doesn't display the trace back. You might try to comment out the catch-all except and let Python send it to std.error. Hopefully then you'll be able to see what's dying on you.
Update
I knew I have seen this issue before... Below you'll find an example that generates that error (many of them actually). Note that there is no other trace back message either... For sake of completeness after you see the error messages, uncomment the queue.get lines and comment out the time.sleeps. The errors should go away. After re-running this again, the errors do not appear... This is inline with what you have been seeing in the sporadic failure rates... You may need to run it a few times to see the errors.
I normally use time.sleep(x) to throttle threads if blocking IO such as get() and read() do not provide a timeout method OR there is no blocking call to be used (user interface refreshes for example).
That being said, I believe there to be a problem with a thread being shutdown when waiting on a time.sleep() call. I believe that this call is what has gotten me every time, but I do not know what actually causes it inside the sleep method. For all I know there are other blocking calls that display this same behavior.
import time
import Queue
from threading import Thread
SLAVE_CNT = 50
OWNER_CNT = 10
MASTER_CNT = 2
class ThreadHungry(object):
def __init__(self):
self.rx_queue = Queue.Queue()
def start(self):
print "Adding Masters..."
for x in range(MASTER_CNT):
self.owners = []
print "Starting slave owners..."
for y in range(OWNER_CNT):
owner = Thread(target=self.__owner_action)
owner.daemon = True
owner.start()
self.owners.append(owner)
def __owner_action(self):
self.slaves = []
print "\tStarting slaves..."
for x in range(SLAVE_CNT):
slave = Thread(target=self.__slave_action)
slave.daemon = True
slave.start()
self.slaves.append(slave)
while(1):
time.sleep(1)
#self.rx_queue.get(block=True)
def __slave_action(self):
while(1):
time.sleep(1)
#self.rx_queue.get(block=True)
if __name__ == "__main__":
c = ThreadHungry()
c.start()
# Stop the threads abruptly after 5 seconds
time.sleep(5)

How to stop multiprocess python-program cleanly after exception?

I have a Python program that has several processes (for now only 2) and threads (2 per process). I would like to catch every exception and especially shut down my program cleanly on Ctrl+c but I can't get it to work. Everytime an Exception occurs the program stops but does not shut down correctly, leaving me with an unusable commandline.
What I have tried so far in pseudocode is:
try:
for process in processes:
process.join()
except:
pass #Just to suppress error-messages, will be removed later
finally:
for process in processes:
process.terminate()
But as I already said with no luck. Also note, that I get the Exception error message for both Processes, so they are both halted I believe?
Maybe I should also mention that most of the threads are blocked in listening on a pipe.
EDIT
So I nearly got it working. I needed to try: every thread and make sure the threads are joined correctly. There is just one flaw: Exception KeyboardInterrupt in <module 'threading' from '/usr/lib64/python2.7/threading.pyc'> ignored when shutting down. This is raised in the main-thread of the main-process. This thread is already finished, meaning it has passed the last line of code.
The problem (I expect) is that the exceptions are raised inside the processes not in the join calls.
I suggest you try wrapping each process's main method in try-except loop. Then have a flag (e.g. an instance of multiprocessing.Value) that the except statement sets to False. Each process could check the value of the flag and stop (cleaning up after itself) if it's set to False.
Note, if you just terminate a process it won't clean up after itself, as this is the same as sending a SIG_TERM to it.

Categories

Resources