Gracefull python joblib kill - python

Is it possible to gracefully kill a joblib process (threading backend), and still return the so far computed results ?
parallel = Parallel(n_jobs=4, backend="threading")
result = parallel(delayed(dummy_f)(x) for x in range(100))
For the moment I came up with two solutions
parallel._aborted = True which waits for the started jobs to finish (in my case it can be very long)
parallel._terminate_backend() which hangs if jobs are still in the pipe (parallel._jobs not empty)
Is there a way to workaround the lib to do this ?

As far as I know, Joblib does not provide methods to kill spawned threads.
As each child thread runs in its own context, it's actually difficult to perform graceful killing or termination.
That being said, there is a workaround that could be adopted.
Mimic .join() (of threading) functionality (kind of):
Create a shared memory shared_dict with keys corresponding each thread id, values if contain either thread output or Exception e.g.:
shared_dict = {i: None for i in range(num_workers)}
Whenever an error is raised in any thread, catch the exception through the handler and instead of raising it immediately, store it in the shared memory flag
Create an exception handler which waits for all(shared_dict.values())
After all values are filled with either result or error, exit the program by raising the error or logging or whatever.

Related

How do I ensure children of a subprocess don't get SIGINT in Python (on Linux)?

I'm trying to use a custom job pool (using multiprocessing.Process), and it works nicely, except SIGINT gets passed on from the parent process all the way to the children of the pool workers, Chrome instances in this case. I used signal.signal(signal.SIGINT, signal.SIG_IGN) in the pool workers, which seems to keep them from getting (or at least responding to) the SIGINT, but I'm using third-party code (Selenium) which creates subprocesses that get the SIGINT and stop because of it. I want only the parent process to receive and handle the SIGINT. How do I do this?
My inclination would be to give the pool workers /dev/null as STDIN, so it would give that to its children rather than the terminal, which, if I understand correctly, will keep it from receiving SIGINT from Ctrl+C. However, it seems that multiprocessing.Process already does that, though presumably too late (after the process is created; using a double-fork should get past this). Is there a good way to do that (or anything else that blocks SIGINT) through multiprocessing, or do I need a different solution to work around that? Maybe there's an easy way to use multiprocessing.Queue with Popen(), so I can use that instead?
To be clear, I still need to be able to log to the console (indirectly through the parent process is fine, maybe better even).

Python ThreadPoolExecutor Suppress Exceptions

from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def div_zero(x):
print('In div_zero')
return x / 0
with ThreadPoolExecutor(max_workers=4) as executor:
futures = executor.submit(div_zero, 1)
done, _ = wait([futures], return_when=ALL_COMPLETED)
# print(done.pop().result())
print('Done')
The program above will run to completion without any error message.
You can only get the exception if you explicitly call future.result() or future.exception(), like what I did in the line commented-out.
I wonder why this Python module chose this kind of behavior even if it hides problems. Because of this, I spent hours debugging
a programming error (referencing a non-exist attribute in a class) that would otherwise be very obvious if the program just crashes with exception, like Java for instance.
I suspect the reason is so that the entire pool does not crash because of one thread raising an exception. This way, the pool will process all the tasks and you can get the threads that raised exceptions separately if you need to.
Each thread is (mostly) isolated from the other threads, including the primary thread. The primary thread does not communicate with the other threads until you ask it to do so.
This includes errors. The result is what you are seeing, the errors occurring other threads do not interfere with the primary thread. You only need to handle them when you ask for the results.

AttributeError 'DupFd' in 'multiprocessing.resource_sharer' | Python multiprocessing + threading

I'm trying to communicate between multiple threading.Thread(s) doing I/O-bound tasks and multiple multiprocessing.Process(es) doing CPU-bound tasks. Whenever a thread finds work for a process, it will be put on a multiprocessing.Queue, together with the sending end of a multiprocessing.Pipe(duplex=False). The processes then do their part and send results back to the threads via the Pipe. This procedure seems to work in roughly 70% of the cases, the other 30% I receive an AttributeError: Can't get attribute 'DupFd' on <module 'multiprocessing.resource_sharer' from '/usr/lib/python3.5/multiprocessing/resource_sharer.py'>
To reproduce:
import multiprocessing
import threading
import time
def thread_work(work_queue, pipe):
while True:
work_queue.put((threading.current_thread().name, pipe[1]))
received = pipe[0].recv()
print("{}: {}".format(threading.current_thread().name, threading.current_thread().name == received))
time.sleep(0.3)
def process_work(work_queue):
while True:
thread, pipe = work_queue.get()
pipe.send(thread)
work_queue = multiprocessing.Queue()
for i in range(0,3):
receive, send = multiprocessing.Pipe(duplex=False)
t = threading.Thread(target=thread_work, args=[work_queue, (receive, send)])
t.daemon = True
t.start()
for i in range(0,2):
p = multiprocessing.Process(target=process_work, args=[work_queue])
p.daemon = True
p.start()
time.sleep(5)
I had a look in the multiprocessing source code, but couldn't understand why this error occurs.
I tried using the queue.Queue, or a Pipe with duplex=True (default) but coudn't find a pattern in the error. Does anyone have a clue how to debug this?
You are forking an already multi-threaded main-process here. That is known to be problematic in general.
It is in-fact problem prone (and not just in Python). The rule is "thread after you fork, not before". Otherwise, the locks used by the thread executor will get duplicated across processes. If one of those processes dies while it has the lock, all of the other processes using that lock will deadlock -Raymond Hettinger.
Trigger for the error you get is apparantly that the duplication of the file-descriptor for the pipe fails in the child process.
To resolve this issue, either create your child-processes as long as your main-process is still single-threaded or use another start_method for creating new processes like 'spawn' (default on Windows) or 'forkserver', if available.
forkserver
When the program starts and selects the forkserver start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use os.fork(). No unnecessary resources are inherited.
Available on Unix platforms which support passing file descriptors over Unix pipes. docs
You can specify another start_method with:
multiprocessing.set_start_method(method)
Set the method which should be used to start child processes. method can be 'fork', 'spawn' or 'forkserver'.
Note that this should be called at most once, and it should be protected inside the if name == 'main' clause of the main module. docs
For a benchmark of the specific start_methods (on Ubuntu 18.04) look here.

Designing a good architecture for python multiprocessing

I have a program (say, "prog") written in C that makes many numerical operations. I want to write a "driver" utility in python that runs the "prog" with different configurations in a parallel way, reads its outputs and logs them. There are several issues to take into account:
All sort of things can go bad any time so logging has to be done as soon as possible after any prog instance finishes.
Several progs can finish simultaneously so logging should be done centralized
workers may be killed somehow and driver has to handle that situation properly
all workers and logger must be terminated correctly without tons of backtraces when KeyboardInterrupt is handled
The first two points make me think that all workers have to send their results to some centralized logger worker through for example multiprocessing.Queue. But it seems that the third point makes this solution a bad one because if a worker is killed the queue is going to become corrupted. So the Queue is not suitable. Instead I can use multiple process to process pipes (i.e. every worker is connected through the pipe with a logger). But then the other problems raise:
reading from pipe is a blocking operation so one logger can't read asynchronously from several workers (use threads?)
if a worker is killed and a pipe is corrupted, how the logger can diagnose this?
P.S. point #4 seems to be solveable -- a have to
disable default SIGINT handling in all workers and logger;
add try except block to main process that makes pool.terminate();pool.join() calls in case of SIGINT exception handled.
Could you please suggest a better design approach if possible and if not than how to tackle the problems described above?
P.S. python 2.7
You can start from the answer given here: https://stackoverflow.com/a/23369802/4323
The idea is to not use subprocess.call() which is blocking, but instead subprocess.Popen which is non-blocking. Set stdout of each instance to e.g. a StringIO object you create for each prog child. Spawn all the progs, wait for them, write their output. Should be not far off from the code shown above.

Gracefully Terminating Python Threads

I am trying to write a unix client program that is listening to a socket, stdin, and reading from file descriptors. I assign each of these tasks to an individual thread and have them successfully communicating with the "main" application using synchronized queues and a semaphore. The problem is that when I want to shutdown these child threads they are all blocking on input. Also, the threads cannot register signal handlers in the threads because in Python only the main thread of execution is allowed to do so.
Any suggestions?
There is no good way to work around this, especially when the thread is blocking.
I had a similar issue ( Python: How to terminate a blocking thread) and the only way I was able to stop my threads was to close the underlying connection. Which resulted in the thread that was blocking to raise and exception and then allowed me to check the stop flag and close.
Example code:
class Example(object):
def __init__(self):
self.stop = threading.Event()
self.connection = Connection()
self.mythread = Thread(target=self.dowork)
self.mythread.start()
def dowork(self):
while(not self.stop.is_set()):
try:
blockingcall()
except CommunicationException:
pass
def terminate():
self.stop.set()
self.connection.close()
self.mythread.join()
Another thing to note is commonly blocking operations generally offer up a timeout. If you have that option I would consider using it. My last comment is that you could always set the thread to deamonic,
From the pydoc :
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
Also, the threads cannot register signal handlers
Signals to kill threads is potentially horrible, especially in C, especially if you allocate memory as part of the thread, since it won't be freed when that particular thread dies (as it belongs to the heap of the process). There is no garbage collection in C, so if that pointer goes out of scope, it's gone out of scope, the memory remains allocated. So just be careful with that one - only do it that way in C if you're going to actually kill all the threads and end the process so that the memory is handed back to the OS - adding and removing threads from a threadpool for example will give you a memory leak.
The problem is that when I want to shutdown these child threads they are all blocking on input.
Funnily enough I've been fighting with the same thing recently. The solution is literally don't make blocking calls without a timeout. So, for example, what you want ideally is:
def threadfunc(running):
while running:
blockingcall(timeout=1)
where running is passed from the controlling thread - I've never used threading but I have used multiprocessing and with this you actually need to pass an Event() object and check is_set(). But you asked for design patterns, that's the basic idea.
Then, when you want this thread to end, you run:
running.clear()
mythread.join()
and your main thread should then allow your client thread to handle its last call, and return, and the whole program folds up nicely.
What do you do if you have a blocking call without a timeout? Use the asynchronous option, and sleep (as in call whatever method you have to suspend the thread for a period of time so you're not spinning) if you need to. There's no other way around it.
See these answers:
Python SocketServer
How to exit a multithreaded program?
Basically, don't block on recv() by using select() with a timeout to check for readability of the socket, and poll a quit flag when select() times out.

Categories

Resources