`apply_async` silences "shared queue errors"

`apply_async` silences "shared queue errors" - python

Consider the following example:
from multiprocessing import Queue, Pool
def work(*args):
print('work')
return 0
if __name__ == '__main__':
queue = Queue()
pool = Pool(1)
result = pool.apply_async(work, args=(queue,))
print(result.get())
This raises the following RuntimeError:
Traceback (most recent call last):
File "/tmp/test.py", line 11, in <module>
print(result.get())
[...]
RuntimeError: Queue objects should only be shared between processes through inheritance
But interestingly the exception is only raised when I try to get the result, not when the "sharing" happens. Commenting the corresponding line silences the error while I actually did share the queue (and work is never executed!).
So here goes my question: Why is this exception only raised when the result is requested, and not when the apply_async method is invoked even though the error seems to be recognized because the target work function is never called?
It looks like the exception occurs in a different process and can only be made available to the main process when inter-process communication is performed in form of requesting the result. Then, however, I'd like to know why such checks are not performed before dispatching to the other process.
(If I used the queue in both work and the main process for communication then this would (silently) introduce a deadlock.)
Python version is 3.5.2.
I have read the following questions:
Sharing many queues among processes in Python
How do you pass a Queue reference to a function managed by pool.map_async()?
Sharing a result queue among several processes
Python multiprocessing: RuntimeError: “Queue objects should only be shared between processes through inheritance”
Python sharing a lock between processes

This behavior results from the design of the multiprocessing.Pool.
Internally, when you call apply_async, you put your job in the Pool call queue and then get back a AsyncResult object, which allow you to retrieve your computation result using get.
Another thread is then in charge of pickling your work. In this thread, the RuntimeError happens but you already returned from the call to async_apply. Thus, it sets the results in AsyncResult as the exception, which is raised when you call get.
This behavior using some kind of future results is better understood when you try using concurrent.futures, which have explicit future objects and, IMO, a better design to handle failures, has you can query the future object for failure without calling the get function.

Related

Is there ever a reason to call join when using pool.map while using python multiprocessing?

As multiprocessing.Pool().map() blocks the main process from moving ahead with the execution. And, yet it gets stated everywhere that join should be called after close as a good practice. I wanted to understand, through example, what could ever be the scenario under which using join makes sense after a multiprocessing.Pool().map() call

Where does it state that "good practice"? If you have no further need of the pool, i.e. you do not plan on submitting any more tasks and your program is not terminating but you want to release the resource used by the pool and "clean up" right away, you can just call terminate either explicitly or implicitly, which happens if you use a with block as follows:
with Pool() as pool:
...
# terminate is called implicitly when the above block exits
But note that terminate will not wait for outstanding tasks, if any, to complete. If there are submitted tasks that queued up to run but not yet running or are currently running, they will be canceled.
Calling close prevents further tasks from being submitted and should only be called when you have no further use for the pool. Calling join, which requires that you first call close, will wait for any outstanding tasks to complete and the processes in the pool to terminate. But if you are using map, by definition that blocks until the tasks submitted complete. So unless you have any other tasks you submitted there is no compelling need to call close followed by join. These calls are, however, useful to wait for outstanding tasks submitted with, for example, apply_async to complete without having to explicitly call get on the AsyncResult instance returned by that call:
pool = Pool()
pool.submit(worker1, args=(arg1, arg2))
pool.submit(worker2, args=(arg3,))
pool.submit(worker3)
# wait for all 3 tasks to complete
pool.close()
pool.join()
Of course, the above is only useful if you do not need any return values from the worker functions.
So to answer your question: Not really; only if you happen to have other tasks submitted asynchronously whose completion you are awaiting. It is, however, one way of immediately releasing pool resource if you are not planning on exiting your program right away, the other way being to call method terminate.

Python ThreadPoolExecutor Suppress Exceptions

from concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED
def div_zero(x):
print('In div_zero')
return x / 0
with ThreadPoolExecutor(max_workers=4) as executor:
futures = executor.submit(div_zero, 1)
done, _ = wait([futures], return_when=ALL_COMPLETED)
# print(done.pop().result())
print('Done')
The program above will run to completion without any error message.
You can only get the exception if you explicitly call future.result() or future.exception(), like what I did in the line commented-out.
I wonder why this Python module chose this kind of behavior even if it hides problems. Because of this, I spent hours debugging
a programming error (referencing a non-exist attribute in a class) that would otherwise be very obvious if the program just crashes with exception, like Java for instance.

I suspect the reason is so that the entire pool does not crash because of one thread raising an exception. This way, the pool will process all the tasks and you can get the threads that raised exceptions separately if you need to.

Each thread is (mostly) isolated from the other threads, including the primary thread. The primary thread does not communicate with the other threads until you ask it to do so.
This includes errors. The result is what you are seeing, the errors occurring other threads do not interfere with the primary thread. You only need to handle them when you ask for the results.

Calling Py_Initialize() in multiple threads

I am embedding Python in a multi-threaded C++ application, is it safe to call
Py_Initialize() in multiple threads? Or should I call it in the main thread?

The Py_Initialize() code contains:
if (initialized)
return;
initialized = 1;
The documentation for the function also says:
https://docs.python.org/2/c-api/init.html#c.Py_Initialize
This is a no-op when called for a second time (without calling Py_Finalize() first).
My recommendation though is you only do it from the main thread, although depending on what you are doing, it can get complicated.
The problem is that signal handlers are only triggered in context of the main Python thread. That is, whatever thread was the one to call Py_Initialize(). So if that is a transient thread and is only used once and then discarded, then no chance to ever have signal handlers called. So you have to give some thought as to how you handle signals.
Also be careful of using lots of transient threads created in C code using native thread API and calling into Python interpreter as each will create data in the Python interpreter. That will accumulate if keep creating and discarding these external threads. You should endeavour to use a thread pool instead if calling in from external threads, and keep reusing prior threads.

Gracefull python joblib kill

Is it possible to gracefully kill a joblib process (threading backend), and still return the so far computed results ?
parallel = Parallel(n_jobs=4, backend="threading")
result = parallel(delayed(dummy_f)(x) for x in range(100))
For the moment I came up with two solutions
parallel._aborted = True which waits for the started jobs to finish (in my case it can be very long)
parallel._terminate_backend() which hangs if jobs are still in the pipe (parallel._jobs not empty)
Is there a way to workaround the lib to do this ?

As far as I know, Joblib does not provide methods to kill spawned threads.
As each child thread runs in its own context, it's actually difficult to perform graceful killing or termination.
That being said, there is a workaround that could be adopted.
Mimic .join() (of threading) functionality (kind of):
Create a shared memory shared_dict with keys corresponding each thread id, values if contain either thread output or Exception e.g.:
shared_dict = {i: None for i in range(num_workers)}
Whenever an error is raised in any thread, catch the exception through the handler and instead of raising it immediately, store it in the shared memory flag
Create an exception handler which waits for all(shared_dict.values())
After all values are filled with either result or error, exit the program by raising the error or logging or whatever.

Output reason for Python crash

I have an application which polls a bunch of servers every few minutes. To do this, it spawns one thread per server to poll (15 servers) and writes back the data to an object:
import requests
class ServerResults(object):
def __init__(self):
self.results = []
def add_server(some_argument):
self.results.append(some_argument)
servers = ['1.1.1.1', '1.1.1.2']
results = ServerResults()
for s in servers:
t = CallThreads(poll_server, s, results)
t.daemon = True
t.start()
def poll_server(server, results):
response = requests.get(server, timeout=10)
results.add_server(response.status_code);
The CallThreads class is a helper function to call a function (in this case poll_server() with arguments (in this case s and results), you can see the source at my Github repo of Python utility functions. Most of the time this works fine, however sometimes a thread intermittently hangs. I'm not sure why, since I am using a timeout on the GET request. In any case, if the thread hangs then the hung threads build up over the course of hours or days, and then Python crashes:
File "/usr/lib/python2.7/threading.py", line 495, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
Exception in thread Thread-575 (most likely raised during interpreter shutdown)
Exception in thread Thread-1671 (most likely raised during interpreter shutdown)
Exception in thread Thread-831 (most likely raised during interpreter shutdown)
How might I deal with this? There seems to be no way to kill a blocking thread in Python. This application needs to run on a Raspberry Pi, so large libraries such as twisted won't fit, in fact I need to get rid of the requests library as well!

As far as I can tell, a possible scenario is that when a thread "hangs" for one given server, it will stay there "forever". Next time you query your servers another thread is spawned (_start_new_thread), up to the point where Python crashes.
Probably not your (main) problem, but you should:
use a thread pool - this won't stress the limited resources of your your system as much as spawning new threads again and again.
check that you use a "thread-compatible" mechanism to handle concurrent access to results. Maybe a semaphore or mutex to lock atomic portions of your code. Probably better would be a dedicated data structure such as a queue.
Concerning the "hang" per se -- beware that the timeout argument while "opening a URL" (urlopen) is related to the time-out for establishing the connection. Not for downloading the actual data:
The optional timeout parameter specifies a timeout in seconds for
blocking operations like the connection attempt (if not specified, the
global default timeout setting will be used). This actually only works
for HTTP, HTTPS and FTP connections.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.