I've been trying to parallelise some code using concurrent.futures.ProcessPoolExecutor but have kept having strange deadlocks that don't occur with ThreadPoolExecutor. A minimal example:
from concurrent import futures
def test():
pass
with futures.ProcessPoolExecutor(4) as executor:
for i in range(100):
print('submitting {}'.format(i))
executor.submit(test)
In python 3.2.2 (on 64-bit Ubuntu), this seems to hang consistently after submitting all the jobs - and this seems to happen whenever the number of jobs submitted is greater than the number of workers. If I replace ProcessPoolExecutor with ThreadPoolExecutor it works flawlessly.
As an attempt to investigate, I gave each future a callback to print the value of i:
from concurrent import futures
def test():
pass
with futures.ProcessPoolExecutor(4) as executor:
for i in range(100):
print('submitting {}'.format(i))
future = executor.submit(test)
def callback(f):
print('callback {}'.format(i))
future.add_done_callback(callback)
This just confused me even more - the value of i printed out by callback is the value at the time it is called, rather than at the time it was defined (so I never see callback 0 but I get lots of callback 99s). Again, ThreadPoolExecutor prints out the expected value.
Wondering if this might be a bug, I tried a recent development version of python. Now, the code at least seems to terminate, but I still get the wrong value of i printed out.
So can anyone explain:
what happened to ProcessPoolExecutor in between python 3.2 and the current dev version that apparently fixed this deadlock
why the 'wrong' value of i is being printed
EDIT: as jukiewicz pointed out below, of course printing i will print the value at the time the callback is called, I don't know what I was thinking... if I pass a callable object with the value of i as one of its attributes, that works as expected.
EDIT: a little bit more information: all of the callbacks are executed, so it looks like it is executor.shutdown (called by executor.__exit__) that is unable to tell that the processes have completed. This does seem to be completely fixed in the current python 3.3, but there seem to have been a lot of changes to multiprocessing and concurrent.futures, so I don't know what fixed this. Since I can't use 3.3 (it doesn't seem to be compatible with either the release or dev versions of numpy), I tried simply copying its multiprocessing and concurrent packages across to my 3.2 installation, which seems to work fine. Still, it seems a little weird that - as far as I can see - ProcessPoolExecutor is completely broken in the latest release version but nobody else is affected.
I modified the code as follows, that solved both problems. callback function was defined as a closure, thus would use the updated value of i every time. As for deadlock, that's likely to be a cause of shutting down the Executor before all the task are complete. Waiting for the futures to complete solves that, too.
from concurrent import futures
def test(i):
return i
def callback(f):
print('callback {}'.format(f.result()))
with futures.ProcessPoolExecutor(4) as executor:
fs = []
for i in range(100):
print('submitting {}'.format(i))
future = executor.submit(test, i)
future.add_done_callback(callback)
fs.append(future)
for _ in futures.as_completed(fs): pass
UPDATE: oh, sorry, I haven't read your updates, this seems have been solved already.
Related
I am currently trying to parallelize a part of an existing program, and have encountered a strange behaviour that causes my program to stop execution (it is still running, but does not make any further progress).
First, here is a minimal working example:
import multiprocessing
import torch
def foo(x):
print(x)
return torch.zeros(x, x)
size = 200
# first block
print("Run with single core")
res = []
for i in range(size):
res.append(foo(i))
# second block
print("Run with multiprocessing")
data = [i for i in range(size)]
with multiprocessing.Pool(processes=1) as pool:
res = pool.map(foo, data)
The problem is that the script stops running during multiprocessing, and reliably at x = 182. Up to this point, I could come up with some reasonable explanations, but now comes the strange part: If I run the code only in parallel (so only the second block of code), the script works perfectly fine. It also works if I first run the parallel version, and then the single-threaded code. Only at the moment where I first run the first and then the second block, the program gets stuck. This holds also true, if I first run the second block, then the first one, and then the second one again; in that case, the first multiprocessing block works fine, and it gets stuck, the second time I run the multiprocessing version.
The problem seems not to stem from a lack of memory, since I can increase size to much higher values (1000+) when only running the multiprocessing code. Additionally, I have no problem when I use np.zeros((x, x)) instead of torch.zeros. Removing the print function does not help either, so I kept it in for demonstration purposes.
This was also reproducible on other machines, stopping as well at x = 182 when running both blocks, but working fine when only running the second. Python and Pytorch versions were respectively (3.7.3, 1.7.1), (3.8.9, 1.8.1), and (3.8.5, 1.9.0). All systems were Ubuntu based.
Has someone an explanation for this behaviour or did I miss any parameters/options I need to set for this to work?
This is not the first time I have problems with Pool() while trying to debug something.
Please note, that I do not try to debug the code which is paralellized. I want to debug code which gets executed much later. The problem is, that this code is never reached and the cause for this is that pool.map() is not terminating.
I have created a simple pipeline which does a few things. Among those things is a very simple preprocessing step for textual data.
To speed things up I am running:
print('Preprocess text')
with Pool() as pool:
df_preprocessed['text'] = pool.map(preprocessor.preprocess, df.text)
But here is the thing:
For some reason this code runs only one time. The second time I am ending up in an endless loop in _handle_workers() of the pool.py module:
#staticmethod
def _handle_workers(pool):
thread = threading.current_thread()
# Keep maintaining workers until the cache gets drained, unless the pool
# is terminated.
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
pool._maintain_pool()
time.sleep(0.1)
# send sentinel to stop workers
pool._taskqueue.put(None)
util.debug('worker handler exiting')
Note the while-loop. My script simply ends up there when calling my preprocessing function a second time.
Please note: This only happens during a debugging session! If the script is executed without a debugger everything is working fine.
What could be the reason for this?
Environment
$ python --version
Python 3.5.6 :: Anaconda, Inc.
Update
This could be a known bug
infinite waiting in multiprocessing.Pool
I wanted to use concurrency in Python for the first time. So I started reading a lot about Python concurreny (GIL, threads vs processes, multiprocessing vs concurrent.futures vs ...) and seen a lot of convoluted examples. Even in examples using the high level concurrent.futures library.
So I decided to just start trying stuff and was surprised with the very, very simple code I ended up with:
from concurrent.futures import ThreadPoolExecutor
class WebHostChecker(object):
def __init__(self, websites):
self.webhosts = []
for website in websites:
self.webhosts.append(WebHost(website))
def __iter__(self):
return iter(self.webhosts)
def check_all(self):
# sequential:
#for webhost in self:
# webhost.check()
# threaded:
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(lambda webhost: webhost.check(), self.webhosts)
class WebHost(object):
def __init__(self, hostname):
self.hostname = hostname
def check(self):
print("Checking {}".format(self.hostname))
self.check_dns() # only modifies internal state, i.e.: sets self.dns
self.check_http() # only modifies internal status, i.e.: sets self.http
Using the classes looks like this:
webhostchecker = WebHostChecker(["urla.com", "urlb.com"])
webhostchecker.check_all() # -> this calls .check() on all WebHost instances in parallel
The relevant multiprocessing/threading code is only 3 lines. I barely had to modify my existing code (which I hoped to be able to do when first starting to write the code for sequential execution, but started to doubt after reading the many examples online).
And... it works! :)
It perfectly distributes the IO-waiting among multiple threads and runs in less than 1/3 of the time of the original program.
So, now, my question(s):
What am I missing here?
Could I implement this differently? (Should I?)
Why are other examples so convoluted? (Although I must say I couldn't find an exact example doing a method call on multiple objects)
Will this code get me in trouble when I expand my program with features/code I cannot predict right now?
I think I already know of one potential problem and it would be nice if someone can confirm my reasoning: if WebHost.check() also becomes CPU bound I won't be able to swap ThreadPoolExecutor for ProcessPoolExecutor. Because every process will get cloned versions of the WebHost instances? And I would have to code something to sync those cloned instances back to the original?
Any insights/comments/remarks/improvements/... that can bring me to greater understanding will be much appreciated! :)
Ok, so I'll add my own first gotcha:
If webhost.check() raises an Exception, then the thread just ends and self.dns and/or self.http might NOT have been set. However, with the current code, you won't see the Exception, UNLESS you also access the executor.map() results! Leaving me wondering why some objects raised AttributeErrors after running check_all() :)
This can easily be fixed by just evaluating every result (which is always None, cause I'm not letting .check() return anything). You can do it after all threads have run or during. I choose to let Exceptions be raised during (ie: within the with statement), so the program stops at the first unexpected error:
def check_all(self):
with ThreadPoolExecutor(max_workers=10) as executor:
# this alone works, but does not raise any exceptions from the threads:
#executor.map(lambda webhost: webhost.check(), self.webhosts)
for i in executor.map(lambda webhost: webhost.check(), self.webhosts):
pass
I guess I could also use list(executor.map(lambda webhost: webhost.check(), self.webhosts)) but that would unnecessarily use up memory.
I have the following situation, where I create a pool in a for loop as follows (I know it's not very elegant, but I have to do this for pickling reasons). Assume that the pathos.multiprocessing is equivalent to python's multiprocessing library (as it is up to some details, that are not relevant for this problem).
I have the following code I want to execute:
self.pool = pathos.multiprocessing.ProcessingPool(number_processes)
for i in range(5):
all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))
pool._clear()
Now my problem: The loop successfully runs the first iteration. However, at the second iteration, the algorithm suddenly stops (Does not finish the pool.map operation. I suspected that zombie processes are generated, or that the process was somehow switched. Below you will find everything I have tried so far.
for i in range(5):
pool = pathos.multiprocessing.ProcessingPool(number_processes)
all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))
pool._clear()
gc.collect()
for p in multiprocessing.active_children():
p.terminate()
gc.collect()
print("We have so many active children: ", multiprocessing.active_children()) # Returns []
The above code works perfectly well on my mac. However, when I upload it on the cluster that has the following specs, I get the error that it gets stuck after the first iteration:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"
This is the link to the pathos' multiprocessing library file is
I am assuming that you are trying to call this via some function which is not the correct way to use this.
You need to wrap it around with :
if __name__ == '__main__':
for i in range(5):
pool = pathos.multiprocessing.Pool(number_processes)
all_responses = pool.map(wrapper_singlerun,
range(self.no_of_restarts))
If you don't do it will keep on creating a copy of itself and will start putting it into stack which will ultimately fill the stack and block everything. The reason it works on mac is that it has fork while windows does not have it.
I'm having problems with Python freezing when I try to use the multiprocessing module. I'm using Spyder 2.3.2 with Python 3.4.3 (I have previously encountered problems that were specific to iPython).
I've reduced it to the following MWE:
import multiprocessing
def test_function(arg1=1,arg2=2):
print("arg1 = {0}, arg2 = {1}".format(arg1,arg2))
return None
pool = multiprocessing.Pool(processes=3)
for i in range(6):
pool.apply_async(test_function)
pool.close()
pool.join()
This, in its current form, should just produce six identical iterations of test_function. However, while I can enter the commands with no hassle, when I give the command pool.join(), iPython hangs, and I have to restart the kernel.
The function works perfectly well when done in serial (the next step in my MWE would be to use pool.apply_async(test_function,kwds=entry).
for i in range(6):
test_function()
arg_list = [{'arg1':3,'arg2':4},{'arg1':5,'arg2':6},{'arg1':7,'arg2':8}]
for entry in arg_list:
test_function(**entry)
I have (occasionally, and I'm unable to reliably reproduce it) come across an error message of ZMQError: Address already in use, which led me to this bug report, but preceding my code with either multiprocessing.set_start_method('spawn') or multiprocessing.set_start_method('forkserver') doesn't seem to work.
Can anyone offer any help/advice? Thanks in advance if so.
#Anarkopsykotik is correct: you must use a main, and you can get it to print by returning a result to the main thread.
Here's a working example.
import multiprocessing
import os
def test_function(arg1=1,arg2=2):
string="arg1 = {0}, arg2 = {1}".format(arg1,arg2) +" from process id: "+ str(os.getpid())
return string
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=3)
for i in range(6):
result = pool.apply_async(test_function)
print(result.get(timeout=1))
pool.close()
pool.join()
Two things pop to my mind that might cause problems.
First, in the doc, there is a warning about using the interactive interpreter with multiprocessing module :
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the Pool examples will not work in the interactive interpreter.
Second: you might want to retrieve a string with your async function, and then display it from your main thread. I am not quite sure child threads have access to standard output, which might be locked to the main thread.