Python: How to keep starting new parallel processes based on condition [duplicate] - python

I am reading various tutorials on the multiprocessing module in Python, and am having trouble understanding why/when to call process.join(). For example, I stumbled across this example:
nums = range(100000)
nprocs = 4
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = factorize_naive(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
From what I understand, process.join() will block the calling process until the process whose join method was called has completed execution. I also believe that the child processes which have been started in the above code example complete execution upon completing the target function, that is, after they have pushed their results to the out_q. Lastly, I believe that out_q.get() blocks the calling process until there are results to be pulled. Thus, if you consider the code:
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
the main process is blocked by the out_q.get() calls until every single worker process has finished pushing its results to the queue. Thus, by the time the main process exits the for loop, each child process should have completed execution, correct?
If that is the case, is there any reason for calling the p.join() methods at this point? Haven't all worker processes already finished, so how does that cause the main process to "wait for all worker processes to finish?" I ask mainly because I have seen this in multiple different examples, and I am curious if I have failed to understand something.

Try to run this:
import math
import time
from multiprocessing import Queue
import multiprocessing
def factorize_naive(n):
factors = []
for div in range(2, int(n**.5)+1):
while not n % div:
factors.append(div)
n //= div
if n != 1:
factors.append(n)
return factors
nums = range(100000)
nprocs = 4
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = factorize_naive(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
time.sleep(5)
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
time.sleep(15)
And open the task-manager. You should be able to see that the 4 subprocesses go in zombie state for some seconds before being terminated by the OS(due to the join calls):
With more complex situations the child processes could stay in zombie state forever(like the situation you was asking about in an other question), and if you create enough child-processes you could fill the process table causing troubles to the OS(which may kill your main process to avoid failures).

At the point just before you call join, all workers have put their results into their queues, but they did not necessarily return, and their processes may not yet have terminated. They may or may not have done so, depending on timing.
Calling join makes sure that all processes are given the time to properly terminate.

I am not exactly sure of the implementation details, but join also seems to be necessary to reflect that a process has indeed terminated (after calling terminate on it for example). In the example here, if you don't call join after terminating a process, process.is_alive() returns True, even though the process was terminated with a process.terminate() call.

Related

multiprocessing hanging at join

Before anyone marks it as a duplicate question. I have been looking at StackOverflow posts for days, I haven't really found a good or satisfying answer.
I have a program that at some point will take individual strings (also many other arguments and objects), do some complicated processes on them, and spit 1 or more strings back. Because each string is processed separately, using multiprocessing seems natural here, especially since I work on machines with over 100 cores.
The following is a minimal example, which works with up to 12 to 15 cores, if I try to give it more cores, it hangs at p.join(). I know it's hanging at join because I tried to add some debug prints before and after join and it would stop at some point between the two print commands.
Minimal example:
import os, random, sys, time, string
import multiprocessing as mp
letters = string.ascii_uppercase
align_len = 1300
def return_string(queue):
n_strings = [1,2,3,4]
alignments = []
# generating 1 to 4 sequences randomly, each sequence of length 1300
# the original code might even produce more than 4, but 1 to 4 is an average case
# instead of the random string there will be some complicated function called
# in the original code
for i in range(random.choice(n_strings)):
alignment = ""
for i in range(align_len):
alignment += random.choice(letters)
alignments.append(alignment)
for a in alignments:
queue.put(a)
def run_string_gen(cores):
processes = []
queue = mp.Queue()
# running the target function 1000 time
for i in range(1000):
# print(i)
process = mp.Process(target=return_string, args = (queue,))
processes.append(process)
if len(processes) == cores:
counter = len(processes)
for p in processes:
p.start()
for p in processes:
p.join()
while queue.qsize() != 0:
a = queue.get()
# the original idea is that instead of print
# I will be writing to a file that is already open
print(a)
processes = []
queue = mp.Queue()
# any leftovers processes
if processes:
for p in processes:
p.start()
for p in processes:
p.join()
while queue.qsize() != 0:
a = queue.get()
print(a)
if __name__ == "__main__":
cores = int(sys.argv[1])
if cores > os.cpu_count():
cores = os.cpu_count()
start = time.perf_counter()
run_string_gen(cores)
print(f"it took {time.perf_counter() - start}")
The suspect is that the queue is getting full, but also it's not that many strings, when I give it 20 cores, it's hanging, but that's about 20*4=80 strings (if the choice was always 4), but is that many strings for the queue to get full?
Assuming the queue is getting full, I am not sure at which point I should check and empty it. Doing it inside return_string seems to be a bad idea as some other processes will also have the queue and might be emptying it/filling it at the same time. Do I use lock.acquire() and lock.release() then?
These strings will be added to a file, so I can avoid using a queue and output the strings to a file. However, because starting a process means copying objects, I cannot pass a _io.TextIOWrapper object (which is an open file to append to) but I need to open and close the file inside return_string while syncing using lock.acquire() and lock.release(), but this seems wasteful to keep opening and closing the output file to write to it.
Some of the suggested solutions out there:
1- De-queuing the queue before joining is one of the answers I found. However, I cannot anticipate how long each process will take, and adding a sleep command after p.start() loop and before p.join() is bad (at least for my code), because if they finish fast and I end up waiting, that's just a lot of time wasted, and the whole idea is to have speed here.
2- Add some kind of sentinal character e.g. none to know if one worker finished. But didn't get this part, if I run the target function 10 times for 10 cores, I will have 10 sentinels, but the problems is that it's hanging and can't get to the queue to empty and check for sentinal.
Any suggestions or ideas on what to do here?
Read carefully the documentation for `multiprocessing.Queue. Read the second warning, which says in part:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
In simple terms, your program violates this by joining the processes before it has read the items from the queue. You must reverse the order of operations. Then the problem becomes how does the main process know when to stop reading if the subprocesses are still running and writing to the queue. The simplest solution is for each subprocess to write a special sentinel record as the final item signaling that there are no more items that will be written by that process. The main process can then simply do blocking reads until it sees N sentinel records where N is the number of processes that it has started that will be writing to the queue. The sentinel record just has to be any unique record that cannot be mistaken for a normal item to be processed. None will suffice for that purpose:
import os, random, sys, time, string
import multiprocessing as mp
letters = string.ascii_uppercase
align_len = 1300
SENTINEL = None # no more records sentinel
def return_string(queue):
n_strings = [1,2,3,4]
alignments = []
# generating 1 to 4 sequences randomly, each sequence of length 1300
# the original code might even produce more than 4, but 1 to 4 is an average case
# instead of the random string there will be some complicated function called
# in the original code
for i in range(random.choice(n_strings)):
alignment = ""
for i in range(align_len):
alignment += random.choice(letters)
alignments.append(alignment)
for a in alignments:
queue.put(a)
# show this process is through writing records:
queue.put(SENTINEL)
def run_string_gen(cores):
processes = []
queue = mp.Queue()
# running the target function 1000 time
for i in range(1000):
# print(i)
process = mp.Process(target=return_string, args = (queue,))
processes.append(process)
if len(processes) == cores:
counter = len(processes)
for p in processes:
p.start()
seen_sentinel_count = 0
while seen_sentinel_count < len(processes):
a = queue.get()
if a is SENTINEL:
seen_sentinel_count += 1
# the original idea is that instead of print
# I will be writing to a file that is already open
else:
print(a)
for p in processes:
p.join()
processes = []
# The same queue can be reused:
#queue = mp.Queue()
# any leftovers processes
if processes:
for p in processes:
p.start()
seen_sentinel_count = 0
while seen_sentinel_count < len(processes):
a = queue.get()
if a is SENTINEL:
seen_sentinel_count += 1
else:
print(a)
for p in processes:
p.join()
if __name__ == "__main__":
cores = int(sys.argv[1])
if cores > os.cpu_count():
cores = os.cpu_count()
start = time.perf_counter()
run_string_gen(cores)
print(f"it took {time.perf_counter() - start}")
Prints:
...
NEUNBZVXNHCHVIGNDCEUXJSINEJQNCOWBMUJRTIASUEJHDJUWZIYHHZTJJSJXALZHOEVGMHSVVMMIFZGLGLJDECEWSVZCDRHZWVOMHCDLJVQLQIQCVKBEVOVDWTMFPWIWIQFOGWAOPTJUWKAFBXPWYDIENZTTJNFAEXDVZHXHJPNFDKACCTRTOKMVDGBQYJQMPSQZKDNDYFVBCFMWCSCHTVKURPJDBMRWFQAYIIALHDJTTMSIAJAPLHUAJNMHOKLZNUTRWWYURBTVQHWECAFHQPOZZLVOQJWVLFXUEQYKWEFXQPHKRRHBBCSYZOHUDIFOMBSRNDJNBHDUYMXSMKUOJZUAPPLOFAESZXIETOARQMBRYWNWTSXKBBKWYYKDNLZOCPHDVNLONEGMALL
it took 32.7125509
Update
The same code done using a multiprocessing pool, which obviates having to re-create processes:
import os, random, sys, time, string
import multiprocessing as mp
letters = string.ascii_uppercase
align_len = 1300
SENTINEL = None # no more records sentinel
def return_string():
n_strings = [1,2,3,4]
alignments = []
# generating 1 to 4 sequences randomly, each sequence of length 1300
# the original code might even produce more than 4, but 1 to 4 is an average case
# instead of the random string there will be some complicated function called
# in the original code
for i in range(random.choice(n_strings)):
alignment = ""
for i in range(align_len):
alignment += random.choice(letters)
alignments.append(alignment)
return alignments
def run_string_gen(cores):
def my_callback(result):
alignments = result
for alignment in alignments:
print(alignment)
pool = mp.Pool(cores)
for i in range(1000):
pool.apply_async(return_string, callback=my_callback)
# wait for completion of all tasks:
pool.close()
pool.join()
if __name__ == "__main__":
cores = int(sys.argv[1])
if cores > os.cpu_count():
cores = os.cpu_count()
start = time.perf_counter()
run_string_gen(cores)
print(f"it took {time.perf_counter() - start}")
Prints:
...
OMCRIHWCNDKYBZBTXUUYAGCMRBMOVTDOCDYFGRODBWLIFZZBDGEDVAJAJFXWJRFGQXTSCCJLDFKMOENGAGXAKKFSYXEQOICKWFPSKOHIMCRATLVLVLMGFAWBDIJMZMVMHCXMTVJBSWXTLDHEWYHUMSQZGGFWRMOHKKKGMTFEOTTJDOQMOWWLKTOWHKCIUNINHTGUZHTBGHROPVKQBNEHQWIDCZUOJGHUXLLDGHCNWIGFUCAQAZULAEZPIP
it took 2.1607988999999996
Note: the answer applies to Linux systems but I guess it will be similar on Windows.
The Queue is implemented using pipes and it seems you hit the capacity limit:
man pipe(7):
If a process attempts to read from an empty pipe, then read(2) will
block until data is available. If a process attempts to write to a
full pipe (see below), then write(2) blocks until sufficient data has
been read from the pipe to allow the write to complete.
However the Python queue will just enqueue the data to the underlying buffer and the queue thread will block on the writes to the pipe.
The Process.join method also blocks so you have to start to consume the data from the queue before that. You can try to create a consumer process or just simplify your code by using Pool.
A simple test case to reproduce the issue with a single process:
test.py:
import logging
import multiprocessing as mp
import os
logger = mp.log_to_stderr()
logger.setLevel(logging.DEBUG)
def worker(q, n):
q.put(os.urandom(2 ** n))
def main():
q = mp.Queue()
p = mp.Process(target=worker, args=(q, 17)) # > 65k bytes
p.start()
# p.join()
if __name__ == "__main__":
main()
Test:
$ python test.py
[DEBUG/MainProcess] created semlock with handle 140292518252544
[DEBUG/MainProcess] created semlock with handle 140292517982208
[DEBUG/MainProcess] created semlock with handle 140292517978112
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[INFO/MainProcess] calling join() for process Process-1
[DEBUG/Process-1] Queue._after_fork()
[INFO/Process-1] child process calling self.run()
[DEBUG/Process-1] Queue._start_thread()
[DEBUG/Process-1] doing self._thread.start()
[DEBUG/Process-1] starting thread to feed data to pipe
[DEBUG/Process-1] ... done self._thread.start()
[INFO/Process-1] process shutting down
[DEBUG/Process-1] running all "atexit" finalizers with priority >= 0
[DEBUG/Process-1] telling queue thread to quit
[DEBUG/Process-1] running the remaining "atexit" finalizers
[DEBUG/Process-1] joining queue thread
As you can see above, it blocks when joining the queue thread because it can't write to the pipe:
$ sudo strace -ttT -f -p 218650
strace: Process 218650 attached with 2 threads
[pid 218650] 07:51:44.659503 write(4, "\277.\332)\334p\226\4e\202\3748\315\341\306\227`X\326\253\23m\25#:\345g-D\233\344$"..., 4096 <unfinished ...>
[pid 218649] 07:51:44.659563 futex(0x7fe3f8000b60, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
Once we read from the pipe on another terminal, the process terminates:
$ cat /proc/218650/fd/4 1> /dev/null
...
[DEBUG/Process-1] feeder thread got sentinel -- exiting
[DEBUG/Process-1] ... queue thread joined
[INFO/Process-1] process exiting with exitcode 0
[DEBUG/MainProcess] running the remaining "atexit" finalizers

Add Task to Multiprocessing Pool of Parent

How can I add a new task to a multiprocessing pool that I initialized in a parent process? This following does not work:
from multiprocessing import Pool
def child_task(x):
# the child task spawns new tasks
results = p.map(grandchild_task, [x])
return results[0]
def grandchild_task(x):
return x
if __name__ == '__main__':
p = Pool(2)
print(p.map(child_task, [0]))
# Result: NameError: name 'p' is not defined
Motivation: I need to parallelize a program which consists of various child tasks, which themselves also have child tasks (i.e., grandchild tasks). Only parallelizing the child tasks OR the grandchild tasks does not utilize all my CPU cores.
In my use-case, I have various child tasks (maybe 1-50) and many grandchild tasks per child task (maybe 100-1000).
Alternatives: If this is not possible using Python's multiprocessing package, I am happy to switch to another library that supports this.
There is such a thing as a minimal reproducible example and then there is going beyond that to remove so much code as to end up with something that (1) is perhaps too oversimplified with the danger than an answer could miss the mark and (2) couldn't possibly run as shown (you need to enclose the code that creates the Pool and submits the task in a block that is controlled by an if __name__ == '__main__': statement.
But based on what you have shown, I don't believe a Pool is the solution for you; you should be creating Process instances as they are required. One way to get the results from the Processes is to store them in a shareable, managed dictionary whose key is, for example, the process id of the Process that has created the result.
To expand on your example, the child task is passed two arguments, x and y and needs to return as a result x**2 + 'y**2. The child task will spawn two instances of grandchild task, each one computing the square of its argument. The child task will then combine the return values from these processes using addition:
from multiprocessing import Process, Manager
import os
def child_task(results_dict, x, y):
# the child task spawns new tasks
p1 = Process(target=grandchild_task, args=(results_dict, x))
p1.start()
pid1 = p1.pid
p2 = Process(target=grandchild_task, args=(results_dict, y))
p2.start()
pid2 = p2.pid
p1.join()
p2.join()
pid = os.getpid()
results_dict[pid] = results_dict[pid1] + results_dict[pid2]
def grandchild_task(results_dict, n):
pid = os.getpid()
results_dict[pid] = n * n
def main():
manager = Manager()
results_dict = manager.dict()
p = Process(target=child_task, args=(results_dict, 2, 3))
p.start()
pid = p.pid
p.join()
# results will be stored with key p.pid:
print(results_dict[pid])
if __name__ == '__main__':
main()
Prints:
13
Update
If you really had a situation where, for example, child_task needed to process N identical calls varying only in its arguments but it had to spawn a sub-process or two, then use a Pool as before but additionally pass a managed dictionary to child_task to be used for spawning additional Processes (not attempting to use a Pool for this) and retrieving their results.
Update 2
The only way I could figure out for the sub-processes themselves to use pooling is to use the ProcessPoolExecutor class from concurrent.futures module. When I attempted to do the same thing with multiprocessing.Pool, I got an error because we had daemon processes trying to create their own processes. But even here the only way is for each process in the pool to have its own pool of processes. You only have a finite number of processors/cores on your computer, so unless there is a bit of I/O mixed in the processing, you can create all these pools but the processes will be waiting for a chance to run. So, it's not clear what performance gains will be realized. There is also the problem of shutting down all the pools created for the child_task sub-processes. Normally a ProcessPoolExecutor instance is created using a with block and when that block is terminated the pool that was created is cleaned up. But child_task is invoked repeatedly and clearly cannot use with block because we don't want constantly to be creating and destroying pools. What I have come here is a bit of a kludge: A third parameter is passed, either True or False, indicating whether child_task should instigate a shutdown of its pool. The default value for this parameter is False, we don't even bother passing it. After all the actual results have been retrieved and the child_task processes are now idle, we submit N new tasks with dummy values but with shutdown set to True. Note that the ProcessPoolExecutor function map works quite a bit differently than the same function in the Pool class (read the docs):
from concurrent.futures import ProcessPoolExecutor
import time
child_executor = None
def child_task(x, y, shutdown=False):
global child_executor
if child_executor is None:
child_executor = ProcessPoolExecutor(max_workers=1)
if shutdown:
if child_executor:
child_executor.shutdown(False)
child_executor = None
time.sleep(.2) # make sure another process in the pool gets the next task
return None
# the child task spawns new task(s)
future = child_executor.submit(grandchild_task, y)
# we can compute one of the results using the current process:
return grandchild_task(x) + future.result()
def grandchild_task(n):
return n * n
def main():
N_WORKERS = 2
with ProcessPoolExecutor(max_workers=N_WORKERS) as executor:
# first call is (1, 2), second call is (3, 4):
results = [result for result in executor.map(child_task, (1, 3), (2, 4))]
print(results)
# force a shutdown
# need N_WORKERS invocations:
[result for result in executor.map(child_task, (0,) * N_WORKERS, (0,) * N_WORKERS, (True,) * N_WORKERS)]
if __name__ == '__main__':
main()
Prints:
[5, 25]
Check this solution:
#!/usr/bin/python
# requires Python version 3.8 or higher
from multiprocessing import Queue, Process
import time
from random import randrange
import os
import psutil
# function to be run by each child process
def square(number):
sleep = randrange(5)
time.sleep(sleep)
print(f'Result is {number * number}, computed by pid {os.getpid()}...sleeping {sleep} secs')
# create a queue where all tasks will be placed
queue = Queue()
# indicate how many number of children you want the system to create to run the tasks
number_of_child_proceses = 5
# put all tasks in the queue above
for task in range(19):
queue.put(task)
# this the main entry/start of the program when you run
def main():
number_of_task = queue.qsize()
print(f'{"_" * 60}\nBatch: {number_of_task // number_of_child_proceses + 1} \n{"_" * 60}')
# don't create more number of children than the number of tasks. Also, in the last round, wait for all child process
# to complete so as to wrap up everything
if number_of_task <= number_of_child_proceses:
processes = [Process(target=square, args=(queue.get(),)) for _ in
range(number_of_task)]
for p in processes:
p.start()
p.join()
else:
processes = [Process(target=square, args=(queue.get(),)) for _ in range(number_of_child_proceses)]
for p in processes:
p.start()
# update count of remaining task
number_of_task = queue.qsize()
# run the program in a loop until no more task remains in the queue
while number_of_task:
current_process = psutil.Process()
children = current_process.children()
# if children process have completed assigned task but there is still more remaining tasks in the queue,
# assign them more tasks
if not len(children) and number_of_task:
print(f'\nAssigned tasks completed... reasigning the remaining {number_of_task} task(s) in the queue\n')
main()
# exit the loop if no more task in the queue to work on
print('\nAll tasks completed!!')
exit()
if __name__ == "__main__":
main()
I have looked around more, and found Ray, which addresses this exact use case using nested remote functions.

python multiprocessing .join() deadlock depends on worker function

I am using the multiprocessing python library to spawn 4 Process() objects to parallelize a cpu intensive task. The task (inspiration and code from this great article) is to compute the prime factors for every integer in a list.
main.py:
import random
import multiprocessing
import sys
num_inputs = 4000
num_procs = 4
proc_inputs = num_inputs/num_procs
input_list = [int(1000*random.random()) for i in xrange(num_inputs)]
output_queue = multiprocessing.Queue()
procs = []
for p_i in xrange(num_procs):
print "Process [%d]"%p_i
proc_list = input_list[proc_inputs * p_i:proc_inputs * (p_i + 1)]
print " - num inputs: [%d]"%len(proc_list)
# Using target=worker1 HANGS on join
p = multiprocessing.Process(target=worker1, args=(p_i, proc_list, output_queue))
# Using target=worker2 RETURNS with success
#p = multiprocessing.Process(target=worker2, args=(p_i, proc_list, output_queue))
procs.append(p)
p.start()
for p in jobs:
print "joining ", p, output_queue.qsize(), output_queue.full()
p.join()
print "joined ", p, output_queue.qsize(), output_queue.full()
print "Processing complete."
ret_vals = []
while output_queue.empty() == False:
ret_vals.append(output_queue.get())
print len(ret_vals)
print sys.getsizeof(ret_vals)
Observation:
If the target for each process is the function worker1, for an input list larger than 4000 elements the main thread gets stuck on .join(), waiting for the spawned processes to terminate and never returns.
If the target for each process is the function worker2, for the same input list the code works just fine and the main thread returns.
This is very confusing to me, as the only difference between worker1 and worker2 (see below) is that the former inserts individual lists in the Queue whereas the latter inserts a single list of lists for each process.
Why is there deadlock using worker1 and not using worker2 target?
Shouldn't both (or neither) go beyond the Multiprocessing Queue maxsize limit is 32767?
worker1 vs worker2:
def worker1(proc_num, proc_list, output_queue):
'''worker function which deadlocks'''
for num in proc_list:
output_queue.put(factorize_naive(num))
def worker2(proc_num, proc_list, output_queue):
'''worker function that works'''
workers_stuff = []
for num in proc_list:
workers_stuff.append(factorize_naive(num))
output_queue.put(workers_stuff)
There are a lot of similar questions on SO, but I believe the core of this questions is clearly distinct from all of them.
Related Links:
https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/
python multiprocessing issues
python multiprocessing - process hangs on join for large queue
Process.join() and queue don't work with large numbers
Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
Script using multiprocessing module does not terminate
Why does multiprocessing.Process.join() hang?
When to call .join() on a process?
What exactly is Python multiprocessing Module's .join() Method Doing?
The docs warn about this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
While a Queue appears to be unbounded, under the covers queued items are buffered in memory to avoid overloading inter-process pipes. A process cannot end normally before those memory buffers are flushed. Your worker1() puts a lot more items on the queue than your worker2(), and that's all there is to it. Note that the number of items that can queued before the implementation resorts to buffering in memory isn't defined: it can vary across OS and Python release.
As the docs suggest, the normal way to avoid this is to .get() all the items off the queue before you attempt to .join() the processes. As you've discovered, whether it's necessary to do so depends in an undefined way on how many items have been put on the queue by each worker process.

Multiprocessing has cutoff at 992 integers being joined as result

I am following this book http://doughellmann.com/pages/python-standard-library-by-example.html
Along with some online references. I have some algorithm setup for multiprocessing where i have a large array of dictionaries and do some calculation. I use multiprocessing to divide the indexes on which the calculations are done on the dictionary. To make the question more general, I replaced the algorithm with just some array of return values. From finding information online and other SO, I think it has to do with the join method.
The structure is like so,
Generate some fake data, call the manager function for multiprocessing, create a Queue, divide data over the number of index. Loop through the number of processes to use, send each process function the correct index range. Lastly join the processes and print out the results.
What I have figured out, is if the function used by the processes is trying to return a range(0,992), it works quickly, if the range(0,993), it hangs. I tried on two different computers with different specs.
The code is here:
import multiprocessing
def main():
data = []
for i in range(0,10):
data.append(i)
CalcManager(data,start=0,end=50)
def CalcManager(myData,start,end):
print 'in calc manager'
#Multi processing
#Set the number of processes to use.
nprocs = 3
#Initialize the multiprocessing queue so we can get the values returned to us
tasks = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
#Setup an empty array to store our processes
procs = []
#Divide up the data for the set number of processes
interval = (end-start)/nprocs
new_start = start
#Create all the processes while dividing the work appropriately
for i in range(nprocs):
print 'starting processes'
new_end = new_start + interval
#Make sure we dont go past the size of the data
if new_end > end:
new_end = end
#Generate a new process and pass it the arguments
data = myData[new_start:new_end]
#Create the processes and pass the data and the result queue
p = multiprocessing.Process(target=multiProcess,args=(data,new_start,new_end,result_q,i))
procs.append(p)
p.start()
#Increment our next start to the current end
new_start = new_end+1
print 'finished starting'
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
#Print out the results
for i in range(nprocs):
result = result_q.get()
print result
#MultiProcess Handling
def multiProcess(data,start,end,result_q,proc_num):
print 'started process'
results = range(0,(992))
result_q.put(results)
return
if __name__== '__main__':
main()
Is there something about these numbers specifically or am I just missing something basic that has nothing to do with these numbers?
From my searches, it seems this is some memory issue with the join method, but the book does not really explain how to solve this using this setup. Is it possible to use this structure (i understand it mostly, so it would be nice if i can continue to use this) and also pass back large results. I know there are other methods to share data between processes, but thats not what I need, just return the values and join them to one array once completed.
I can't reproduce this on my machine, but it sounds like items in put into the queue haven't been flushed to the underlying pipe. This will cause a deadlock if you try to terminate the process, according to the docs:
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe. This means that if you try joining that process you may get a
deadlock unless you are sure that all items which have been put on the
queue have been consumed. Similarly, if the child process is
non-daemonic then the parent process may hang on exit when it tries to
join all its non-daemonic children.
If you're in this situation. your p.join() calls will hang forever, because there's still buffered data in the queue. You can avoid it by consuming from the queue before you join the processes:
#Print out the results
for i in range(nprocs):
result = result_q.get()
print result
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
This doesn't affect the way the code works, each result_q.get() call will block until the result is placed on the queue, which has the same effect has calling join on all processes prior to calling get. The only difference is you avoid the deadlock.

When to call .join() on a process?

I am reading various tutorials on the multiprocessing module in Python, and am having trouble understanding why/when to call process.join(). For example, I stumbled across this example:
nums = range(100000)
nprocs = 4
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = factorize_naive(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
From what I understand, process.join() will block the calling process until the process whose join method was called has completed execution. I also believe that the child processes which have been started in the above code example complete execution upon completing the target function, that is, after they have pushed their results to the out_q. Lastly, I believe that out_q.get() blocks the calling process until there are results to be pulled. Thus, if you consider the code:
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
# Wait for all worker processes to finish
for p in procs:
p.join()
the main process is blocked by the out_q.get() calls until every single worker process has finished pushing its results to the queue. Thus, by the time the main process exits the for loop, each child process should have completed execution, correct?
If that is the case, is there any reason for calling the p.join() methods at this point? Haven't all worker processes already finished, so how does that cause the main process to "wait for all worker processes to finish?" I ask mainly because I have seen this in multiple different examples, and I am curious if I have failed to understand something.
Try to run this:
import math
import time
from multiprocessing import Queue
import multiprocessing
def factorize_naive(n):
factors = []
for div in range(2, int(n**.5)+1):
while not n % div:
factors.append(div)
n //= div
if n != 1:
factors.append(n)
return factors
nums = range(100000)
nprocs = 4
def worker(nums, out_q):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outdict = {}
for n in nums:
outdict[n] = factorize_naive(n)
out_q.put(outdict)
# Each process will get 'chunksize' nums and a queue to put his out
# dict into
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
time.sleep(5)
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
time.sleep(15)
And open the task-manager. You should be able to see that the 4 subprocesses go in zombie state for some seconds before being terminated by the OS(due to the join calls):
With more complex situations the child processes could stay in zombie state forever(like the situation you was asking about in an other question), and if you create enough child-processes you could fill the process table causing troubles to the OS(which may kill your main process to avoid failures).
At the point just before you call join, all workers have put their results into their queues, but they did not necessarily return, and their processes may not yet have terminated. They may or may not have done so, depending on timing.
Calling join makes sure that all processes are given the time to properly terminate.
I am not exactly sure of the implementation details, but join also seems to be necessary to reflect that a process has indeed terminated (after calling terminate on it for example). In the example here, if you don't call join after terminating a process, process.is_alive() returns True, even though the process was terminated with a process.terminate() call.

Categories

Resources