python multiprocessing .join() deadlock depends on worker function - python

I am using the multiprocessing python library to spawn 4 Process() objects to parallelize a cpu intensive task. The task (inspiration and code from this great article) is to compute the prime factors for every integer in a list.
import random
import multiprocessing
import sys
num_inputs = 4000
num_procs = 4
proc_inputs = num_inputs/num_procs
input_list = [int(1000*random.random()) for i in xrange(num_inputs)]
output_queue = multiprocessing.Queue()
procs = []
for p_i in xrange(num_procs):
print "Process [%d]"%p_i
proc_list = input_list[proc_inputs * p_i:proc_inputs * (p_i + 1)]
print " - num inputs: [%d]"%len(proc_list)
# Using target=worker1 HANGS on join
p = multiprocessing.Process(target=worker1, args=(p_i, proc_list, output_queue))
# Using target=worker2 RETURNS with success
#p = multiprocessing.Process(target=worker2, args=(p_i, proc_list, output_queue))
for p in jobs:
print "joining ", p, output_queue.qsize(), output_queue.full()
print "joined ", p, output_queue.qsize(), output_queue.full()
print "Processing complete."
ret_vals = []
while output_queue.empty() == False:
print len(ret_vals)
print sys.getsizeof(ret_vals)
If the target for each process is the function worker1, for an input list larger than 4000 elements the main thread gets stuck on .join(), waiting for the spawned processes to terminate and never returns.
If the target for each process is the function worker2, for the same input list the code works just fine and the main thread returns.
This is very confusing to me, as the only difference between worker1 and worker2 (see below) is that the former inserts individual lists in the Queue whereas the latter inserts a single list of lists for each process.
Why is there deadlock using worker1 and not using worker2 target?
Shouldn't both (or neither) go beyond the Multiprocessing Queue maxsize limit is 32767?
worker1 vs worker2:
def worker1(proc_num, proc_list, output_queue):
'''worker function which deadlocks'''
for num in proc_list:
def worker2(proc_num, proc_list, output_queue):
'''worker function that works'''
workers_stuff = []
for num in proc_list:
There are a lot of similar questions on SO, but I believe the core of this questions is clearly distinct from all of them.
Related Links:
python multiprocessing issues
python multiprocessing - process hangs on join for large queue
Process.join() and queue don't work with large numbers
Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
Script using multiprocessing module does not terminate
Why does multiprocessing.Process.join() hang?
When to call .join() on a process?
What exactly is Python multiprocessing Module's .join() Method Doing?

The docs warn about this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
While a Queue appears to be unbounded, under the covers queued items are buffered in memory to avoid overloading inter-process pipes. A process cannot end normally before those memory buffers are flushed. Your worker1() puts a lot more items on the queue than your worker2(), and that's all there is to it. Note that the number of items that can queued before the implementation resorts to buffering in memory isn't defined: it can vary across OS and Python release.
As the docs suggest, the normal way to avoid this is to .get() all the items off the queue before you attempt to .join() the processes. As you've discovered, whether it's necessary to do so depends in an undefined way on how many items have been put on the queue by each worker process.


In Python multiprocessing when child process writes data to Queue and no one reads it, child process does not exit. WHY

I have a python code where the main process creates a child process. There is a shared queue between the two processes. The child process writes some data to this shared queue. The main process join()s on the child process.
If the data in the queue is not removed with get(), the child process does not terminate and the main is blocked at join(). Why is it so.
Following is the code that I used :
from multiprocessing import Process, Queue
from time import *
def f(q):
q.put([42, None, 'hello', [x for x in range(100000)]])
print (q.qsize())
print (q.qsize())
q = Queue()
print (q.qsize())
p = Process(target=f, args=(q,))
#print (q.get())
print('bef join')
print('aft join')
At present the q.get() is commented and so the output is :
bef join
and then the code is blocked.
But if I uncomment one of the q.get() invocations, then the code runs completely with the following output :
bef join
aft join
Well, if you look at the Queue documentation, it explicitly says that
Queue.join : Blocks until all items in the queue have been gotten and processed. It seems logic to me that join() blocks your program if you don't empty the Queue.
To me, you need to learn about the philosophy of Multiprocessing. You have several tasks to do that don't need each other to be run, and your program at the moment is too slow for you. You need to use Multiprocess !
But don't forget there will (trust me) come a time when you will need to wait until some parallel computations are all done, because you need all of these elements to do your next task. And that's where, in your case, join() comes in. You are basically saying : I was doing things asynchronously. But now, my next task needs to be synced with the different items I computed before. Let's wait here until they are all ready.

python multiprocessing stuck (maybe reading csv)

I am trying to learn how to use multiprocessing and I am having a problem.
I am trying to run this code:
import multiprocessing as mp
import random
import string
# Define an output queue
output = mp.Queue()
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]
# Run processes
for p in processes:
# Exit the completed processes
for p in processes:
# Get process results from the output queue
results = [output.get() for p in processes]
From here
The code in itself runs properly, but when I replace rand_string with my function (reads a bunch of csv files in Pandas dataframes) the code never ends.
The function is this:
def readMyCSV(clFile):
aClTable = pd.read_csv(clFile)
# I do some processing here, but at the end the
# function returns a Pandas DataFrame
Then I wrap the function so that it allows for a Queue in the arguments:
def readMyCSVParWrap(clFile, outputq):
and I build the processes with:
processes = [mp.Process(target=readMyCSVParWrap, args=(singleFile,output)) for singleFile in allFiles[:5]]
If I do so, the code never stops running, and results are never printed.
IF I put only the clFile string in the output queue, e.g.:
the results are printed properly (just a list of clFiles)
When I look at htop, I see 5 processes being spawn, but they do not use any CPU.
Lastly, the readMyCSV function works properly if I run it by itself (returns a Pandas DataFrame)
Is there anything I am doing wrong?
I am running this in a Jupyter notebook, maybe that is an issue?
Seems your join-statements on the processes are causing a deadlock. The processes can't terminate because they wait till the items on the queue are consumed, but in your code this happens only after the joining.
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the Queue.cancel_join_thread method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.
The docs further suggest to swap the lines with queue.get and join or just removing join.
Also important:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process)...protect the “entry point” of the program by using if name == 'main':. ibid

join() method in multiprocessing with input and output Queue()

I have a SOMETIMES working piece of code:
def my_map2(func, iterable, processes=None):
''' FASTER THAN mp.Pool() - NOT '''
import multiprocessing as mp # Load multiprocessing library
if (processes == None):
processes = mp.cpu_count() # Set maximum number of cores?
L = len(iterable)
iter2 = zip(iterable,range(L))
IN = mp.Queue()
for x in iter2:
OUT = mp.Queue() # = mp.JoinableQueue() Q3
lock = mp.Lock() # Q2
def target_fun(IN, OUT):
while not IN.empty():
inp = IN.get()
out = (inp[0],func(inp[1]))
with lock: # Q2
proc = [mp.Process(target=target_fun, args=(IN, OUT,)) for x in range(processes)]
for p in proc: p.start() # Run proc
for p in proc: p.join() # Exit the completed proc Q1
results = [OUT.get() for p in range(L)] # Get Results Back Q1
results.sort() # Sort
return( [r[1] for r in results] )
import time
def f(x):
res = my_map2(f,range(100))
join() called before collecting from the queue. Why does this code work at all? join() is invoked before before Queue is empty so it should deadlock according to documentation:
any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock.
However if 2 lines marked with 'Q1' are swapped then we should run into an issue "getting from an empty queue", but both variants seem to SOMETIMES work...
If I remove the Lock() (edit 2 lines with Q2) it does get a deadlock. Why now?
Queues are thread and process safe. When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks. (talking about pipes and queues)
JoinableQueue. If I try using it it does not work. I tried all:
If you use JoinableQueue then you must call JoinableQueue.task_done().
If a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
Question 3 is: So what is an actual difference between Normal and Joinable queue? Does it mean that you can terminate child that has put something on a normal Queue at any time? I thought only manager processes work like this...
Also, so how do you correctly use both Queues?
I have read library reference, quite a few tutorials and stack overflow posts, but does anybody really understand how this works? (Talking about join(), different Queue types, when is the lock needed and accurate understanding of processes behind)

Python's semaphore hangs for ever

Im trying to do things concurrently in my program and to throttle the number of processes opened at the same time (10).
from multiprocessing import Process
from threading import BoundedSemaphore
semaphore = BoundedSemaphore(10)
for x in xrange(100000):
print 'new'
p = Process(target=f, args=(x,))
def f(x):
... # do some work
print 'done'
The first 10 processes are launched and they end correctly (I see 10 "new" and "done" on the console), and then nothing. I don't see another "new", the program just hangs there (and Ctrl-C doesn't work either). What's wrong ?
Your problem is the use of threading.BoundedSemaphore across process boundaries:
import threading
import multiprocessing
import time
semaphore = threading.BoundedSemaphore(10)
def f(x):
p = multiprocessing.Process(target=f, args=(100,))
When you create a new process, the child gets a copy of the parent process's memory. Thus the child is decrementing it's semaphore, and the semaphore in the parent is untouched. (Typically, processes are isolated from each other: it takes some extra work to communicate across processes; this is what multiprocessing is for.)
This is opposed to threads, where the two threads share the memory space, and are considered the same process.
multiprocessing.BoundedSemaphore is probably what you want. (If you replace threading.BoundedSemaphore with it, and replace semaphore._value with semaphore.get_value()`, you'll see the above's output change.)
Your bounded semaphore is not shared properly between the various processes which are being spawned; you might want to switch to using multiprocessing.BoundedSemaphore. See the answers to this question for some more details.

Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?

I would like to run a number of jobs using a pool of processes and apply a given timeout after which a job should be killed and replaced by another working on the next task.
I have tried to use the multiprocessing module which offers a method to run of pool of workers asynchronously (e.g. using map_async), but there I can only set a "global" timeout after which all processes would be killed.
Is it possible to have an individual timeout after which only a single process that takes too long is killed and a new worker is added to the pool again instead (processing the next task and skipping the one that timed out)?
Here's a simple example to illustrate my problem:
def Check(n):
import time
if n % 2 == 0: # select some (arbitrary) subset of processes
print "%d timeout" % n
while 1:
# loop forever to simulate some process getting stuck
print "%d done" % n
return 0
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.map_async(Check, range(10))
print result.get(timeout=1)
After the timeout all workers are killed and the program exits. I would like instead that it continues with the next subtask. Do I have to implement this behavior myself or are there existing solutions?
It is possible to kill the hanging workers and they are automatically replaced. So I came up with this code:
jobs = pool.map_async(Check, range(10))
while 1:
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
for c in multiprocessing.active_children():
print result
The problem now is that the loop never exits; even after all tasks have been processed, calling get yields a timeout exception.
The pebble Pool module has been built for solving these types of issue. It supports timeout on given tasks allowing to detect them and easily recover.
from pebble import ProcessPool
from concurrent.futures import TimeoutError
with ProcessPool() as pool:
future = pool.schedule(function, args=[1,2], timeout=5)
result = future.result()
except TimeoutError:
print "Function took longer than %d seconds" % error.args[1]
For your specific example:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
results = []
with ProcessPool(max_workers=4) as pool:
future =, range(10), timeout=5)
iterator = future.result()
# iterate over all results, if a computation timed out
# print it and continue to the next result
while True:
result = next(iterator)
except StopIteration:
except TimeoutError as error:
print "function took longer than %d seconds" % error.args[1]
print results
Currently the Python does not provide native means to the control execution time of each distinct task in the pool outside the worker itself.
So the easy way is to use wait_procs in the psutil module and implement the tasks as subprocesses.
If nonstandard libraries are not desirable, then you have to implement own Pool on base of subprocess module having the working cycle in the main process, poll() - ing the execution of each worker and performing required actions.
As for the updated problem, the pool becomes corrupted if you directly terminate one of the workers (it is the bug in the interpreter implementation, because such behavior should not be allowed): the worker is recreated, but the task is lost and the pool becomes nonjoinable.
You have to terminate all the pool and then recreate it again for another tasks:
from multiprocessing import Pool
while True:
pool = Pool(processes=4)
jobs = pool.map_async(Check, range(10))
print "Waiting for result"
result = jobs.get(timeout=1)
break # all clear
except multiprocessing.TimeoutError:
# kill all processes
print result
Pebble is an excellent and handy library, which solves the issue. Pebble is designed for the asynchronous execution of Python functions, where is PyExPool is designed for the asynchronous execution of modules and external executables, though both can be used interchangeably.
One more aspect is when 3dparty dependencies are not desirable, then PyExPool can be a good choice, which is a single-file lightweight implementation of Multi-process Execution Pool with per-Job and global timeouts, opportunity to group Jobs into Tasks and other features.
PyExPool can be embedded into your sources and customized, having permissive Apache 2.0 license and production quality, being used in the core of one high-loaded scientific benchmarking framework.
Try the construction where each process is being joined with a timeout on a separate thread. So the main program never gets stuck and as well the processes which if gets stuck, would be killed due to timeout. This technique is a combination of threading and multiprocessing modules.
Here is my way to maintain the minimum x number of threads in the memory. Its an combination of threading and multiprocessing modules. It may be unusual to other techniques like respected fellow members have explained above BUT may be worth considerable. For the sake of explanation, I am taking a scenario of crawling a minimum of 5 websites at a time.
so here it is:-
#importing dependencies.
from multiprocessing import Process
from threading import Thread
import threading
# Crawler function
def crawler(domain):
# define crawler technique here.
output.write(scrapeddata + "\n")
Next is threadController function. This function will control the flow of threads to the main memory. It will keep activating the threads to maintain the threadNum "minimum" limit ie. 5. Also it won't exit until, all Active threads(acitveCount) are finished up.
It will maintain a minimum of threadNum(5) startProcess function threads (these threads will eventually start the Processes from the processList while joining them with a time out of 60 seconds). After staring threadController, there would be 2 threads which are not included in the above limit of 5 ie. the Main thread and the threadController thread itself. thats why threading.activeCount() != 2 has been used.
def threadController():
print "Thread count before child thread starts is:-", threading.activeCount(), len(processList)
# staring first thread. This will make the activeCount=3
Thread(target = startProcess).start()
# loop while thread List is not empty OR active threads have not finished up.
while len(processList) != 0 or threading.activeCount() != 2:
if (threading.activeCount() < (threadNum + 2) and # if count of active threads are less than the Minimum AND
len(processList) != 0): # processList is not empty
Thread(target = startProcess).start() # This line would start startThreads function as a seperate thread **
startProcess function, as a separate thread, would start Processes from the processlist. The purpose of this function (**started as a different thread) is that It would become a parent thread for Processes. So when It will join them with a timeout of 60 seconds, this would stop the startProcess thread to move ahead but this won't stop threadController to perform. So this way, threadController will work as required.
def startProcess():
pr = processList.pop(0)
pr.join(60.00) # joining the thread with time out of 60 seconds as a float.
if __name__ == '__main__':
# a file holding a list of domains
domains = open("Domains.txt", "r").read().split("\n")
output = open("test.txt", "a")
processList = [] # thread list
threadNum = 5 # number of thread initiated processes to be run at one time
# making process List
for r in range(0, len(domains), 1):
domain = domains[r].strip()
p = Process(target = crawler, args = (domain,))
processList.append(p) # making a list of performer threads.
# starting the threadController as a seperate thread.
mt = Thread(target = threadController)
mt.join() # won't let go next until threadController thread finishes.
print "Done"
Besides maintaining a minimum number of threads in the memory, my aim was to also have something which could avoid stuck threads or processes in the memory. I did this using the time out function. My apologies for any typing mistake.
I hope this construction would help anyone in this world.
Vikas Gautam

