join() method in multiprocessing with input and output Queue() - python

I have a SOMETIMES working piece of code:
def my_map2(func, iterable, processes=None):
''' FASTER THAN mp.Pool() - NOT '''
import multiprocessing as mp # Load multiprocessing library
mp.freeze_support()
if (processes == None):
processes = mp.cpu_count() # Set maximum number of cores?
L = len(iterable)
iter2 = zip(iterable,range(L))
IN = mp.Queue()
for x in iter2:
IN.put(x)
OUT = mp.Queue() # = mp.JoinableQueue() Q3
lock = mp.Lock() # Q2
def target_fun(IN, OUT):
while not IN.empty():
inp = IN.get()
out = (inp[0],func(inp[1]))
with lock: # Q2
OUT.put(out)
proc = [mp.Process(target=target_fun, args=(IN, OUT,)) for x in range(processes)]
for p in proc: p.start() # Run proc
for p in proc: p.join() # Exit the completed proc Q1
results = [OUT.get() for p in range(L)] # Get Results Back Q1
results.sort() # Sort
return( [r[1] for r in results] )
import time
def f(x):
time.sleep(0.1);
return(x)
res = my_map2(f,range(100))
Questions:
join() called before collecting from the queue. Why does this code work at all? join() is invoked before before Queue is empty so it should deadlock according to documentation:
any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock.
However if 2 lines marked with 'Q1' are swapped then we should run into an issue "getting from an empty queue", but both variants seem to SOMETIMES work...
If I remove the Lock() (edit 2 lines with Q2) it does get a deadlock. Why now?
Queues are thread and process safe. When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks. (talking about pipes and queues)
JoinableQueue. If I try using it it does not work. I tried all:
If you use JoinableQueue then you must call JoinableQueue.task_done().
If a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
Question 3 is: So what is an actual difference between Normal and Joinable queue? Does it mean that you can terminate child that has put something on a normal Queue at any time? I thought only manager processes work like this...
Also, so how do you correctly use both Queues?
I have read library reference, quite a few tutorials and stack overflow posts, but does anybody really understand how this works? (Talking about join(), different Queue types, when is the lock needed and accurate understanding of processes behind)

Related

Add Task to Multiprocessing Pool of Parent

How can I add a new task to a multiprocessing pool that I initialized in a parent process? This following does not work:
from multiprocessing import Pool
def child_task(x):
# the child task spawns new tasks
results = p.map(grandchild_task, [x])
return results[0]
def grandchild_task(x):
return x
if __name__ == '__main__':
p = Pool(2)
print(p.map(child_task, [0]))
# Result: NameError: name 'p' is not defined
Motivation: I need to parallelize a program which consists of various child tasks, which themselves also have child tasks (i.e., grandchild tasks). Only parallelizing the child tasks OR the grandchild tasks does not utilize all my CPU cores.
In my use-case, I have various child tasks (maybe 1-50) and many grandchild tasks per child task (maybe 100-1000).
Alternatives: If this is not possible using Python's multiprocessing package, I am happy to switch to another library that supports this.
There is such a thing as a minimal reproducible example and then there is going beyond that to remove so much code as to end up with something that (1) is perhaps too oversimplified with the danger than an answer could miss the mark and (2) couldn't possibly run as shown (you need to enclose the code that creates the Pool and submits the task in a block that is controlled by an if __name__ == '__main__': statement.
But based on what you have shown, I don't believe a Pool is the solution for you; you should be creating Process instances as they are required. One way to get the results from the Processes is to store them in a shareable, managed dictionary whose key is, for example, the process id of the Process that has created the result.
To expand on your example, the child task is passed two arguments, x and y and needs to return as a result x**2 + 'y**2. The child task will spawn two instances of grandchild task, each one computing the square of its argument. The child task will then combine the return values from these processes using addition:
from multiprocessing import Process, Manager
import os
def child_task(results_dict, x, y):
# the child task spawns new tasks
p1 = Process(target=grandchild_task, args=(results_dict, x))
p1.start()
pid1 = p1.pid
p2 = Process(target=grandchild_task, args=(results_dict, y))
p2.start()
pid2 = p2.pid
p1.join()
p2.join()
pid = os.getpid()
results_dict[pid] = results_dict[pid1] + results_dict[pid2]
def grandchild_task(results_dict, n):
pid = os.getpid()
results_dict[pid] = n * n
def main():
manager = Manager()
results_dict = manager.dict()
p = Process(target=child_task, args=(results_dict, 2, 3))
p.start()
pid = p.pid
p.join()
# results will be stored with key p.pid:
print(results_dict[pid])
if __name__ == '__main__':
main()
Prints:
13
Update
If you really had a situation where, for example, child_task needed to process N identical calls varying only in its arguments but it had to spawn a sub-process or two, then use a Pool as before but additionally pass a managed dictionary to child_task to be used for spawning additional Processes (not attempting to use a Pool for this) and retrieving their results.
Update 2
The only way I could figure out for the sub-processes themselves to use pooling is to use the ProcessPoolExecutor class from concurrent.futures module. When I attempted to do the same thing with multiprocessing.Pool, I got an error because we had daemon processes trying to create their own processes. But even here the only way is for each process in the pool to have its own pool of processes. You only have a finite number of processors/cores on your computer, so unless there is a bit of I/O mixed in the processing, you can create all these pools but the processes will be waiting for a chance to run. So, it's not clear what performance gains will be realized. There is also the problem of shutting down all the pools created for the child_task sub-processes. Normally a ProcessPoolExecutor instance is created using a with block and when that block is terminated the pool that was created is cleaned up. But child_task is invoked repeatedly and clearly cannot use with block because we don't want constantly to be creating and destroying pools. What I have come here is a bit of a kludge: A third parameter is passed, either True or False, indicating whether child_task should instigate a shutdown of its pool. The default value for this parameter is False, we don't even bother passing it. After all the actual results have been retrieved and the child_task processes are now idle, we submit N new tasks with dummy values but with shutdown set to True. Note that the ProcessPoolExecutor function map works quite a bit differently than the same function in the Pool class (read the docs):
from concurrent.futures import ProcessPoolExecutor
import time
child_executor = None
def child_task(x, y, shutdown=False):
global child_executor
if child_executor is None:
child_executor = ProcessPoolExecutor(max_workers=1)
if shutdown:
if child_executor:
child_executor.shutdown(False)
child_executor = None
time.sleep(.2) # make sure another process in the pool gets the next task
return None
# the child task spawns new task(s)
future = child_executor.submit(grandchild_task, y)
# we can compute one of the results using the current process:
return grandchild_task(x) + future.result()
def grandchild_task(n):
return n * n
def main():
N_WORKERS = 2
with ProcessPoolExecutor(max_workers=N_WORKERS) as executor:
# first call is (1, 2), second call is (3, 4):
results = [result for result in executor.map(child_task, (1, 3), (2, 4))]
print(results)
# force a shutdown
# need N_WORKERS invocations:
[result for result in executor.map(child_task, (0,) * N_WORKERS, (0,) * N_WORKERS, (True,) * N_WORKERS)]
if __name__ == '__main__':
main()
Prints:
[5, 25]
Check this solution:
#!/usr/bin/python
# requires Python version 3.8 or higher
from multiprocessing import Queue, Process
import time
from random import randrange
import os
import psutil
# function to be run by each child process
def square(number):
sleep = randrange(5)
time.sleep(sleep)
print(f'Result is {number * number}, computed by pid {os.getpid()}...sleeping {sleep} secs')
# create a queue where all tasks will be placed
queue = Queue()
# indicate how many number of children you want the system to create to run the tasks
number_of_child_proceses = 5
# put all tasks in the queue above
for task in range(19):
queue.put(task)
# this the main entry/start of the program when you run
def main():
number_of_task = queue.qsize()
print(f'{"_" * 60}\nBatch: {number_of_task // number_of_child_proceses + 1} \n{"_" * 60}')
# don't create more number of children than the number of tasks. Also, in the last round, wait for all child process
# to complete so as to wrap up everything
if number_of_task <= number_of_child_proceses:
processes = [Process(target=square, args=(queue.get(),)) for _ in
range(number_of_task)]
for p in processes:
p.start()
p.join()
else:
processes = [Process(target=square, args=(queue.get(),)) for _ in range(number_of_child_proceses)]
for p in processes:
p.start()
# update count of remaining task
number_of_task = queue.qsize()
# run the program in a loop until no more task remains in the queue
while number_of_task:
current_process = psutil.Process()
children = current_process.children()
# if children process have completed assigned task but there is still more remaining tasks in the queue,
# assign them more tasks
if not len(children) and number_of_task:
print(f'\nAssigned tasks completed... reasigning the remaining {number_of_task} task(s) in the queue\n')
main()
# exit the loop if no more task in the queue to work on
print('\nAll tasks completed!!')
exit()
if __name__ == "__main__":
main()
I have looked around more, and found Ray, which addresses this exact use case using nested remote functions.

How to return values from Process- or Thread instances?

So I want to run a function which can either search for information on the web or directly from my own mysql database.
The first process will be time-consuming, the second relatively fast.
With this in mind I create a process which starts this compound search (find_compound_view). If the process finishes relatively fast it means it's present on the database so I can render the results immediately. Otherwise, I will render "drax_retrieving_data.html".
The stupid solution I came up with was to run the function twice, once to check if the process takes a long time, the other to actually get the return values of the function. This is pretty much because I don't know how to return the values of my find_compound_view function. I've tried googling but I can't seem to find how to return the values from the class Process specifically.
p = Process(target=find_compound_view, args=(form,))
p.start()
is_running = p.is_alive()
start_time=time.time()
while is_running:
time.sleep(0.05)
is_running = p.is_alive()
if time.time() - start_time > 10 :
print('Timer exceeded, DRAX is retrieving info!',time.time() - start_time)
return render(request,'drax_internal_dbs/drax_retrieving_data.html')
compound = find_compound_view(form,use_email=False)
if compound:
data=*****
return render(request, 'drax_internal_dbs/result.html',data)
You will need a multiprocessing.Pipe or a multiprocessing.Queue to send the results back to your parent-process. If you just do I/0, you should use a Thread instead of a Process, since it's more lightweight and most time will be spend on waiting. I'm showing you how it's done for Process and Threads in general.
Process with Queue
The multiprocessing queue is build on top of a pipe and access is synchronized with locks/semaphores. Queues are thread- and process-safe, meaning you can use one queue for multiple producer/consumer-processes and even multiple threads in these processes. Adding the first item on the queue will also start a feeder-thread in the calling process. The additional overhead of a multiprocessing.Queue makes using a pipe for single-producer/single-consumer scenarios preferable and more performant.
Here's how to send and retrieve a result with a multiprocessing.Queue:
from multiprocessing import Process, Queue
SENTINEL = 'SENTINEL'
def sim_busy(out_queue, x):
for _ in range(int(x)):
assert 1 == 1
result = x
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Process(target=sim_busy, args=(out_queue, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel breaks the loop
print(result)
The queue is passed as argument into the function, results are .put() on the queue and the parent get.()s from the queue. .get() is a blocking call, execution does not resume until something is to get (specifying timeout parameter is possible). Note the work sim_busy does here is cpu-intensive, that's when you would choose processes over threads.
Process & Pipe
For one-to-one connections a pipe is enough. The setup is nearly identical, just the methods are named differently and a call to Pipe() returns two connection objects. In duplex mode, both objects are read-write ends, with duplex=False (simplex) the first connection object is the read-end of the pipe, the second is the write-end. In this basic scenario we just need a simplex-pipe:
from multiprocessing import Process, Pipe
SENTINEL = 'SENTINEL'
def sim_busy(write_conn, x):
for _ in range(int(x)):
assert 1 == 1
result = x
write_conn.send(result)
# If all results are send, send a sentinel-value to let the parent know
# no more results will come.
write_conn.send(SENTINEL)
if __name__ == '__main__':
# duplex=False because we just need one-way communication in this case.
read_conn, write_conn = Pipe(duplex=False)
p = Process(target=sim_busy, args=(write_conn, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(read_conn.recv, SENTINEL): # sentinel breaks the loop
print(result)
Thread & Queue
For use with threading, you want to switch to queue.Queue. queue.Queue is build on top of a collections.deque, adding some locks to make it thread-safe. Unlike with multiprocessing's queue and pipe, objects put on a queue.Queue won't get pickled. Since threads share the same memory address-space, serialization for memory-copying is unnecessary, only pointers are transmitted.
from threading import Thread
from queue import Queue
import time
SENTINEL = 'SENTINEL'
def sim_io(out_queue, query):
time.sleep(1)
result = query + '_result'
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Thread(target=sim_io, args=(out_queue, 'my_query'))
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel-value breaks the loop
print(result)
Read here why for result in iter(out_queue.get, SENTINEL):
should be prefered over a while True...break setup, where possible.
Read here why you should use if __name__ == '__main__': in all your scripts and especially in multiprocessing.
More about get()-usage here.

Inverse of ProcessPoolExecutor in Python

This is related to my earlier problem which I'm still working on solving. Essentially I need the inverse design of ProcessPoolExecutor, where I have many querying processes and one worker which calculates and sends back results in batches.
Sending the work items is easy with one shared queue, but I still don't have a nice solution for sending all the results back to the right threads on the right processes.
I think it makes the most sense to have a separate multiprocessing.pipe for each querying process. The worker process waits for an available item on any pipe, and the dequeues and processes it, keeping track of which pipe it came from. When it's time to send data back, it feeds the results onto the correct pipe.
Here's a simple example:
#!/usr/bin/env python3
import multiprocessing as mp
def worker(pipes):
quit = [False] * len(pipes)
results = [''] * len(pipes)
# Wait for all workers to send None before quitting
while not all(quit):
ready = mp.connection.wait(pipes)
for pipe in ready:
# Get index of query proc's pipe
i = pipes.index(pipe)
# Receive and "process"
obj = pipe.recv()
if obj is None:
quit[i] = True
continue
result = str(obj)
results[i] += result
# Send back to query proc
pipes[i].send(result)
print(results)
def query(pipe):
for i in 'do some work':
pipe.send(i)
assert pipe.recv() == i
pipe.send(None) # Send sentinel
if __name__ == '__main__':
nquery_procs = 8
work_pipes, query_pipes = zip(*(mp.Pipe() for _ in range(nquery_procs)))
query_procs = [mp.Process(target=query, args=(pipe,)) for pipe in query_pipes]
for p in query_procs:
p.start()
worker(work_pipes)
for p in query_procs:
p.join()
Alternatively, you could give each querying process an ID number (which might just be its pipe's index), and any request must be a tuple which is (id_num, data). This just gets around the worker process doing pipes.index(pipe) on each loop, so I'm not sure how much it buys you.

python multiprocessing .join() deadlock depends on worker function

I am using the multiprocessing python library to spawn 4 Process() objects to parallelize a cpu intensive task. The task (inspiration and code from this great article) is to compute the prime factors for every integer in a list.
main.py:
import random
import multiprocessing
import sys
num_inputs = 4000
num_procs = 4
proc_inputs = num_inputs/num_procs
input_list = [int(1000*random.random()) for i in xrange(num_inputs)]
output_queue = multiprocessing.Queue()
procs = []
for p_i in xrange(num_procs):
print "Process [%d]"%p_i
proc_list = input_list[proc_inputs * p_i:proc_inputs * (p_i + 1)]
print " - num inputs: [%d]"%len(proc_list)
# Using target=worker1 HANGS on join
p = multiprocessing.Process(target=worker1, args=(p_i, proc_list, output_queue))
# Using target=worker2 RETURNS with success
#p = multiprocessing.Process(target=worker2, args=(p_i, proc_list, output_queue))
procs.append(p)
p.start()
for p in jobs:
print "joining ", p, output_queue.qsize(), output_queue.full()
p.join()
print "joined ", p, output_queue.qsize(), output_queue.full()
print "Processing complete."
ret_vals = []
while output_queue.empty() == False:
ret_vals.append(output_queue.get())
print len(ret_vals)
print sys.getsizeof(ret_vals)
Observation:
If the target for each process is the function worker1, for an input list larger than 4000 elements the main thread gets stuck on .join(), waiting for the spawned processes to terminate and never returns.
If the target for each process is the function worker2, for the same input list the code works just fine and the main thread returns.
This is very confusing to me, as the only difference between worker1 and worker2 (see below) is that the former inserts individual lists in the Queue whereas the latter inserts a single list of lists for each process.
Why is there deadlock using worker1 and not using worker2 target?
Shouldn't both (or neither) go beyond the Multiprocessing Queue maxsize limit is 32767?
worker1 vs worker2:
def worker1(proc_num, proc_list, output_queue):
'''worker function which deadlocks'''
for num in proc_list:
output_queue.put(factorize_naive(num))
def worker2(proc_num, proc_list, output_queue):
'''worker function that works'''
workers_stuff = []
for num in proc_list:
workers_stuff.append(factorize_naive(num))
output_queue.put(workers_stuff)
There are a lot of similar questions on SO, but I believe the core of this questions is clearly distinct from all of them.
Related Links:
https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/
python multiprocessing issues
python multiprocessing - process hangs on join for large queue
Process.join() and queue don't work with large numbers
Python 3 Multiprocessing queue deadlock when calling join before the queue is empty
Script using multiprocessing module does not terminate
Why does multiprocessing.Process.join() hang?
When to call .join() on a process?
What exactly is Python multiprocessing Module's .join() Method Doing?
The docs warn about this:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
While a Queue appears to be unbounded, under the covers queued items are buffered in memory to avoid overloading inter-process pipes. A process cannot end normally before those memory buffers are flushed. Your worker1() puts a lot more items on the queue than your worker2(), and that's all there is to it. Note that the number of items that can queued before the implementation resorts to buffering in memory isn't defined: it can vary across OS and Python release.
As the docs suggest, the normal way to avoid this is to .get() all the items off the queue before you attempt to .join() the processes. As you've discovered, whether it's necessary to do so depends in an undefined way on how many items have been put on the queue by each worker process.

Multiprocessing has cutoff at 992 integers being joined as result

I am following this book http://doughellmann.com/pages/python-standard-library-by-example.html
Along with some online references. I have some algorithm setup for multiprocessing where i have a large array of dictionaries and do some calculation. I use multiprocessing to divide the indexes on which the calculations are done on the dictionary. To make the question more general, I replaced the algorithm with just some array of return values. From finding information online and other SO, I think it has to do with the join method.
The structure is like so,
Generate some fake data, call the manager function for multiprocessing, create a Queue, divide data over the number of index. Loop through the number of processes to use, send each process function the correct index range. Lastly join the processes and print out the results.
What I have figured out, is if the function used by the processes is trying to return a range(0,992), it works quickly, if the range(0,993), it hangs. I tried on two different computers with different specs.
The code is here:
import multiprocessing
def main():
data = []
for i in range(0,10):
data.append(i)
CalcManager(data,start=0,end=50)
def CalcManager(myData,start,end):
print 'in calc manager'
#Multi processing
#Set the number of processes to use.
nprocs = 3
#Initialize the multiprocessing queue so we can get the values returned to us
tasks = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
#Setup an empty array to store our processes
procs = []
#Divide up the data for the set number of processes
interval = (end-start)/nprocs
new_start = start
#Create all the processes while dividing the work appropriately
for i in range(nprocs):
print 'starting processes'
new_end = new_start + interval
#Make sure we dont go past the size of the data
if new_end > end:
new_end = end
#Generate a new process and pass it the arguments
data = myData[new_start:new_end]
#Create the processes and pass the data and the result queue
p = multiprocessing.Process(target=multiProcess,args=(data,new_start,new_end,result_q,i))
procs.append(p)
p.start()
#Increment our next start to the current end
new_start = new_end+1
print 'finished starting'
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
#Print out the results
for i in range(nprocs):
result = result_q.get()
print result
#MultiProcess Handling
def multiProcess(data,start,end,result_q,proc_num):
print 'started process'
results = range(0,(992))
result_q.put(results)
return
if __name__== '__main__':
main()
Is there something about these numbers specifically or am I just missing something basic that has nothing to do with these numbers?
From my searches, it seems this is some memory issue with the join method, but the book does not really explain how to solve this using this setup. Is it possible to use this structure (i understand it mostly, so it would be nice if i can continue to use this) and also pass back large results. I know there are other methods to share data between processes, but thats not what I need, just return the values and join them to one array once completed.
I can't reproduce this on my machine, but it sounds like items in put into the queue haven't been flushed to the underlying pipe. This will cause a deadlock if you try to terminate the process, according to the docs:
As mentioned above, if a child process has put items on a queue (and
it has not used JoinableQueue.cancel_join_thread), then that process
will not terminate until all buffered items have been flushed to the
pipe. This means that if you try joining that process you may get a
deadlock unless you are sure that all items which have been put on the
queue have been consumed. Similarly, if the child process is
non-daemonic then the parent process may hang on exit when it tries to
join all its non-daemonic children.
If you're in this situation. your p.join() calls will hang forever, because there's still buffered data in the queue. You can avoid it by consuming from the queue before you join the processes:
#Print out the results
for i in range(nprocs):
result = result_q.get()
print result
#Joint the process to wait for all data/process to be finished
for p in procs:
p.join()
This doesn't affect the way the code works, each result_q.get() call will block until the result is placed on the queue, which has the same effect has calling join on all processes prior to calling get. The only difference is you avoid the deadlock.

Categories

Resources