Collecting result from different process in python - python

I am doing several process togethere. Each of the proccess returns some results. How would I collect those results from those process.
task_1 = Process(target=do_this_task,args=(para_1,para_2))
task_2 = Process(target=do_this_task,args=(para_1,para_2))
do_this_task returns some results. I would like to collect those results and save them in some variable.

So right now I would suggest you should use the python multiprocessing module's Pool as it handles quite a bit for you. Could you elaborate what you're doing and why you want to use what I assume to be multiprocessing.Process directly?
If you still want to use multiprocessing.Process directly you should use a Queue to get the return values.
example given in the docs:
"
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
"-Multiprocessing Docs
So processes are things that usually run in the background to do something in general, if you do multiprocessing with them you need to 'throw around' the data since processes don't have shared memory like threads - so that's why you use the Queue - it does it for you. Another thing you can do is pipes, and conveniently they give an example for that as well :).
"
from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print parent_conn.recv() # prints "[42, None, 'hello']"
p.join()
"
-Multiprocessing Docs
what this does is manually use pipes to throw around the finished results to the 'parent process' in this case.
Also sometimes I find cases which multiprocessing cannot pickle well so I use this great answer (or my modified specialized variants of) by mrule that he posts here:
"
from multiprocessing import Process, Pipe
from itertools import izip
def spawn(f):
def fun(pipe,x):
pipe.send(f(x))
pipe.close()
return fun
def parmap(f,X):
pipe=[Pipe() for x in X]
proc=[Process(target=spawn(f),args=(c,x)) for x,(p,c) in izip(X,pipe)]
[p.start() for p in proc]
[p.join() for p in proc]
return [p.recv() for (p,c) in pipe]
if __name__ == '__main__':
print parmap(lambda x:x**x,range(1,5))
"
you should be warned however that this takes over control manually of the processes so certain things can leave 'dead' processes lying around - which is not a good thing, an example being unexpected signals - this is an example of using pipes for multi-processing though :).
If those commands are not in python, e.g. you want to run ls then you might be better served by using subprocess, as os.system isn't a good thing to use anymore necessarily as it is now considered that subprocess is an easier-to-use and more flexible tool, a small discussion is presented here.

You can do something like this with multiprocessing
from multiprocessing import Pool
mydict = {}
with Pool(processes=5) as pool:
task_1 = pool.apply_async(do_this_task,args=(para_1,para_2))
task_2 = pool.apply_async(do_this_task,args=(para_1,para_2))
mydict.update({"task_1": task_1.get(), "task_2":task_2.get()})
print(mydict)
or if you would like to try multithreading with concurrent.futures then take a look at this answer.

If the processes are external scripts then try using the subprocess module. However, your code suggests you want to run functions in parallel. For this, try the multiprocessing module. Some code from this answer for specific details of using multiprocessing:
def foo(bar, baz):
print 'hello {0}'.format(bar)
return 'foo' + baz
from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=1)
async_result = pool.apply_async(foo, ('world', 'foo')) # tuple of args for foo
# do some other stuff in the other processes
return_val = async_result.get() # get the return value from your function.

Related

Run function in parallel and grab outputs using Queue

I would like to fun a function using different arguments. For each different argument, I would like to run the function in parallel and then get the output of each run. It seems that the multiprocessing module can help here. I am not sure about the right steps to make this work.
Do I start all the processes, then get all the queues and then join all the processes in this order? Or do I get the results after I have joined? Or do I get the ith result after I have joined the ith process?
from numpy.random import uniform
from multiprocessing import Process, Queue
def function(x):
return uniform(0.0, x)
if __name__ == "__main__":
queue = Queue()
processes = []
x_values = [1.0, 10.0, 100.0]
# Start all processes
for x in x_values:
process = Process(target=function, args=(x, queue, ))
processes.append(process)
process.start()
# Grab results of the processes?
outputs = [queue.get() for _ in range(len(x_values))]
# Not even sure what this does but apparently it's needed
for process in processes:
process.join()
So lets make a simple example for multiprocessing pools with a loaded function that sleeps for 3 seconds and returns the value passed to it(your parameter) and also the result of the function which is just doubling it.
IIRC there's some issue with stopping pools cleanly
from multiprocessing import Pool
import time
def time_waster(val):
try:
time.sleep(3)
return (val, val*2) #return a tuple here but you can use a dict as well with all your parameters
except KeyboardInterrupt:
raise KeyboardInterruptError()
if __name__ == '__main__':
x = list(range(5)) #values to pass to the function
results = []
try:
with Pool(2) as p: #I use 2 but you can use as many as you have cores
results.append(p.map(time_waster,x))
except KeyboardInterrupt:
p.terminate()
except Exception as e:
p.terminate()
finally:
p.join()
print(results)
As an extra service added some keyboardinterrupt handlers as IIRC there are some issues interrupting pools.https://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool
proc.join() blocks until the process ended. queue.get() blocks until there is something in the queue. Because your processes don't put anything into the queue (in this example) than this code will never get beyond the queue.get() part... If your processes put something in the queue at the very end, then it doesn't matter if you first join() or get() because they happen at about the same time.

join() method in multiprocessing with input and output Queue()

I have a SOMETIMES working piece of code:
def my_map2(func, iterable, processes=None):
''' FASTER THAN mp.Pool() - NOT '''
import multiprocessing as mp # Load multiprocessing library
mp.freeze_support()
if (processes == None):
processes = mp.cpu_count() # Set maximum number of cores?
L = len(iterable)
iter2 = zip(iterable,range(L))
IN = mp.Queue()
for x in iter2:
IN.put(x)
OUT = mp.Queue() # = mp.JoinableQueue() Q3
lock = mp.Lock() # Q2
def target_fun(IN, OUT):
while not IN.empty():
inp = IN.get()
out = (inp[0],func(inp[1]))
with lock: # Q2
OUT.put(out)
proc = [mp.Process(target=target_fun, args=(IN, OUT,)) for x in range(processes)]
for p in proc: p.start() # Run proc
for p in proc: p.join() # Exit the completed proc Q1
results = [OUT.get() for p in range(L)] # Get Results Back Q1
results.sort() # Sort
return( [r[1] for r in results] )
import time
def f(x):
time.sleep(0.1);
return(x)
res = my_map2(f,range(100))
Questions:
join() called before collecting from the queue. Why does this code work at all? join() is invoked before before Queue is empty so it should deadlock according to documentation:
any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock.
However if 2 lines marked with 'Q1' are swapped then we should run into an issue "getting from an empty queue", but both variants seem to SOMETIMES work...
If I remove the Lock() (edit 2 lines with Q2) it does get a deadlock. Why now?
Queues are thread and process safe. When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks. (talking about pipes and queues)
JoinableQueue. If I try using it it does not work. I tried all:
If you use JoinableQueue then you must call JoinableQueue.task_done().
If a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread()), then that process will not terminate until all buffered items have been flushed to the pipe.
Question 3 is: So what is an actual difference between Normal and Joinable queue? Does it mean that you can terminate child that has put something on a normal Queue at any time? I thought only manager processes work like this...
Also, so how do you correctly use both Queues?
I have read library reference, quite a few tutorials and stack overflow posts, but does anybody really understand how this works? (Talking about join(), different Queue types, when is the lock needed and accurate understanding of processes behind)

When do I use join method in multiprocessing module?

I'm learning about the multiprocessing module. I've found these examples in the documentation at python.org:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
Here they use join to finish the process.
from multiprocessing import Process, Lock
def f(l, i):
l.acquire()
try:
print('hello world', i)
finally:
l.release()
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
But they don't use it in this case. I also read this:
Remember also that non-daemonic processes will be joined automatically.
That explains the second example. So why should I use join in the first one? Must I do that because the Process is in a variable?
You should use join() when you want to wait for any subprocess to finish, e.g. if your main program wants to do something based on the results of the workers. You should also call join() if your main process is long running and creates subprocess frequently. Otherwise, the ones you didn't join will accumulate as "zombie processes".
In general, whenever the thread of execution of your main process reaches a point where waiting for the subprocesses doesn't hurt, just do so. It's a bit like closing a file -- it's not strictly necessary, since all files will be implicitly closed on exit, but it is good practice, since it saves resources.

Returns values of functions Python Multiprocessing.Process

I'm trying to generate the checksum of two identical files (in two different directories) and am using multiprocessing.Process() to run the checksum of both files simultaneously instead of sequentially.
However, when I run the multiprocessing.Process() object on the checksum generating function I get this return value:
<Process(Process-1, stopped)>
<Process(Process-2, stopped)>
These should be a list of checksum strings.
The return statement from the generating function is:
return chksum_list
Pretty basic and the program works well when running sequentially.
How do I retrieve the return value of the function that is being processed through the multiprocessing.Process() object?
Thanks.
The docs are relatively good on this topic;
Pipes
You could communicate via a pipe to the process objects;
From the docs:
from multiprocessing import Process, Pipe
def f(conn):
conn.send([42, None, 'hello'])
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print parent_conn.recv() # prints "[42, None, 'hello']"
p.join()
Pool & map
Alternatively you could use a Pool of processes:
pool = Pool(processes=4)
returnvals = pool.map(f, range(10))
where f is your function, which will act on each member of range(10).
Similarly, you can pass in any list containing the inputs to your processes;
returnvals = pool.map(f, [input_to_process_1, input_to_process_2])
In your specific case, input_to_process_1/2 could be paths to the files you're doing checksums on, while f is your checksum function.

Is it possible to run function in a subprocess without threading or writing a separate file/script.

import subprocess
def my_function(x):
return x + 100
output = subprocess.Popen(my_function, 1) #I would like to pass the function object and its arguments
print output
#desired output: 101
I have only found documentation on opening subprocesses using separate scripts. Does anyone know how to pass function objects or even an easy way to pass function code?
I think you're looking for something more like the multiprocessing module:
http://docs.python.org/library/multiprocessing.html#the-process-class
The subprocess module is for spawning processes and doing things with their input/output - not for running functions.
Here is a multiprocessing version of your code:
from multiprocessing import Process, Queue
# must be a global function
def my_function(q, x):
q.put(x + 100)
if __name__ == '__main__':
queue = Queue()
p = Process(target=my_function, args=(queue, 1))
p.start()
p.join() # this blocks until the process terminates
result = queue.get()
print result
You can use the standard Unix fork system call, as os.fork(). fork() will create a new process, with the same script running. In the new process, it will return 0, while in the old process it will return the process ID of the new process.
child_pid = os.fork()
if child_pid == 0:
print "New proc"
else:
print "Old proc"
For a higher level library, that provides multiprocessing support that provides a portable abstraction for using multiple processes, there's the multiprocessing module. There's an article on IBM DeveloperWorks, Multiprocessing with Python, with a brief introduction to both techniques.
Brian McKenna's above post about multiprocessing is really helpful, but if you wanted to go down the threaded route (opposed to process-based), this example will get you started:
import threading
import time
def blocker():
while True:
print "Oh, sorry, am I in the way?"
time.sleep(1)
t = threading.Thread(name='child procs', target=blocker)
t.start()
# Prove that we passed through the blocking call
print "No, that's okay"
You can also use the setDaemon(True) feature to background the thread immediately.
You can use concurrent.futures.ProcessPoolExecutor, which not only propagates the return value, but also any exceptions:
import concurrent.futures
# must be a global function
def my_function(x):
if x < 0:
raise ValueError
return x + 100
with concurrent.futures.ProcessPoolExecutor() as executor:
f = executor.submit(my_function, 1)
ret = f.result() # will rethrow any exceptions

Categories

Resources