Parallelizing different functions at the same time in python - python

I want to execute f1 and f2 at the same time. but the following code doesn't work!
from multiprocessing import Pool
def f1(x):
return x*x
def f2(x):
return x^2
if __name__ == '__main__':
x1=10
x2=20
p= Pool(2)
out=(p.map([f1, f2], [x1, x2]))
y1=out[0]
y2=out[1]

I believe you'd like to use threading.Thread and shared queue in your code.
from queue import Queue
from threading import Thread
import time
def f1(q, x):
# Sleep function added to compare execution times.
time.sleep(5)
# Instead of returning the result we put it in shared queue.
q.put(x * 2)
def f2(q, x):
time.sleep(5)
q.put(x ^ 2)
if __name__ == '__main__':
x1 = 10
x2 = 20
result_queue = Queue()
# We create two threads and pass shared queue to both of them.
t1 = Thread(target=f1, args=(result_queue, x1))
t2 = Thread(target=f2, args=(result_queue, x2))
# Starting threads...
print("Start: %s" % time.ctime())
t1.start()
t2.start()
# Waiting for threads to finish execution...
t1.join()
t2.join()
print("End: %s" % time.ctime())
# After threads are done, we can read results from the queue.
while not result_queue.empty():
result = result_queue.get()
print(result)
Code above should print output similar to:
Start: Sat Jul 2 20:50:50 2016
End: Sat Jul 2 20:50:55 2016
20
22
As you can see, even though both functions wait 5 seconds to yield their results, they do it in parallel so overall execution time is 5 seconds.
If you care about what function put what result in your queue, I can see two solutions that will allow to determine that. You can either create multiple queues or wrap your results in a tuple.
def f1(q, x):
time.sleep(5)
# Tuple containing function information.
q.put((f1, x * 2))
And for further simplification (especially when you have many functions to deal with) you can decorate your functions (to avoid repeated code and to allow function calls without queue):
def wrap_result(func):
def wrapper(*args):
# Assuming that shared queue is always the last argument.
q = args[len(args) - 1]
# We use it to store the results only if it was provided.
if isinstance(q, Queue):
function_result = func(*args[:-1])
q.put((func, function_result))
else:
function_result = func(*args)
return function_result
return wrapper
#wrap_result
def f1(x):
time.sleep(5)
return x * 2
Note that my decorator was written in a rush and its implementation might need improvements (in case your functions accept kwargs, for instance). If you decide to use it, you'll have to pass your arguments in reverse order: t1 = threading.Thread(target=f1, args=(x1, result_queue)).
A little friendly advice.
"Following code doesn't work" says nothing about the problem. Is it raising an exception? Is it giving unexpected results?
It's important to read error messages. Even more important - to study their meaning. Code that you have provided raises a TypeError with pretty obvious message:
File ".../stack.py", line 16, in <module> out = (p.map([f1, f2], [x1, x2]))
TypeError: 'list' object is not callable
That means first argument of Pool().map() have to be a callable object, a function for instance. Let's see the docs of that method.
Apply func to each element in iterable, collecting the results in a
list that is returned.
It clearly doesn't allow a list of functions to be passed as it's argument.
Here you can read more about Pool().map() method.

I want to execute f1 and f2 at the same time. but the following code doesn't work! ...
out=(p.map([f1, f2], [x1, x2]))
The minimal change to your code is to replace the p.map() call with:
r1 = p.apply_async(f1, [x1])
out2 = f2(x2)
out1 = r1.get()
Though if all you want is to run two function calls concurrently then you don't need the Pool() here, you could just start a Thread/Process manually and use Pipe/Queue to get the result:
#!/usr/bin/env python
from multiprocessing import Process, Pipe
def another_process(f, args, conn):
conn.send(f(*args))
conn.close()
if __name__ == '__main__':
parent_conn, child_conn = Pipe(duplex=False)
p = Process(target=another_process, args=(f1, [x1], child_conn))
p.start()
out2 = f2(x2)
out1 = parent_conn.recv()
p.join()

Related

Running different Python functions in separate CPUs

Using multiprocessing.pool I can split an input list for a single function to be processed in parallel along multiple CPUs. Like this:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4)
results = pool.map(f, range(100))
pool.close()
pool.join()
However, this does not allow to run different functions on different processors. If I want to do something like this, in parallel / simultaneously:
foo1(args1) --> Processor1
foo2(args2) --> Processor2
How can this be done?
Edit: After Darkonaut remarks, I do not care about specifically assigning foo1 to Processor number 1. It can be any processor as chosen by the OS. I am just interested in running independent functions in different/ parallel Processes. So rather:
foo1(args1) --> process1
foo2(args2) --> process2
I usually find it easiest to use the concurrent.futures module for concurrency. You can achieve the same with multiprocessing, but concurrent.futures has (IMO) a much nicer interface.
Your example would then be:
from concurrent.futures import ProcessPoolExecutor
def foo1(x):
return x * x
def foo2(x):
return x * x * x
if __name__ == '__main__':
with ProcessPoolExecutor(2) as executor:
# these return immediately and are executed in parallel, on separate processes
future_1 = executor.submit(foo1, 1)
future_2 = executor.submit(foo2, 2)
# get results / re-raise exceptions that were thrown in workers
result_1 = future_1.result() # contains foo1(1)
result_2 = future_2.result() # contains foo2(2)
If you have many inputs, it is better to use executor.map with the chunksize argument instead:
from concurrent.futures import ProcessPoolExecutor
def foo1(x):
return x * x
def foo2(x):
return x * x * x
if __name__ == '__main__':
with ProcessPoolExecutor(4) as executor:
# these return immediately and are executed in parallel, on separate processes
future_1 = executor.map(foo1, range(10000), chunksize=100)
future_2 = executor.map(foo2, range(10000), chunksize=100)
# executor.map returns an iterator which we have to consume to get the results
result_1 = list(future_1) # contains [foo1(x) for x in range(10000)]
result_2 = list(future_2) # contains [foo2(x) for x in range(10000)]
Note that the optimal values for chunksize, the number of processes, and whether process-based concurrency actually leads to increased performance depends on many factors:
The runtime of foo1 / foo2. If they are extremely cheap (as in this example), the communication overhead between processes might dominate the total runtime.
Spawning a process takes time, so the code inside with ProcessPoolExecutor needs to run long enough for this to amortize.
The actual number of physical processors in the machine you are running on.
Whether your application is IO bound or compute bound.
Whether the functions you use in foo are already parallelized (such as some np.linalg solvers, or scikit-learn estimators).

Function that multiprocesses another function

I'm performing analyses of time-series of simulations. Basically, it's doing the same tasks for every time steps. As there is a very high number of time steps, and as the analyze of each of them is independant, I wanted to create a function that can multiprocess another function. The latter will have arguments, and return a result.
Using a shared dictionnary and the lib concurrent.futures, I managed to write this :
import concurrent.futures as Cfut
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
# **param_fixed : passing parameters
manager = mlp.Manager()
dic = manager.dict()
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param, dic, *args)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
return [dic[i] for i in sorted(dic.keys())]
Typically, I can use it like this :
def read_file(files, dictionnary):
for file in files:
i = int(file[4:9])
#print(str(i))
if 'bz2' in file:
os.system('bunzip2 ' + file)
file = file[:-4]
dictionnary[i] = np.loadtxt(file)
os.system('bzip2 ' + file)
Map = np.array(multiprocess_loop_grouped(read_file, list_alti, Group_size, N_thread))
or like this :
def autocorr(x):
result = np.correlate(x, x, mode='full')
return result[result.size//2:]
def find_lambda_finger(indexes, dic, Deviation):
for i in indexes :
#print(str(i))
# Beach = Deviation[i,:] - np.mean(Deviation[i,:])
dic[i] = Anls.find_first_max(autocorr(Deviation[i,:]), valmax = True)
args = [Deviation]
Temp = Rescal.multiprocess_loop_grouped(find_lambda_finger, range(Nalti), Group_size, N_thread, *args)
Basically, it is working. But it is not working well. Sometimes it crashes. Sometimes it actually launches a number of python processes equal to Nworkers, and sometimes there is only 2 or 3 of them running at a time while I specified Nworkers = 15.
For example, a classic error I obtain is described in the following topic I raised : Calling matplotlib AFTER multiprocessing sometimes results in error : main thread not in main loop
What is the more Pythonic way to achieve what I want ? How can I improve the control this function ? How can I control more the number of running python process ?
One of the basic concepts for Python multi-processing is using queues. It works quite well when you have an input list that can be iterated and which does not need to be altered by the sub-processes. It also gives you a good control over all the processes, because you spawn the number you want, you can run them idle or stop them.
It is also a lot easier to debug. Sharing data explicitly is usually an approach that is much more difficult to setup correctly.
Queues can hold anything as they are iterables by definition. So you can fill them with filepath strings for reading files, non-iterable numbers for doing calculations or even images for drawing.
In your case a layout could look like that:
import multiprocessing as mp
import numpy as np
import itertools as it
def worker1(in_queue, out_queue):
#holds when nothing is available, stops when 'STOP' is seen
for a in iter(in_queue.get, 'STOP'):
#do something
out_queue.put({a: result}) #return your result linked to the input
def worker2(in_queue, out_queue):
for a in iter(in_queue.get, 'STOP'):
#do something differently
out_queue.put({a: result}) //return your result linked to the input
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# your final result
result = {}
in_queue = mp.Queue()
out_queue = mp.Queue()
# fill your input
for a in param_list:
in_queue.put(a)
# stop command at end of input
for n in range(Nworkers):
in_queue.put('STOP')
# setup your worker process doing task as specified
process = [mp.Process(target=function,
args=(in_queue, out_queue), daemon=True) for x in range(Nworkers)]
# run processes
for p in process:
p.start()
# wait for processes to finish
for p in process:
p.join()
# collect your results from the calculations
for a in param_list:
result.update(out_queue.get())
return result
temp = multiprocess_loop_grouped(worker1, param_list, group_size, Nworkers, *args)
map = multiprocess_loop_grouped(worker2, param_list, group_size, Nworkers, *args)
It can be made a bit more dynamic when you are afraid that your queues will run out of memory. Than you need to fill and empty the queues while the processes are running. See this example here.
Final words: it is not more Pythonic as you requested. But it is easier to understand for a newbie ;-)

python multiprocessing with different functions

I can't find an appropriate method for the next task:
I have an input massive [x1, x2, x3 ... xn] and function that I want to apply to every x from massive f(xi, param) so there are eight possible values of this param, and the result of function doesn't depend from param. So I need to run this function f for all possible values of param at the same time. and it's absolutely not important which function work with which x. it's important to all this eight function f(xi, p1), f(xi, p2)... f(xi, p8) works at the same time and all the x will be processed only once.
Seems pretty straightforward. I don't know what you mean by "all the x will be processed only once", but that doesn't sound possible. The function still has to be called separately for each parameter (Unless some black magician on SO can prove me wrong, which seems likely)
from multiprocessing import Pool
def f(x, param):
...do stuff...
if __name__ == '__main__':
params = [1,2,3,4,5,6,7,8]
try:
pool = Pool(8)
pool.starmap(f, [(x,param) for param in params])
except SomeError:
...do error stuff...
finally:
pool.close()
pool.join()

Multiprocessing a loop inside a loop inside a function

I wrote some code to break up a for loop into multiple processes to speed up calculations.
import numpy as np
import formfactors
from subdivide_loop import subdivide_loop
import multiprocessing
def worker(start, end, triangleI, areaI, scene, kdtree, samples, output):
form_factors = np.zeros(end-start)
for j in range(start, end):
triangleJ = np.array(scene[j][0:4])
form_factors[start] = formfactors.uniform(triangleJ, triangleI, areaI, kdtree, samples)
result = output.get(block=True)
for j in range(start, end):
result[j] = form_factors[j]
output.put(result)
def calculate_formfactors(start, end, triangleI, areaI, scene, kdtree, samples, output, nb_processes,
max_interval_length):
intervals = subdivide_loop(start, end, max_interval_length, nb_processes)
print("start")
jobs = []
for k in range(nb_processes):
p = multiprocessing.Process(target=worker,
args=(intervals[k][0], intervals[k][1], triangleI, areaI, scene, kdtree,
samples, output))
jobs.append(p)
for p in jobs:
p.start()
for p in jobs:
p.join()
results = output.get()
return results
I would like to be able to call calculate_formfactors() inside a function inside a loop, like this:
def outer_function():
for i in range(1000):
for j in range(i + 1, 1000, max_interval_length):
form_factors = calculate_formfactors(args)
But running this gives an error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
Because of how the outer function works, breaking up outer_function() instead of calculate_formfactors() is not possible.
So, any advice on how to do this?
As the error suggests, make sure your outer_function() (or whatever initiates it) is called from within an __main__ guard, e.g.
if __name__ == "__main__":
outer_function()
It doesn't have to be the outer_function() but you need to trace it all back to the first step that initializes the chain that ultimately leads to the call to multiprocessing.Process() and put it within the above block.
This is because on non-forking systems multiple processes are essentially run as subprocesses so creating new processes from the main script would end up with an infinite recursion/processes spawning. You can read more about it in this answer. Because of that, you have to make sure your multiprocessing initialization code executes only once which is where the __main__ guard comes in.

Multiprocessing 2 different functions python3

I am struggling for a while with Multiprocessing in Python. I would like to run 2 independent functions simultaneously, wait until both calculations are finished and then continue with the output of both functions. Something like this:
# Function A:
def jobA(num):
result=num*2
return result
# Fuction B:
def jobB(num):
result=num^3
return result
# Parallel process function:
{resultA,resultB}=runInParallel(jobA(num),jobB(num))
I found other examples of multiprocessing however they used only one function or didn't returned an output. Anyone knows how to do this? Many thanks!
I'd recommend creating processes manually (rather than as part of a pool), and sending the return values to the main process through a multiprocessing.Queue. These queues can share almost any Python object in a safe and relatively efficient way.
Here's an example, using the jobs you've posted.
def jobA(num, q):
q.put(num * 2)
def jobB(num, q):
q.put(num ^ 3)
import multiprocessing as mp
q = mp.Queue()
jobs = (jobA, jobB)
args = ((10, q), (2, q))
for job, arg in zip(jobs, args):
mp.Process(target=job, args=arg).start()
for i in range(len(jobs)):
print('Result of job {} is: {}'.format(i, q.get()))
This prints out:
Result of job 0 is: 20
Result of job 1 is: 1
But you can of course do whatever further processing you'd like using these values.

Categories

Resources