In Python I'm running a command of the form
reduce(func, bigArray[1:], bigArray[0])
and I'd like to add parallel processing to speed it up.
I am aware I can do this manually by splitting the array, running processes on the separate portions, and combining the result.
However, given the ubiquity of running reduce in parallel, I wanted to see if there's a native way, or a library, that will do this automatically.
I'm running a single machine with 6 cores.
For anyone stumbling across this, I ended up writing a helper to do it
def parallelReduce(l, numCPUs, connection=None):
if numCPUs == 1 or len(l) <= 100:
returnVal= reduce(reduceFunc, l[1:], l[0])
if connection != None:
connection.send(returnVal)
return returnVal
parent1, child1 = multiprocessing.Pipe()
parent2, child2 = multiprocessing.Pipe()
p1 = multiprocessing.Process(target=parallelReduce, args=(l[:len(l) // 2], numCPUs // 2, child1, ) )
p2 = multiprocessing.Process(target=parallelReduce, args=(l[len(l) // 2:], numCPUs // 2 + numCPUs%2, child2, ) )
p1.start()
p2.start()
leftReturn, rightReturn = parent1.recv(), parent2.recv()
p1.join()
p2.join()
returnVal = reduceFunc(leftReturn, rightReturn)
if connection != None:
connection.send(returnVal)
return returnVal
Note that you can get the number of CPUs with multiprocessing.cpu_count()
Using this function showed substantial performance increase over the serial version.
If you're able to combine map and reduce (or want to concatenate the result instead of a more general reduce) you could use mr4p:
https://github.com/lapets/mr4mp
The code for the _reduce function inside the class appears to implement parallel processing via multiprocessing.pool to pool the usual reduce processes, roughly by following a process:
reduce(<Function used to reduce>, pool.map(partial(reduce, <function used to reduce>), <List of results to reduce>))
I haven't tried it yet but it seems the syntax is:
mr4mp.pool().mapreduce(<Function to be mapped>,<Function used to reduce>, <List of entities to apply function on>)
Related
I am trying to run 1200 iterations of a function with different values using multiprocessing.
Is there a way I can set the priority and affinities of the processors within the function itself ?
Here is an example of what I am doing :
with multiprocessing.Pool(processes=3) as pool:
r = pool.map(func, (c for c in combinations))
I want each of the 3 processes to have high priority using psutil, and the cpu_affinity to be specified. While I can use: psutil.Process().HIGH_PRIORITY_CLASS withing func, how should I specify different affinities for the three processors?
I would use the initializer function in mp.Pool:
#same prio for each child
def init_priority(prio_level):
set_prio(prio_level)
if __name__ == "__main__":
with Pool(nprocs, init_priority, (prio_level,)) as p:
p.map(...)
#Different prio for each child: (this may not be very useful
#because you cannot choose which child will accept each "task").
def init_priority(q):
prio_level = q.get()
set_prio(prio_level)
if __name__ == "__main__":
q = mp.Queue()
for _ in range(nprocs): #put one prio_level for each process
q.put(prio_level)
with Pool(nprocs, init_priority, (q,)) as p:
p.map(...)
If you need to have some high priority child processes and some low priority, and will need to be able to discern easily between them, I would skip mp.Pool, and just use your own Process's.
Is there a way to run a function in parallel within an already parallelised function? I know that using multiprocessing.Pool() this is not possible as a daemonic process can not create a child process. I am fairly new to parallel computing and am struggling to find a workaround.
I currently have several thousand calculations that need to be run in parallel using some other commercially available quantum mechanical code I interface to. Each calculation, has three subsequent calculations that need to be executed in parallel on normal termination of the parent calculation, if the parent calculation does not terminate normally, that is the end of the calculation for that point. I could always combine these three subsequent calculations into one big calculation and run normally - although I would much prefer to run separately in parallel.
Main currently looks like this, run() is the parent calculation that is first run in parallel for a series of points, and par_nacmes() is the function that I want to run in parallel for three child calculations following normal termination of the parent.
def par_nacmes(nacme_input_data):
nacme_dir, nacme_input, index = nacme_input_data # Unpack info in tuple for the calculation
axes_index = get_axis_index(nacme_input)
[norm_term, nacme_outf] = util.run_calculation(molpro_keys, pwd, nacme_dir, nacme_input, index) # Submit child calculation
if norm_term:
data.extract_nacme(nacme_outf, molpro_keys['nacme_regex'], index, axes_index)
else:
with open('output.log', 'w+') as f:
f.write('NACME Crashed for GP%s - axis %s' % (index, axes_index))
def run(grid_point):
index, geom = grid_point
if inputs['code'] == 'molpro':
[spe_dir, spe_input] = molpro.setup_spe(inputs, geom, pwd, index)
[norm_term, spe_outf] = util.run_calculation(molpro_keys, pwd, spe_dir, spe_input, index) # Run each parent calculation
if norm_term: # If parent calculation terminates normally - Extract data and continue with subsequent calculations for each point
data.extract_energies(spe_dir+spe_outf, inputs['spe'], molpro_keys['energy_regex'],
molpro_keys['cas_prog'], index)
if inputs['nacme'] == 'yes':
[nacme_dir, nacmes_inputs] = molpro.setup_nacme(inputs, geom, spe_dir, index)
nacmes_data = [(nacme_dir, nacme_inp, index) for nacme_inp in nacmes_inputs] # List of three tuples - each with three elements. Each tuple describes a child calculation to be run in parallel
nacme_pool = multiprocessing.Pool()
nacme_pool.map(par_nacmes, [nacme_input for nacme_input in nacmes_data]) # Run each calculation in list of tuples in parallel
if inputs['grad'] == 'yes':
pass
else:
with open('output.log', 'w+') as f:
f.write('SPE crashed for GP%s' % index)
elif inputs['code'] == 'molcas': # TO DO
pass
if __name__ == "__main__":
try:
pwd = os.getcwd() # parent dir
f = open(inp_geom, 'r')
ref_geom = np.genfromtxt(f, skip_header=2, usecols=(1, 2, 3), encoding=None)
f.close()
geom_list = coordinate_generator(ref_geom) # Generate nuclear coordinates
if inputs['code'] == 'molpro':
couplings = molpro.coupled_states(inputs['states'][-1])
elif inputs['code'] == 'molcas':
pass
data = setup.global_data(ref_geom, inputs['states'][-1], couplings, len(geom_list))
run_pool = multiprocessing.Pool()
run_pool.map(run, [(k, v) for k, v in enumerate(geom_list)]) # Run each parent calculation for each set of coordinates
except StopIteration:
print('Please ensure goemetry file is correct.')
Any insight on how to run these child calculations in parallel for each point would be a great help. I have seen some people suggest using multi-threading instead or to set daemon to false, although I am unsure if this is the best way to do this.
firstly I dont know why you have to run par_nacmes in paralel but if you have to you could:
a use threads to run them instead of processes
or b use multiprocessing.Process to run run however that would involve a lot of overhead so I personally wouldn't do it.
for a all you have to do is
replace
nacme_pool = multiprocessing.Pool()
nacme_pool.map(par_nacmes, [nacme_input for nacme_input in nacmes_data])
in run()
with
threads = []
for nacme_input in nacmes_data:
t = Thread(target=par_nacmes, args=(nacme_input,)); t.start()
threads.append(t)
for t in threads: t.join()
or if you dont care if the treads have finished or not
for nacme_input in nacmes_data:
t = Thread(target=par_nacmes, args=(nacme_input,)); t.start()
I'm performing analyses of time-series of simulations. Basically, it's doing the same tasks for every time steps. As there is a very high number of time steps, and as the analyze of each of them is independant, I wanted to create a function that can multiprocess another function. The latter will have arguments, and return a result.
Using a shared dictionnary and the lib concurrent.futures, I managed to write this :
import concurrent.futures as Cfut
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
# **param_fixed : passing parameters
manager = mlp.Manager()
dic = manager.dict()
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param, dic, *args)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
return [dic[i] for i in sorted(dic.keys())]
Typically, I can use it like this :
def read_file(files, dictionnary):
for file in files:
i = int(file[4:9])
#print(str(i))
if 'bz2' in file:
os.system('bunzip2 ' + file)
file = file[:-4]
dictionnary[i] = np.loadtxt(file)
os.system('bzip2 ' + file)
Map = np.array(multiprocess_loop_grouped(read_file, list_alti, Group_size, N_thread))
or like this :
def autocorr(x):
result = np.correlate(x, x, mode='full')
return result[result.size//2:]
def find_lambda_finger(indexes, dic, Deviation):
for i in indexes :
#print(str(i))
# Beach = Deviation[i,:] - np.mean(Deviation[i,:])
dic[i] = Anls.find_first_max(autocorr(Deviation[i,:]), valmax = True)
args = [Deviation]
Temp = Rescal.multiprocess_loop_grouped(find_lambda_finger, range(Nalti), Group_size, N_thread, *args)
Basically, it is working. But it is not working well. Sometimes it crashes. Sometimes it actually launches a number of python processes equal to Nworkers, and sometimes there is only 2 or 3 of them running at a time while I specified Nworkers = 15.
For example, a classic error I obtain is described in the following topic I raised : Calling matplotlib AFTER multiprocessing sometimes results in error : main thread not in main loop
What is the more Pythonic way to achieve what I want ? How can I improve the control this function ? How can I control more the number of running python process ?
One of the basic concepts for Python multi-processing is using queues. It works quite well when you have an input list that can be iterated and which does not need to be altered by the sub-processes. It also gives you a good control over all the processes, because you spawn the number you want, you can run them idle or stop them.
It is also a lot easier to debug. Sharing data explicitly is usually an approach that is much more difficult to setup correctly.
Queues can hold anything as they are iterables by definition. So you can fill them with filepath strings for reading files, non-iterable numbers for doing calculations or even images for drawing.
In your case a layout could look like that:
import multiprocessing as mp
import numpy as np
import itertools as it
def worker1(in_queue, out_queue):
#holds when nothing is available, stops when 'STOP' is seen
for a in iter(in_queue.get, 'STOP'):
#do something
out_queue.put({a: result}) #return your result linked to the input
def worker2(in_queue, out_queue):
for a in iter(in_queue.get, 'STOP'):
#do something differently
out_queue.put({a: result}) //return your result linked to the input
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# your final result
result = {}
in_queue = mp.Queue()
out_queue = mp.Queue()
# fill your input
for a in param_list:
in_queue.put(a)
# stop command at end of input
for n in range(Nworkers):
in_queue.put('STOP')
# setup your worker process doing task as specified
process = [mp.Process(target=function,
args=(in_queue, out_queue), daemon=True) for x in range(Nworkers)]
# run processes
for p in process:
p.start()
# wait for processes to finish
for p in process:
p.join()
# collect your results from the calculations
for a in param_list:
result.update(out_queue.get())
return result
temp = multiprocess_loop_grouped(worker1, param_list, group_size, Nworkers, *args)
map = multiprocess_loop_grouped(worker2, param_list, group_size, Nworkers, *args)
It can be made a bit more dynamic when you are afraid that your queues will run out of memory. Than you need to fill and empty the queues while the processes are running. See this example here.
Final words: it is not more Pythonic as you requested. But it is easier to understand for a newbie ;-)
I am struggling for a while with Multiprocessing in Python. I would like to run 2 independent functions simultaneously, wait until both calculations are finished and then continue with the output of both functions. Something like this:
# Function A:
def jobA(num):
result=num*2
return result
# Fuction B:
def jobB(num):
result=num^3
return result
# Parallel process function:
{resultA,resultB}=runInParallel(jobA(num),jobB(num))
I found other examples of multiprocessing however they used only one function or didn't returned an output. Anyone knows how to do this? Many thanks!
I'd recommend creating processes manually (rather than as part of a pool), and sending the return values to the main process through a multiprocessing.Queue. These queues can share almost any Python object in a safe and relatively efficient way.
Here's an example, using the jobs you've posted.
def jobA(num, q):
q.put(num * 2)
def jobB(num, q):
q.put(num ^ 3)
import multiprocessing as mp
q = mp.Queue()
jobs = (jobA, jobB)
args = ((10, q), (2, q))
for job, arg in zip(jobs, args):
mp.Process(target=job, args=arg).start()
for i in range(len(jobs)):
print('Result of job {} is: {}'.format(i, q.get()))
This prints out:
Result of job 0 is: 20
Result of job 1 is: 1
But you can of course do whatever further processing you'd like using these values.
I am using the cexprtk wrapper in python to evaluate arithmetic expressions as it offers very fast evaluation compared to the standard eval(). For a large list of expressions the initial overhead is cumbersome as it has to compile all the terms which can take a long time.
However it offers a very nice feature whereby you only need to compile once and can then re-evaluate the expressions using different values for the variables later on; which I want to do.
I was wondering if it was possible to apply Python multiprocessing to this compilation process? I would break apart the large list of arithmetic expressions into sub-lists and feed them separately into functions which apply the cexprtk compilation to the different lists. These can then be run in parallel.
I attempted to do this, but the output is nan whatever I try. Here is a very simple example showing a working cexprtk code without multiprocessing:
import cexprtk
st = cexprtk.Symbol_Table({"W":1, "X":3, "Y":1, "Z":2}, add_constants= True)
L = ['W+X+Y+Z','Y^2*W+Z']
A = [cexprtk.Expression(x, st) for x in L]
print(A)[0]() ## This gives 3 which is correct
print(A)[1]() ## This gives 7 which is correct
Now here is the attempt at using multiprocessing with two lists and two queues:
from multiprocessing import Process, Queue
import cexprtk
st = cexprtk.Symbol_Table({"W":1, "X":3, "Y":1, "Z":2}, add_constants= True)
## Define two lists
L = ['W+X+Y+Z','Y^2*W+Z']
L2 = ['W^5+Z-Y','Y^7+20-X']
## Define functions and put into queue (que)
def myfunc1(que):
lst1 = [cexprtk.Expression(x, st) for x in L]
que.put(lst1)
def myfunc2(que):
lst2 = [cexprtk.Expression(x, st) for x in L2]
que.put(lst2)
queue1 = Queue()
queue2 = Queue()
p1 = Process(target= myfunc1, args= (queue1,))
p2 = Process(target= myfunc2, args= (queue2,))
p1.start()
p2.start()
ans = queue1.get()
ans2 = queue2.get()
print(ans1[0]()) # Gives nan
print(ans2[0]()) # Gives nan
I feel as though this falls in the category of "embarrassingly parallel problems" as the lists are completely separate and no communication is needed between the processes. I have used this exact method of multiprocessing before with great success; but in this instance it is not giving an answer; and as there are no error messages, I have not got any error feedback to work with.
If you use eval() instead it works without issue; so I assume it is the cexprtk wrapper. Is there a way to achieve what I am after? Or is the Python -> C++ -> Python too much for multiprocessing?