Python Multiprocessing: Fastest way to signal an event to all processes?

Python Multiprocessing: Fastest way to signal an event to all processes? - python

I'm doing a monte carlo simulation with multiple processes using python's multiprocessing library. The processes basically guess some object and if it meets some condition it is added to a shared list. My calculation is finished if this list meets some condition.
My current code looks like this: (pseudocode without unimportant details)
mgr = Manager()
ns = mgr.Namespace()
ns.mylist = []
ns.othersharedstuff = x
killsig = mgr.Event()
processes = [ MyProcess(ns, killsig) for _ in range(8) ]
for p in processes: p.start()
for p in processes: p.join()
get data from ns.mylist()
def MyProcess.run(self):
localdata = y
while not killsig.is_set():
x = guessObject()
if x.meetsCondition():
add x to ns.mylist and put local data into ns()
if ns.mylist meets condition:
killsig.set()
put local data into ns()
When I replace 'while not killsig.is_set():' with 'while True:', the speed of my simulation increases by about 25%! (except it doesn't terminate anymore of course)
Is there a faster way than using signals? It is not important if the unsynchronized local data of each process is lost, so something involving process.terminate() would be fine too.

Since you've got the original process that has a list of all your subprocesses, why not use that to terminate the processes? I'm picturing something like this:
ns.othersharedstuff = x
killsig = mgr.Event()
processes = [ MyProcess(ns, killsig) for _ in range(8) ]
for p in processes: p.start()
while not killsig.isSet():
time.sleep(0.01) # 10 milliseconds
for p in processes: p.terminate()
get data from ns.mylist()
Then you can just set the while loop to while true:

Related

Queue (multiprocessing) where you can get each value twice

Is there any option to have a multiprocessing Queue where each value can be accessed twice?
My problem is I have one "Generator process" creating a constant flux of data and would like to access this in two different process each doing it's thing with the data.
A minimal "example" of the issue.
import multiprocessing as mp
import numpy as np
class Process1(mp.Process):
def __init__(self,Data_Queue):
mp.Process.__init__(self)
self.Data_Queue = Data_Queue
def run(self):
while True:
self.Data_Queue.get()
# Do stuff with
self.Data_Queue.task_done()
class Process2(mp.Process):
def __init__(self,Data_Queue):
mp.Process.__init__(self)
self.Data_Queue = Data_Queue
def run(self):
while True:
self.Data_Queue.get()
# Do stuff with
self.Data_Queue.task_done()
if __name__ == "__main__":
data_Queue = mp.Queue()
P1 = Process1()
P1.start()
P2 = Process2()
P2.start()
while True: # Generate data
data_Queue.put(np.random.rand(1000))
The idea is that I would like for both Process1 and Process2 to access all generated data in this example. What would happen is that each one would only just get some random portions of it this way.
Thanks for the help!
Update 1: As pointed in some of the questions and answers this becomes a little more complicated for two reasons I did not include in the initial question.
The data is externally generated on a non constant schedule (I may receive tons of data for a few seconds than wait minutes for more to come)
As such, data may arrive faster than it's possible to process so it would need to be "Queued" in a way while it waits for its turn to be processed.

One way to solve your problem is, first, to use multiprocessing.Array to share, let's say, a numpy array with your data between worker processes. Second, use a multiprocessing.Barrier to synchronize the main process and the workers when generating and processing data batches. And, finally, provide each process worker with its own queue to signal them when the next data batch is ready for processing. Below is the complete working example just to show you the idea:
#!/usr/bin/env python3
import os
import time
import ctypes
import multiprocessing as mp
import numpy as np
WORKERS = 5
DATA_SIZE = 10
DATA_BATCHES = 10
def process_data(data, queue, barrier):
proc = os.getpid()
print(f'[WORKER: {proc}] Started')
while True:
data_batch = queue.get()
if data_batch is None:
break
arr = np.frombuffer(data.get_obj())
print(f'[WORKER: {proc}] Started processing data {arr}')
time.sleep(np.random.randint(0, 2))
print(f'[WORKER: {proc}] Finished processing data {arr}')
barrier.wait()
print(f'[WORKER: {proc}] Finished')
def generate_data_array(i):
print(f'[DATA BATCH: {i}] Start generating data... ', end='')
time.sleep(np.random.randint(0, 2))
data = np.random.randint(0, 10, size=DATA_SIZE)
print(f'Done! {data}')
return data
if __name__ == '__main__':
data = mp.Array(ctypes.c_double, DATA_SIZE)
data_barrier = mp.Barrier(WORKERS + 1)
workers = []
# Start workers:
for _ in range(WORKERS):
data_queue = mp.Queue()
p = mp.Process(target=process_data, args=(data, data_queue, data_barrier))
p.start()
workers.append((p, data_queue))
# Generate data batches in the main process:
for i in range(DATA_BATCHES):
arr = generate_data_array(i + 1)
data_arr = np.frombuffer(data.get_obj())
np.copyto(data_arr, arr)
for _, data_queue in workers:
# Signal workers that the new data batch is ready:
data_queue.put(True)
data_barrier.wait()
# Stop workers:
for worker, data_queue in workers:
data_queue.put(None)
worker.join()
Here, you start with the definition of the shared data array data and the barrier data_barrier used for the process synchronization. Then, in the loop, you instantiate a queue data_queue, create and start a worker process p passing the shared data array, the queue instance, and the shared barrier instance data_barrier as its parameters. Once the workers have been started, you generate data batches in the loop, copy generated numpy arrays into shared data array, and signal processes via their queues that the next data batch is ready for processing. Then, you wait on barrier when all the worker processes have finished their work before generate the next data batch. In the end, you send None signal to all the processes in order to make them quit the infinite processing loop.

setting cpu affinity using multiprocessing in python windows

I am trying to run 1200 iterations of a function with different values using multiprocessing.
Is there a way I can set the priority and affinities of the processors within the function itself ?
Here is an example of what I am doing :
with multiprocessing.Pool(processes=3) as pool:
r = pool.map(func, (c for c in combinations))
I want each of the 3 processes to have high priority using psutil, and the cpu_affinity to be specified. While I can use: psutil.Process().HIGH_PRIORITY_CLASS withing func, how should I specify different affinities for the three processors?

I would use the initializer function in mp.Pool:
#same prio for each child
def init_priority(prio_level):
set_prio(prio_level)
if __name__ == "__main__":
with Pool(nprocs, init_priority, (prio_level,)) as p:
p.map(...)
#Different prio for each child: (this may not be very useful
#because you cannot choose which child will accept each "task").
def init_priority(q):
prio_level = q.get()
set_prio(prio_level)
if __name__ == "__main__":
q = mp.Queue()
for _ in range(nprocs): #put one prio_level for each process
q.put(prio_level)
with Pool(nprocs, init_priority, (q,)) as p:
p.map(...)
If you need to have some high priority child processes and some low priority, and will need to be able to discern easily between them, I would skip mp.Pool, and just use your own Process's.

Function that multiprocesses another function

I'm performing analyses of time-series of simulations. Basically, it's doing the same tasks for every time steps. As there is a very high number of time steps, and as the analyze of each of them is independant, I wanted to create a function that can multiprocess another function. The latter will have arguments, and return a result.
Using a shared dictionnary and the lib concurrent.futures, I managed to write this :
import concurrent.futures as Cfut
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
# **param_fixed : passing parameters
manager = mlp.Manager()
dic = manager.dict()
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param, dic, *args)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
return [dic[i] for i in sorted(dic.keys())]
Typically, I can use it like this :
def read_file(files, dictionnary):
for file in files:
i = int(file[4:9])
#print(str(i))
if 'bz2' in file:
os.system('bunzip2 ' + file)
file = file[:-4]
dictionnary[i] = np.loadtxt(file)
os.system('bzip2 ' + file)
Map = np.array(multiprocess_loop_grouped(read_file, list_alti, Group_size, N_thread))
or like this :
def autocorr(x):
result = np.correlate(x, x, mode='full')
return result[result.size//2:]
def find_lambda_finger(indexes, dic, Deviation):
for i in indexes :
#print(str(i))
# Beach = Deviation[i,:] - np.mean(Deviation[i,:])
dic[i] = Anls.find_first_max(autocorr(Deviation[i,:]), valmax = True)
args = [Deviation]
Temp = Rescal.multiprocess_loop_grouped(find_lambda_finger, range(Nalti), Group_size, N_thread, *args)
Basically, it is working. But it is not working well. Sometimes it crashes. Sometimes it actually launches a number of python processes equal to Nworkers, and sometimes there is only 2 or 3 of them running at a time while I specified Nworkers = 15.
For example, a classic error I obtain is described in the following topic I raised : Calling matplotlib AFTER multiprocessing sometimes results in error : main thread not in main loop
What is the more Pythonic way to achieve what I want ? How can I improve the control this function ? How can I control more the number of running python process ?

One of the basic concepts for Python multi-processing is using queues. It works quite well when you have an input list that can be iterated and which does not need to be altered by the sub-processes. It also gives you a good control over all the processes, because you spawn the number you want, you can run them idle or stop them.
It is also a lot easier to debug. Sharing data explicitly is usually an approach that is much more difficult to setup correctly.
Queues can hold anything as they are iterables by definition. So you can fill them with filepath strings for reading files, non-iterable numbers for doing calculations or even images for drawing.
In your case a layout could look like that:
import multiprocessing as mp
import numpy as np
import itertools as it
def worker1(in_queue, out_queue):
#holds when nothing is available, stops when 'STOP' is seen
for a in iter(in_queue.get, 'STOP'):
#do something
out_queue.put({a: result}) #return your result linked to the input
def worker2(in_queue, out_queue):
for a in iter(in_queue.get, 'STOP'):
#do something differently
out_queue.put({a: result}) //return your result linked to the input
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# your final result
result = {}
in_queue = mp.Queue()
out_queue = mp.Queue()
# fill your input
for a in param_list:
in_queue.put(a)
# stop command at end of input
for n in range(Nworkers):
in_queue.put('STOP')
# setup your worker process doing task as specified
process = [mp.Process(target=function,
args=(in_queue, out_queue), daemon=True) for x in range(Nworkers)]
# run processes
for p in process:
p.start()
# wait for processes to finish
for p in process:
p.join()
# collect your results from the calculations
for a in param_list:
result.update(out_queue.get())
return result
temp = multiprocess_loop_grouped(worker1, param_list, group_size, Nworkers, *args)
map = multiprocess_loop_grouped(worker2, param_list, group_size, Nworkers, *args)
It can be made a bit more dynamic when you are afraid that your queues will run out of memory. Than you need to fill and empty the queues while the processes are running. See this example here.
Final words: it is not more Pythonic as you requested. But it is easier to understand for a newbie ;-)

Multiprocessing a loop inside a loop inside a function

I wrote some code to break up a for loop into multiple processes to speed up calculations.
import numpy as np
import formfactors
from subdivide_loop import subdivide_loop
import multiprocessing
def worker(start, end, triangleI, areaI, scene, kdtree, samples, output):
form_factors = np.zeros(end-start)
for j in range(start, end):
triangleJ = np.array(scene[j][0:4])
form_factors[start] = formfactors.uniform(triangleJ, triangleI, areaI, kdtree, samples)
result = output.get(block=True)
for j in range(start, end):
result[j] = form_factors[j]
output.put(result)
def calculate_formfactors(start, end, triangleI, areaI, scene, kdtree, samples, output, nb_processes,
max_interval_length):
intervals = subdivide_loop(start, end, max_interval_length, nb_processes)
print("start")
jobs = []
for k in range(nb_processes):
p = multiprocessing.Process(target=worker,
args=(intervals[k][0], intervals[k][1], triangleI, areaI, scene, kdtree,
samples, output))
jobs.append(p)
for p in jobs:
p.start()
for p in jobs:
p.join()
results = output.get()
return results
I would like to be able to call calculate_formfactors() inside a function inside a loop, like this:
def outer_function():
for i in range(1000):
for j in range(i + 1, 1000, max_interval_length):
form_factors = calculate_formfactors(args)
But running this gives an error:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
Because of how the outer function works, breaking up outer_function() instead of calculate_formfactors() is not possible.
So, any advice on how to do this?

As the error suggests, make sure your outer_function() (or whatever initiates it) is called from within an __main__ guard, e.g.
if __name__ == "__main__":
outer_function()
It doesn't have to be the outer_function() but you need to trace it all back to the first step that initializes the chain that ultimately leads to the call to multiprocessing.Process() and put it within the above block.
This is because on non-forking systems multiple processes are essentially run as subprocesses so creating new processes from the main script would end up with an infinite recursion/processes spawning. You can read more about it in this answer. Because of that, you have to make sure your multiprocessing initialization code executes only once which is where the __main__ guard comes in.

Python Multiprocessing speed issue

I have a nested for loop of the form
while x<lat2[0]:
while y>lat3[1]:
if (is_inside_nepal([x,y])):
print("inside")
else:
print("not")
y = y - (1/150.0)
y = lat2[1]
x = x + (1/150.0)
#here lat2[0] represents a large number
Now this normally takes around 50s for executing.
And I have changed this loop to a multiprocessing code.
def v1find_coordinates(q):
while not(q.empty()):
x1 = q.get()
x2 = x1 + incfactor
while x1<x2:
def func(x1):
while y>lat3[1]:
if (is_inside([x1,y])):
print x1,y,"inside"
else:
print x1,y,"not inside"
y = y - (1/150.0)
func(x1)
y = lat2[1]
x1 = x1 + (1/150.0)
incfactor = 0.7
xvalues = drange(x,lat2[0],incfactor)
#this drange function is to get list with increment factor as decimal
cores = mp.cpu_count()
q = Queue()
for i in xvalues:
q.put(i)
for i in range(0,cores):
p = Process(target = v1find_coordinates,args=(q,) )
p.start()
p.Daemon = True
processes.append(p)
for i in processes:
print ("now joining")
i.join()
This multiprocessing code also takes around 50s execution time. This means there is no difference of time between the two.
I also have tried using pools. I have also managed the chunk size. I have googled and searched through other stackoverflow. But can't find any satisfying answer.
The only answer I could find was time was taken in process management to make both the result same. If this is the reason then how can I get the multiprocessing work to obtain faster results?
Will implementing in C from Python give faster results?
I am not expecting drastic results but by common sense one can tell that running on 4 cores should be a lot faster than running in 1 core. But I am getting similar results. Any kind of help would be appreciated.

You seem to be using a thread Queue (from Queue import Queue). This does not work as expected as Process uses fork() and it clones the entire Queue into each worker Process
Use:
from multiprocessing import Queue

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing: Fastest way to signal an event to all processes? - python

Related

Queue (multiprocessing) where you can get each value twice

setting cpu affinity using multiprocessing in python windows

Function that multiprocesses another function

Multiprocessing a loop inside a loop inside a function

Python Multiprocessing speed issue

Categories

Resources