Multiprocesses will not run in Parallel on Windows on Jupyter Notebook

Multiprocesses will not run in Parallel on Windows on Jupyter Notebook - python

I'm currently working on Windows on jupyter notebook and have been struggling to get multiprocessing to work. It does not run all my async's in parallel it runs them singularly one at a time please provide some guidance where am I going wrong. I need to put the results into a variable for future use. What am I not understanding?
import multiprocessing as mp
import cylib
Pool = mp.Pool(processes=4)
result1 = Pool.apply_async(cylib.f, [v]) # evaluate asynchronously
result2 = Pool.apply_async(cylib.f, [x]) # evaluate asynchronously
result3 = Pool.apply_async(cylib.f, [y]) # evaluate asynchronously
result4 = Pool.apply_async(cylib.f, [z]) # evaluate asynchronously
vr = result1.get(timeout=420)
xr = result2.get(timeout=420)
yr = result3.get(timeout=420)
zr = result4.get(timeout=420)

The tasks are executing in parallel.
However, this is fetching the results synchronously i.e. "wait until result1 is ready, then wait until result2 is ready, .." and so on.
vr = result1.get(timeout=420)
xr = result2.get(timeout=420)
yr = result3.get(timeout=420)
zr = result4.get(timeout=420)
Consider the following example code, where each task is polled asynchronously
from time import sleep
import multiprocessing as mp
pool = mp.Pool(processes=4)
# Create tasks with longer wait first
tasks = {i: pool.apply_async(sleep, [t]) for i, t in enumerate(reversed(range(3)))}
done = set()
# Keep polling until all tasks complete
while len(done) < len(tasks):
for i, t in tasks.items():
# Skip completed tasks
if i in done:
continue
result = None
try:
result = t.get(timeout=0)
except mp.TimeoutError:
pass
else:
print("Task #:{} complete".format(i))
done.add(i)
You can replicate something like the above or use the callback argument on apply_async to perform some handling automatically as tasks complete.

Related

Python multiprocessing write to file with starmap_async()

I'm currently setting up a automated simulation pipeline for OpenFOAM (CFD library) using the PyFoam library within Python to create a large database for machine learning purposes. The database will have around 500k distinct simulations. To run this pipeline on multiple machines, I'm using the multiprocessing.Pool.starmap_async(args) option which will continually start a new simulation once the old simulation has completed.
However, since some of the simulations might / will crash, I want to generate a textfile with all cases which have crashed.
I've already found this thread which implements the multiprocessing.Manager.Queue() and adds a listener but I failed to get it running with starmap_async(). For my testing I'm trying to print the case name for any simulation which has been completed but currently only one entry is written into the text file instead of all of them (the simulations all complete successfully).
So my question is how can I write a message to a file for each simulation which has completed.
The current code layout looks roughly like this - only important snipped has been added as the remaining code can't be run without OpenFOAM and additional customs scripts which were created for the automation.
Any help is highly appreciated! :)
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
# Defining global variables
manager = multiprocessing.Manager()
queue = manager.Queue()
def runCase(airfoil, angle, velocity):
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
'''
A lot of pre-processing commands to prepare the simulation
which has been removed from snipped such as generate geometry, create mesh etc...
'''
# run simulation
machine = LAMMachine(nr=4) # set number of cores for parallel execution
simulation = BasicRunner(argv=[solver, "-case", case.name], silent=True, lam=machine, logname="solver")
simulation.start() # start simulation
# check if simulation has completed
if simulation.runOK():
# write message into queue
queue.put(newCase)
if not simulation.runOK():
print("Simulation did not run successfully")
def listener(queue):
fname = 'errors.txt'
msg = queue.get()
while True:
with open(fname, 'w') as f:
if msg == 'complete':
break
f.write(str(msg) + '\n')
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
paramlist = list(itertools.product(nacas, angles, np.round(machs, 9)))
# create number of processes and keep 2 cores idle for other processes
nCores = psutil.cpu_count(logical=False) - 2
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
pool.apply_async(listener, (queue,)) # start the listener
pool.starmap_async(runCase, paramlist).get() # run parallel simulations
queue.put('complete')
pool.close()
pool.join()
if __name__ == '__main__':
main()

First, when your with multiprocessing.Pool(processes=nProcs) as pool: exits, there will be an implicit call to pool.terminate(), which will kill all pool processes and with it any running or queued up tasks. There is no point in calling queue.put('complete') since nobody is listening.
Second, your 'listener" task gets only a single message from the queue. If is "complete", it terminates immediately. If it is something else, it just loops continuously writing the same message to the output file. This cannot be right, can it? Did you forget an additional call to queue.get() in your loop?
Third, I do not quite follow your computation for nProcs. Why the division by 4? If you had 5 physical processors nProcs would be computed as 0. Do you mean something like:
nProcs = psutil.cpu_count(logical=False) // 4
if nProcs == 0:
nProcs = 1
elif nProcs > 1:
nProcs -= 1 # Leave a core free
Fourth, why do you need a separate "listener" task? Have your runCase task return whatever message is appropriate according to how it completes back to the main process. In the code below, multiprocessing.pool.Pool.imap is used so that results can be processed as the tasks complete and results returned:
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
def runCase(tpl):
# Unpack tuple:
airfoil, angle, velocity = tpl
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
... # Code omitted for brevity
# check if simulation has completed
if simulation.runOK():
return '' # No error
# Simulation did not run successfully:
return f"Simulation {newcase} did not run successfully"
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
# There is no reason to convert this into a list; it
# can be lazilly computed:
paramlist = itertools.product(nacas, angles, np.round(machs, 9))
# create number of processes and keep 1 core idle for main process
nCores = psutil.cpu_count(logical=False) - 1
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
with open('errors.txt', 'w') as f:
# Process message results as soon as the task ends.
# Use method imap_unordered if you do not care about the order
# of the messages in the output.
# We can only pass a single argument using imap, so make it a tuple:
for msg in pool.imap(runCase, zip(paramlist)):
if msg != '': # Error completion
print(msg)
print(msg, file=f)
pool.join() # Not really necessary here
if __name__ == '__main__':
main()

Fastest way to perform Multiprocessing of a loop in a function?

1. I have a function var. I want to know the best possible way to run the loop within this function quickly by multiprocessing/parallel processing by utilizing all the processors, cores, threads, and RAM memory the system has.
import numpy
from pysheds.grid import Grid
xs = 82.1206, 72.4542, 65.0431, 83.8056, 35.6744
ys = 25.2111, 17.9458, 13.8844, 10.0833, 24.8306
a = r'/home/test/image1.tif'
b = r'/home/test/image2.tif'
def var(interest):
variable_avg = []
for (x,y) in zip(xs,ys):
grid = Grid.from_raster(interest, data_name='map')
grid.catchment(data='map', x=x, y=y, out_name='catch')
variable = grid.view('catch', nodata=np.nan)
variable = numpy.array(variable)
variablemean = (variable).mean()
variable_avg.append(variablemean)
return(variable_avg)
2. It would be great if I can run both function var and loop in it parallelly for the given multiple parameters of the function.
ex:var(a)and var(b) at the same time. Since it will consume much less time then just parallelizing the loop alone.
Ignore 2, if it does not make sense.

TLDR:
You can use the multiprocessing library to run your var function in parallel. However, as written you likely don't make enough calls to var for multiprocessing to have a performance benefit because of its overhead. If all you need to do is run those two calls, running in serial is likely the fastest you'll get. However, if you need to make a lot of calls, multiprocessing can help you out.
We'll need to use a process pool to run this in parallel, threads won't work here because Python's global interpreter lock will prevent us from true parallelism. The drawback of process pools is that processes are heavyweight to spin up. In the example of just running two calls to var the time to create the pool overwhelms the time spent running var itself.
To illiustrate this, let's use a process pool and use asyncio to run calls to var in parallel and compare it to just running things sequentially. Note to run this example I used an image from the Pysheds library https://github.com/mdbartos/pysheds/tree/master/data - if your image is much larger the below may not hold true.
import functools
import time
from concurrent.futures.process import ProcessPoolExecutor
import asyncio
a = 'diem.tif'
xs = 10, 20, 30, 40, 50
ys = 10, 20, 30, 40, 50
async def main():
loop = asyncio.get_event_loop()
pool_start = time.time()
with ProcessPoolExecutor() as pool:
task_one = loop.run_in_executor(pool, functools.partial(var, a))
task_two = loop.run_in_executor(pool, functools.partial(var, a))
results = await asyncio.gather(task_one, task_two)
pool_end = time.time()
print(f'Process pool took {pool_end-pool_start}')
serial_start = time.time()
result_one = var(a)
result_two = var(a)
serial_end = time.time()
print(f'Running in serial took {serial_end - serial_start}')
if __name__ == "__main__":
asyncio.run(main())
Running the above on my machine (a 2.4 GHz 8-Core Intel Core i9) I get the following output:
Process pool took 1.7581260204315186
Running in serial took 0.32335805892944336
In this example, a process pool is over five times slower! This is due to the overhead of creating and managing multiple processes. That said, if you need to call var more than just a few times, a process pool may make more sense. Let's adapt this to run var 100 times and compare the results:
async def main():
loop = asyncio.get_event_loop()
pool_start = time.time()
tasks = []
with ProcessPoolExecutor() as pool:
for _ in range(100):
tasks.append(loop.run_in_executor(pool, functools.partial(var, a)))
results = await asyncio.gather(*tasks)
pool_end = time.time()
print(f'Process pool took {pool_end-pool_start}')
serial_start = time.time()
for _ in range(100):
result = var(a)
serial_end = time.time()
print(f'Running in serial took {serial_end - serial_start}')
Running 100 times, I get the following output:
Process pool took 3.442288875579834
Running in serial took 13.769982099533081
In this case, running in a process pool is about 4x faster. You may also wish to try running each iteration of your loop concurrently. You can do this by creating a function that processes one x,y coordinate at a time and then run each point you want to examine in a process pool:
def process_poi(interest, x, y):
grid = Grid.from_raster(interest, data_name='map')
grid.catchment(data='map', x=x, y=y, out_name='catch')
variable = grid.view('catch', nodata=np.nan)
variable = np.array(variable)
return variable.mean()
async def var_loop_async(interest, pool, loop):
tasks = []
for (x,y) in zip(xs,ys):
function_call = functools.partial(process_poi, interest, x, y)
tasks.append(loop.run_in_executor(pool, function_call))
return await asyncio.gather(*tasks)
async def main():
loop = asyncio.get_event_loop()
pool_start = time.time()
tasks = []
with ProcessPoolExecutor() as pool:
for _ in range(100):
tasks.append(var_loop_async(a, pool, loop))
results = await asyncio.gather(*tasks)
pool_end = time.time()
print(f'Process pool took {pool_end-pool_start}')
serial_start = time.time()
In this case I get Process pool took 3.2950568199157715 - so not really any faster than our first version with one process per each call of var. This is likely because the limiting factor at this point is how many cores we have available on our CPU, splitting our work into smaller increments does not add much value.
That said, if you have 1000 x and y coordinates you wish to examine across two images, this last approach may yield a performance gain.

I think this is a reasonable and straightforward way of speeding up your code by merely parallelizing only the main loop. You can saturate your cores with this, so there is no need to parallelize also for the interest variable. I can't test the code, so I assume that your function is correct, I have just encoded the loop in a new function and parallelized it in var().
from multiprocessing import Pool
def var(interest,xs,ys):
grid = Grid.from_raster(interest, data_name='map')
with Pool(4) as p: #uses 4 cores, adjust this as you need
variable_avg = p.starmap(loop, [(x,y,grid) for x,y in zip(xs,ys)])
return variable_avg
def loop(x, y, grid):
grid.catchment(data='map', x=x, y=y, out_name='catch')
variable = grid.view('catch', nodata=np.nan)
variable = numpy.array(variable)
return variable.mean()

python multiprocessing hangs after a few iterations

I am running a multiprocessing pool in a for loop over a chuck of data. It runs fine for two iterations and hangs on the third. If I reduce the size of each chuck it hangs later on perhaps the forth or fifth iteration. In the program where I discovered the problem I am running a more extensive function but this works to reproduce the error.
Is there a proper way to terminate a pool after it is finished? So that I can start it again.
import pandas as pd
import numpy as np
from multiprocess import Pool
df = pd.read_csv('paths.csv')
def do_something(user):
v = df[df['userId'] == user]
return v
if __name__ == '__main__':
users = df['userId'].unique()
n_chunks = round(len(users)/40)
subsets = [users[i:i+n_chunks] for i in range(0, len(users), n_chunks)]
chunk_counter = 0
for user_subset in subsets:
chunk_counter += 1
print(f'Beginning to process chunk {chunk_counter}...')
with Pool() as pool:
frames = pool.map(do_something, user_subset)
pool.close()
pool.terminate()
print(f'Completed processing chunk {chunk_counter}.')

I was able to prevent the hanging with the code below:
with Pool(maxtasksperchild=1) as pool:
frames = pool.map_async(do_something, user_subset).get()
pool.terminate()
pool.join()
I don't understand why using map_async would prevent the hanging. I will dive deeper if I have a chance and update if I understand the reason.

How to efficiently chain ipyparallel tasks and pass intermediate results to engines?

I am trying to chain several tasks together in iPyParallel, like
import ipyparallel
client = ipyparallel.Client()
view = client.load_balanced_view()
def task1(x):
## Do some work.
return x * 2
def task2(x):
## Do some work.
return x * 3
def task3(x):
## Do some work.
return x * 4
results1 = view.map_async(task1, [1, 2, 3])
results2 = view.map_async(task2, results1.get())
results3 = view.map_async(task3, results2.get())
However, this code won't submit any task2 unless task1 is done and is essentially blocking. My tasks can take different time and it is very inefficient. Is there an easy way that I can chain these steps efficiently and engines can get the results from previous steps? Something like:
def task2(x):
## Do some work.
return x.get() * 3 ## Get AsyncResult out.
def task3(x):
## Do some work.
return x.get() * 4 ## Get AsyncResult out.
results1 = [view.apply_async(task1, x) for x in [1, 2, 3]]
results2 = []
for x in result1:
view.set_flags(after=x.msg_ids)
results2.append(view.apply_async(task2, x))
results3 = []
for x in result2:
view.set_flags(after=x.msg_ids)
results3.append(view.apply_async(task3, x))
Apparently, this will fail as AsyncResult is not pickable.
I was considering a few solutions:
Use view.map_async(ordered=False).
results1 = view.map_async(task1, [1, 2, 3], ordered=False)
for x in results1:
results2.append(view.apply_async(task2, x.get()))
But this has to wait for all task1 to finish before any task3 can be submitted. It is still blocking.
Use asyncio.
#asyncio.coroutine
def submitter(x):
result1 = yield from asyncio.wrap_future(view.apply_async(task1, x))
result2 = yield from asyncio.wrap_future(view.apply_async(task2, result1)
result3 = yield from asyncio.wrap_future(view.apply_async(task3, result2)
yield result3
#asyncio.coroutine
def submit_all(ls):
jobs = [submitter(x) for x in ls]
results = []
for async_r in asyncio.as_completed(jobs):
r = yield from async_r
results.append(r)
## Do some work, like analysing results.
It is working, but the code soon become messy and unintuitive when more complicated tasks are introduced.
Thank you for your help.

Option 1: chain futures
IPython parallel isn't the best at doing this because the connection has to be done at the client level. You do have to wait for the results to complete and return to the client before submitting the results. Essentially, your asyncio submit_all is the right way to do it for IPython parallel. You can get something a little more generic by writing a chain function that uses add_done_callback to submit the new task when the previous one completes:
from concurrent.futures import Future
from functools import partial
def chain_apply(view, func, future):
"""Chain a call to view.apply(func, future.result()) when future is ready.
Returns a Future for the subsequent result.
"""
f2 = Future()
# when f1 is ready, submit a new task for func on its result
def apply_func(f):
if f.exception():
f2.set_exception(f.exception())
return
print('submitting %s(%s)' % (func.__name__, f.result()))
ar = view.apply_async(func, f.result())
# when ar is done, pass through the result to f2
ar.add_done_callback(lambda ar: f2.set_result(ar.get()))
future.add_done_callback(apply_func)
return f2
def chain_map(view, func, list_of_futures):
"""Chain a new callback on a list of futures."""
return [ chain_apply(view, func, f) for f in list_of_futures ]
# use builtin map with apply, since we want one Future per item
results1 = map(partial(view.apply, task1), [1, 2, 3])
results2 = chain_map(view, task2, results1)
results3 = chain_map(view, task3, results2)
print("Waiting for results")
[ r.result() for r in results3 ]
As with any example of add_done_callback, it can be written with coroutines, but I find the callbacks in this case to be fine. This should at least be a fairly generic utility that you can use to compose your pipeline.
Option 2: dask.distributed
Full disclosure: I'm the primary author of IPython Parallel, about to suggest that you use a different tool.
It is possible to pass results from one task to another via engine namespaces and DAG dependencies in IPython parallel, but honestly, if your workflow looks like this, you should consider using dask distributed, which is designed specifically for this kind of computation graph. If you are already comfortable and familiar with IPython parallel, getting started with dask should not be too much of a burden.
IPython 5.1 provides a handy command for turning your IPython parallel cluster into a dask distributed cluster:
import ipyparallel as ipp
client = ipp.Client()
executor = client.become_distributed(ncores=1)
And then the key relevant feature of dask is that you can submit futures as arguments to subsequent map calls, and the scheduler takes care of it when the results are ready, rather than having to do it explicitly in the client:
results1 = executor.map(task1, [1, 2, 3])
results2 = executor.map(task2, results1)
results3 = executor.map(task3, results2)
executor.gather(results3)
So basically, dask distributed works how you wish IPython parallel's load-balancing would work when you need to chain things like this.
This notebook illustrates both examples.

Python sequence multi-threading for network tasks

I am trying to run a script with multiple threads to decrease the time taken by the script to complete.
I need to know how to implement multithreading in a program like this.
Script example :
def getnetworkdata():
data = ["somesite.com/1", "somesite.com/2", "somesite.com/3", "somesite.com/4"]
for url in data:
r = requests.get(url)
someOtherArray.append(r.text)
Threads should be running in sequence for the required task.
The output I am expecting :
someOtherArray = [1, 2, 3, 4]
I am using Python 2.x

from multiprocessing.dummy import Pool as ThreadPool
data = ["somesite.com/1", "somesite.com/2", "somesite.com/3", "somesite.com/4"]
# Make the Pool of workers
pool = ThreadPool(4)
# Open the urls in their own threads
# and return the results
results = pool.map(requests.get, data)
someOtherArray = map( lambda x: x.text, results )
#close the pool and wait for the work to finish
pool.close()
pool.join()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocesses will not run in Parallel on Windows on Jupyter Notebook - python

Related

Python multiprocessing write to file with starmap_async()

Fastest way to perform Multiprocessing of a loop in a function?

python multiprocessing hangs after a few iterations

How to efficiently chain ipyparallel tasks and pass intermediate results to engines?

Python sequence multi-threading for network tasks

Categories

Resources