How to efficiently chain ipyparallel tasks and pass intermediate results to engines?

How to efficiently chain ipyparallel tasks and pass intermediate results to engines? - python

I am trying to chain several tasks together in iPyParallel, like
import ipyparallel
client = ipyparallel.Client()
view = client.load_balanced_view()
def task1(x):
## Do some work.
return x * 2
def task2(x):
## Do some work.
return x * 3
def task3(x):
## Do some work.
return x * 4
results1 = view.map_async(task1, [1, 2, 3])
results2 = view.map_async(task2, results1.get())
results3 = view.map_async(task3, results2.get())
However, this code won't submit any task2 unless task1 is done and is essentially blocking. My tasks can take different time and it is very inefficient. Is there an easy way that I can chain these steps efficiently and engines can get the results from previous steps? Something like:
def task2(x):
## Do some work.
return x.get() * 3 ## Get AsyncResult out.
def task3(x):
## Do some work.
return x.get() * 4 ## Get AsyncResult out.
results1 = [view.apply_async(task1, x) for x in [1, 2, 3]]
results2 = []
for x in result1:
view.set_flags(after=x.msg_ids)
results2.append(view.apply_async(task2, x))
results3 = []
for x in result2:
view.set_flags(after=x.msg_ids)
results3.append(view.apply_async(task3, x))
Apparently, this will fail as AsyncResult is not pickable.
I was considering a few solutions:
Use view.map_async(ordered=False).
results1 = view.map_async(task1, [1, 2, 3], ordered=False)
for x in results1:
results2.append(view.apply_async(task2, x.get()))
But this has to wait for all task1 to finish before any task3 can be submitted. It is still blocking.
Use asyncio.
#asyncio.coroutine
def submitter(x):
result1 = yield from asyncio.wrap_future(view.apply_async(task1, x))
result2 = yield from asyncio.wrap_future(view.apply_async(task2, result1)
result3 = yield from asyncio.wrap_future(view.apply_async(task3, result2)
yield result3
#asyncio.coroutine
def submit_all(ls):
jobs = [submitter(x) for x in ls]
results = []
for async_r in asyncio.as_completed(jobs):
r = yield from async_r
results.append(r)
## Do some work, like analysing results.
It is working, but the code soon become messy and unintuitive when more complicated tasks are introduced.
Thank you for your help.

Option 1: chain futures
IPython parallel isn't the best at doing this because the connection has to be done at the client level. You do have to wait for the results to complete and return to the client before submitting the results. Essentially, your asyncio submit_all is the right way to do it for IPython parallel. You can get something a little more generic by writing a chain function that uses add_done_callback to submit the new task when the previous one completes:
from concurrent.futures import Future
from functools import partial
def chain_apply(view, func, future):
"""Chain a call to view.apply(func, future.result()) when future is ready.
Returns a Future for the subsequent result.
"""
f2 = Future()
# when f1 is ready, submit a new task for func on its result
def apply_func(f):
if f.exception():
f2.set_exception(f.exception())
return
print('submitting %s(%s)' % (func.__name__, f.result()))
ar = view.apply_async(func, f.result())
# when ar is done, pass through the result to f2
ar.add_done_callback(lambda ar: f2.set_result(ar.get()))
future.add_done_callback(apply_func)
return f2
def chain_map(view, func, list_of_futures):
"""Chain a new callback on a list of futures."""
return [ chain_apply(view, func, f) for f in list_of_futures ]
# use builtin map with apply, since we want one Future per item
results1 = map(partial(view.apply, task1), [1, 2, 3])
results2 = chain_map(view, task2, results1)
results3 = chain_map(view, task3, results2)
print("Waiting for results")
[ r.result() for r in results3 ]
As with any example of add_done_callback, it can be written with coroutines, but I find the callbacks in this case to be fine. This should at least be a fairly generic utility that you can use to compose your pipeline.
Option 2: dask.distributed
Full disclosure: I'm the primary author of IPython Parallel, about to suggest that you use a different tool.
It is possible to pass results from one task to another via engine namespaces and DAG dependencies in IPython parallel, but honestly, if your workflow looks like this, you should consider using dask distributed, which is designed specifically for this kind of computation graph. If you are already comfortable and familiar with IPython parallel, getting started with dask should not be too much of a burden.
IPython 5.1 provides a handy command for turning your IPython parallel cluster into a dask distributed cluster:
import ipyparallel as ipp
client = ipp.Client()
executor = client.become_distributed(ncores=1)
And then the key relevant feature of dask is that you can submit futures as arguments to subsequent map calls, and the scheduler takes care of it when the results are ready, rather than having to do it explicitly in the client:
results1 = executor.map(task1, [1, 2, 3])
results2 = executor.map(task2, results1)
results3 = executor.map(task3, results2)
executor.gather(results3)
So basically, dask distributed works how you wish IPython parallel's load-balancing would work when you need to chain things like this.
This notebook illustrates both examples.

Related

Multiprocesses will not run in Parallel on Windows on Jupyter Notebook

I'm currently working on Windows on jupyter notebook and have been struggling to get multiprocessing to work. It does not run all my async's in parallel it runs them singularly one at a time please provide some guidance where am I going wrong. I need to put the results into a variable for future use. What am I not understanding?
import multiprocessing as mp
import cylib
Pool = mp.Pool(processes=4)
result1 = Pool.apply_async(cylib.f, [v]) # evaluate asynchronously
result2 = Pool.apply_async(cylib.f, [x]) # evaluate asynchronously
result3 = Pool.apply_async(cylib.f, [y]) # evaluate asynchronously
result4 = Pool.apply_async(cylib.f, [z]) # evaluate asynchronously
vr = result1.get(timeout=420)
xr = result2.get(timeout=420)
yr = result3.get(timeout=420)
zr = result4.get(timeout=420)

The tasks are executing in parallel.
However, this is fetching the results synchronously i.e. "wait until result1 is ready, then wait until result2 is ready, .." and so on.
vr = result1.get(timeout=420)
xr = result2.get(timeout=420)
yr = result3.get(timeout=420)
zr = result4.get(timeout=420)
Consider the following example code, where each task is polled asynchronously
from time import sleep
import multiprocessing as mp
pool = mp.Pool(processes=4)
# Create tasks with longer wait first
tasks = {i: pool.apply_async(sleep, [t]) for i, t in enumerate(reversed(range(3)))}
done = set()
# Keep polling until all tasks complete
while len(done) < len(tasks):
for i, t in tasks.items():
# Skip completed tasks
if i in done:
continue
result = None
try:
result = t.get(timeout=0)
except mp.TimeoutError:
pass
else:
print("Task #:{} complete".format(i))
done.add(i)
You can replicate something like the above or use the callback argument on apply_async to perform some handling automatically as tasks complete.

Function that multiprocesses another function

I'm performing analyses of time-series of simulations. Basically, it's doing the same tasks for every time steps. As there is a very high number of time steps, and as the analyze of each of them is independant, I wanted to create a function that can multiprocess another function. The latter will have arguments, and return a result.
Using a shared dictionnary and the lib concurrent.futures, I managed to write this :
import concurrent.futures as Cfut
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# function : function that is running in parallel
# param_list : list of items
# group_size : size of the groups
# Nworkers : number of group/items running in the same time
# **param_fixed : passing parameters
manager = mlp.Manager()
dic = manager.dict()
executor = Cfut.ProcessPoolExecutor(Nworkers)
futures = [executor.submit(function, param, dic, *args)
for param in grouper(param_list, group_size)]
Cfut.wait(futures)
return [dic[i] for i in sorted(dic.keys())]
Typically, I can use it like this :
def read_file(files, dictionnary):
for file in files:
i = int(file[4:9])
#print(str(i))
if 'bz2' in file:
os.system('bunzip2 ' + file)
file = file[:-4]
dictionnary[i] = np.loadtxt(file)
os.system('bzip2 ' + file)
Map = np.array(multiprocess_loop_grouped(read_file, list_alti, Group_size, N_thread))
or like this :
def autocorr(x):
result = np.correlate(x, x, mode='full')
return result[result.size//2:]
def find_lambda_finger(indexes, dic, Deviation):
for i in indexes :
#print(str(i))
# Beach = Deviation[i,:] - np.mean(Deviation[i,:])
dic[i] = Anls.find_first_max(autocorr(Deviation[i,:]), valmax = True)
args = [Deviation]
Temp = Rescal.multiprocess_loop_grouped(find_lambda_finger, range(Nalti), Group_size, N_thread, *args)
Basically, it is working. But it is not working well. Sometimes it crashes. Sometimes it actually launches a number of python processes equal to Nworkers, and sometimes there is only 2 or 3 of them running at a time while I specified Nworkers = 15.
For example, a classic error I obtain is described in the following topic I raised : Calling matplotlib AFTER multiprocessing sometimes results in error : main thread not in main loop
What is the more Pythonic way to achieve what I want ? How can I improve the control this function ? How can I control more the number of running python process ?

One of the basic concepts for Python multi-processing is using queues. It works quite well when you have an input list that can be iterated and which does not need to be altered by the sub-processes. It also gives you a good control over all the processes, because you spawn the number you want, you can run them idle or stop them.
It is also a lot easier to debug. Sharing data explicitly is usually an approach that is much more difficult to setup correctly.
Queues can hold anything as they are iterables by definition. So you can fill them with filepath strings for reading files, non-iterable numbers for doing calculations or even images for drawing.
In your case a layout could look like that:
import multiprocessing as mp
import numpy as np
import itertools as it
def worker1(in_queue, out_queue):
#holds when nothing is available, stops when 'STOP' is seen
for a in iter(in_queue.get, 'STOP'):
#do something
out_queue.put({a: result}) #return your result linked to the input
def worker2(in_queue, out_queue):
for a in iter(in_queue.get, 'STOP'):
#do something differently
out_queue.put({a: result}) //return your result linked to the input
def multiprocess_loop_grouped(function, param_list, group_size, Nworkers, *args):
# your final result
result = {}
in_queue = mp.Queue()
out_queue = mp.Queue()
# fill your input
for a in param_list:
in_queue.put(a)
# stop command at end of input
for n in range(Nworkers):
in_queue.put('STOP')
# setup your worker process doing task as specified
process = [mp.Process(target=function,
args=(in_queue, out_queue), daemon=True) for x in range(Nworkers)]
# run processes
for p in process:
p.start()
# wait for processes to finish
for p in process:
p.join()
# collect your results from the calculations
for a in param_list:
result.update(out_queue.get())
return result
temp = multiprocess_loop_grouped(worker1, param_list, group_size, Nworkers, *args)
map = multiprocess_loop_grouped(worker2, param_list, group_size, Nworkers, *args)
It can be made a bit more dynamic when you are afraid that your queues will run out of memory. Than you need to fill and empty the queues while the processes are running. See this example here.
Final words: it is not more Pythonic as you requested. But it is easier to understand for a newbie ;-)

Multiprocessing 2 different functions python3

I am struggling for a while with Multiprocessing in Python. I would like to run 2 independent functions simultaneously, wait until both calculations are finished and then continue with the output of both functions. Something like this:
# Function A:
def jobA(num):
result=num*2
return result
# Fuction B:
def jobB(num):
result=num^3
return result
# Parallel process function:
{resultA,resultB}=runInParallel(jobA(num),jobB(num))
I found other examples of multiprocessing however they used only one function or didn't returned an output. Anyone knows how to do this? Many thanks!

I'd recommend creating processes manually (rather than as part of a pool), and sending the return values to the main process through a multiprocessing.Queue. These queues can share almost any Python object in a safe and relatively efficient way.
Here's an example, using the jobs you've posted.
def jobA(num, q):
q.put(num * 2)
def jobB(num, q):
q.put(num ^ 3)
import multiprocessing as mp
q = mp.Queue()
jobs = (jobA, jobB)
args = ((10, q), (2, q))
for job, arg in zip(jobs, args):
mp.Process(target=job, args=arg).start()
for i in range(len(jobs)):
print('Result of job {} is: {}'.format(i, q.get()))
This prints out:
Result of job 0 is: 20
Result of job 1 is: 1
But you can of course do whatever further processing you'd like using these values.

multiprocessing.Pool.map() not working as expected

I understand from simple examples that Pool.map is supposed to behave identically to the 'normal' python code below except in parallel:
def f(x):
# complicated processing
return x+1
y_serial = []
x = range(100)
for i in x: y_serial += [f(x)]
y_parallel = pool.map(f, x)
# y_serial == y_parallel!
However I have two bits of code that I believe should follow this example:
#Linear version
price_datas = []
for csv_file in loop_through_zips(data_directory):
price_datas += [process_bf_data_csv(csv_file)]
#Parallel version
p = Pool()
price_data_parallel = p.map(process_bf_data_csv, loop_through_zips(data_directory))
However the Parallel code doesn't work whereas the Linear code does. From what I can observe, the parallel version appears to be looping through the generator (it's printing out log lines from the generator function) but then not actually performing the "process_bf_data_csv" function. What am I doing wrong here?

.map tries to pull all values from your generator to form it into an iterable before actually starting the work.
Try waiting longer (till the generator runs out) or use multi threading and a queue instead.

Is there a simple process-based parallel map for python?

I'm looking for a simple process-based parallel map for python, that is, a function
parmap(function,[data])
that would run function on each element of [data] on a different process (well, on a different core, but AFAIK, the only way to run stuff on different cores in python is to start multiple interpreters), and return a list of results.
Does something like this exist? I would like something simple, so a simple module would be nice. Of course, if no such thing exists, I will settle for a big library :-/

I seems like what you need is the map method in multiprocessing.Pool():
map(func, iterable[, chunksize])
A parallel equivalent of the map() built-in function (it supports only
one iterable argument though). It blocks till the result is ready.
This method chops the iterable into a number of chunks which it submits to the
process pool as separate tasks. The (approximate) size of these chunks can be
specified by setting chunksize to a positive integ
For example, if you wanted to map this function:
def f(x):
return x**2
to range(10), you could do it using the built-in map() function:
map(f, range(10))
or using a multiprocessing.Pool() object's method map():
import multiprocessing
pool = multiprocessing.Pool()
print pool.map(f, range(10))

This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
To parallelize your example, you'd need to define your map function with the #ray.remote decorator, and then invoke it with .remote. This will ensure that every instance of the remote function will executed in a different process.
import time
import ray
ray.init()
# Define the function you want to apply map on, as remote function.
#ray.remote
def f(x):
# Do some work...
time.sleep(1)
return x*x
# Define a helper parmap(f, list) function.
# This function executes a copy of f() on each element in "list".
# Each copy of f() runs in a different process.
# Note f.remote(x) returns a future of its result (i.e.,
# an identifier of the result) rather than the result itself.
def parmap(f, list):
return [f.remote(x) for x in list]
# Call parmap() on a list consisting of first 5 integers.
result_ids = parmap(f, range(1, 6))
# Get the results
results = ray.get(result_ids)
print(results)
This will print:
[1, 4, 9, 16, 25]
and it will finish in approximately len(list)/p (rounded up the nearest integer) where p is number of cores on your machine. Assuming a machine with 2 cores, our example will execute in 5/2 rounded up, i.e, in approximately 3 sec.
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

Python3's Pool class has a map() method and that's all you need to parallelize map:
from multiprocessing import Pool
with Pool() as P:
xtransList = P.map(some_func, a_list)
Using with Pool() as P is similar to a process pool and will execute each item in the list in parallel. You can provide the number of cores:
with Pool(processes=4) as P:

For those who looking for Python equivalent of R's mclapply(), here is my implementation. It is an improvement of the following two examples:
"Parallelize Pandas map() or apply()", as mentioned by #Rafael
Valero.
How to apply map to functions with multiple arguments.
It can be apply to map functions with single or multiple arguments.
import numpy as np, pandas as pd
from scipy import sparse
import functools, multiprocessing
from multiprocessing import Pool
num_cores = multiprocessing.cpu_count()
def parallelize_dataframe(df, func, U=None, V=None):
#blockSize = 5000
num_partitions = 5 # int( np.ceil(df.shape[0]*(1.0/blockSize)) )
blocks = np.array_split(df, num_partitions)
pool = Pool(num_cores)
if V is not None and U is not None:
# apply func with multiple arguments to dataframe (i.e. involves multiple columns)
df = pd.concat(pool.map(functools.partial(func, U=U, V=V), blocks))
else:
# apply func with one argument to dataframe (i.e. involves single column)
df = pd.concat(pool.map(func, blocks))
pool.close()
pool.join()
return df
def square(x):
return x**2
def test_func(data):
print("Process working on: ", data.shape)
data["squareV"] = data["testV"].apply(square)
return data
def vecProd(row, U, V):
return np.sum( np.multiply(U[int(row["obsI"]),:], V[int(row["obsJ"]),:]) )
def mProd_func(data, U, V):
data["predV"] = data.apply( lambda row: vecProd(row, U, V), axis=1 )
return data
def generate_simulated_data():
N, D, nnz, K = [302, 184, 5000, 5]
I = np.random.choice(N, size=nnz, replace=True)
J = np.random.choice(D, size=nnz, replace=True)
vals = np.random.sample(nnz)
sparseY = sparse.csc_matrix((vals, (I, J)), shape=[N, D])
# Generate parameters U and V which could be used to reconstruct the matrix Y
U = np.random.sample(N*K).reshape([N,K])
V = np.random.sample(D*K).reshape([D,K])
return sparseY, U, V
def main():
Y, U, V = generate_simulated_data()
# find row, column indices and obvseved values for sparse matrix Y
(testI, testJ, testV) = sparse.find(Y)
colNames = ["obsI", "obsJ", "testV", "predV", "squareV"]
dtypes = {"obsI":int, "obsJ":int, "testV":float, "predV":float, "squareV": float}
obsValDF = pd.DataFrame(np.zeros((len(testV), len(colNames))), columns=colNames)
obsValDF["obsI"] = testI
obsValDF["obsJ"] = testJ
obsValDF["testV"] = testV
obsValDF = obsValDF.astype(dtype=dtypes)
print("Y.shape: {!s}, #obsVals: {}, obsValDF.shape: {!s}".format(Y.shape, len(testV), obsValDF.shape))
# calculate the square of testVals
obsValDF = parallelize_dataframe(obsValDF, test_func)
# reconstruct prediction of testVals using parameters U and V
obsValDF = parallelize_dataframe(obsValDF, mProd_func, U, V)
print("obsValDF.shape after reconstruction: {!s}".format(obsValDF.shape))
print("First 5 elements of obsValDF:\n", obsValDF.iloc[:5,:])
if __name__ == '__main__':
main()

I know this is an old post, but just in case, I wrote a tool to make this super, super easy called parmapper (I actually call it parmap in my use but the name was taken).
It handles a lot of the setup and deconstruction of processes and adds tons of features. In rough order of importance
Can take lambda and other unpickleable functions
Can apply starmap and other similar call methods to make it very easy to directly use.
Can split amongst both threads and/or processes
Includes features such as progress bars
It does incur a small cost but for most uses, that is negligible.
I hope you find it useful.
(Note: It, like map in Python 3+, returns an iterable so if you expect all results to pass through it immediately, use list())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to efficiently chain ipyparallel tasks and pass intermediate results to engines? - python

Related

Multiprocesses will not run in Parallel on Windows on Jupyter Notebook

Function that multiprocesses another function

Multiprocessing 2 different functions python3

multiprocessing.Pool.map() not working as expected

Is there a simple process-based parallel map for python?

Categories

Resources