Parallelize nested functions in Python

Parallelize nested functions in Python - python

As a Python beginner, I'm trying to parallelize some sections of a function that serves as an input to an optimization routine. This function f returns the log-likelihood, the gradient and the hessian for a given vector b. In this function there are three independent loop functions: loop_1, loop_2, and loop_3.
What is the most efficient implementation? Parallelizing the three loop functions in three concurrent processes or parallelizing one loop at a time? And how can this be implemented? When using the multiprocessing package I get a 'pickle' error, as my nested loop functions are not in the general namespace.
def f(b):
# Do something computational intensive on b
def calc(i, j):
return u, v, w
def loop_1():
for i in range(1:1000):
c, d, e = calc(i, 0)
for j in range(1:200):
f, g, h = calc(i, j)
return x, y, z
def loop_2():
# similar to loop_1
def loop_3():
# similar to loop_1
# Aggregate results from the three loops
return u, v, w

There are several ways to avoid the pickling error you receive.
An option could be asynchronous, if makes sense to do so. Sometimes it makes it slower, sometimes it makes it slower.
In that case it would look something like the code bellow, I use it as a templet when I forget things:
import asyncio
def f():
async def factorial(n):
f.p = 2
await asyncio.sleep(0.2)
return 1 if n < 2 else n * await factorial(n-1)
async def multiply(n, k):
await asyncio.sleep(0.2)
return sum(n for _ in range(k))
async def power(n, k):
await asyncio.sleep(0.2)
return await multiply(n, await power(n, k-1)) if k != 0 else 1
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(power(2, 5)),
asyncio.ensure_future(factorial(5))]
f.p = 0
ans = tuple(loop.run_until_complete(asyncio.gather(*tasks)))
print(f.p)
return ans
if __name__ == '__main__':
print(f())
Async and await is builtin keywords like def, for, in and such in python3.5.
Another work around with functions in functions is to use threads instead.
from concurrent.futures import ThreadPoolExecutor
import time
def f():
def factorial(n):
f.p = 2
time.sleep(0.2)
return 1 if n < 2 else n*factorial(n-1)
def multiply(n, k):
time.sleep(0.2)
return sum(n for _ in range(k))
def power(n, k):
time.sleep(0.2)
return multiply(n, power(n, k-1)) if k != 0 else 1
def calculate(func, args):
return func(*args)
def calculate_star(args):
return calculate(*args)
pool = ThreadPoolExecutor()
tasks = [(power, (2, 5)), (factorial, (5, ))]
f.p = 0
result = list(pool.map(calculate_star, tasks))
print(f.p)
return result
if __name__ == '__main__':
print(f())

You should start your function in a pool of process.
import multiprocessing
pool = multiprocessing.Pool()
for i in range(3):
if i == 0:
pool.apply_async(loop_1)
elif i == 1:
pool.apply_async(loop_2)
if i == 2:
pool.apply_async(loop_3)
pool.close()
If loop_1,loop_2 and loop_3 are the same functions with same operations you simply can call loop_3 three times.

Related

concurrent.futures.as_completed(...) left hanging after jobs have been submitted to ProcessPoolExecutor

My code is similar to the example below. jobs1 and jobs2 would be calls do different functions: one is camelot-py::read_pdf and another is a call to a library that makes a (blocking) request.
from concurrent import futures
import time
n =200
t0 = time.time()
def f(x):
t=x
while x < 100:
x += 1
print("f\t", t)
return x
def h(jobs):
all = []
for job in futures.as_completed(jobs):
f_res = job.result()
all.append(f_res)
print("h\t", f_res)
return all
def g(x, p_executor: futures.ProcessPoolExecutor):
# IO-bound task except for calling f/h
time.sleep(1)
jobs1 = [p_executor.submit(f, x) for x in [x,x+1,x+2]]
jobs2 = [p_executor.submit(f, x) for x in [x,x-1,x-2]]
h(jobs1)
h(jobs2)
return x
with futures.ProcessPoolExecutor(max_workers=4) as p_executor:
with futures.ThreadPoolExecutor() as t_executor:
jobs = [t_executor.submit(g, x, p_executor) for x in range(n)]
all = set(range(n))
for job in futures.as_completed(jobs):
x = job.result()
all.remove(x)
print("main loop\t",x)
print("missing \t", all)
While this example runs without problems on my machine, my original code doesn't. The ProcessPoolExecutor is meant to run the function below, but, for some cases does not print "end", leaving my equivalent of the function h above (and g and the main) hanging.
def camelot_extraction(pickled_read_pdf, fixed_options, options):
unpickled_read_pdf = pickle.loads(pickled_read_pdf)
print("start")
x = (options, unpickled_read_pdf(**fixed_options, **options))
print("end")
return x
I got rid of the ThreadPoolExecutor and am able to run the main loop sequentially with the expected results, but i don't understand how this can be happening in this alternative in which i make use of both the Thread and Process Pool executors.
Any idea about what might be happening?
Thanks a lot!

Correctly Implementing (concurrent) Asyncio for multiple functions

I have couple of functions and there execution is not dependent each other. What I am trying to do is execute them concurrently instead of sequentially (synchronous). I have added event loop as well, but I am not able to figure out if it is working correctly or not.
This is the implementation:
File 1:
import file2
def funcA():
a, b = 1, 2
file2.main(a, b)
File2:
def main(a, b):
asyncio.get_event_loop().run_until_complete(_main(a, b))
async def _main(a, b):
out1 = await funcA(a, b)
out2 = await funcB(a, b)
out3 = await funcC(a, b)
async def funcA(a, b):
result = 1 # some processing done here
return result
async def funcB(a, b):
result = 1 # some processing done here
return result
async def funcC(a, b):
result = 1 # some processing done here
return result
I am not able to figure out if these are working concurrently or not. I am adding time.sleep(10) in any function, executions stops there. I don't want them to run in background as I need output from those functions. Please help guys.

One way to do what you want would be to use asyncio.run() in main and then gather in the async version of main. To simulate long processing, use asyncio.sleep() See the following code:
import asyncio
def main(a, b):
res = asyncio.run(async_main(a, b))
print(f"in main, result is {res}")
async def funcA(a, b):
print('funcA - start')
await asyncio.sleep(3)
result = (a+b) # some processing done here
print('funcA - end')
return result
async def funcB(a, b):
print('funcB - start')
await asyncio.sleep(3)
result = (a+b)*2 # some processing done here
print('funcB - end')
return result
async def funcC(a, b):
print('funcC - start')
await asyncio.sleep(3)
result = (a+b)*3 # some processing done here
print('funcC - end')
return result
async def async_main(a, b):
print("in async_main")
res = await asyncio.gather(funcA(a, b), funcB(a, b), funcC(a, b))
print(f"in async_main, result is {res}")
return res
if __name__ == "__main__":
main(1, 2)
The result is:
in async_main
funcA - start
funcB - start
funcC - start
funcA - end
funcB - end
funcC - end
in async_main, result is [3, 6, 9]
in main, result is [3, 6, 9]

How to pass multiprocessing.Pool instance to apply_async callback function?

Here is my prime factorization program,i added a callback function in pool.apply_async(findK, args=(N,begin,end)),a message prompt out prime factorization is over when factorization is over,it works fine.
import math
import multiprocessing
def findK(N,begin,end):
for k in range(begin,end):
if N% k == 0:
print(N,"=" ,k ,"*", N/k)
return True
return False
def prompt(result):
if result:
print("prime factorization is over")
def mainFun(N,process_num):
pool = multiprocessing.Pool(process_num)
for i in range(process_num):
if i ==0 :
begin =2
else:
begin = int(math.sqrt(N)/process_num*i)+1
end = int(math.sqrt(N)/process_num*(i+1))
pool.apply_async(findK, args=(N,begin,end) , callback = prompt)
pool.close()
pool.join()
if __name__ == "__main__":
N = 684568031001583853
process_num = 16
mainFun(N,process_num)
Now i want to change the callback function in apply_async,to change prompt into a shutdown function to kill all other process.
def prompt(result):
if result:
pool.terminate()
The pool instance is not defined in prompt scope or passed into prompt.
pool.terminate() can't work in prompt function.
How to pass multiprocessing.Pool instance to apply_async'callback function ?
(I have made it done in class format,just to add a class method and call self.pool.terminate can kill all other process,
how to do the job in function format?)
if not set pool as global variable, can pool be passed into callback function?

Passing extra arguments to the callback function is not supported. Yet you have plenty of elegant ways to workaround that.
You can encapsulate your pool logic into an object:
class Executor:
def __init__(self, process_num):
self.pool = multiprocessing.Pool(process_num)
def prompt(self, result):
if result:
print("prime factorization is over")
self.pool.terminate()
def schedule(self, function, args):
self.pool.apply_async(function, args=args, callback=self.prompt)
def wait(self):
self.pool.close()
self.pool.join()
def main(N,process_num):
executor = Executor(process_num)
for i in range(process_num):
...
executor.schedule(findK, (N,begin,end))
executor.wait()
Or you can use the concurrent.futures.Executor implementation which returns a Future object. You just append the pool to the Future object before setting the callback.
def prompt(future):
if future.result():
print("prime factorization is over")
future.pool_executor.shutdown(wait=False)
def main(N,process_num):
executor = concurrent.futures.ProcessPoolExecutor(max_workers=process_num)
for i in range(process_num):
...
future = executor.submit(findK, N,begin,end)
future.pool_executor = executor
future.add_done_callback(prompt)

You can simply define a local close function as a callback:
import math
import multiprocessing
def findK(N, begin, end):
for k in range(begin, end):
if N % k == 0:
print(N, "=", k, "*", N / k)
return True
return False
def mainFun(N, process_num):
pool = multiprocessing.Pool(process_num)
def close(result):
if result:
print("prime factorization is over")
pool.terminate()
for i in range(process_num):
if i == 0:
begin = 2
else:
begin = int(math.sqrt(N) / process_num * i) + 1
end = int(math.sqrt(N) / process_num * (i + 1))
pool.apply_async(findK, args=(N, begin, end), callback=close)
pool.close()
pool.join()
if __name__ == "__main__":
N = 684568031001583853
process_num = 16
mainFun(N, process_num)
You can also use a partial function from functool, with
import functools
def close_pool(pool, results):
if result:
pool.terminate()
def mainFun(N, process_num):
pool = multiprocessing.Pool(process_num)
close = funtools.partial(close_pool, pool)
....

You need to have pool end up in prompt's environment. One possibility is to move pool into the global scope (though this isn't really best-practice). This appears to work:
import math
import multiprocessing
pool = None
def findK(N,begin,end):
for k in range(begin,end):
if N% k == 0:
print(N,"=" ,k ,"*", N/k)
return True
return False
def prompt(result):
if result:
print("prime factorization is over")
pool.terminate()
def mainFun(N,process_num):
global pool
pool = multiprocessing.Pool(process_num)
for i in range(process_num):
if i ==0 :
begin =2
else:
begin = int(math.sqrt(N)/process_num*i)+1
end = int(math.sqrt(N)/process_num*(i+1))
pool.apply_async(findK, args=(N,begin,end) , callback = prompt)
pool.close()
pool.join()
if __name__ == "__main__":
N = 684568031001583853
process_num = 16
mainFun(N,process_num)

concurrent.futures.ThreadPoolExecutor is slower than for list comprehension

I'm testing a trivial function using list comprehension vs concurrent.futures:
class Test:
#staticmethod
def something(times = 1):
return sum([1 for i in range(times)])
#staticmethod
def simulate1(function, N):
l = []
for i in range(N):
outcome = function()
l.append(outcome)
return sum(l) / N
#staticmethod
def simulate2(function, N):
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
l = [outcome for outcome in executor.map(lambda x: function(), range(N))]
return sum(l) / N
#staticmethod
def simulate3(function, N):
import concurrent.futures
l = 0
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(function) for i in range(N)]
for future in concurrent.futures.as_completed(futures):
l += future.result()
return l / N
def simulation():
simulationRate = 100000
import datetime
s = datetime.datetime.now()
o = Test.simulate1(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
s = datetime.datetime.now()
o = Test.simulate2(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
s = datetime.datetime.now()
o = Test.simulate3(lambda : Test.something(10), simulationRate)
print((datetime.datetime.now() - s))
simulation()
Measuring the time, I get:
0:00:00.258000
0:00:10.348000
0:00:10.556000
I'm getting started with concurrency so I don't understand what is the bottleneck that prevents the threads to run faster.

if you change your task function to this, you will see the difference:
def something(n):
""" simulate doing some io based task.
"""
time.sleep(0.001)
return sum(1 for i in range(n))
On my mac pro, this gives:
0:00:13.774700
0:00:01.591226
0:00:01.489159
The concurrent.future is obvious more faster this time.
The reason is that: you are simulating a cpu based task, because of python's GIL, concurrent.future make it slower.
concurrent.future provides a high-level interface for asynchronously executing callables, you are using it for wrong scene.

Python Threads are not Improving Speed

In order to speed up a certain list processing logic, I wrote a decorator that would 1) intercept incoming function call 2) take its input list, break it into multiple pieces 4) pass these pieces to the original function on seperate threads 5) combine output and return
I thought it was a pretty neat idea, until I coded it and saw there was no change in speed! Even though I see multiple cores busy on htop, multithreaded version is actually slower than the single thread version.
Does this have to do with the infamous cpython GIL?
Thanks!
from threading import Thread
import numpy as np
import time
# breaks a list into n list of lists
def split(a, n):
k, m = len(a) / n, len(a) % n
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
THREAD_NUM = 8
def parallel_compute(fn):
class Worker(Thread):
def __init__(self, *args):
Thread.__init__(self)
self.result = None
self.args = args
def run(self):
self.result = fn(*self.args)
def new_compute(*args, **kwargs):
threads = [Worker(args[0], args[1], args[2], x) for x in split(args[3], THREAD_NUM)]
for x in threads: x.start()
for x in threads: x.join()
final_res = []
for x in threads: final_res.extend(x.result)
return final_res
return new_compute
# some function that does a lot of computation
def f(x): return np.abs(np.tan(np.cos(np.sqrt(x**2))))
class Foo:
#parallel_compute
def compute(self, bla, blah, input_list):
return map(f, input_list)
inp = [i for i in range(40*1000*100)]
#inp = [1,2,3,4,5,6,7]
if __name__ == "__main__":
o = Foo()
start = time.time()
res = o.compute(None, None, inp)
end = time.time()
print 'parallel', end - start
Single thread version
import time, fast_one, numpy as np
class SlowFoo:
def compute(self, bla, blah, input_list):
return map(fast_one.f, input_list)
if __name__ == "__main__":
o = SlowFoo()
start = time.time()
res = np.array(o.compute(None, None, fast_one.inp))
end = time.time()
print 'single', end - start
And here is the multiprocessing version that gives "PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed".
import pathos.multiprocessing as mp
import numpy as np, dill
import time
def split(a, n):
k, m = len(a) / n, len(a) % n
return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))
def f(x): return np.abs(np.tan(np.cos(np.sqrt(x**2))))
def compute(input_list):
return map(f, input_list)
D = 2; pool = mp.Pool(D)
def parallel_compute(fn):
def new_compute(*args, **kwargs):
inp = []
for x in split(args[0], D): inp.append(x)
outputs_async = pool.map_async(fn, inp)
outputs = outputs_async.get()
outputs = [y for x in outputs for y in x]
return outputs
return new_compute
compute = parallel_compute(compute)
inp = [i for i in range(40*1000)]
if __name__ == "__main__":
start = time.time()
res = compute(inp)
end = time.time()
print 'parallel', end - start
print len(res)

Yes, when your threads are doing CPU-bound work implemented in Python (not by, say, C extensions which can release the GIL before and after marshalling/demarshalling data from Python structures), the GIL is a problem here.
I'd suggest using a multiprocessing model, a Python implementation that doesn't have it (IronPython, Jython, etc), or a different language altogether (if you're doing performance-sensitive work, there's no end of languages nearly as fluid as Python but with considerably better runtime performance).

Alternatively you can redsign and start all parallel Code in subprocesses.
You need worker-threads which start a subprocess for calculation.
Those subprocesses can run really parallel.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize nested functions in Python - python

Related

concurrent.futures.as_completed(...) left hanging after jobs have been submitted to ProcessPoolExecutor

Correctly Implementing (concurrent) Asyncio for multiple functions

How to pass multiprocessing.Pool instance to apply_async callback function?

concurrent.futures.ThreadPoolExecutor is slower than for list comprehension

Python Threads are not Improving Speed

Categories

Resources