apply_async callback function not being called - python

I am a newbie to python,i am have function that calculate feature for my data and then return a list that should be processed and written in file.,..i am using Pool to do the calculation and then and use the callback function to write into file,however the callback function is not being call,i ve put some print statement in it but it is definetly not being called.
my code looks like this:
def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
for dic in results[0]:
feature_list=[]
print(dic)
beginLine=True
for key,value in dic.items():
if(beginLine):
feature_list.append(str(value))
beginLine=False
else:
feature_list.append(str(key)+":"+str(value))
feature_line=" ".join(feature_list)
f.write(feature_line+"\n")
def generate_features(users,impressions,interactions,items,filename):
#some processing
return [result1,result2,filename]
if __name__=="__main__":
pool=mp.Pool(mp.cpu_count()-1)
for i in range(interval):
if i==interval:
pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
else:
pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
begin=begin+interval
pool.close()
pool.join()

It's not obvious from your post what is contained in the list returned by generate_features. However, if any of result1, result2, or filename are not serializable, then for some reason the multiprocessing lib will not call the callback function and will fail to do so silently. I think this is because the multiprocessing lib attempts to pickle objects before passing them back and forth between child processes and the parent process. If anything you're returning isn't "pickleable" (i.e not serializable) then the callback doesn't get called.
I've encountered this bug myself, and it turned out to be an instance of a logger object that was giving me troubles. Here is some sample code to reproduce my issue:
import multiprocessing as mp
import logging
def bad_test_func(ii):
print('Calling bad function with arg %i'%ii)
name = "file_%i.log"%ii
logging.basicConfig(filename=name,level=logging.DEBUG)
if ii < 4:
log = logging.getLogger()
else:
log = "Test log %i"%ii
return log
def good_test_func(ii):
print('Calling good function with arg %i'%ii)
instance = ('hello', 'world', ii)
return instance
def pool_test(func):
def callback(item):
print('This is the callback')
print('I have been given the following item: ')
print(item)
num_processes = 3
pool = mp.Pool(processes = num_processes)
results = []
for i in range(5):
res = pool.apply_async(func, (i,), callback=callback)
results.append(res)
pool.close()
pool.join()
def main():
print('#'*30)
print('Calling pool test with bad function')
print('#'*30)
pool_test(bad_test_func)
print('#'*30)
print('Calling pool test with good function')
print('#'*30)
pool_test(good_test_func)
if __name__ == '__main__':
main()
Hopefully this helpful and points you in the right direction.

Related

Python Multiprocessing Pool as Decorator

I'm working on code where I frequently have to use python's multiprocessing Pool class. This results in a ton of code that looks like this:
import time
from multiprocessing import Pool
from functools import partial
def test_func(x):
time.sleep(1)
return x
def test_func_parallel(iterable, processes):
p = Pool(processes=processes)
output = p.map(test_func, iterable)
p.close()
return output
This can be made more general:
def parallel(func, iterable, **kwargs):
func = partial(func, **kwargs)
p = Pool(processes=6)
out = p.map(func, iterable)
p.close()
return out
This works, but adding a parallel wrapper to every other function complicates the code. What I'd really like is to get this working as a decorator. Something like this:
def parallel(num_processes):
def parallel_decorator(func, num_processes=num_processes):
def parallel_wrapper(iterable, **kwargs):
func = partial(func, **kwargs)
p = Pool(processes=num_processes)
output = p.map(func, iterable)
p.close()
return output
return parallel_wrapper
return parallel_decorator
Which could be used as follows:
#parallel(6)
def test_func(x):
time.sleep(1)
return x
This fails for pickle reasons
Can't pickle <function test1 at 0x117473268>: it's not the same object as __main__.test1
I've read a few posts on related issues, but they all implement a solution where the multiprocessing is executed outside the decorator. Does anyone know a way to make this work?
If you don't mind to not use the syntactic sugar for decorators (# symbol), something like this should work:
import functools
import time
from multiprocessing import Pool
def parallel(func=None, **options):
if func is None:
return functools.partial(parallel, **options)
def wrapper(iterable, **kwargs):
processes = options["processes"]
with Pool(processes) as pool:
result = pool.map(func, iterable)
return result
return wrapper
def test(i):
time.sleep(1)
print(f"{i}: {i * i}")
test_parallel = parallel(test, processes=6)
def main():
test_parallel(range(10))
if __name__ == "__main__":
main()
I have the same problem. It revolves around how Pool() objects are implemented. So, it is going to work fine with a normal wrapper but not with a Decorator. The workaround is to define your own Pool()-like implementation using Process().
This can be very tricky to optimize but if you are a Decorator enthusiast here is a (dirty) example:
# something to do
args = range(10)
def parallel(function):
""" An alternative implementation to
multiprocessing.Pool().map() using
multiprocessing.Process(). """
def interfacer(args):
""" The wrapper function. """
# required libraries
from multiprocessing import (Queue, Process)
from os import cpu_count
# process control
## maximum number of processes required
max_processes = len(args)
## maximum numer of processes running
max_threads = cpu_count() - 1
""" Since there is no Pool() around
we need to take care of the processes
ourselves. If there is nothing for a
processes to do, it is going to await
for an input, if there are too many of
them, the processor shall suffer. """
# communications
## things to do
inbasket = Queue()
## things done
outbasket = Queue()
""" I am thinking asynchronouly,
there is probably a better way of
doing this. """
# populate inputs
for each in args:
## put arguments into the basket
inbasket.put(each)
def doer():
""" Feeds the targeted/decorated
'function' with data from the baskets and
collets the results.
This blind function helps the
implementation to generalize over any
iterable. """
outbasket.put(function(inbasket.get()))
return(True)
def run(processes = max_threads):
""" Create a certain number of
Process()s and runs each one.
There is room for improvements here. """
# the process pool
factory = list()
# populate the process pool
for each in range(processes):
factory.append(Process(target = doer))
# execute in process pool
for each in factory:
each.start()
each.join()
each.close()
return(True)
""" Now we need to manage the processes,
and prevent them for overwhelm the CPU.
That is the tricky part that Pool() does
so well. """
while max_processes:
# as long as there is something to do
if (max_processes - max_threads) >= 0:
run(max_threads)
max_processes -= max_threads
else:
# play it safe
run(1)
max_processes -= 1
# undo the queue and give me back the list of 'dones'
return([outbasket.get() for each in range(outbasket.qsize())])
return(interfacer)
#parallel
def test(x):
return(x**2)
print(test(args))
Probably this code is inefficient, but gives an idea.

How do I have a forever loop running in the background [duplicate]

I researched first and couldn't find an answer to my question. I am trying to run multiple functions in parallel in Python.
I have something like this:
files.py
import common #common is a util class that handles all the IO stuff
dir1 = 'C:\folder1'
dir2 = 'C:\folder2'
filename = 'test.txt'
addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]
def func1():
c = common.Common()
for i in range(len(addFiles)):
c.createFiles(addFiles[i], filename, dir1)
c.getFiles(dir1)
time.sleep(10)
c.removeFiles(addFiles[i], dir1)
c.getFiles(dir1)
def func2():
c = common.Common()
for i in range(len(addFiles)):
c.createFiles(addFiles[i], filename, dir2)
c.getFiles(dir2)
time.sleep(10)
c.removeFiles(addFiles[i], dir2)
c.getFiles(dir2)
I want to call func1 and func2 and have them run at the same time. The functions do not interact with each other or on the same object. Right now I have to wait for func1 to finish before func2 to start. How do I do something like below:
process.py
from files import func1, func2
runBothFunc(func1(), func2())
I want to be able to create both directories pretty close to the same time because every min I am counting how many files are being created. If the directory isn't there it will throw off my timing.
You could use threading or multiprocessing.
Due to peculiarities of CPython, threading is unlikely to achieve true parallelism. For this reason, multiprocessing is generally a better bet.
Here is a complete example:
from multiprocessing import Process
def func1():
print 'func1: starting'
for i in xrange(10000000): pass
print 'func1: finishing'
def func2():
print 'func2: starting'
for i in xrange(10000000): pass
print 'func2: finishing'
if __name__ == '__main__':
p1 = Process(target=func1)
p1.start()
p2 = Process(target=func2)
p2.start()
p1.join()
p2.join()
The mechanics of starting/joining child processes can easily be encapsulated into a function along the lines of your runBothFunc:
def runInParallel(*fns):
proc = []
for fn in fns:
p = Process(target=fn)
p.start()
proc.append(p)
for p in proc:
p.join()
runInParallel(func1, func2)
If your functions are mainly doing I/O work (and less CPU work) and you have Python 3.2+, you can use a ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
def run_io_tasks_in_parallel(tasks):
with ThreadPoolExecutor() as executor:
running_tasks = [executor.submit(task) for task in tasks]
for running_task in running_tasks:
running_task.result()
run_io_tasks_in_parallel([
lambda: print('IO task 1 running!'),
lambda: print('IO task 2 running!'),
])
If your functions are mainly doing CPU work (and less I/O work) and you have Python 2.6+, you can use the multiprocessing module:
from multiprocessing import Process
def run_cpu_tasks_in_parallel(tasks):
running_tasks = [Process(target=task) for task in tasks]
for running_task in running_tasks:
running_task.start()
for running_task in running_tasks:
running_task.join()
run_cpu_tasks_in_parallel([
lambda: print('CPU task 1 running!'),
lambda: print('CPU task 2 running!'),
])
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
To parallelize your example, you'd need to define your functions with the #ray.remote decorator, and then invoke them with .remote.
import ray
ray.init()
dir1 = 'C:\\folder1'
dir2 = 'C:\\folder2'
filename = 'test.txt'
addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]
# Define the functions.
# You need to pass every global variable used by the function as an argument.
# This is needed because each remote function runs in a different process,
# and thus it does not have access to the global variables defined in
# the current process.
#ray.remote
def func1(filename, addFiles, dir):
# func1() code here...
#ray.remote
def func2(filename, addFiles, dir):
# func2() code here...
# Start two tasks in the background and wait for them to finish.
ray.get([func1.remote(filename, addFiles, dir1), func2.remote(filename, addFiles, dir2)])
If you pass the same argument to both functions and the argument is large, a more efficient way to do this is using ray.put(). This avoids the large argument to be serialized twice and to create two memory copies of it:
largeData_id = ray.put(largeData)
ray.get([func1(largeData_id), func2(largeData_id)])
Important - If func1() and func2() return results, you need to rewrite the code as follows:
ret_id1 = func1.remote(filename, addFiles, dir1)
ret_id2 = func2.remote(filename, addFiles, dir2)
ret1, ret2 = ray.get([ret_id1, ret_id2])
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
Seems like you have a single function that you need to call on two different parameters. This can be elegantly done using a combination of concurrent.futures and map with Python 3.2+
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def sleep_secs(seconds):
time.sleep(seconds)
print(f'{seconds} has been processed')
secs_list = [2,4, 6, 8, 10, 12]
Now, if your operation is IO bound, then you can use the ThreadPoolExecutor as such:
with ThreadPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
Note how map is used here to map your function to the list of arguments.
Now, If your function is CPU bound, then you can use ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
If you are not sure, you can simply try both and see which one gives you better results.
Finally, if you are looking to print out your results, you can simply do this:
with ThreadPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
for result in results:
print(result)
In 2021 the easiest way is to use asyncio:
import asyncio, time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
task1 = asyncio.create_task(
say_after(4, 'hello'))
task2 = asyncio.create_task(
say_after(3, 'world'))
print(f"started at {time.strftime('%X')}")
# Wait until both tasks are completed (should take
# around 2 seconds.)
await task1
await task2
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
References:
[1] https://docs.python.org/3/library/asyncio-task.html
If you are a windows user and using python 3, then this post will help you to do parallel programming in python.when you run a usual multiprocessing library's pool programming, you will get an error regarding the main function in your program. This is because the fact that windows has no fork() functionality. The below post is giving a solution to the mentioned problem .
http://python.6.x6.nabble.com/Multiprocessing-Pool-woes-td5047050.html
Since I was using the python 3, I changed the program a little like this:
from types import FunctionType
import marshal
def _applicable(*args, **kwargs):
name = kwargs['__pw_name']
code = marshal.loads(kwargs['__pw_code'])
gbls = globals() #gbls = marshal.loads(kwargs['__pw_gbls'])
defs = marshal.loads(kwargs['__pw_defs'])
clsr = marshal.loads(kwargs['__pw_clsr'])
fdct = marshal.loads(kwargs['__pw_fdct'])
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
del kwargs['__pw_name']
del kwargs['__pw_code']
del kwargs['__pw_defs']
del kwargs['__pw_clsr']
del kwargs['__pw_fdct']
return func(*args, **kwargs)
def make_applicable(f, *args, **kwargs):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
kwargs['__pw_name'] = f.__name__ # edited
kwargs['__pw_code'] = marshal.dumps(f.__code__) # edited
kwargs['__pw_defs'] = marshal.dumps(f.__defaults__) # edited
kwargs['__pw_clsr'] = marshal.dumps(f.__closure__) # edited
kwargs['__pw_fdct'] = marshal.dumps(f.__dict__) # edited
return _applicable, args, kwargs
def _mappable(x):
x,name,code,defs,clsr,fdct = x
code = marshal.loads(code)
gbls = globals() #gbls = marshal.loads(gbls)
defs = marshal.loads(defs)
clsr = marshal.loads(clsr)
fdct = marshal.loads(fdct)
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
return func(x)
def make_mappable(f, iterable):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
name = f.__name__ # edited
code = marshal.dumps(f.__code__) # edited
defs = marshal.dumps(f.__defaults__) # edited
clsr = marshal.dumps(f.__closure__) # edited
fdct = marshal.dumps(f.__dict__) # edited
return _mappable, ((i,name,code,defs,clsr,fdct) for i in iterable)
After this function , the above problem code is also changed a little like this:
from multiprocessing import Pool
from poolable import make_applicable, make_mappable
def cube(x):
return x**3
if __name__ == "__main__":
pool = Pool(processes=2)
results = [pool.apply_async(*make_applicable(cube,x)) for x in range(1,7)]
print([result.get(timeout=10) for result in results])
And I got the output as :
[1, 8, 27, 64, 125, 216]
I am thinking that this post may be useful for some of the windows users.
There's no way to guarantee that two functions will execute in sync with each other which seems to be what you want to do.
The best you can do is to split up the function into several steps, then wait for both to finish at critical synchronization points using Process.join like #aix's answer mentions.
This is better than time.sleep(10) because you can't guarantee exact timings. With explicitly waiting, you're saying that the functions must be done executing that step before moving to the next, instead of assuming it will be done within 10ms which isn't guaranteed based on what else is going on on the machine.
(about How can I simultaneously run two (or more) functions in python?)
With asyncio, sync/async tasks could be run concurrently by:
import asyncio
import time
def function1():
# performing blocking tasks
while True:
print("function 1: blocking task ...")
time.sleep(1)
async def function2():
# perform non-blocking tasks
while True:
print("function 2: non-blocking task ...")
await asyncio.sleep(1)
async def main():
loop = asyncio.get_running_loop()
await asyncio.gather(
# https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor
loop.run_in_executor(None, function1),
function2(),
)
if __name__ == '__main__':
asyncio.run(main())

python - multiprocessing with queue

Here is my code below , I put string in queue , and hope dowork2 to do something work , and return char in shared_queue
but I always get nothing at while not shared_queue.empty()
please give me some point , thanks.
import time
import multiprocessing as mp
class Test(mp.Process):
def __init__(self, **kwargs):
mp.Process.__init__(self)
self.daemon = False
print('dosomething')
def run(self):
manager = mp.Manager()
queue = manager.Queue()
shared_queue = manager.Queue()
# shared_list = manager.list()
pool = mp.Pool()
results = []
results.append(pool.apply_async(self.dowork2,(queue,shared_queue)))
while True:
time.sleep(0.2)
t =time.time()
queue.put('abc')
queue.put('def')
l = ''
while not shared_queue.empty():
l = l + shared_queue.get()
print(l)
print( '%.4f' %(time.time()-t))
pool.close()
pool.join()
def dowork2(queue,shared_queue):
while True:
path = queue.get()
shared_queue.put(path[-1:])
if __name__ == '__main__':
t = Test()
t.start()
# t.join()
# t.run()
I managed to get it work by moving your dowork2 outside the class. If you declare dowork2 as a function before Test class and call it as
results.append(pool.apply_async(dowork2, (queue, shared_queue)))
it works as expected. I am not 100% sure but it probably goes wrong because your Test class is already subclassing Process. Now when your pool creates a subprocess and initialises the same class in the subprocess, something gets overridden somewhere.
Overall I wonder if Pool is really what you want to use here. Your worker seems to be in an infinite loop indicating you do not expect a return value from the worker, only the result in the return queue. If this is the case, you can remove Pool.
I also managed to get it work keeping your worker function within the class when I scrapped the Pool and replaced with another subprocess:
foo = mp.Process(group=None, target=self.dowork2, args=(queue, shared_queue))
foo.start()
# results.append(pool.apply_async(Test.dowork2, (queue, shared_queue)))
while True:
....
(you need to add self to your worker, though, or declare it as a static method:)
def dowork2(self, queue, shared_queue):

Can I pass a method to apply_async or map in python multiprocessing?

How can I get the following to work? The main point is that I want to run a method (and not a function) asynchronously.
from multiprocessing import Pool
class Async:
def __init__(self, pool):
self.pool = pool
self.run()
def run(self):
p.apply_async(self.f, (10, ))
def f(self, x):
print x*x
if __name__ == '__main__':
p = Pool(5)
a = Async(p)
p.close()
p.join()
This prints nothing.
The problem appears to be due to the fact that multiprocessing needs to pickle self.f while bound methods are not picklable. There is a discussion on how to solve the problem here.
The apply_async apparently creates an exception which is put inside the future returned. That's why nothing is printed. If a get is executed on the future, then the exception is raised.
Its definitely possible to thread class methods using a threadpool in python 2 - the following programme did what I would expect.
#!/usr/bin/env python
from multiprocessing.pool import ThreadPool
class TestAsync():
def __init__(self):
pool = ThreadPool(processes = 2)
async_completions = []
for a in range(2):
async_completions.append(pool.apply_async(self.print_int, ( a,)))
for completion in async_completions:
res = completion.get()
print("res = %d" % res)
def print_int(self, value):
print(value)
return (value*10)
a = TestAsync()

How to run functions in parallel?

I researched first and couldn't find an answer to my question. I am trying to run multiple functions in parallel in Python.
I have something like this:
files.py
import common #common is a util class that handles all the IO stuff
dir1 = 'C:\folder1'
dir2 = 'C:\folder2'
filename = 'test.txt'
addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]
def func1():
c = common.Common()
for i in range(len(addFiles)):
c.createFiles(addFiles[i], filename, dir1)
c.getFiles(dir1)
time.sleep(10)
c.removeFiles(addFiles[i], dir1)
c.getFiles(dir1)
def func2():
c = common.Common()
for i in range(len(addFiles)):
c.createFiles(addFiles[i], filename, dir2)
c.getFiles(dir2)
time.sleep(10)
c.removeFiles(addFiles[i], dir2)
c.getFiles(dir2)
I want to call func1 and func2 and have them run at the same time. The functions do not interact with each other or on the same object. Right now I have to wait for func1 to finish before func2 to start. How do I do something like below:
process.py
from files import func1, func2
runBothFunc(func1(), func2())
I want to be able to create both directories pretty close to the same time because every min I am counting how many files are being created. If the directory isn't there it will throw off my timing.
You could use threading or multiprocessing.
Due to peculiarities of CPython, threading is unlikely to achieve true parallelism. For this reason, multiprocessing is generally a better bet.
Here is a complete example:
from multiprocessing import Process
def func1():
print 'func1: starting'
for i in xrange(10000000): pass
print 'func1: finishing'
def func2():
print 'func2: starting'
for i in xrange(10000000): pass
print 'func2: finishing'
if __name__ == '__main__':
p1 = Process(target=func1)
p1.start()
p2 = Process(target=func2)
p2.start()
p1.join()
p2.join()
The mechanics of starting/joining child processes can easily be encapsulated into a function along the lines of your runBothFunc:
def runInParallel(*fns):
proc = []
for fn in fns:
p = Process(target=fn)
p.start()
proc.append(p)
for p in proc:
p.join()
runInParallel(func1, func2)
If your functions are mainly doing I/O work (and less CPU work) and you have Python 3.2+, you can use a ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
def run_io_tasks_in_parallel(tasks):
with ThreadPoolExecutor() as executor:
running_tasks = [executor.submit(task) for task in tasks]
for running_task in running_tasks:
running_task.result()
run_io_tasks_in_parallel([
lambda: print('IO task 1 running!'),
lambda: print('IO task 2 running!'),
])
If your functions are mainly doing CPU work (and less I/O work) and you have Python 2.6+, you can use the multiprocessing module:
from multiprocessing import Process
def run_cpu_tasks_in_parallel(tasks):
running_tasks = [Process(target=task) for task in tasks]
for running_task in running_tasks:
running_task.start()
for running_task in running_tasks:
running_task.join()
run_cpu_tasks_in_parallel([
lambda: print('CPU task 1 running!'),
lambda: print('CPU task 2 running!'),
])
This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.
To parallelize your example, you'd need to define your functions with the #ray.remote decorator, and then invoke them with .remote.
import ray
ray.init()
dir1 = 'C:\\folder1'
dir2 = 'C:\\folder2'
filename = 'test.txt'
addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]
# Define the functions.
# You need to pass every global variable used by the function as an argument.
# This is needed because each remote function runs in a different process,
# and thus it does not have access to the global variables defined in
# the current process.
#ray.remote
def func1(filename, addFiles, dir):
# func1() code here...
#ray.remote
def func2(filename, addFiles, dir):
# func2() code here...
# Start two tasks in the background and wait for them to finish.
ray.get([func1.remote(filename, addFiles, dir1), func2.remote(filename, addFiles, dir2)])
If you pass the same argument to both functions and the argument is large, a more efficient way to do this is using ray.put(). This avoids the large argument to be serialized twice and to create two memory copies of it:
largeData_id = ray.put(largeData)
ray.get([func1(largeData_id), func2(largeData_id)])
Important - If func1() and func2() return results, you need to rewrite the code as follows:
ret_id1 = func1.remote(filename, addFiles, dir1)
ret_id2 = func2.remote(filename, addFiles, dir2)
ret1, ret2 = ray.get([ret_id1, ret_id2])
There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.
Seems like you have a single function that you need to call on two different parameters. This can be elegantly done using a combination of concurrent.futures and map with Python 3.2+
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def sleep_secs(seconds):
time.sleep(seconds)
print(f'{seconds} has been processed')
secs_list = [2,4, 6, 8, 10, 12]
Now, if your operation is IO bound, then you can use the ThreadPoolExecutor as such:
with ThreadPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
Note how map is used here to map your function to the list of arguments.
Now, If your function is CPU bound, then you can use ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
If you are not sure, you can simply try both and see which one gives you better results.
Finally, if you are looking to print out your results, you can simply do this:
with ThreadPoolExecutor() as executor:
results = executor.map(sleep_secs, secs_list)
for result in results:
print(result)
In 2021 the easiest way is to use asyncio:
import asyncio, time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
task1 = asyncio.create_task(
say_after(4, 'hello'))
task2 = asyncio.create_task(
say_after(3, 'world'))
print(f"started at {time.strftime('%X')}")
# Wait until both tasks are completed (should take
# around 2 seconds.)
await task1
await task2
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
References:
[1] https://docs.python.org/3/library/asyncio-task.html
If you are a windows user and using python 3, then this post will help you to do parallel programming in python.when you run a usual multiprocessing library's pool programming, you will get an error regarding the main function in your program. This is because the fact that windows has no fork() functionality. The below post is giving a solution to the mentioned problem .
http://python.6.x6.nabble.com/Multiprocessing-Pool-woes-td5047050.html
Since I was using the python 3, I changed the program a little like this:
from types import FunctionType
import marshal
def _applicable(*args, **kwargs):
name = kwargs['__pw_name']
code = marshal.loads(kwargs['__pw_code'])
gbls = globals() #gbls = marshal.loads(kwargs['__pw_gbls'])
defs = marshal.loads(kwargs['__pw_defs'])
clsr = marshal.loads(kwargs['__pw_clsr'])
fdct = marshal.loads(kwargs['__pw_fdct'])
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
del kwargs['__pw_name']
del kwargs['__pw_code']
del kwargs['__pw_defs']
del kwargs['__pw_clsr']
del kwargs['__pw_fdct']
return func(*args, **kwargs)
def make_applicable(f, *args, **kwargs):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
kwargs['__pw_name'] = f.__name__ # edited
kwargs['__pw_code'] = marshal.dumps(f.__code__) # edited
kwargs['__pw_defs'] = marshal.dumps(f.__defaults__) # edited
kwargs['__pw_clsr'] = marshal.dumps(f.__closure__) # edited
kwargs['__pw_fdct'] = marshal.dumps(f.__dict__) # edited
return _applicable, args, kwargs
def _mappable(x):
x,name,code,defs,clsr,fdct = x
code = marshal.loads(code)
gbls = globals() #gbls = marshal.loads(gbls)
defs = marshal.loads(defs)
clsr = marshal.loads(clsr)
fdct = marshal.loads(fdct)
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
return func(x)
def make_mappable(f, iterable):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
name = f.__name__ # edited
code = marshal.dumps(f.__code__) # edited
defs = marshal.dumps(f.__defaults__) # edited
clsr = marshal.dumps(f.__closure__) # edited
fdct = marshal.dumps(f.__dict__) # edited
return _mappable, ((i,name,code,defs,clsr,fdct) for i in iterable)
After this function , the above problem code is also changed a little like this:
from multiprocessing import Pool
from poolable import make_applicable, make_mappable
def cube(x):
return x**3
if __name__ == "__main__":
pool = Pool(processes=2)
results = [pool.apply_async(*make_applicable(cube,x)) for x in range(1,7)]
print([result.get(timeout=10) for result in results])
And I got the output as :
[1, 8, 27, 64, 125, 216]
I am thinking that this post may be useful for some of the windows users.
There's no way to guarantee that two functions will execute in sync with each other which seems to be what you want to do.
The best you can do is to split up the function into several steps, then wait for both to finish at critical synchronization points using Process.join like #aix's answer mentions.
This is better than time.sleep(10) because you can't guarantee exact timings. With explicitly waiting, you're saying that the functions must be done executing that step before moving to the next, instead of assuming it will be done within 10ms which isn't guaranteed based on what else is going on on the machine.
(about How can I simultaneously run two (or more) functions in python?)
With asyncio, sync/async tasks could be run concurrently by:
import asyncio
import time
def function1():
# performing blocking tasks
while True:
print("function 1: blocking task ...")
time.sleep(1)
async def function2():
# perform non-blocking tasks
while True:
print("function 2: non-blocking task ...")
await asyncio.sleep(1)
async def main():
loop = asyncio.get_running_loop()
await asyncio.gather(
# https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor
loop.run_in_executor(None, function1),
function2(),
)
if __name__ == '__main__':
asyncio.run(main())

Categories

Resources