Call 4 methods at once in Python 3 - python

I want to call 4 methods at once so they run parallel-ly in Python. These methods make HTTP calls and do some basic operation like verify response. I want to call them at once so the time taken will be less. Say each method takes ~20min to run, I want all 4methods to return response in 20min and not 20*4 80min
It is important to note that the 4methods I'm trying to run in parallel are async functions. When I tried using ThreadPoolExecutor to run the 4methods in parallel I didn't see much difference in time taken.
Example code - edited from #tomerar comment below
from concurrent.futures import ThreadPoolExecutor
async def foo_1():
print("foo_1")
async def foo_2():
print("foo_2")
async def foo_3():
print("foo_3")
async def foo_4():
print("foo_4")
with ThreadPoolExecutor() as executor:
for foo in [await foo_1,await foo_2,await foo_3,await foo_4]:
executor.submit(foo)
Looking for suggestions

You can use from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
def foo_1():
print("foo_1")
def foo_2():
print("foo_2")
def foo_3():
print("foo_3")
def foo_4():
print("foo_4")
with ThreadPoolExecutor() as executor:
for foo in [foo_1,foo_2,foo_3,foo_4]:
executor.submit(foo)

You can use "multiprocessing" in python.
it's so simple
from multiprocessing import Pool
pool = Pool()
result1 = pool.apply_async(solve1, [A]) # evaluate "solve1(A)"
result2 = pool.apply_async(solve2, [B]) # evaluate "solve2(B)"
answer1 = result1.get(timeout=10)
answer2 = result2.get(timeout=10)
you can see full details

Related

Run multiple async loops in separate processes within a main async app

Ok so this is a bit convoluted but I have a async class with a lot of async code.
I wish to parallelize a task inside that class and I want to spawn multiple processes to run a blocking task and also within each of this processes I want to create an asyncio loop to handle various subtasks.
SO I short of managed to do this with a ThreadPollExecutor but when I try to use a ProcessPoolExecutor I get a Can't pickle local object error.
This is a simplified version of my code that runs with ThreadPoolExecutor. How can this be parallelized with ProcessPoolExecutor?
import asyncio
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
class MyClass:
def __init__(self) -> None:
self.event_loop = None
# self.pool_executor = ProcessPoolExecutor(max_workers=8)
self.pool_executor = ThreadPoolExecutor(max_workers=8)
self.words = ["one", "two", "three", "four", "five"]
self.multiplier = int(2)
async def subtask(self, letter: str):
await asyncio.sleep(1)
return letter * self.multiplier
async def task_gatherer(self, subtasks: list):
return await asyncio.gather(*subtasks)
def blocking_task(self, word: str):
time.sleep(1)
subtasks = [self.subtask(letter) for letter in word]
result = asyncio.run(self.task_gatherer(subtasks))
return result
async def master_method(self):
self.event_loop = asyncio.get_running_loop()
master_tasks = [
self.event_loop.run_in_executor(
self.pool_executor,
self.blocking_task,
word,
)
for word in self.words
]
results = await asyncio.gather(*master_tasks)
print(results)
if __name__ == "__main__":
my_class = MyClass()
asyncio.run(my_class.master_method())
This is a very good question. Both the problem and the solution are quite interesting.
The Problem
One difference between multithreading and multiprocessing is how memory is handled. Threads share a memory space. Processes do not (in general, see below).
Objects are passed to a ThreadPoolExecutor simply by reference. There is no need to create new objects.
But a ProcessPoolExecutor lives in a separate memory space. To pass objects to it, the implementation pickles the objects and unpickles them again on the other side. This detail is often important.
Look carefully at the arguments to blocking_task in the original question. I don't mean word - I mean the first argument: self. The one that's always there. We've seen it a million times and hardly even think about it. To execute the function blocking_task, a value is required for the argument named "self." To run this function in a ProcessPoolExecutor, "self" must get pickled and unpickled. Now look at some of the member objects of "self": there's an event loop and also the executor itself. Neither of which is pickleable. That's the problem.
There is no way we can run that function, as is, in another Process.
Admittedly, the traceback message "Cannot pickle local object" leaves a lot to be desired. So does the documentation. But it actually makes total sense that the program works with a ThreadPool but not with a ProcessPool.
Note: There are mechanisms for sharing ctypes objects between Processes. However, as far as I'm aware, there is no way to share Python objects directly. That's why the pickle/unpickle mechanism is used.
The Solution
Refactor MyClass to separate the data from the multiprocessing framework. I created a second class, MyTask, which can be pickled and unpickled. I moved a few of the functions from MyClass into it. Nothing of importance has been modified from the original listing - just rearranged.
The script runs successfully with both ProcessPoolExecutor and ThreadPoolExecutor.
import asyncio
import time
# from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
# Refactored MyClass to break out MyTask
class MyTask:
def __init__(self):
self.multiplier = 2
async def subtask(self, letter: str):
await asyncio.sleep(1)
return letter * self.multiplier
async def task_gatherer(self, subtasks: list):
return await asyncio.gather(*subtasks)
def blocking_task(self, word: str):
time.sleep(1)
subtasks = [self.subtask(letter) for letter in word]
result = asyncio.run(self.task_gatherer(subtasks))
return result
class MyClass:
def __init__(self):
self.task = MyTask()
self.event_loop: asyncio.AbstractEventLoop = None
self.pool_executor = ProcessPoolExecutor(max_workers=8)
# self.pool_executor = ThreadPoolExecutor(max_workers=8)
self.words = ["one", "two", "three", "four", "five"]
async def master_method(self):
self.event_loop = asyncio.get_running_loop()
master_tasks = [
self.event_loop.run_in_executor(
self.pool_executor,
self.task.blocking_task,
word,
)
for word in self.words
]
results = await asyncio.gather(*master_tasks)
print(results)
if __name__ == "__main__":
my_class = MyClass()
asyncio.run(my_class.master_method())

Is a change made to contextvars in a process pool not propagated to the main process running the asyncio loop

Below is the code snippet that I ran:
from concurrent.futures import ProcessPoolExecutor
import asyncio
import contextvars
ctx = contextvars.ContextVar('ctx', default=None)
pool = ProcessPoolExecutor(max_workers=2)
def task():
print(f'inside pool process, ctx: {ctx.get()}')
ctx.set('co co')
return ctx.get()
async def execute():
loop = asyncio.get_event_loop()
ctx.set('yo yo')
ctx_from_pool = await loop.run_in_executor(pool, task)
ctx_from_async = ctx.get()
print(f'ctx_from_async: {ctx_from_async}')
print(f'ctx_from_pool: {ctx_from_pool}')
ctx.set('bo bo')
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(execute()))
ctx_from_main = ctx.get()
print(f'ctx_from_main: {ctx_from_main}')
main()
Output:
inside pool process, ctx: yo yo
ctx_from_async: yo yo
ctx_from_pool: co co
ctx_from_main: None
My understanding is that the reason for contextvars change ctx.set('co co') made by process pool to not propagated to the main process is that when assigning the task, a copy of the variables are made using pickle due to which the change was made on a different copy of the variable rather than the one which is accessed by the main process. However, I am not completely sure of this as I don't have a lot of experience with ProcessPoolExecutor.
Could someone throw some additional light on the same? Also, what can be done to have a seamless manipulation of the contextvars across asyncio loop and process pool executor?

Asyncio performance with synchronous code

I create the following test to check performance with running synchronous code in async function.
In return_random function can be something like write log, dump or load json, validate in-out date, which call other functions... etc.
count_sync and count_async variables using for skip overhead for open and close event loop. just calculate time inside function.
This part of code just call synchronous function count times.
import timeit
from time import time
from random import random
count = 100
run_numbers = 100000
count_sync = 0
def return_random():
return random()
def test():
global count_sync
start = time()
for _ in range(count):
return_random()
count_sync += time() - start
return
total_sunc = timeit.timeit('test()', globals=globals(),
number=run_numbers))
Same code but now return_random is asynchronous function:
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
count_async = 0
async def return_random_async():
return random()
async def test_async():
global count_async
start = time()
for _ in range(count):
await return_random_async()
count_async += time() - start
return
total_sunc = timeit.timeit('asyncio.run(test_async())', globals=globals(), number=run_numbers)
After running code with different numbers of call function and count of timeit running got following results:
RUNNING run_numbers: 1000. CALL FUNCTIONS count: 1000
total sync: 0.12023316
total Async: 0.48369559500000003
inside def sync 0.11995530128479004
inside def Async:0.24073457717895508
RUNNING run_numbers: 100000. CALL FUNCTIONS count: 100
total sync: 1.422697458
total Async: 25.452165134999998 (!!!)
inside def sync: 1.3965537548065186
inside def Async: 2.8397130966186523
All times run with synchronous function faster more than 2 times.
Is it means that running synchronous code better with not async function?
And preferably do not use a lot async functions ?
You need to use async functions only when you really need. Examples: asynchronous http libraries like aiohttp, asynchronous drivers like motor_asyncio for MongoDB, etc. In other cases it's better to run synchronous code with not async functions, because they have an overhead that you don't need to have.

How to use concurrent.futures in Python

Im struggling to get multithreading working in Python. I have i function which i want to execute on 5 threads based on a parameter. I also needs 2 parameters that are the same for every thread. This is what i have:
from concurrent.futures import ThreadPoolExecutor
def do_something_parallel(sameValue1, sameValue2, differentValue):
print(str(sameValue1)) #same everytime
print(str(sameValue2)) #same everytime
print(str(differentValue)) #different
main():
differentValues = ["1000ms", "100ms", "10ms", "20ms", "50ms"]
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(do_something_parallel, sameValue1, sameValue2, differentValue) for differentValue in differentValues]
But i don't know what to do next
If you don't care about the order, you can now do:
from concurrent.futures import as_completed
# The rest of your code here
for f in as_completed(futures):
# Do what you want with f.result(), for example:
print(f.result())
Otherwise, if you care about order, it might make sense to use ThreadPoolExecutor.map with functools.partial to fill in the arguments that are always the same:
from functools import partial
# The rest of your code...
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(
partial(do_something_parallel, sameValue1, sameValue2),
differentValues
))

Asynchronous method call in Python?

I was wondering if there's any library for asynchronous method calls in Python. It would be great if you could do something like
#async
def longComputation():
<code>
token = longComputation()
token.registerCallback(callback_function)
# alternative, polling
while not token.finished():
doSomethingElse()
if token.finished():
result = token.result()
Or to call a non-async routine asynchronously
def longComputation()
<code>
token = asynccall(longComputation())
It would be great to have a more refined strategy as native in the language core. Was this considered?
Something like:
import threading
thr = threading.Thread(target=foo, args=(), kwargs={})
thr.start() # Will run "foo"
....
thr.is_alive() # Will return whether foo is running currently
....
thr.join() # Will wait till "foo" is done
See the documentation at https://docs.python.org/library/threading.html for more details.
You can use the multiprocessing module added in Python 2.6. You can use pools of processes and then get results asynchronously with:
apply_async(func[, args[, kwds[, callback]]])
E.g.:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=1) # Start a worker processes.
result = pool.apply_async(f, [10], callback) # Evaluate "f(10)" asynchronously calling callback when finished.
This is only one alternative. This module provides lots of facilities to achieve what you want. Also it will be really easy to make a decorator from this.
As of Python 3.5, you can use enhanced generators for async functions.
import asyncio
import datetime
Enhanced generator syntax:
#asyncio.coroutine
def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
yield from asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
New async/await syntax:
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
It's not in the language core, but a very mature library that does what you want is Twisted. It introduces the Deferred object, which you can attach callbacks or error handlers ("errbacks") to. A Deferred is basically a "promise" that a function will have a result eventually.
You can implement a decorator to make your functions asynchronous, though that's a bit tricky. The multiprocessing module is full of little quirks and seemingly arbitrary restrictions – all the more reason to encapsulate it behind a friendly interface, though.
from inspect import getmodule
from multiprocessing import Pool
def async(decorated):
r'''Wraps a top-level function around an asynchronous dispatcher.
when the decorated function is called, a task is submitted to a
process pool, and a future object is returned, providing access to an
eventual return value.
The future object has a blocking get() method to access the task
result: it will return immediately if the job is already done, or block
until it completes.
This decorator won't work on methods, due to limitations in Python's
pickling machinery (in principle methods could be made pickleable, but
good luck on that).
'''
# Keeps the original function visible from the module global namespace,
# under a name consistent to its __name__ attribute. This is necessary for
# the multiprocessing pickling machinery to work properly.
module = getmodule(decorated)
decorated.__name__ += '_original'
setattr(module, decorated.__name__, decorated)
def send(*args, **opts):
return async.pool.apply_async(decorated, args, opts)
return send
The code below illustrates usage of the decorator:
#async
def printsum(uid, values):
summed = 0
for value in values:
summed += value
print("Worker %i: sum value is %i" % (uid, summed))
return (uid, summed)
if __name__ == '__main__':
from random import sample
# The process pool must be created inside __main__.
async.pool = Pool(4)
p = range(0, 1000)
results = []
for i in range(4):
result = printsum(i, sample(p, 100))
results.append(result)
for result in results:
print("Worker %i: sum value is %i" % result.get())
In a real-world case I would ellaborate a bit more on the decorator, providing some way to turn it off for debugging (while keeping the future interface in place), or maybe a facility for dealing with exceptions; but I think this demonstrates the principle well enough.
Just
import threading, time
def f():
print "f started"
time.sleep(3)
print "f finished"
threading.Thread(target=f).start()
My solution is:
import threading
class TimeoutError(RuntimeError):
pass
class AsyncCall(object):
def __init__(self, fnc, callback = None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
self.Thread = threading.Thread(target = self.run, name = self.Callable.__name__, args = args, kwargs = kwargs)
self.Thread.start()
return self
def wait(self, timeout = None):
self.Thread.join(timeout)
if self.Thread.isAlive():
raise TimeoutError()
else:
return self.Result
def run(self, *args, **kwargs):
self.Result = self.Callable(*args, **kwargs)
if self.Callback:
self.Callback(self.Result)
class AsyncMethod(object):
def __init__(self, fnc, callback=None):
self.Callable = fnc
self.Callback = callback
def __call__(self, *args, **kwargs):
return AsyncCall(self.Callable, self.Callback)(*args, **kwargs)
def Async(fnc = None, callback = None):
if fnc == None:
def AddAsyncCallback(fnc):
return AsyncMethod(fnc, callback)
return AddAsyncCallback
else:
return AsyncMethod(fnc, callback)
And works exactly as requested:
#Async
def fnc():
pass
You could use eventlet. It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.
Here's an example of a super minimal crawler:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
Something like this works for me, you can then call the function, and it will dispatch itself onto a new thread.
from thread import start_new_thread
def dowork(asynchronous=True):
if asynchronous:
args = (False)
start_new_thread(dowork,args) #Call itself on a new thread.
else:
while True:
#do something...
time.sleep(60) #sleep for a minute
return
You can use concurrent.futures (added in Python 3.2).
import time
from concurrent.futures import ThreadPoolExecutor
def long_computation(duration):
for x in range(0, duration):
print(x)
time.sleep(1)
return duration * 2
print('Use polling')
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(long_computation, 5)
while not future.done():
print('waiting...')
time.sleep(0.5)
print(future.result())
print('Use callback')
executor = ThreadPoolExecutor(max_workers=1)
future = executor.submit(long_computation, 5)
future.add_done_callback(lambda f: print(f.result()))
print('waiting for callback')
executor.shutdown(False) # non-blocking
print('shutdown invoked')
The newer asyncio running method in Python 3.7 and later is using asyncio.run() instead of creating loop and calling loop.run_until_complete() as well as closing it:
import asyncio
import datetime
async def display_date(delay):
loop = asyncio.get_running_loop()
end_time = loop.time() + delay
while True:
print("Blocking...", datetime.datetime.now())
await asyncio.sleep(1)
if loop.time() > end_time:
print("Done.")
break
asyncio.run(display_date(5))
Is there any reason not to use threads? You can use the threading class.
Instead of finished() function use the isAlive(). The result() function could join() the thread and retrieve the result. And, if you can, override the run() and __init__ functions to call the function specified in the constructor and save the value somewhere to the instance of the class.
The native Python way for asynchronous calls in 2021 with Python 3.9 suitable also for Jupyter / Ipython Kernel
Camabeh's answer is the way to go since Python 3.3.
async def display_date(loop):
end_time = loop.time() + 5.0
while True:
print(datetime.datetime.now())
if (loop.time() + 1.0) >= end_time:
break
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
# Blocking call which returns when the display_date() coroutine is done
loop.run_until_complete(display_date(loop))
loop.close()
This will work in Jupyter Notebook / Jupyter Lab but throw an error:
RuntimeError: This event loop is already running
Due to Ipython's usage of event loops we need something called nested asynchronous loops which is not yet implemented in Python. Luckily there is nest_asyncio to deal with the issue. All you need to do is:
!pip install nest_asyncio # use ! within Jupyter Notebook, else pip install in shell
import nest_asyncio
nest_asyncio.apply()
(Based on this thread)
Only when you call loop.close() it throws another error as it probably refers to Ipython's main loop.
RuntimeError: Cannot close a running event loop
I'll update this answer as soon as someone answered to this github issue.
You can use process. If you want to run it forever use while (like networking) in you function:
from multiprocessing import Process
def foo():
while 1:
# Do something
p = Process(target = foo)
p.start()
if you just want to run it one time, do like that:
from multiprocessing import Process
def foo():
# Do something
p = Process(target = foo)
p.start()
p.join()

Categories

Resources