Background: I'm a very experienced Python programmer who is completely clueless about the new coroutines/async/await features. I can't write an async "hello world" to save my life.
My question is: I am given an arbitrary coroutine function f. I want to write a coroutine function g that will wrap f, i.e. I will give g to the user as if it was f, and the user will call it and be none the wiser, since g will be using f under the hood. Like when you decorate a normal Python function to add functionality.
The functionality that I want to add: Whenever the program flow goes into my coroutine, it acquires a context manager that I provide, and as soon as program flow goes out of the coroutine, it releases that context manager. Flow comes back in? Re-acquire the context manager. It goes back out? Re-release it. Until the coroutine is completely finished.
To demonstrate, here is the described functionality with plain generators:
def generator_wrapper(_, *args, **kwargs):
gen = function(*args, **kwargs)
method, incoming = gen.send, None
while True:
with self:
outgoing = method(incoming)
try:
method, incoming = gen.send, (yield outgoing)
except Exception as e:
method, incoming = gen.throw, e
Is it possible to do it with coroutines?
Coroutines are built on iterators - the __await__ special method is a regular iterator. This allows you to wrap the underlying iterator in yet another iterator. The trick is that you must unwrap the iterator of your target using its __await__, then re-wrap your own iterator using your own __await__.
The core functionality that works on instantiated coroutines looks like this:
class CoroWrapper:
"""Wrap ``target`` to have every send issued in a ``context``"""
def __init__(self, target: 'Coroutine', context: 'ContextManager'):
self.target = target
self.context = context
# wrap an iterator for use with 'await'
def __await__(self):
# unwrap the underlying iterator
target_iter = self.target.__await__()
# emulate 'yield from'
iter_send, iter_throw = target_iter.send, target_iter.throw
send, message = iter_send, None
while True:
# communicate with the target coroutine
try:
with self.context:
signal = send(message)
except StopIteration as err:
return err.value
else:
send = iter_send
# communicate with the ambient event loop
try:
message = yield signal
except BaseException as err:
send, message = iter_throw, err
Note that this explicitly works on a Coroutine, not an Awaitable - Coroutine.__await__ implements the generator interface. In theory, an Awaitable does not necessarily provide __await__().send or __await__().throw.
This is enough to pass messages in and out:
import asyncio
class PrintContext:
def __enter__(self):
print('enter')
def __exit__(self, exc_type, exc_val, exc_tb):
print('exit via', exc_type)
return False
async def main_coro():
print(
'wrapper returned',
await CoroWrapper(test_coro(), PrintContext())
)
async def test_coro(delay=0.5):
await asyncio.sleep(delay)
return 2
asyncio.run(main_coro())
# enter
# exit via None
# enter
# exit <class 'StopIteration'>
# wrapper returned 2
You can delegate the wrapping part to a separate decorator. This also ensures that you have an actual coroutine, not a custom class - some async libraries require this.
from functools import wraps
def send_context(context: 'ContextManager'):
"""Wrap a coroutine to issue every send in a context"""
def coro_wrapper(target: 'Callable[..., Coroutine]') -> 'Callable[..., Coroutine]':
#wraps(target)
async def context_coroutine(*args, **kwargs):
return await CoroWrapper(target(*args, **kwargs), context)
return context_coroutine
return coro_wrapper
This allows you to directly decorate a coroutine function:
#send_context(PrintContext())
async def test_coro(delay=0.5):
await asyncio.sleep(delay)
return 2
print('async run returned:', asyncio.run(test_coro()))
# enter
# exit via None
# enter
# exit via <class 'StopIteration'>
# async run returned: 2
Related
I'm trying to run a function on separate threads using asyncio and futures. I have a decorator which takes the long running function and its argument asynchronously and outputs its value. Unfortunately the processes seem to not be working asynchronously.
def multiprocess(self, function, executor=None, *args, **kwargs):
async def run_task(function, *args, **kwargs):
#functools.wraps(function)
async def wrap(*args, **kwargs):
while True:
execution_runner = executor or self._DEFAULT_POOL_
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = await asyncio.wrap_future(executed_job)
return future
return wrap
return asyncio.run(run_task(function, *args, **kwargs))
To call the decorator I have two functions _async_task and task_function. _async_task contains a loop that runs task_function for each document that needs to be processed.
#staticmethod
def _async_task(documents):
processed_docs = asyncio.run(task_function(documents))
return processed_docs
task_function processes each document in documents as below,
#multiprocess
async def task_function(documents):
processed_documents = None
try:
for doc in documents:
processed_documents = process_document(doc)
print(processed_documents)
except Exception as err:
print(err)
return processed_documents
The clue that this doesn't work asynchronously is that the diagnostics I have for the multithreading decorator print the following.
Pending summarise_news: 0 jobs
Threads: summarise_news: 2
Since there's no pending jobs and the entire process takes as long as the synchronous run, it's running synchronously.
I had some issues setting up your code, but I think I've come up with an answer.
First of all, as #dano mentioned in his comment, asyncio.run blocks until the coroutine running is completed. Thus, you won't get any speedup from using this approach.
I used a slightly modified multiprocess decorator
def multiprocess(executor=None, *args, **kwargs):
def run_task(function, *args, **kwargs):
def wrap(*args, **kwargs):
execution_runner = executor or DEFAULT_EXECUTOR
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = asyncio.wrap_future(executed_job)
return future
return wrap
return run_task
As you can see, there's no asyncio.run here, and both the decorator and inner wrapper are synchronous since asyncio.wrap_future does not need await.
The updated multiprocess decorator is now used with process_document function. The reason for that is you won't get any benefit of parallelizing a function that calls blocking functions in a sequence. You have to convert your blocking function to be runnable in an executor instead.
NOTE that this dummy process_document is exactly like I described - fully blocking and synchronous.
#multiprocess()
def process_document(doc):
print(f"Processing doc: {doc}...")
time.sleep(2)
print(f"Doc {doc} done.")
Now, to the last point. We already made process_document kind of asynchronous by converting it to be runnable in an executor, BUT it still matters HOW exactly you invoke it.
Consider the following examples:
for doc in documents:
result = await process_document(doc)
results = await asyncio.gather(*[process_document(doc) for doc in documents])
In the former one, we will wait for coroutines sequentially, having to wait until one finishes before starting another.
In the latter example, they will be executed in parallel, so it really depends on how exactly you invoke your coroutine execution.
Here's the full code snipped I used:
import asyncio
import concurrent.futures
import time
DEFAULT_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=4)
def multiprocess(executor=None, *args, **kwargs):
def run_task(function, *args, **kwargs):
def wrap(*args, **kwargs):
execution_runner = executor or DEFAULT_EXECUTOR
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = asyncio.wrap_future(executed_job)
return future
return wrap
return run_task
#multiprocess()
def process_document(doc):
print(f"Processing doc: {doc}...")
time.sleep(2)
print(f"Doc {doc} done.")
async def task_function_sequential(documents):
start = time.time()
for doc in documents:
await process_document(doc)
end = time.time()
print(f"task_function_sequential took: {end-start}s")
async def task_function_parallel(documents):
start = time.time()
jobs = [process_document(doc) for doc in documents]
await asyncio.gather(*jobs)
end = time.time()
print(f"task_function_parallel took: {end-start}s")
async def main():
documents = [i for i in range(5)]
await task_function_sequential(documents)
await task_function_parallel(documents)
asyncio.run(main())
Notice that the task_function_parallel example still takes around 4 seconds, instead of 2, because the thread pool is limited to 4 workers, and the number of jobs is 5, so the last job will be waiting for some workers to be available.
I would like to read from multiple simultanous HTTP streaming requests inside coroutines using httpx, and yield the data back to my non-async function running the event loop, rather than just returning the final data.
But if I make my async functions yield instead of return, I get complaints that asyncio.as_completed() and loop.run_until_complete() expects a coroutine or a Future, not an async generator.
So the only way I can get this to work at all is by collecting all the streamed data inside each coroutine, returning all data once the request finishes. Then collect all the coroutine results and finally returning that to the non-async calling function.
Which means I have to keep everything in memory, and wait until the slowest request has completed before I get all my data, which defeats the whole point of streaming http requests.
Is there any way I can accomplish something like this? My current silly implementation looks like this:
def collect_data(urls):
"""Non-async function wishing it was a non-async generator"""
async def stream(async_client, url, payload):
data = []
async with async_client.stream("GET", url=url) as ar:
ar.raise_for_status()
async for line in ar.aiter_lines():
data.append(line)
# would like to yield each line here
return data
async def execute_tasks(urls):
all_data = []
async with httpx.AsyncClient() as async_client:
tasks = [stream(async_client, url) for url in urls]
for coroutine in asyncio.as_completed(tasks):
all_data += await coroutine
# would like to iterate and yield each line here
return all_events
try:
loop = asyncio.get_event_loop()
data = loop.run_until_complete(execute_tasks(urls=urls))
return data
# would like to iterate and yield the data here as it becomes available
finally:
loop.close()
EDIT: I've tried some solutions using asyncio.Queue and trio memory channels as well, but since I can only read from those in an async scope it doesn't get me any closer to a solution
EDIT 2: The reason I want to use this from a non-asyncronous generator is that I want to use it from a Django app using a Django Rest Framework streaming API.
Normally you should just make collect_data async, and use async code throughout - that's how asyncio was designed to be used. But if that's for some reason not feasible, you can iterate an async iterator manually by applying some glue code:
def iter_over_async(ait, loop):
ait = ait.__aiter__()
async def get_next():
try:
obj = await ait.__anext__()
return False, obj
except StopAsyncIteration:
return True, None
while True:
done, obj = loop.run_until_complete(get_next())
if done:
break
yield obj
The way the above works is by providing an async closure that keeps retrieving the values from the async iterator using the __anext__ magic method and returning the objects as they arrive. This async closure is invoked with run_until_complete() in a loop inside an ordinary sync generator. (The closure actually returns a pair of done indicator and actual object in order to avoid propagating StopAsyncIteration through run_until_complete, which might be unsupported.)
With this in place, you can make your execute_tasks an async generator (async def with yield) and iterate over it using:
for chunk in iter_over_async(execute_tasks(urls), loop):
...
Just note that this approach is incompatible with asyncio.run, and might cause problems later down the line.
Just wanting to update #user4815162342's solution to use asyncio.run_coroutine_threadsafe instead of loop.run_until_complete.
import asyncio
from typing import Any, AsyncGenerator
def _iter_over_async(loop: asyncio.AbstractEventLoop, async_generator: AsyncGenerator):
ait = async_generator.__aiter__()
async def get_next() -> tuple[bool, Any]:
try:
obj = await ait.__anext__()
done = False
except StopAsyncIteration:
obj = None
done = True
return done, obj
while True:
done, obj = asyncio.run_coroutine_threadsafe(get_next(), loop).result()
if done:
break
yield obj
I'd also like to add, that I have found tools like this quite helpful in the process of piecewise convert synchronous code to asyncio code.
There is a nice library that does this (and more!) called pypeln:
import pypeln as pl
import asyncio
from random import random
async def slow_add1(x):
await asyncio.sleep(random()) # <= some slow computation
return x + 1
async def slow_gt3(x):
await asyncio.sleep(random()) # <= some slow computation
return x > 3
data = range(10) # [0, 1, 2, ..., 9]
stage = pl.task.map(slow_add1, data, workers=3, maxsize=4)
stage = pl.task.filter(slow_gt3, stage, workers=2)
data = list(stage) # e.g. [5, 6, 9, 4, 8, 10, 7]
I have a tiny event async eventsystem like this:
from collections import defaultdict
from uuid import uuid4
class EventSystem:
def __init__(self):
self.handlers = defaultdict(dict)
def register_handler(self, event, callback, register_id=None):
register_id = register_id or uuid4()
self.handlers[event][register_id] = callback
return register_id
def unregister_handler(self, event, register_id):
del self.handlers[event][register_id]
def clear_handlers(self, event):
handler_register_ids = list(self.handlers[event].keys())
for register_id in handler_register_ids:
self.unregister_handler(event, register_id)
async def fire_event(self, event, data):
handlers = self.handlers[event]
for register_id, callback in handlers.items():
await callback(data)
return len(handlers)
Which currently forces handlers to be async functions.
I cannot decide what is more pythonic, enforcing this policy, and having an async2sync wrapper for sync functions:
async def async2sync(func, *args, **kwargs):
return func(*args, **kwargs)
Or changing fire_event to checking the handler type, using inspect.isawaitable:
async def fire_event(self, event, data):
handlers = self.handlers[event]
for register_id, callback in handlers.items():
ret = callback(data)
if inspect.isawaitable(ret):
await ret
return len(handlers)
I am not worried about long-running or blocking sync functions.
Since the wrapper in your first approach wraps sync functions into async, shouldn't it be called sync2async rather than async2sync?
If long-running or blocking sync functions are not a concern, both approaches are fine. Both have benefits and drabacks. The first approach is a bit more minimalistic and easier to reason about. The second approach is a bit more clever (which can bite you when you least expect), but also much more pleasant to use because you can just write either kind of function for handler and things will "just work". If the user of your API is someone other than yourself, they will probably appreciate it.
TL;DR Either is fine; I'd personally probably go with the second.
I'm having this case where I need to get from the coroutine about which task failed.
I'm using asyncio.wait and when a task fails with an exception, I cannot tell which arguments caused the task to fail.
I tried to read the coro cr_frame but it seems after the coro runs, the cr_frame returns None
I tried other things too like trying to edit the coro Class and trying to put a attribute
coro.__mydata = data but it seems that i cannot add attributes dynamically on the coro (maybe its a limitation on python, don't know)
Here's some code
async def main():
"""
Basically this function send messages to an api
The class takes things like channelID, userID etc
Sometimes channelID, userID are wrong and the task would return an exception
"""
resp = await asyncio.wait(self._queue_send_task)
for r in resp[0]:
try:
sent = r.result()
except:
## Exception because of wrong args, I need the args to act upon them
coro = r.get_coro()
coro.cr_frame ## Returns none, normally this would return a frame if I were to call it before the coro start
There is a tricky post handler, sometimes it can take a lots of time (depending on a input values), sometimes not.
What I want is to write back whenever 1 second passes, dynamically allocating the response.
def post():
def callback():
self.write('too-late')
self.finish()
timeout_obj = IOLoop.current().add_timeout(
dt.timedelta(seconds=1),
callback,
)
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
Turns out I can't do much from within callback.
Even raising an exception is suppressed by the inner context and won't be passed through the post method.
Any other ways to achieve the goal?
Thank you.
UPD 2020-05-15:
I found similar question
Thanks #ionut-ticus, using with_timeout() is much more convenient.
After some tries, I think I came really close to what i'm looking for:
def wait(fn):
#gen.coroutine
#wraps(fn)
def wrap(*args):
try:
result = yield gen.with_timeout(
dt.timedelta(seconds=20),
IOLoop.current().run_in_executor(None, fn, *args),
)
raise gen.Return(result)
except gen.TimeoutError:
logging.error('### TOO LONG')
raise gen.Return('Next time, bro')
return wrap
#wait
def blocking_func(item):
time.sleep(30)
# this is not a Subprocess.
# It is a file IO and DB
return 'we are done here'
Still not sure, should wait() decorator being wrapped in a
coroutine?
Some times in a chain of calls of a blocking_func(), there can
be another ThreadPoolExecutor. I have a concern, would this work
without making "mine" one global, and passing to the
Tornado's run_in_executor()?
Tornado: v5.1.1
An example of usage of tornado.gen.with_timeout. Keep in mind the task needs to be async or else the IOLoop will be blocked and won't be able to process the timeout:
#gen.coroutine
def async_task():
# some async code
#gen.coroutine
def get(self):
delta = datetime.timedelta(seconds=1)
try:
task = self.async_task()
result = yield gen.with_timeout(delta, task)
self.write("success")
except gen.TimeoutError:
self.write("timeout")
I'd advise to use https://github.com/aio-libs/async-timeout:
import asyncio
import async_timeout
def post():
try:
async with async_timeout.timeout(1):
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
except asyncio.TimeoutError:
self.write('too-late')
self.finish()