How to split a long coroutine without using await? - python

I have a coroutine that is getting too big and I'd like to split it for readability.
async def handle_message(self, message):
message_type = message.get('type')
if message_type == 'broadcast':
...
for n in self._neighbors:
await self.send_message(n, message)
elif message_type == 'graph':
...
I'd like to extract the portion which handles broadcast messages into a private method like this:
async def handle_message(self, message):
message_type = message.get('type')
...
if message_type = 'broadcast':
await self._handle_broadcast(message)
elif message_type = 'graph':
...
The problem is that this changes the behavior of the code, since the _handle_broadcast part is a coroutine and its execution might be delayed since I call it with await.
What is the way to ensure that the coroutine runs immediately and isn't delayed?

In short: split the coroutine exactly like you started, by using await.
The problem is that this changes the behavior of the code, since the _handle_broadcast part is a coroutine and its execution might be delayed since I call it with await.
For better or worse, this premise is false. When given a coroutine, await immediately starts executing it without an intermediate delay. It is only if that coroutine calls something that causes it to suspend (such as asyncio.sleep or a network read that doesn't have data yet) that your coroutine gets suspended along with it - which is precisely what you would get had the code stayed inline.
In that sense await <some coroutine> works like the coroutine equivalent of a regular function call, allowing precisely the kind of non-semantics-changing refactoring that you need. This can be demonstrated with an example:
import asyncio
async def heartbeat():
while True:
print('tick')
await asyncio.sleep(1)
async def noop():
pass
async def coro():
# despite "await", this blocks the event loop!
while True:
await noop()
loop = asyncio.get_event_loop()
loop.create_task(heartbeat())
loop.create_task(coro())
loop.run_forever()
The above code blocks the event loop - even though coro does nothing except await in a loop. So await is not a guarantee of yielding to the event loop, the coroutine has to do it with other means. (This behavior can also be a source of bugs.)
In the above case, one can get the event loop "un-stuck" by inserting an await asyncio.sleep(0). But that kind of thing should never be needed in production asyncio code, where the program should be structured so that each coroutine does comparatively little work, and then uses await to obtain more data.

Related

Difference between await Coroutine and await Task

On FastAPI, I have an endpoint that calls get_1 or get_2 coroutine function below.
get_1 uses await redis.get(key)
get_2 uses await asyncio.ensure_future(redis.get(key))
Is there any difference between the 2 functions in terms of functionality and performance?
#redis.py
import asyncio
import aioredis
async def get_1(key):
redis = aioredis.from_url("redis://localhost")
value = await redis.get(key)
return value
async def get_2(key):
redis = aioredis.from_url("redis://localhost")
value = await asyncio.ensure_future(redis.get(key))
return value
First of all, to understand what exactly await does and how task differs from future, I recommend starting with this topic and, of course, official documentation.
As for your question, at first glance, the both expressions await coro() and await create_task(coro()) do the same thing. They start coroutine, wait for it to complete and return the result.
But there are a number of important difference:
The await coro() leads to direct call to the coroutine code without returning execution path to event loop. This issue was explained in this topic.
The await create_task(coro()) leads to wrapping the coroutine in a task, scheduling its execution in the event loop, returning execution path to event
loop and then waiting for the result. In this case, before executing of the target coroutine(scheduled as a task) other already sheduled tasks can be executed.
Usually, await is not used with create_task, to allow a spawned task to run in parallel, but sometimes it is needed, the example in the next paragraph
The await coro() executes the target coroutine within the current context of variables, and the await create_task(coro()) within the copy of the current context (more details in this topic).
Based on the above, most likely you want await coro(), leaving the second expression for more specific cases.

Python asyncio cancel unawaited coroutines

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.
So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()
Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

How can I prevent context switching when calling an async function?

If I use async functions, then all the functions above the stack should also be async, and their call should be preceded by the await keyword. This example emulates modern programs with several architectural layers of the application:
async def func1():
await asyncio.sleep(1)
async def func2():
await func1()
async def func3():
await func2()
async def func4():
await func3()
async def func5():
await func4()
When an execution thread meet 'await', it can switch to another coroutine, which requires resources for context switching. With a large number of competing corutes and different levels of abstraction, these overheads may begin to limit the performance of the entire system. But in the presented example it makes sense to switch the context only in one case, on line:
await asyncio.sleep(1)
How can I ban context switching for certain asynchronous functions?
First of all, by default in your example context wouldn't be switched. In other words, until coroutine faces something actually blocking (like Future) it won't return control to event loop and resume its way directly to an inner coroutine.
I don't know easier way to demonstrate this than inheriting default event loop implementation:
import asyncio
class TestEventLoop(asyncio.SelectorEventLoop):
def _run_once(self):
print('control inside event loop')
super()._run_once()
async def func1():
await asyncio.sleep(1)
async def func2():
print('before func1')
await func1()
print('after func1')
async def main():
print('before func2')
await func2()
print('after func2')
loop = TestEventLoop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.close()
In output you'll see:
control inside event loop
before func2
before func1
control inside event loop
control inside event loop
after func1
after func2
control inside event loop
func2 passed execution flow directly to func1 avoiding event loop's _run_once that could switch to another coroutine. Only when blocking asyncio.sleep was faced, event loop got control.
Although it's a detail of implementation of default event loop.
Second of all, and it's probably much more important, switching between coroutines is extremely cheap comparing to benefit we get from using asyncio to work with I/O.
It's also much cheaper than other async alternatives like switching between OS threads.
Situation when your code is slow because of many coroutines is highly unlikely, but even if it happened you should probably to take a look at more efficient event loop implementations like uvloop.
I would like to point out that if you ever run a sufficiently large number of coroutines that the overhead of switching context becomes an issue, you can ensure reduced concurrency using a Semaphore. I recently received a ~2x performance increase by reducing concurrency from 1000 to 50 for coroutines running HTTP requests.

Is there a way to manually switch on asyncio event loop

I want to use the event loop to monitor any inserting data into my asyncio.Queue(you can find its source code here https://github.com/python/cpython/blob/3.6/Lib/asyncio/queues.py), but I run into some problems. Here is the following code:
import asyncio
import threading
async def recv(q):
while True:
msg = await q.get()
print(msg)
async def checking_task():
while True:
await asyncio.sleep(0.1)
def loop_in_thread(loop,q):
asyncio.set_event_loop(loop)
asyncio.ensure_future(recv(q))
asyncio.ensure_future(insert(q))
# asyncio.ensure_future(checking_task()) comment this out, and it will work as intended
loop.run_forever()
async def insert(q):
print('invoked')
await q.put('hello')
q = asyncio.Queue()
loop = asyncio.get_event_loop()
t = threading.Thread(target=loop_in_thread, args=(loop, q,))
t.start()
The program has started and we can see the following result
invoked
hello
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x000001E215DCFAC8>()]>>}
But now if we manually add data into q by using q.put_nowait('test'), we would get the following result:
q.put_nowait('test') # a non-async way to add data into queue
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future finished result=None>>}
As you can see, the future is already finished, yet we still haven't print out the newly added string 'test'. In other words, msg = await q.get() is still waiting even though the Future related to q.get() is done and there are no other tasks running. This confuses me because in the official documentation(https://docs.python.org/3/library/asyncio-task.html), it says
result = await future or result = yield from future – suspends the coroutine until the future is done, then returns the future’s result
It seemed that even though the Future is done, we still need some sort of await in other async function to make the event loop keep processing tasks.
I found a workaround to this problem, which is adding a checking_task(), and also add that coroutine into the event loop; then it will work as intended.
But adding a checking_task() coroutine is very costly for CPU since it just runs a while loop. I am wondering if there is some manual way for us to trigger that await event without using a async function. For example, something magical like
q.put_nowait('test')
loop.ok_you_can_start_running_other_pending_tasks()
Helps will be greatly appreciated! Thanks.
So I ended up with using
loop.call_soon_threadsafe(q.put_nowait, 'test')
and it will work as intended. After figure this out, I searched some information about . It turned out this post (Scheduling an asyncio coroutine from another thread) has the same problem. And #kfx's answer would also work, which is
loop.call_soon_threadsafe(loop.create_task, q.put('test'))
Notice asyncio.Queue.put() is a coroutine but asyncio.Queue.put_nowait() is a normal function.

multiple nonblocking tasks using asyncio and aiohttp

I am trying to perform several non blocking tasks with asyncio and aiohttp and I don't think the way I am doing it is efficient. I think it would be best to use await instead of yield. can anyone help?
def_init__(self):
self.event_loop = asyncio.get_event_loop()
def run(self):
tasks = [
asyncio.ensure_future(self.subscribe()),
asyncio.ensure_future(self.getServer()),]
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
#asyncio.coroutine
def getServer(self):
server = yield from self.event_loop.create_server(handler, ip, port)
return server
#asyncio.coroutine
def sunbscribe(self):
while True:
yield from asyncio.sleep(10)
self.sendNotification(self.sub.recieve())
def sendNotification(msg):
# send message as a client
I have to listen to a server and subscribe to listen to broadcasts and depending on the broadcasted message POST to a different server.
According to the PEP 492:
await , similarly to yield from , suspends execution of read_data
coroutine until db.fetch awaitable completes and returns the result
data.
It uses the yield from implementation with an extra step of validating
its argument. await only accepts an awaitable , which can be one of:
So I don't see an efficiency problem in your code, as they use the same implementation.
However, I do wonder why you return the server but never use it.
The main design mistake I see in your code is that you use both:
self.event_loop.run_until_complete(asyncio.gather(*tasks))
try:
self.event_loop.run_forever()
From what I can see you just need the run_forever()
Some extra tips:
In my implementations using asyncio I usually make sure that the loop is closed in case of error, or this can cause a massive leak depending on your app type.
try:
loop.run_until_complete(asyncio.gather(*tasks))
finally: # close the loop no matter what or you leak FDs
loop.close()
I also use Uvloop instead of the builtin one, according to benchmarks it's much more efficient.
import uvloop
...
loop = uvloop.new_event_loop()
asyncio.set_event_loop(loop)
Await will not be more efficient than yield from. It may be more pythonic, but
async def foo():
await some_future
and
#asyncio.coroutine
def foo()
yield from some_future
are approximately the same. Certainly in terms of efficiency, they are very close. Await is implemented using logic very similar to yield from. (There's an additional method call to await involved, but that is typically lost in the noise)
In terms of efficiency, removing the explicit sleep and polling in your subscribe method seems like the primary target in this design. Rather than sleeping for a fixed period of time it would be better to get a future that indicates when the receive call will succeed and only running subscribe's task when receive has data.

Categories

Resources