Python asyncio cancel unawaited coroutines - python

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.

So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()

Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

Related

Difference between await Coroutine and await Task

On FastAPI, I have an endpoint that calls get_1 or get_2 coroutine function below.
get_1 uses await redis.get(key)
get_2 uses await asyncio.ensure_future(redis.get(key))
Is there any difference between the 2 functions in terms of functionality and performance?
#redis.py
import asyncio
import aioredis
async def get_1(key):
redis = aioredis.from_url("redis://localhost")
value = await redis.get(key)
return value
async def get_2(key):
redis = aioredis.from_url("redis://localhost")
value = await asyncio.ensure_future(redis.get(key))
return value
First of all, to understand what exactly await does and how task differs from future, I recommend starting with this topic and, of course, official documentation.
As for your question, at first glance, the both expressions await coro() and await create_task(coro()) do the same thing. They start coroutine, wait for it to complete and return the result.
But there are a number of important difference:
The await coro() leads to direct call to the coroutine code without returning execution path to event loop. This issue was explained in this topic.
The await create_task(coro()) leads to wrapping the coroutine in a task, scheduling its execution in the event loop, returning execution path to event
loop and then waiting for the result. In this case, before executing of the target coroutine(scheduled as a task) other already sheduled tasks can be executed.
Usually, await is not used with create_task, to allow a spawned task to run in parallel, but sometimes it is needed, the example in the next paragraph
The await coro() executes the target coroutine within the current context of variables, and the await create_task(coro()) within the copy of the current context (more details in this topic).
Based on the above, most likely you want await coro(), leaving the second expression for more specific cases.

Tasks created with create_task that are never awaited, seem to break expectations of cancelation for child tasks

Imagine we're writing an application which allows a user to run an application (let's say it's a series of important operations against an API) continuously, and can run multiple applications concurrently. Requirements include:
the user can control the number of concurrent applications (which may limit concurrent load against an API, which is often important)
if the OS tries to close the Python program running this thing, it should gracefully terminate, allowing any in-progress applications to complete their run before closing
The question here is specifically about the task manager we've coded, so let's stub out some code that illustrates this problem:
import asyncio
import signal
async def work_chunk():
"""Simulates a chunk of work that can possibly fail"""
await asyncio.sleep(1)
async def protected_work():
"""All steps of this function MUST complete, the caller should shield it from cancelation."""
print("protected_work start")
for i in range(3):
await work_chunk()
print(f"protected_work working... {i+1} out of 3 steps complete")
print("protected_work done... ")
async def subtask():
print("subtask: starting loop of protected work...")
cancelled = False
while not cancelled:
protected_coro = asyncio.create_task(protected_work())
try:
await asyncio.shield(protected_coro)
except asyncio.CancelledError:
cancelled = True
await protected_coro
print("subtask: cancelation complete")
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
await asyncio.sleep(5)
def shutdown(signal, main_task):
"""Cleanup tasks tied to the service's shutdown."""
print(f"Received exit signal {signal.name}. Scheduling cancelation:")
main_task.cancel()
async def main():
print("main... start")
coro = asyncio.ensure_future(subtask_manager())
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, coro))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, coro))
await coro
print("main... done")
def run():
asyncio.run(main())
run()
subtask_manager manages a pool of workers, periodically looking up what the present concurrency requirement is and updating the number of active workers appropriately (note that the code above cuts out most of that, and just hard codes a number, since it isn't important to the question).
subtask is the worker loop itself, which continuously runs protected_work() until someone cancels it.
But this code is broken. When you give it a SIGINT, the whole thing immediately crashes.
Before I explain further, let me point you at a critical bit of code:
1 protected_coro = asyncio.create_task(protected_work())
2 try:
3 await asyncio.shield(protected_coro)
4 except asyncio.CancelledError:
5 cancelled = True
6 await protected_coro # <-- This will raise CancelledError too!
After some debugging, we find that our try/except block isn't working. We find that both line 3 AND line 6 raise CancelledError.
When we dig in further, we find that ALL "await" calls throw CancelledError after the subtask manager is canceled, not just the line noted above. (i.e., the second line of work_chunk(), await asyncio.sleep(1), and the 4th line of protected_work(), await work_chunk(), also raise CancelledError.)
What's going on here?
It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
Why?
Clearly, I don't understand how cancelation propagation works in Python. I've struggled to find documentation on how it works. Can someone describe to me how cancelation is propagated in a clear-minded way that explains the behavior found in the example above?
After looking at this problem for a long time, and experimenting with other code snippets (where cancelation propagation works as expected), I started to wonder if the problem is Python doesn't know the order of propagation here, in this case.
But why?
Well, subtask_manager creates tasks, but doesn't await them.
Could it be that Python doesn't assume that the coroutine that created that task (with create_task) owns that task? I think Python uses the await keyword exclusively to know in what order to propagate cancelation, and if after traversing the whole tree of tasks it finds tasks that still haven't been canceled, it just destroys them all.
Therefore, it's up to us to manage Task cancelation propagation ourselves, in any place where we know we haven't awaited an async task. So, we need to refactor subtask_manager to catch its own cancelation, and explicitly cancel and then await all its child tasks:
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
print("cancelation detected, canceling children")
[t.cancel() for t in tasks.values()]
await asyncio.gather(*[t for t in tasks.values()])
return
Now our code works as expected:
Note: I've answered my own question Q&A style, but I still feel unsatisfied with my textual answer about how cancelation propagation works. If anyone has a better explanation of how cancelation propagation works, I would love to read it.
What's going on here? It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
TL;DR Canceling everything is precisely what's happening, simply because the event loop is exiting.
To investigate this, I changed the invocation of add_signal_handler() to loop.call_later(.5, lambda: shutdown(signal.SIGINT, coro)). Python's Ctrl+C handling has odd corners, and I wanted to check whether the strange behavior is the result of that. But the bug was perfectly reproducible without signals, so it wasn't that.
And yet, asyncio cancellation really shouldn't work like your code shows. Canceling a task propagates to the future (or another task) it awaits, but shield is specifically implemented to circumvent that. It creates and returns a fresh future, and connects the result of the original (shielded) future to the new one in a way that cancel() doesn't know how to follow.
It took me some time to unearth what really happens, and that is:
await coro at the end of main awaits the task that gets cancelled, so it gets a CancelledError as soon as shutdown cancels it;
the exception causes main to exit and enters the cleanup sequence at the end of asyncio.run(). This cleanup sequence cancels all tasks, including the ones you've shielded.
You can test it by changing await coro at the end of main() to:
try:
await coro
finally:
print('main... done')
And you will see that "main... done" is printed prior to all the mysterious cancellations you've been witnessing.
So that clears the mystery and to fix the issue, you should postpone exiting main until everything is done. For example, you can create the tasks dict in main, pass it to subtask_manager(), and then await those critical tasks when the main task gets cancelled:
async def subtask_manager(tasks):
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
for t in tasks.values():
t.cancel()
raise
# ... shutdown unchanged
async def main():
print("main... start")
tasks = {}
main_task = asyncio.ensure_future(subtask_manager(tasks))
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, main_task))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, main_task))
try:
await main_task
except asyncio.CancelledError:
await asyncio.gather(*tasks.values())
finally:
print("main... done")
Note that the main task must explicitly cancel its subtasks because that actually wouldn't happen automatically. Cancellation is propagated through a chain of awaits, and subtask_manager doesn't explicitly awaits its subtasks, it just spawns them and awaits something else, effectively shielding them.

How to cancel all remaining tasks in gather if one fails?

In case one task of gather raises an exception, the others are still allowed to continue.
Well, that's not exactly what I need. I want to distinguish between errors that are fatal and need to cancel all remaining tasks, and errors that are not and instead should be logged while allowing other tasks to continue.
Here is my failed attempt to implement this:
from asyncio import gather, get_event_loop, sleep
class ErrorThatShouldCancelOtherTasks(Exception):
pass
async def my_sleep(secs):
await sleep(secs)
if secs == 5:
raise ErrorThatShouldCancelOtherTasks('5 is forbidden!')
print(f'Slept for {secs}secs.')
async def main():
try:
sleepers = gather(*[my_sleep(secs) for secs in [2, 5, 7]])
await sleepers
except ErrorThatShouldCancelOtherTasks:
print('Fatal error; cancelling')
sleepers.cancel()
finally:
await sleep(5)
get_event_loop().run_until_complete(main())
(the finally await sleep here is to prevent the interpreter from closing immediately, which would on its own cancel all tasks)
Oddly, calling cancel on the gather does not actually cancel it!
PS C:\Users\m> .\AppData\Local\Programs\Python\Python368\python.exe .\wtf.py
Slept for 2secs.
Fatal error; cancelling
Slept for 7secs.
I am very surprised by this behavior since it seems to be contradictory to the documentation, which states:
asyncio.gather(*coros_or_futures, loop=None, return_exceptions=False)
Return a future aggregating results from the given coroutine objects or futures.
(...)
Cancellation: if the outer Future is cancelled, all children (that have not completed yet) are also cancelled. (...)
What am I missing here? How to cancel the remaining tasks?
The problem with your implementation is that it calls sleepers.cancel() after sleepers has already raised. Technically the future returned by gather() is in a completed state, so its cancellation must be no-op.
To correct the code, you just need to cancel the children yourself instead of trusting gather's future to do it. Of course, coroutines are not themselves cancelable, so you need to convert them to tasks first (which gather would do anyway, so you're doing no extra work). For example:
async def main():
tasks = [asyncio.ensure_future(my_sleep(secs))
for secs in [2, 5, 7]]
try:
await asyncio.gather(*tasks)
except ErrorThatShouldCancelOtherTasks:
print('Fatal error; cancelling')
for t in tasks:
t.cancel()
finally:
await sleep(5)
I am very surprised by this behavior since it seems to be contradictory to the documentation[...]
The initial stumbling block with gather is that it doesn't really run tasks, it's just a helper to wait for them to finish. For this reason gather doesn't bother to cancel the remaining tasks if some of them fails with an exception - it just abandons the wait and propagates the exception, leaving the remaining tasks to proceed in the background. This was reported as a bug, but wasn't fixed for backward compatibility and because the behavior is documented and unchanged from the beginning. But here we have another wart: the documentation explicitly promises being able to cancel the returned future. Your code does exactly that and that doesn't work, without it being obvious why (at least it took me a while to figure it out, and required reading the source). It turns out that the contract of Future actually prevents this from working. By the time you call cancel(), the future returned by gather has already completed, and cancelling a completed future is meaningless, it is just no-op. (The reason is that a completed future has a well-defined result that could have been observed by outside code. Cancelling it would change its result, which is not allowed.)
In other words, the documentation is not wrong, because canceling would have worked if you had performed it prior to await sleepers having completed. However, it's misleading, because it appears to allow canceling gather() in this important use case of one of its awaitable raising, but in reality doesn't.
Problems like this that pop up when using gather are reason why many people eagerly await (no pun intended) trio-style nurseries in asyncio.
You can create your own custom gather-function
This cancels all its children when any exception occurs:
import asyncio
async def gather(*tasks, **kwargs):
tasks = [ task if isinstance(task, asyncio.Task) else asyncio.create_task(task)
for task in tasks ]
try:
return await asyncio.gather(*tasks, **kwargs)
except BaseException as e:
for task in tasks:
task.cancel()
raise e
# If a() or b() raises an exception, both are immediately cancelled
a_result, b_result = await gather(a(), b())
What you can do with Python 3.10 (and, probably, earlier versions) is use asyncio.wait. It takes an iterable of awaitables and a condition as to when to return, and when the condition is met, it returns two sets of tasks: completed ones and pending ones. You can have it return on the first exception and then cancel the pending tasks one by one:
async def my_task(x):
try:
...
except RecoverableError as e:
...
tasks = [asyncio.crate_task(my_task(x)) for x in xs]
done, pending = await asyncio.wait(taksk, return_when=asyncio.FIRST_EXCEPTION)
for p in pending:
p.cancel()
And you can wrap your tasks in try-except re-raising the fatal exceptions and processing not-fatal ones otherwise. It's not gather, but it looks like it does what you want.
https://docs.python.org/3/library/asyncio-task.html#id9

How to easily find a coroutine that has timed out?

key problem :asyncio.wait(aws,timeout=1,return_when=FIRST_COMPLETED) Is there a simple way to check if the returned task has timed out?
This is an extended question.
The scene is like this:
Total number of coroutines is unknown
server only allows 10 links
The server will return a seemingly correct result (eg returning an incorrect page)
The server sometimes does not return any data.
Maximum possible access to all data
So in order to get data faster, I need to limit the number of coroutines. Check the returned page. And timeout.
There are two simple methods at present.
1. similar to the thread, use queue to build a coroutine pool + 10 infinite loop coro. I don't really like it. In fact, this method works very fast.
2. I tried to use the high-level API of async python3.7, try to simplify the structure of the program, using while tasks & asyncio.wait & return_when.
Here I came across a problem with how to find timeouts for coroutines.
I built a simple demo:
import asyncio
async def test(delaytime):
print(f"begin {delaytime}")
await asyncio.sleep(delaytime )
print(f"finish {delaytime} ")
async def main():
# the number of tasks is unknow,range(10) is just a demo
allts = list(range(10))
ts = []
while len(ts)<5:
arg = allts.pop()
t = asyncio.create_task(test(arg))
t.arg = arg
ts.append(t)
while ts:
dones,pendings = await asyncio.wait(ts,timeout=2,return_when=asyncio.FIRST_COMPLETED)
for t in dones:
# if check t.result() is error , i can append ts again
print(t.arg,"is done")
ts.remove(t)
while len(ts)<5:
if len(allts):
arg = allts.pop()
t = asyncio.create_task(test(arg))
t.arg = arg
ts.append(t)
else:
break
# for t in pendings:
# # if can check t is timeout , i can append ts again
# pass
if __name__=="__main__":
asyncio.run(main())
After debugging, I know that return_when=asyncio.FIRST_COMPLETED, the tasks returned by asyncio.wait are in the pendings, except for the completed tasks.
However, I can't tell which task is timeout.
I thought about using wait_for, but wait_for has no return_when argument.
Is there a simple way to determine the timeout task in order to re-join ts?
The issue is that the approach of using wait(return_when=FIRST_COMPLETED) is fundamentally incompatible with the use of timeout. Since different tasks have started at different times, a single timeout argument obviously can't apply to all tasks. If you want to use return_when=FIRST_COMPLETED, wrap each task in asyncio.wait_for:
t = asyncio.create_task(asyncio.wait_for(test(arg), 2))
Then, when the task is done, you can use t.exception() to test if it has timed out, in which case it will return asyncio.TimeoutError. This check should only be performed among the done tasks.

Learning asyncio: "coroutine was never awaited" warning error

I am trying to learn to use asyncio in Python to optimize scripts.
My example returns a coroutine was never awaited warning, can you help to understand and find how to solve it?
import time
import datetime
import random
import asyncio
import aiohttp
import requests
def requete_bloquante(num):
print(f'Get {num}')
uid = requests.get("https://httpbin.org/uuid").json()['uuid']
print(f"Res {num}: {uid}")
def faire_toutes_les_requetes():
for x in range(10):
requete_bloquante(x)
print("Bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
async def requete_sans_bloquer(num, session):
print(f'Get {num}')
async with session.get("https://httpbin.org/uuid") as response:
uid = (await response.json()['uuid'])
print(f"Res {num}: {uid}")
async def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
loop.run_until_complete(asyncio.gather(*futures))
loop.close()
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes_sans_bloquer()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
The first classic part of the code runs correctly, but the second half only produces:
synchronicite.py:43: RuntimeWarning: coroutine 'faire_toutes_les_requetes_sans_bloquer' was never awaited
You made faire_toutes_les_requetes_sans_bloquer an awaitable function, a coroutine, by using async def.
When you call an awaitable function, you create a new coroutine object. The code inside the function won't run until you then await on the function or run it as a task:
>>> async def foo():
... print("Running the foo coroutine")
...
>>> foo()
<coroutine object foo at 0x10b186348>
>>> import asyncio
>>> asyncio.run(foo())
Running the foo coroutine
You want to keep that function synchronous, because you don't start the loop until inside that function:
def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
# ...
loop.close()
print("Fin de la boucle !")
However, you are also trying to use a aiophttp.ClientSession() object, and that's an asynchronous context manager, you are expected to use it with async with, not just with, and so has to be run in aside an awaitable task. If you use with instead of async with a TypeError("Use async with instead") exception will be raised.
That all means you need to move the loop.run_until_complete() call out of your faire_toutes_les_requetes_sans_bloquer() function, so you can keep that as the main task to be run; you can call and await on asycio.gather() directly then:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
asyncio.run(faire_toutes_les_requetes_sans_bloquer())
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
I used the new asyncio.run() function (Python 3.7 and up) to run the single main task. This creates a dedicated loop for that top-level coroutine and runs it until complete.
Next, you need to move the closing ) parenthesis on the await resp.json() expression:
uid = (await response.json())['uuid']
You want to access the 'uuid' key on the result of the await, not the coroutine that response.json() produces.
With those changes your code works, but the asyncio version finishes in sub-second time; you may want to print microseconds:
exec_time = (datetime.datetime.now() - start).total_seconds()
print(f"Pour faire 10 requêtes, ça prend {exec_time:.3f}s\n")
On my machine, the synchronous requests code in about 4-5 seconds, and the asycio code completes in under .5 seconds.
Do not use loop.run_until_complete call inside async function. The purpose for that method is to run an async function inside sync context. Anyway here's how you should change the code:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
loop = asyncio.get_event_loop()
loop.run_until_complete(faire_toutes_les_requetes_sans_bloquer())
Note that alone faire_toutes_les_requetes_sans_bloquer() call creates a future that has to be either awaited via explicit await (for that you have to be inside async context) or passed to some event loop. When left alone Python complains about that. In your original code you do none of that.
Not sure if this was the issue for you, but for me the response from the coroutine was another coroutine, so my code started warning me (note not actually crashing) I had creating coroutines that weren't being called. After I actually called them (although I didn't realy use the response the error went away).
Note main code I added was:
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
inspired after I saw:
response: str = await content_from_url[0]
Full code:
"""
-- Notes from [1]
Threading and asyncio both run on a single processor and therefore only run one at a time [1]. It's cooperative concurrency.
Note: threads.py has a very good block with good defintions for io-bound, cpu-bound if you need to recall it.
Note: coroutine is an important definition to understand before proceeding. Definition provided at the end of this tutorial.
General idea for asyncio is that there is a general event loop that controls how and when each tasks gets run.
The event loop is aware of each task and knows what states they are in.
For simplicitly of exponsition assume there are only two states:
a) Ready state
b) Waiting state
a) indicates that a task has work to do and can be run - while b) indicates that a task is waiting for a response from an
external thing (e.g. io, printer, disk, network, coq, etc). This simplified event loop has two lists of tasks
(ready_to_run_lst, waiting_lst) and runs things from the ready to run list. Once a task runs it is in complete control
until it cooperatively hands back control to the event loop.
The way it works is that the task that was ran does what it needs to do (usually an io operation, or an interleaved op
or something like that) but crucially it gives control back to the event loop when the running task (with control) thinks is best.
(Note that this means the task might not have fully completed getting what is "fully needs".
This is probably useful when the user whats to implement the interleaving himself.)
Once the task cooperatively gives back control to the event loop it is placed by the event loop in either the
ready to run list or waiting list (depending how fast the io ran, etc). Then the event loop goes through the waiting
loop to see if anything waiting has "returned".
Once all the tasks have been sorted into the right list the event loop is able to choose what to run next (e.g. by
choosing the one that has been waiting to be ran the longest). This repeats until the event loop code you wrote is done.
The crucial point (and distinction with threads) that we want to emphasizes is that in asyncio, an operation is never
interrupted in the middle and every switching/interleaving is done deliberately by the programmer.
In a way you don't have to worry about making your code thread safe.
For more details see [2], [3].
Asyncio syntax:
i) await = this is where the code you wrote calls an expensive function (e.g. an io) and thus hands back control to the
event loop. Then the event loop will likely put it in the waiting loop and runs some other task. Likely eventually
the event loop comes back to this function and runs the remaining code given that we have the value from the io now.
await = the key word that does (mainly) two things 1) gives control back to the event loop to see if there is something
else to run if we called it on a real expensive io operation (e.g. calling network, printer, etc) 2) gives control to
the new coroutine (code that might give up control copperatively) that it is awaiting. If this is your own code with async
then it means it will go into this new async function (coroutine) you defined.
No real async benefits are being experienced until you call (await) a real io e.g. asyncio.sleep is the typical debug example.
todo: clarify, I think await doesn't actually give control back to the event loop but instead runs the "coroutine" this
await is pointing too. This means that if it's a real IO then it will actually give it back to the event loop
to do something else. In this case it is actually doing something "in parallel" in the async way.
Otherwise, it is your own python coroutine and thus gives it the control but "no true async parallelism" happens.
iii) async = approximately a flag that tells python the defined function might use await. This is not strictly true but
it gives you a simple model while your getting started. todo - clarify async.
async = defines a coroutine. This doesn't define a real io, it only defines a function that can give up and give the
execution power to other coroutines or the (asyncio) event loop.
todo - context manager with async
ii) awaiting = when you call something (e.g. a function) that usually requires waiting for the io response/return/value.
todo: though it seems it's also the python keyword to give control to a coroutine you wrote in python or give
control to the event loop assuming your awaiting an actual io call.
iv) async with = this creates a context manager from an object you would normally await - i.e. an object you would
wait to get the return value from an io. So usually we swap out (switch) from this object.
todo - e.g.
Note: - any function that calls await needs to be marked with async or you’ll get a syntax error otherwise.
- a task never gives up control without intentionally doing so e.g. never in the middle of an op.
Cons: - note how this also requires more thinking carefully (but feels less dangerous than threading due to no pre-emptive
switching) due to the concurrency. Another disadvantage is again the idisocyncracies of using this in python + learning
new syntax and details for it to actually work.
- understanding the semanics of new syntax + learning where to really put the syntax to avoid semantic errors.
- we needed a special asycio compatible lib for requests, since the normal requests is not designed to inform
the event loop that it's block (or done blocking)
- if one of the tasks doesn't cooperate properly then the whole code can be a mess and slow it down.
- not all libraries support the async IO paradigm in python (e.g. asyncio, trio, etc).
Pro: + despite learning where to put await and async might be annoying it forces your to think carefully about your code
which on itself can be an advantage (e.g. better, faster, less bugs due to thinking carefully)
+ often faster...? (skeptical)
1. https://realpython.com/python-concurrency/
2. https://realpython.com/async-io-python/
3. https://stackoverflow.com/a/51116910/6843734
todo - read [2] later (or [3] but thats not a tutorial and its more details so perhaps not a priority).
asynchronous = 1) dictionary def: not happening at the same time
e.g. happening indepedently 2) computing def: happening independently of the main program flow
couroutine = are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed.
So basically it's a routine/"function" that can give up control in "a controlled way" (i.e. not randomly like with threads).
Usually they are associated with a single process -- so it's concurrent but not parallel.
Interesting note: Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.
Likely we have an event loop in this document as an example. I guess yield and operators too are good examples!
Interesting contrast with subroutines: Subroutines are special cases of coroutines.[3] When subroutines are invoked, execution begins at the start,
and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations.
By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine;
from the coroutine's point of view, it is not exiting but calling another coroutine.
Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked.
event loop = event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program.
Appendix:
For I/O-bound problems, there’s a general rule of thumb in the Python community:
“Use asyncio when you can, threading when you must.”
asyncio can provide the best speed up for this type of program, but sometimes you will require critical libraries that
have not been ported to take advantage of asyncio.
Remember that any task that doesn’t give up control to the event loop will block all of the other tasks
-- Notes from [2]
see asyncio_example2.py file.
The sync fil should have taken longer e.g. in one run the async file took:
Downloaded 160 sites in 0.4063692092895508 seconds
While the sync option took:
Downloaded 160 in 3.351937770843506 seconds
"""
import asyncio
from asyncio import Task
from asyncio.events import AbstractEventLoop
import aiohttp
from aiohttp import ClientResponse
from aiohttp.client import ClientSession
from typing import Coroutine
import time
async def download_site(session: ClientSession, url: str) -> str:
async with session.get(url) as response:
print(f"Read {response.content_length} from {url}")
return response.text()
async def download_all_sites(sites: list[str]) -> list[str]:
# async with = this creates a context manager from an object you would normally await - i.e. an object you would wait to get the return value from an io. So usually we swap out (switch) from this object.
async with aiohttp.ClientSession() as session: # we will usually away session.FUNCS
# create all the download code a coroutines/task to be later managed/run by the event loop
tasks: list[Task] = []
for url in sites:
# creates a task from a coroutine todo: basically it seems it creates a callable coroutine? (i.e. function that is able to give up control cooperatively or runs an external io and also thus gives back control cooperatively to the event loop). read more? https://stackoverflow.com/questions/36342899/asyncio-ensure-future-vs-baseeventloop-create-task-vs-simple-coroutine
task: Task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
# runs tasks/coroutines in the event loop and aggrates the results. todo: does this halt until all coroutines have returned? I think so due to the paridgm of how async code works.
content_from_url: list[ClientResponse.text] = await asyncio.gather(*tasks, return_exceptions=True)
assert isinstance(content_from_url[0], Coroutine) # note allresponses are coroutines
print(f'result after aggregating/doing all coroutine tasks/jobs = {content_from_url=}')
# this is needed since the response is in a coroutine object for some reason
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
print(f'result after getting response from coroutines that hold the text = {content_from_url_as_str=}')
return content_from_url_as_str
if __name__ == "__main__":
# - args
num_sites: int = 80
sites: list[str] = ["https://www.jython.org", "http://olympus.realpython.org/dice"] * num_sites
start_time: float = time.time()
# - run the same 160 tasks but without async paradigm, should be slower!
# note: you can't actually do this here because you have the async definitions to your functions.
# to test the synchronous version see the synchronous.py file. Then compare the two run times.
# await download_all_sites(sites)
# download_all_sites(sites)
# - Execute the coroutine coro and return the result.
asyncio.run(download_all_sites(sites))
# - run event loop manager and run all tasks with cooperative concurrency
# asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
# makes explicit the creation of the event loop that manages the coroutines & external ios
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# asyncio.run(download_all_sites(sites))
# making creating the coroutine that hasn't been ran yet with it's args explicit
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# download_all_sites_coroutine: Coroutine = download_all_sites(sites)
# asyncio.run(download_all_sites_coroutine)
# - print stats about the content download and duration
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
print('Success.\a')

Categories

Resources