Asyncio yielding results from multiple futures as they arrive

Asyncio yielding results from multiple futures as they arrive - python

I currently have a function, which yields results from an external source, which may arrive over a long time period.
async def run_command(data):
async with httpx.AsyncClient() as client:
url = f"http://hostname.com:8000/"
async with client.stream("POST", url, json=data, timeout=None) as r:
async for line in r.aiter_lines():
yield (json.loads(line), 200)
This currently works fine for one data call, and I can process the result simply like so:
async for result in run_command(data):
yield result
The result is passed back to the initial calling function which streams it to a GUI using Django channels.
async for result, status in self.call_command(dataSet):
await self.channel_layer.group_send(
self.group_name, {
'type': "transmit",
'message': result,
'status': status,
},
)
However, I would now like to call the run_command function concurrently for multiple data sets. I'm trying to do it like so:
for future in asyncio.as_completed([run_command(data) for data in dataSet]):
yield future
I get a TypeError: An asyncio.Future, a coroutine or an awaitable is required.
Replacing the last line with the two below, still gives the same error.
result = await future
yield result
Changing the for loop into async for loop doesn't work either with the error: TypeError: 'async for' requires an object with __aiter__ method, got generator.
So is it possible to yield results as they arrive from multiple futures?

If I understand your requirements correctly, this sounds like a textbook case for a queue producer-consumer setup.
Let me try and abstract this for you. You have some asynchronous generator that you create by supplying some input data. Once it's running, it yields results at irregular intervals.
Now you have multiple sets of data. You want to have multiple generators concurrently crunching away at those and you want to be able to process a result as soon as one is yielded by any of the generators.
The solution I propose is to write two coroutine functions - a queue_producer and a queue_consumer. Both receive the same asyncio.Queue instance as argument.
The producer also receives one single piece of input data. It sets up the aforementioned asynchronous generator and begins iterating through it. As soon as a new item is yielded to it, it puts it into the queue.
The consumer is actually itself an asynchronous generator. It receives a timeout argument in addition to the queue. It starts an infinite loop of awaiting the next item it can get from the queue. Once it gets an item, it yields that item. If the waiting takes longer than the specified timeout, it breaks out of the loop and ends.
To demonstrate this, I will use this very silly and simple asynchronous iterator implementation that simply iterates over characters in a string with random sleep times in between each letter. I'll call it funky_string_iter.
Here is a full working example:
from asyncio import TimeoutError, run
from asyncio.tasks import create_task, gather, sleep, wait_for
from asyncio.queues import Queue
from collections.abc import AsyncIterator
from random import random
async def funky_string_iter(data: str) -> AsyncIterator[str]:
for char in data:
await sleep(random())
yield char
async def queue_producer(queue: Queue[str], data: str) -> None:
async for result in funky_string_iter(data):
await queue.put(result)
async def queue_consumer(queue: Queue[str], timeout: float) -> AsyncIterator[str]:
while True:
try:
yield await wait_for(queue.get(), timeout)
except TimeoutError:
break
async def main() -> None:
data_sets = ["abc", "xyz", "foo"]
q: Queue[str] = Queue()
# Launch the producer tasks:
producers = [
create_task(queue_producer(q, data))
for data in data_sets
]
# Iterate through the consumer until it times out:
async for result in queue_consumer(q, timeout=3):
print(result)
await gather(*producers) # Clean up the producer tasks
if __name__ == '__main__':
run(main())
The output will obviously be more or less randomly ordered, but here is one example output I got:
a
f
o
b
x
y
z
o
c
This demonstrates how the characters are yielded by the queue_consumer as soon as they are available (in the queue), which in turn depends on which queue_producer task yields the next character from its string.
You can transfer this to your specific case by substituting that funky string iterator for your run_command. Naturally, adjust the rest as needed.
As for that error you got from the asyncio.as_completed iteration, it is not entirely clear. But that function does not seem to be the right fit as it is not intended for asynchronous iterators.

Related

Simplify nested asyncio operations for string modification function

I've an async code that looks like this:
There's a third-party function that performs some operations on the string and returns a modified string, for the purpose of this question, it's something like non_async_func.
I've an async def async_func_single function that wraps around the non_async_func that performs a single operation.
Then another async def async_func_batch function that nested-wraps around async_func_single to perform the function for a batch of data.
The code kind of works but I would like to know more about why/how, my questions are
Is it necessary to create the async_func_single and have async_func_batch wrap around it?
Can I directly just feed in a batch of data in async_func_batch to call non_async_func?
I have a per_chunk function that feeds in the data in batches, is there any asyncio operations/functions that can avoid the use of pre-batching the data I want to send to async_func_batch?
import nest_asyncio
nest_asyncio.apply()
import asyncio
from itertools import zip_longest
from loremipsum import get_sentences
def per_chunk(iterable, n=1, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def non_async_func(text):
return text[::-1]
async def async_func_single(text):
# Perform some string operation.
return non_async_func(text)
async def async_func_batch(batch):
tasks = [async_func_single(text) for text in batch]
return await asyncio.gather(*tasks)
# Create some random inputs
thousand_texts = get_sentences(1000)
# Loop through 20 sentence at a time.
for batch in per_chunk(thousand_texts, n=20):
loop = asyncio.get_event_loop()
results = loop.run_until_complete(async_func_batch(batch))
for i, o in zip(thousand_texts, results):
print(i, o)

Note that marking your functions as "async def", rather than "def" doesn't make them automatically asynchronous - you can have "async def" functions that are synchronous. The difference between asynchronous functions and synchronous ones is that asynchronous functions define places (using "await") where it waits on either another asynchronous function or waits on an asynchronous IO operation.
Also note that asyncio is not magic - it is basically a scheduler that schedules asynchronous functions to be run based on whether the function/operation that is being "awaited" has completed. And, as the scheduler and the asynchronous functions all run on a single thread, then at any given moment, only a single asynchronous function can be running.
So, going back to your code, the only thing your "async_func_single" function is doing is calling an synchronous function, therefore, despite being marked as "async def", it is still a synchronous function. And the same logic applies to the "async_func_batch" function - the "async_func_single" tasks passed to "asyncio.gather" are all synchronous, so the "asyncio.gather" is just running each task synchronously (so it is not offering up any benefits over a simple for loop waiting on each task), so again the "async_func_batch" is a synchronous function. Because you are just calling synchronous functions, then asyncio is not offering any benefits to your program.
If you want multiple synchronous functions that all run at the same time, you don't use asynchronous functions. You need to run them in parallel processes/threads:
import sys
import itertools
import concurrent.futures
from loremipsum import get_sentences
executor = concurrent.futures.ProcessPoolExecutor(workers=sys.cpu_count())
def per_chunk(iterable, n=1):
while True:
chunk = tuple(itertools.islice(iterable, n))
if chunk:
yield chunk
else:
break
def non_async_func(text):
return text[::-1]
def process_batches(batches):
futures = [
executor.submit(non_async_func, batch)
for batch in batches
]
concurrent.futures.wait(futures)
thousand_texts = get_sentences(1000)
process_batches(per_chunk(thousand_texts, n=20))
If you still want to use an asynchronous function to process the batches, then asyncio provides asynchronous wrappers around the concurrent futures:
async def process_batches(batches):
event_loop = asyncio.get_running_loop()
futures = [
event_loop.run_in_executor(executor, non_async_func, batch)
for batch in batches
]
await asyncio.wait(futures)
thousand_texts = get_sentences(1000)
asyncio.run(process_batches(per_chunk(thousand_texts, n=20)))
but it gives no advantages unless you have other asynchronous functions that can be run while it is waiting.

I have tried to answer your questions below.
The code kind of works but I would like to know more about why/how, my questions are
Is it necessary to create the async_func_single and have
async_func_batch wrap around it?
No, this is absolutely not necessary.
Can I directly just feed in a batch of data in async_func_batch to
call non_async_func?
You could do something like the example 1 below, where you feed all the data directly.
I have a per_chunk function that feeds in the data in batches, is
there any asyncio operations/functions that can avoid the use of
pre-batching the data I want to send to async_func_batch?
It's possible to use Asyncio Queues with a max size and then process data until the queue is empty and fill it up again. Check out example 2.
Example 1
import asyncio
from concurrent.futures import ThreadPoolExecutor
from loremipsum import get_sentences
def non_async_func(text):
return text[::-1]
async def async_func_batch(batch):
with ThreadPoolExecutor(max_workers=20) as executor:
futures = [loop.run_in_executor(executor, non_async_func, text) for text in batch]
return(await asyncio.gather(*futures))
# Create some random inputs
thousand_texts = get_sentences(1000)
# Loop through 20 sentence at a time.
loop = asyncio.get_event_loop()
results = loop.run_until_complete(async_func_batch(thousand_texts))
for i, o in zip(thousand_texts, results):
print(i, o)
Example 2
Queues can be infinite in size. If you do not specify maxsize it will Queue up all elements before processing. If you remove maxsize, then you need to move join outside of the for-loop and remove if taskQueue.full():.
from loremipsum import get_sentences
import asyncio
async def async_func(text, taskQueue, resultsQueue):
await resultsQueue.put(text[::-1]) # add the result to the resultsQueue
taskQueue.task_done() # Tell the taskQueue that the task is finished
taskQueue.get_nowait() # Don't wait for it (unblocking)
async def main():
taskQueue = asyncio.Queue(maxsize=20)
resultsQueue = asyncio.Queue()
thousand_texts = get_sentences(1000)
results = []
for text in thousand_texts:
await taskQueue.put(asyncio.create_task(async_func(text, taskQueue, resultsQueue)))
if taskQueue.full(): # If maxsize is reached
await taskQueue.join() # Will block until finished
while not resultsQueue.empty():
results.append(await resultsQueue.get())
for i, o in zip(thousand_texts, results):
print(i, o)
if __name__ == "__main__":
asyncio.run(main())

Python asyncio cancel unawaited coroutines

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.

So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()

Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

How to convert this function to async generator

I am trying to build an async generator, but I couldn't find any resources or figure out how to do it.
I am still getting the same error
TypeError: 'async for' requires an object with aiter method, got
coroutine
I read a not understandable for pep from 2016 and I am really confused.
Basically what I am trying to do is to schedule multiple coroutines and when one of them finishes I yield a value so I can process it immediately after the result comes without waiting for every result.
But I couldn't figure out this so I decided I will assume that coroutines will finish in the order they were created and I still have a lot of problems
I am looking for some solution for either yielding coroutines result in one after another or reacting to the first finished coroutine
Thanks in advance for any tips, resources, examples, and solutions :)
async def get_from_few_pages(
self,
max_pages: int = 0,
):
pages = [
asyncio.ensure_future(
asyncio.get_running_loop().create_task(self.extract_data_from_page(i))
)
for i in range(max_pages)
]
for coroutine in pages:
await asyncio.gather(coroutine)
yield coroutine.result()

To create an async generator, you create an async def with a yield inside, much like your code does. In fact, your code looks like something that should actually work, although imperfectly, so I don't know why you're getting the error you quote.
However, there are issues with your code:
it will always yield the coroutines in the order they are given, not in the order in which they complete - but they will run in parallel.
you don't need both ensure_future() and create_task(), create_task() is sufficient
you don't asyncio.gather() to await a single thing, it's for when you have more than one thing to await in parallel
To get a generator that yields awaitables as they complete, you can use asyncio.wait(return_when=FIRST_COMPLETED), like this:
async def get_from_few_pages(self, max_pages):
pending = [asyncio.create_task(self.extract_data_from_page(i))
for i in range(max_pages)]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for fut in done:
yield fut.result()

Learning asyncio: "coroutine was never awaited" warning error

I am trying to learn to use asyncio in Python to optimize scripts.
My example returns a coroutine was never awaited warning, can you help to understand and find how to solve it?
import time
import datetime
import random
import asyncio
import aiohttp
import requests
def requete_bloquante(num):
print(f'Get {num}')
uid = requests.get("https://httpbin.org/uuid").json()['uuid']
print(f"Res {num}: {uid}")
def faire_toutes_les_requetes():
for x in range(10):
requete_bloquante(x)
print("Bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
async def requete_sans_bloquer(num, session):
print(f'Get {num}')
async with session.get("https://httpbin.org/uuid") as response:
uid = (await response.json()['uuid'])
print(f"Res {num}: {uid}")
async def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
loop.run_until_complete(asyncio.gather(*futures))
loop.close()
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
faire_toutes_les_requetes_sans_bloquer()
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
The first classic part of the code runs correctly, but the second half only produces:
synchronicite.py:43: RuntimeWarning: coroutine 'faire_toutes_les_requetes_sans_bloquer' was never awaited

You made faire_toutes_les_requetes_sans_bloquer an awaitable function, a coroutine, by using async def.
When you call an awaitable function, you create a new coroutine object. The code inside the function won't run until you then await on the function or run it as a task:
>>> async def foo():
... print("Running the foo coroutine")
...
>>> foo()
<coroutine object foo at 0x10b186348>
>>> import asyncio
>>> asyncio.run(foo())
Running the foo coroutine
You want to keep that function synchronous, because you don't start the loop until inside that function:
def faire_toutes_les_requetes_sans_bloquer():
loop = asyncio.get_event_loop()
# ...
loop.close()
print("Fin de la boucle !")
However, you are also trying to use a aiophttp.ClientSession() object, and that's an asynchronous context manager, you are expected to use it with async with, not just with, and so has to be run in aside an awaitable task. If you use with instead of async with a TypeError("Use async with instead") exception will be raised.
That all means you need to move the loop.run_until_complete() call out of your faire_toutes_les_requetes_sans_bloquer() function, so you can keep that as the main task to be run; you can call and await on asycio.gather() directly then:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
print("Non bloquant : ")
start = datetime.datetime.now()
asyncio.run(faire_toutes_les_requetes_sans_bloquer())
exec_time = (datetime.datetime.now() - start).seconds
print(f"Pour faire 10 requêtes, ça prend {exec_time}s\n")
I used the new asyncio.run() function (Python 3.7 and up) to run the single main task. This creates a dedicated loop for that top-level coroutine and runs it until complete.
Next, you need to move the closing ) parenthesis on the await resp.json() expression:
uid = (await response.json())['uuid']
You want to access the 'uuid' key on the result of the await, not the coroutine that response.json() produces.
With those changes your code works, but the asyncio version finishes in sub-second time; you may want to print microseconds:
exec_time = (datetime.datetime.now() - start).total_seconds()
print(f"Pour faire 10 requêtes, ça prend {exec_time:.3f}s\n")
On my machine, the synchronous requests code in about 4-5 seconds, and the asycio code completes in under .5 seconds.

Do not use loop.run_until_complete call inside async function. The purpose for that method is to run an async function inside sync context. Anyway here's how you should change the code:
async def faire_toutes_les_requetes_sans_bloquer():
async with aiohttp.ClientSession() as session:
futures = [requete_sans_bloquer(x, session) for x in range(10)]
await asyncio.gather(*futures)
print("Fin de la boucle !")
loop = asyncio.get_event_loop()
loop.run_until_complete(faire_toutes_les_requetes_sans_bloquer())
Note that alone faire_toutes_les_requetes_sans_bloquer() call creates a future that has to be either awaited via explicit await (for that you have to be inside async context) or passed to some event loop. When left alone Python complains about that. In your original code you do none of that.

Not sure if this was the issue for you, but for me the response from the coroutine was another coroutine, so my code started warning me (note not actually crashing) I had creating coroutines that weren't being called. After I actually called them (although I didn't realy use the response the error went away).
Note main code I added was:
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
inspired after I saw:
response: str = await content_from_url[0]
Full code:
"""
-- Notes from [1]
Threading and asyncio both run on a single processor and therefore only run one at a time [1]. It's cooperative concurrency.
Note: threads.py has a very good block with good defintions for io-bound, cpu-bound if you need to recall it.
Note: coroutine is an important definition to understand before proceeding. Definition provided at the end of this tutorial.
General idea for asyncio is that there is a general event loop that controls how and when each tasks gets run.
The event loop is aware of each task and knows what states they are in.
For simplicitly of exponsition assume there are only two states:
a) Ready state
b) Waiting state
a) indicates that a task has work to do and can be run - while b) indicates that a task is waiting for a response from an
external thing (e.g. io, printer, disk, network, coq, etc). This simplified event loop has two lists of tasks
(ready_to_run_lst, waiting_lst) and runs things from the ready to run list. Once a task runs it is in complete control
until it cooperatively hands back control to the event loop.
The way it works is that the task that was ran does what it needs to do (usually an io operation, or an interleaved op
or something like that) but crucially it gives control back to the event loop when the running task (with control) thinks is best.
(Note that this means the task might not have fully completed getting what is "fully needs".
This is probably useful when the user whats to implement the interleaving himself.)
Once the task cooperatively gives back control to the event loop it is placed by the event loop in either the
ready to run list or waiting list (depending how fast the io ran, etc). Then the event loop goes through the waiting
loop to see if anything waiting has "returned".
Once all the tasks have been sorted into the right list the event loop is able to choose what to run next (e.g. by
choosing the one that has been waiting to be ran the longest). This repeats until the event loop code you wrote is done.
The crucial point (and distinction with threads) that we want to emphasizes is that in asyncio, an operation is never
interrupted in the middle and every switching/interleaving is done deliberately by the programmer.
In a way you don't have to worry about making your code thread safe.
For more details see [2], [3].
Asyncio syntax:
i) await = this is where the code you wrote calls an expensive function (e.g. an io) and thus hands back control to the
event loop. Then the event loop will likely put it in the waiting loop and runs some other task. Likely eventually
the event loop comes back to this function and runs the remaining code given that we have the value from the io now.
await = the key word that does (mainly) two things 1) gives control back to the event loop to see if there is something
else to run if we called it on a real expensive io operation (e.g. calling network, printer, etc) 2) gives control to
the new coroutine (code that might give up control copperatively) that it is awaiting. If this is your own code with async
then it means it will go into this new async function (coroutine) you defined.
No real async benefits are being experienced until you call (await) a real io e.g. asyncio.sleep is the typical debug example.
todo: clarify, I think await doesn't actually give control back to the event loop but instead runs the "coroutine" this
await is pointing too. This means that if it's a real IO then it will actually give it back to the event loop
to do something else. In this case it is actually doing something "in parallel" in the async way.
Otherwise, it is your own python coroutine and thus gives it the control but "no true async parallelism" happens.
iii) async = approximately a flag that tells python the defined function might use await. This is not strictly true but
it gives you a simple model while your getting started. todo - clarify async.
async = defines a coroutine. This doesn't define a real io, it only defines a function that can give up and give the
execution power to other coroutines or the (asyncio) event loop.
todo - context manager with async
ii) awaiting = when you call something (e.g. a function) that usually requires waiting for the io response/return/value.
todo: though it seems it's also the python keyword to give control to a coroutine you wrote in python or give
control to the event loop assuming your awaiting an actual io call.
iv) async with = this creates a context manager from an object you would normally await - i.e. an object you would
wait to get the return value from an io. So usually we swap out (switch) from this object.
todo - e.g.
Note: - any function that calls await needs to be marked with async or you’ll get a syntax error otherwise.
- a task never gives up control without intentionally doing so e.g. never in the middle of an op.
Cons: - note how this also requires more thinking carefully (but feels less dangerous than threading due to no pre-emptive
switching) due to the concurrency. Another disadvantage is again the idisocyncracies of using this in python + learning
new syntax and details for it to actually work.
- understanding the semanics of new syntax + learning where to really put the syntax to avoid semantic errors.
- we needed a special asycio compatible lib for requests, since the normal requests is not designed to inform
the event loop that it's block (or done blocking)
- if one of the tasks doesn't cooperate properly then the whole code can be a mess and slow it down.
- not all libraries support the async IO paradigm in python (e.g. asyncio, trio, etc).
Pro: + despite learning where to put await and async might be annoying it forces your to think carefully about your code
which on itself can be an advantage (e.g. better, faster, less bugs due to thinking carefully)
+ often faster...? (skeptical)
1. https://realpython.com/python-concurrency/
2. https://realpython.com/async-io-python/
3. https://stackoverflow.com/a/51116910/6843734
todo - read [2] later (or [3] but thats not a tutorial and its more details so perhaps not a priority).
asynchronous = 1) dictionary def: not happening at the same time
e.g. happening indepedently 2) computing def: happening independently of the main program flow
couroutine = are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed.
So basically it's a routine/"function" that can give up control in "a controlled way" (i.e. not randomly like with threads).
Usually they are associated with a single process -- so it's concurrent but not parallel.
Interesting note: Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.
Likely we have an event loop in this document as an example. I guess yield and operators too are good examples!
Interesting contrast with subroutines: Subroutines are special cases of coroutines.[3] When subroutines are invoked, execution begins at the start,
and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations.
By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine;
from the coroutine's point of view, it is not exiting but calling another coroutine.
Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, whereas threads are typically preemptively multitasked.
event loop = event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program.
Appendix:
For I/O-bound problems, there’s a general rule of thumb in the Python community:
“Use asyncio when you can, threading when you must.”
asyncio can provide the best speed up for this type of program, but sometimes you will require critical libraries that
have not been ported to take advantage of asyncio.
Remember that any task that doesn’t give up control to the event loop will block all of the other tasks
-- Notes from [2]
see asyncio_example2.py file.
The sync fil should have taken longer e.g. in one run the async file took:
Downloaded 160 sites in 0.4063692092895508 seconds
While the sync option took:
Downloaded 160 in 3.351937770843506 seconds
"""
import asyncio
from asyncio import Task
from asyncio.events import AbstractEventLoop
import aiohttp
from aiohttp import ClientResponse
from aiohttp.client import ClientSession
from typing import Coroutine
import time
async def download_site(session: ClientSession, url: str) -> str:
async with session.get(url) as response:
print(f"Read {response.content_length} from {url}")
return response.text()
async def download_all_sites(sites: list[str]) -> list[str]:
# async with = this creates a context manager from an object you would normally await - i.e. an object you would wait to get the return value from an io. So usually we swap out (switch) from this object.
async with aiohttp.ClientSession() as session: # we will usually away session.FUNCS
# create all the download code a coroutines/task to be later managed/run by the event loop
tasks: list[Task] = []
for url in sites:
# creates a task from a coroutine todo: basically it seems it creates a callable coroutine? (i.e. function that is able to give up control cooperatively or runs an external io and also thus gives back control cooperatively to the event loop). read more? https://stackoverflow.com/questions/36342899/asyncio-ensure-future-vs-baseeventloop-create-task-vs-simple-coroutine
task: Task = asyncio.ensure_future(download_site(session, url))
tasks.append(task)
# runs tasks/coroutines in the event loop and aggrates the results. todo: does this halt until all coroutines have returned? I think so due to the paridgm of how async code works.
content_from_url: list[ClientResponse.text] = await asyncio.gather(*tasks, return_exceptions=True)
assert isinstance(content_from_url[0], Coroutine) # note allresponses are coroutines
print(f'result after aggregating/doing all coroutine tasks/jobs = {content_from_url=}')
# this is needed since the response is in a coroutine object for some reason
content_from_url_as_str: list[str] = await asyncio.gather(*content_from_url, return_exceptions=True)
print(f'result after getting response from coroutines that hold the text = {content_from_url_as_str=}')
return content_from_url_as_str
if __name__ == "__main__":
# - args
num_sites: int = 80
sites: list[str] = ["https://www.jython.org", "http://olympus.realpython.org/dice"] * num_sites
start_time: float = time.time()
# - run the same 160 tasks but without async paradigm, should be slower!
# note: you can't actually do this here because you have the async definitions to your functions.
# to test the synchronous version see the synchronous.py file. Then compare the two run times.
# await download_all_sites(sites)
# download_all_sites(sites)
# - Execute the coroutine coro and return the result.
asyncio.run(download_all_sites(sites))
# - run event loop manager and run all tasks with cooperative concurrency
# asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
# makes explicit the creation of the event loop that manages the coroutines & external ios
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# asyncio.run(download_all_sites(sites))
# making creating the coroutine that hasn't been ran yet with it's args explicit
# event_loop: AbstractEventLoop = asyncio.get_event_loop()
# download_all_sites_coroutine: Coroutine = download_all_sites(sites)
# asyncio.run(download_all_sites_coroutine)
# - print stats about the content download and duration
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
print('Success.\a')

Iterating over asyncio coroutines/Tasks as soon as they are ready

I launch a bunch of requests using aiohttp. Is there a way to get the results one-by-one as soon as each request is complete?
Perhaps using something like async for? Or Python 3.6 async generators?
Currently I await asyncio.gather(*requests) and process them when all of them are completed.

asyncio has as_completed function that probably does what you need. Note, it returns regular iterator, not async.
Here's example of usage:
import asyncio
async def test(i):
await asyncio.sleep(i)
return i
async def main():
fs = [
test(1),
test(2),
test(3),
]
for f in asyncio.as_completed(fs):
i = await f # Await for next result.
print(i, 'done')
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
Output:
1 done
2 done
3 done

Canonical way is pushing result into asyncio.Queue like in crawler example.
Also it's wise to run limited amount for download tasks which get new job from input queue instead of spawning a million of new tasks.

As I understand according to the docs, requests are Futures (or can be easily converted to Future using asyncio.ensure_future).
A Future object has a method .add_done_callback.
So, you can add your callback for every request, and then do gather.
Docs for Future.add_done_callback

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Asyncio yielding results from multiple futures as they arrive - python

Related

Simplify nested asyncio operations for string modification function

Python asyncio cancel unawaited coroutines

How to convert this function to async generator

Learning asyncio: "coroutine was never awaited" warning error

Iterating over asyncio coroutines/Tasks as soon as they are ready

Categories

Resources