How to convert this function to async generator

How to convert this function to async generator - python

I am trying to build an async generator, but I couldn't find any resources or figure out how to do it.
I am still getting the same error
TypeError: 'async for' requires an object with aiter method, got
coroutine
I read a not understandable for pep from 2016 and I am really confused.
Basically what I am trying to do is to schedule multiple coroutines and when one of them finishes I yield a value so I can process it immediately after the result comes without waiting for every result.
But I couldn't figure out this so I decided I will assume that coroutines will finish in the order they were created and I still have a lot of problems
I am looking for some solution for either yielding coroutines result in one after another or reacting to the first finished coroutine
Thanks in advance for any tips, resources, examples, and solutions :)
async def get_from_few_pages(
self,
max_pages: int = 0,
):
pages = [
asyncio.ensure_future(
asyncio.get_running_loop().create_task(self.extract_data_from_page(i))
)
for i in range(max_pages)
]
for coroutine in pages:
await asyncio.gather(coroutine)
yield coroutine.result()

To create an async generator, you create an async def with a yield inside, much like your code does. In fact, your code looks like something that should actually work, although imperfectly, so I don't know why you're getting the error you quote.
However, there are issues with your code:
it will always yield the coroutines in the order they are given, not in the order in which they complete - but they will run in parallel.
you don't need both ensure_future() and create_task(), create_task() is sufficient
you don't asyncio.gather() to await a single thing, it's for when you have more than one thing to await in parallel
To get a generator that yields awaitables as they complete, you can use asyncio.wait(return_when=FIRST_COMPLETED), like this:
async def get_from_few_pages(self, max_pages):
pending = [asyncio.create_task(self.extract_data_from_page(i))
for i in range(max_pages)]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for fut in done:
yield fut.result()

Related

Asyncio yielding results from multiple futures as they arrive

I currently have a function, which yields results from an external source, which may arrive over a long time period.
async def run_command(data):
async with httpx.AsyncClient() as client:
url = f"http://hostname.com:8000/"
async with client.stream("POST", url, json=data, timeout=None) as r:
async for line in r.aiter_lines():
yield (json.loads(line), 200)
This currently works fine for one data call, and I can process the result simply like so:
async for result in run_command(data):
yield result
The result is passed back to the initial calling function which streams it to a GUI using Django channels.
async for result, status in self.call_command(dataSet):
await self.channel_layer.group_send(
self.group_name, {
'type': "transmit",
'message': result,
'status': status,
},
)
However, I would now like to call the run_command function concurrently for multiple data sets. I'm trying to do it like so:
for future in asyncio.as_completed([run_command(data) for data in dataSet]):
yield future
I get a TypeError: An asyncio.Future, a coroutine or an awaitable is required.
Replacing the last line with the two below, still gives the same error.
result = await future
yield result
Changing the for loop into async for loop doesn't work either with the error: TypeError: 'async for' requires an object with __aiter__ method, got generator.
So is it possible to yield results as they arrive from multiple futures?

If I understand your requirements correctly, this sounds like a textbook case for a queue producer-consumer setup.
Let me try and abstract this for you. You have some asynchronous generator that you create by supplying some input data. Once it's running, it yields results at irregular intervals.
Now you have multiple sets of data. You want to have multiple generators concurrently crunching away at those and you want to be able to process a result as soon as one is yielded by any of the generators.
The solution I propose is to write two coroutine functions - a queue_producer and a queue_consumer. Both receive the same asyncio.Queue instance as argument.
The producer also receives one single piece of input data. It sets up the aforementioned asynchronous generator and begins iterating through it. As soon as a new item is yielded to it, it puts it into the queue.
The consumer is actually itself an asynchronous generator. It receives a timeout argument in addition to the queue. It starts an infinite loop of awaiting the next item it can get from the queue. Once it gets an item, it yields that item. If the waiting takes longer than the specified timeout, it breaks out of the loop and ends.
To demonstrate this, I will use this very silly and simple asynchronous iterator implementation that simply iterates over characters in a string with random sleep times in between each letter. I'll call it funky_string_iter.
Here is a full working example:
from asyncio import TimeoutError, run
from asyncio.tasks import create_task, gather, sleep, wait_for
from asyncio.queues import Queue
from collections.abc import AsyncIterator
from random import random
async def funky_string_iter(data: str) -> AsyncIterator[str]:
for char in data:
await sleep(random())
yield char
async def queue_producer(queue: Queue[str], data: str) -> None:
async for result in funky_string_iter(data):
await queue.put(result)
async def queue_consumer(queue: Queue[str], timeout: float) -> AsyncIterator[str]:
while True:
try:
yield await wait_for(queue.get(), timeout)
except TimeoutError:
break
async def main() -> None:
data_sets = ["abc", "xyz", "foo"]
q: Queue[str] = Queue()
# Launch the producer tasks:
producers = [
create_task(queue_producer(q, data))
for data in data_sets
]
# Iterate through the consumer until it times out:
async for result in queue_consumer(q, timeout=3):
print(result)
await gather(*producers) # Clean up the producer tasks
if __name__ == '__main__':
run(main())
The output will obviously be more or less randomly ordered, but here is one example output I got:
a
f
o
b
x
y
z
o
c
This demonstrates how the characters are yielded by the queue_consumer as soon as they are available (in the queue), which in turn depends on which queue_producer task yields the next character from its string.
You can transfer this to your specific case by substituting that funky string iterator for your run_command. Naturally, adjust the rest as needed.
As for that error you got from the asyncio.as_completed iteration, it is not entirely clear. But that function does not seem to be the right fit as it is not intended for asynchronous iterators.

Difference between await Coroutine and await Task

On FastAPI, I have an endpoint that calls get_1 or get_2 coroutine function below.
get_1 uses await redis.get(key)
get_2 uses await asyncio.ensure_future(redis.get(key))
Is there any difference between the 2 functions in terms of functionality and performance?
#redis.py
import asyncio
import aioredis
async def get_1(key):
redis = aioredis.from_url("redis://localhost")
value = await redis.get(key)
return value
async def get_2(key):
redis = aioredis.from_url("redis://localhost")
value = await asyncio.ensure_future(redis.get(key))
return value

First of all, to understand what exactly await does and how task differs from future, I recommend starting with this topic and, of course, official documentation.
As for your question, at first glance, the both expressions await coro() and await create_task(coro()) do the same thing. They start coroutine, wait for it to complete and return the result.
But there are a number of important difference:
The await coro() leads to direct call to the coroutine code without returning execution path to event loop. This issue was explained in this topic.
The await create_task(coro()) leads to wrapping the coroutine in a task, scheduling its execution in the event loop, returning execution path to event
loop and then waiting for the result. In this case, before executing of the target coroutine(scheduled as a task) other already sheduled tasks can be executed.
Usually, await is not used with create_task, to allow a spawned task to run in parallel, but sometimes it is needed, the example in the next paragraph
The await coro() executes the target coroutine within the current context of variables, and the await create_task(coro()) within the copy of the current context (more details in this topic).
Based on the above, most likely you want await coro(), leaving the second expression for more specific cases.

Python asyncio cancel unawaited coroutines

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.

So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()

Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

Python asyncio ensure_future decorator

Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?

asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.

Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()

Is there a way to manually switch on asyncio event loop

I want to use the event loop to monitor any inserting data into my asyncio.Queue(you can find its source code here https://github.com/python/cpython/blob/3.6/Lib/asyncio/queues.py), but I run into some problems. Here is the following code:
import asyncio
import threading
async def recv(q):
while True:
msg = await q.get()
print(msg)
async def checking_task():
while True:
await asyncio.sleep(0.1)
def loop_in_thread(loop,q):
asyncio.set_event_loop(loop)
asyncio.ensure_future(recv(q))
asyncio.ensure_future(insert(q))
# asyncio.ensure_future(checking_task()) comment this out, and it will work as intended
loop.run_forever()
async def insert(q):
print('invoked')
await q.put('hello')
q = asyncio.Queue()
loop = asyncio.get_event_loop()
t = threading.Thread(target=loop_in_thread, args=(loop, q,))
t.start()
The program has started and we can see the following result
invoked
hello
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x000001E215DCFAC8>()]>>}
But now if we manually add data into q by using q.put_nowait('test'), we would get the following result:
q.put_nowait('test') # a non-async way to add data into queue
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future finished result=None>>}
As you can see, the future is already finished, yet we still haven't print out the newly added string 'test'. In other words, msg = await q.get() is still waiting even though the Future related to q.get() is done and there are no other tasks running. This confuses me because in the official documentation(https://docs.python.org/3/library/asyncio-task.html), it says
result = await future or result = yield from future – suspends the coroutine until the future is done, then returns the future’s result
It seemed that even though the Future is done, we still need some sort of await in other async function to make the event loop keep processing tasks.
I found a workaround to this problem, which is adding a checking_task(), and also add that coroutine into the event loop; then it will work as intended.
But adding a checking_task() coroutine is very costly for CPU since it just runs a while loop. I am wondering if there is some manual way for us to trigger that await event without using a async function. For example, something magical like
q.put_nowait('test')
loop.ok_you_can_start_running_other_pending_tasks()
Helps will be greatly appreciated! Thanks.

So I ended up with using
loop.call_soon_threadsafe(q.put_nowait, 'test')
and it will work as intended. After figure this out, I searched some information about . It turned out this post (Scheduling an asyncio coroutine from another thread) has the same problem. And #kfx's answer would also work, which is
loop.call_soon_threadsafe(loop.create_task, q.put('test'))
Notice asyncio.Queue.put() is a coroutine but asyncio.Queue.put_nowait() is a normal function.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert this function to async generator - python

Related

Asyncio yielding results from multiple futures as they arrive

Difference between await Coroutine and await Task

Python asyncio cancel unawaited coroutines

Python asyncio ensure_future decorator

Is there a way to manually switch on asyncio event loop

Categories

Resources