How to use `async for` in Python? - python

I mean what do I get from using async for. Here is the code I write with async for, AIter(10) could be replaced with get_range().
But the code runs like sync not async.
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
The result I excepted should be :
start 1
start 2
start 3
...
end 1
end 2
...
finally 1
finally 2
...
However, the real result is:
start 0
end 0
finally 0
start 1
end 1
finally 1
start 2
end 2
I know I could get the excepted result by using asyncio.gather or asyncio.wait.
But it is hard for me to understand what I got by use async for here instead of simple for.
What is the right way to use async for if I want to loop over several Feature object and use them as soon as one is finished. For example:
async for f in feature_objects:
data = await f
with open("file", "w") as fi:
fi.write()

But it is hard for me to understand what I got by use async for here instead of simple for.
The underlying misunderstanding is expecting async for to automatically parallelize the iteration. It doesn't do that, it simply allows sequential iteration over an async source. For example, you can use async for to iterate over lines coming from a TCP stream, messages from a websocket, or database records from an async DB driver.
None of the above would work with an ordinary for, at least not without blocking the event loop. This is because for calls __next__ as a blocking function and doesn't await its result. You cannot manually await elements obtained by for because for expects __next__ to signal the end of iteration by raising StopIteration. If __next__ is a coroutine, the StopIteration exception won't be visible before awaiting it. This is why async for was introduced, not just in Python, but also in other languages with async/await and generalized for.
If you want to run the loop iterations in parallel, you need to start them as parallel coroutines and use asyncio.as_completed or equivalent to retrieve their results as they come:
async def x(i):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
return i
# run x(0)..x(10) concurrently and process results as they arrive
for f in asyncio.as_completed([x(i) for i in range(10)]):
result = await f
# ... do something with the result ...
If you don't care about reacting to results immediately as they arrive, but you need them all, you can make it even simpler by using asyncio.gather:
# run x(0)..x(10) concurrently and process results when all are done
results = await asyncio.gather(*[x(i) for i in range(10)])

(Adding on the accepted answer - for Charlie's bounty).
Assuming you want to consume each yielded value concurrently, a straightforward way would be:
import asyncio
async def process_all():
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj))
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
...
Explanation:
Consider the following code, without create_task:
async def process_all():
async for obj in my_async_generator:
await process_obj(obj))
This is roughly equivalent to:
async def process_all():
obj1 = await my_async_generator.__anext__():
await process_obj(obj1))
obj2 = await my_async_generator.__anext__():
await process_obj(obj1))
...
Basically, the loop cannot continue because its body is blocking. The way to go is to delegate the processing of each iteration to a new asyncio task which will start without blocking the loop. The, gather wait for all of the tasks - which means, for every iteration to be processed.

Code based on fantastic answer from #matan129, just missing the async generator to make it runnable, once I have that (or if someone wants to contributed it) will finilize this:
import time
import asyncio
async def process_all():
"""
Example where the async for loop allows to loop through concurrently many things without blocking on each individual
iteration but blocks (waits) for all tasks to run.
ref:
- https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python/72758067#72758067
"""
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj)) # concurrently dispatches a coroutine to be executed.
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
await asyncio.sleep(5) # expensive IO
if __name__ == '__main__':
# - test asyncio
s = time.perf_counter()
asyncio.run(process_all())
# - print stats
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
print('Success, done!\a')

Related

Running blocking function (f.e. requests) concurrently but asynchronous with Python

There is a function that blocks event loop (f.e. that function makes an API request). I need to make continuous stream of requests which will run in parallel but not synchronous. So every next request will be started before the previous request will be finished.
So I found this solved question with the loop.run_in_executer() solution and use it in the beginning:
import asyncio
import requests
#blocking_request_func() defined somewhere
async def main():
loop = asyncio.get_event_loop()
future1 = loop.run_in_executor(None, blocking_request_func, 'param')
future2 = loop.run_in_executor(None, blocking_request_func, 'param')
response1 = await future1
response2 = await future2
print(response1)
print(response2)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
this works well, requests run in parallel but there is a problem for my task - in this example we make group of tasks/futures in the beginning and then run this group synchronous. But I need something like this:
1. Sending request_1 and not awaiting when it's done.
(AFTER step 1 but NOT in the same time when step 1 starts):
2. Sending request_2 and not awaiting when it's done.
(AFTER step 2 but NOT in the same time when step 2 starts):
3. Sending request_3 and not awaiting when it's done.
(Request 1(or any other) gives the response)
(AFTER step 3 but NOT in the same time when step 3 starts):
4. Sending request_4 and not awaiting when it's done.
(Request 2(or any other) gives the response)
and so on...
I tried using asyncio.TaskGroup():
async def request_func():
global result #the list of results of requests defined somewhere in global area
loop = asyncio.get_event_loop()
result.append(await loop.run_in_executor(None, blocking_request_func, 'param')
await asyncio.sleep(0) #adding or removing this line gives the same result
async def main():
async with asyncio.TaskGroup() as tg:
for i in range(0, 10):
tg.create_task(request_func())
all these things gave the same result: first of all we defined group of tasks/futures and only then run this group synchronous and concurrently. But is there a way to run all these requests concurrently but "in the stream"?
I tried to make visualization if my explanation is not clear enough.
What I have for now
What I need
================ Update with the answer ===================
The most close answer however with some limitations:
import asyncio
import random
import time
def blockme(n):
x = random.random() * 2.0
time.sleep(x)
return n, x
def cb(fut):
print("Result", fut.result())
async def main():
#You need to control threads quantity
pool = concurrent.futures.ThreadPoolExecutor(max_workers=4)
loop = asyncio.get_event_loop()
futs = []
#You need to control requests per second
delay = 0.5
while await asyncio.sleep(delay, result=True):
fut = loop.run_in_executor(pool, blockme, n)
fut.add_done_callback(cb)
futs.append(fut)
#You need to control futures quantity, f.e. like this:
if len(futs)>40:
completed, futs = await asyncio.wait(futs,
timeout=5,
return_when=FIRST_COMPLETED)
asyncio.run(main())
I think this might be what you want. You don't have to await each request - the run_in_executor function returns a Future. Instead of awaiting that, you can attach a callback function:
import asyncio
import random
import time
def blockme(n):
x = random.random() * 2.0
time.sleep(x)
return n, x
def cb(fut):
print("Result", fut.result())
async def main():
loop = asyncio.get_event_loop()
futs = []
for n in range(20):
fut = loop.run_in_executor(None, blockme, n)
fut.add_done_callback(cb)
futs.append(fut)
await asyncio.gather(*futs)
# await asyncio.sleep(10)
asyncio.run(main())
All the requests are started at the beginning, but they don't all execute in parallel because the number of threads is limited by the ThreadPool. You can adjust the number of threads if you want.
Here I simulated a blocking call with time.sleep. I needed a way to prevent main() from ending before all the callbacks occurred, so I used gather for that purpose. You can also wait for some length of time, but gather is cleaner.
Apologies if I don't understand what you want. But I think you want to avoid using await for each call, and I tried to show one way you can do that.
This is directly referenced from Python documentation. The code snippet from documentation of asyncio library explains how you can run a blocking code concurrently using asyncio. It uses to_thread method to create task
you can find more here - https://docs.python.org/3/library/asyncio-task.html#running-in-threads
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())

Why does 'await' break from the local function when called from main()?

I am new to asynchronous programming, and while I understand most concepts, there is one relating to the inner runnings of 'await' that I don't quite understand.
Consider the following:
import asyncio
async def foo():
print('start fetching')
await asyncio.sleep(2)
print('done fetcihng')
async def main():
task1 = asyncio.create_task(foo())
asyncio.run(main())
Output: start fetching
vs.
async def foo():
print('start fetching')
print('done fetcihng')
async def main():
task1 = asyncio.create_task(foo())
asyncio.run(main())
Output: start fetching followed by done fetching
Perhaps it is my understanding of await, which I do understand insofar that we can use it to pause (2 seconds in the case above), or await for functions to fully finish running before any further code is run.
But for the first example above, why does await cause 'done fetching' to not run??
asyncio.create_task schedules an awaitable on the event loop and returns immediately, so you are actually exiting the main function (and closing the event loop) before the task is able to finish
you need to change main to either
async def main():
task1 = asyncio.create_task(foo())
await task1
or
async def main():
await foo()
creating a task first (the former) is useful in many cases, but they all involve situations where the event loop will outlast the task, e.g. a long running server, otherwise you should just await the coroutine directly like the latter

async python - yield from raises AssertionError for correct parameters

My requirement is to run 2 functions at the same time and stop execution of one if the another one calculates and returns a result faster.
I have know knowledge of async programming or event loops. I read python 3.6 which lead me to asyncio.wait()
My sample code:
import time
import asyncio as aio
async def f(x):
time.sleep(1) # to fake fast work
print("f say: " + str(x*2))
return x*2
async def s(x):
time.sleep(3) # to fake slow work
print("s say: " + str(x*2))
return x**2
x = 10
assert aio.iscoroutinefunction(f)
assert aio.iscoroutinefunction(s)
futures = {f(x), s(x)}
def executor():
yield from aio.wait(futures, return_when=aio.FIRST_COMPLETED)
done, pending = executor()
But it doesnt work for some unknown reason.
The particular assertion you are getting has to do with incorrect use of yield from. However, the problem runs deeper:
My requirement is to run 2 functions at the same time
This is not how asyncio works, nothing is run "at the same time". Instead, one runs async functions which execute until the point when they reach what would normally be a blocking call. Instead of blocking, an async function then suspends its execution, allowing other coroutines to run. They must be driven by an event loop, which drives them and wakes them up once some IO event allows them to resume.
A more correct asyncio version of your code would look like this:
import asyncio
async def f(x):
await asyncio.sleep(1) # to fake fast work
print("f say: " + str(x*2))
return x*2
async def s(x):
await asyncio.sleep(3) # to fake slow work
print("s say: " + str(x*2))
return x**2
async def execute():
futures = {f(10), s(10)}
done, pending = await asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED)
for fut in pending:
fut.cancel()
return done
loop = asyncio.get_event_loop()
done = loop.run_until_complete(execute())
print(done)
Note in particular that:
asyncio.sleep() is used instead of time.sleep(). This applies to every blocking call.
Coroutines such as asyncio.sleep and asyncio.wait must be awaited using the await keyword. This allows the coroutine to suspend itself upon encountering a blocking call.
Async code is executed through the event loop, whose entry point is typically run_until_complete or run_forever.

How to make request without blocking (using asyncio)?

I would like to achieve the following using asyncio:
# Each iteration of this loop MUST last only 1 second
while True:
# Make an async request
sleep(1)
However, the only examples I've seen use some variation of
async def my_func():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, requests.get, 'http://www.google.com')
loop = asyncio.get_event_loop()
loop.run_until_complete(my_func())
But run_until_complete is blocking! Using run_until_complete in each iteration of my while loop would cause the loop to block.
I've spent the last couple of hours trying to figure out how to correctly run a non-blocking task (defined with async def) without success. I must be missing something obvious, because something as simple as this should surely be simple. How can I achieve what I have described?
run_until_complete runs the main event loop. It's not "blocking" so to speak, it just runs the event loop until the coroutine you passed as a parameter returns. It has to hang because otherwise, the program would either stop or be blocked by the next instructions.
It's pretty hard to tell what you are willing to achieve, but this piece code actually does something:
async def my_func():
loop = asyncio.get_event_loop()
while True:
res = await loop.run_in_executor(None, requests.get, 'http://www.google.com')
print(res)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(my_func())
It will perform a GET request on Google homepage every seconds, popping a new thread to perform each request. You can convince yourself that it's actually non-blocking by running multiple requests virtually in parallel:
async def entrypoint():
await asyncio.wait([
get('https://www.google.com'),
get('https://www.stackoverflow.com'),
])
async def get(url):
loop = asyncio.get_event_loop()
while True:
res = await loop.run_in_executor(None, requests.get, url)
print(url, res)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(entrypoint())
Another thing to notice is that you're running requests in separate threads each time. It works, but it's sort of a hack. You should rather be using a real asynchronus HTTP client such as aiohttp.
This is Python 3.10
asyncio is single threaded execution, using await to yield the cpu to other function until what is await'ed is done.
import asyncio
async def my_func(t):
print("Start my_func")
await asyncio.sleep(t) # The await yields cpu, while we wait
print("Exit my_func")
async def main():
asyncio.ensure_future(my_func(10)) # Schedules on event loop, we might want to save the returned future to later check for completion.
print("Start main")
await asyncio.sleep(1) # The await yields cpu, giving my_func chance to start.
print("running other stuff")
await asyncio.sleep(15)
print("Exit main")
if __name__ == "__main__":
asyncio.run(main()) # Starts event loop

Create generator that yields coroutine results as the coroutines finish

Currently, I have an inefficient synchronous generator that makes many HTTP requests in sequence and yields the results. I'd like to use asyncio and aiohttp to parallelise the requests and thereby speed up this generator, but I want to keep it as an ordinary generator (not a PEP 525 async generator) so that the non-async code that calls it doesn't need to be modified. How can I create such a generator?
asyncio.as_completed() takes an iterable of coroutines or futures and returns an iterable of futures in the order that the input futures complete. Normally, you'd loop over its result and await the members from inside an async function...
import asyncio
async def first():
await asyncio.sleep(5)
return 'first'
async def second():
await asyncio.sleep(1)
return 'second'
async def third():
await asyncio.sleep(3)
return 'third'
async def main():
for future in asyncio.as_completed([first(), second(), third()]):
print(await future)
# Prints 'second', then 'third', then 'first'
asyncio.run(main())
... but for the purpose of this question, what we want is to be able to yield these results from an ordinary generator, so that normal synchronous code can consume them without ever knowing that async functions are being used under the hood. We can do that by calling loop.run_until_complete() on the futures yielded by our as_completed call...
import asyncio
async def first():
await asyncio.sleep(5)
return 'first'
async def second():
await asyncio.sleep(1)
return 'second'
async def third():
await asyncio.sleep(3)
return 'third'
def ordinary_generator():
loop = asyncio.get_event_loop()
for future in asyncio.as_completed([first(), second(), third()]):
yield loop.run_until_complete(future)
# Prints 'second', then 'third', then 'first'
for element in ordinary_generator():
print(element)
In this way, we've exposed our async code to non-async-land in a manner that doesn't require callers to define any functions as async, or to even know that ordinary_generator is using asyncio under the hood.
As an alternative implementation of ordinary_generator() that offers more flexibility in some circumstances, we can repeatedly call asyncio.wait() with the FIRST_COMPLETED flag instead of looping over as_completed():
import concurrent.futures
def ordinary_generator():
loop = asyncio.get_event_loop()
pending = [first(), second(), third()]
while pending:
done, pending = loop.run_until_complete(
asyncio.wait(
pending,
return_when=concurrent.futures.FIRST_COMPLETED
)
)
for job in done:
yield job.result()
This approach, maintaining a list of pending jobs, has the advantage that we can adapt it to add jobs to the pending list on the fly. This is useful in use cases where our async jobs can add an unpredictable number of further jobs to the queue - like a web spider that follows all links on each page that it visits.
One caveat: the approaches above assume we're calling the synchronous code from the main thread, in which case get_event_loop is guaranteed to give us a loop and we've got no need to .close it. If we want ordinary_generator to be usable from a non-main thread, especially one that may have previously had an event loop created, then life gets harder, because we can't rely on get_event_loop (it raises a RuntimeError on any non-main thread that doesn't have an event loop yet). In that case the simplest thing I can think to do is to spin off a new thread to run our asyncio code, and communicate with it via a queue:
def ordinary_generator():
sentinel = object()
queue = Queue()
def thread_entry_point():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
for future in asyncio.as_completed([first(), second(), third()]):
try:
queue.put(loop.run_until_complete(future))
except Exception as e:
queue.put((sentinel, e))
break
loop.close()
queue.put(sentinel)
Thread(target=thread_entry_point).start()
while True:
val = queue.get()
if val is sentinel:
return
if isinstance(val, tuple) and len(val) == 2 and val[0] is sentinel:
raise val[1]
yield val
(Combining the use of run_until_complete from the penultimate example with the use of an extra thread in the final example is left as an exercise for any reader who needs to do so.)
Mark's answer is great but I'd like to contribute a different implementation that doesn't rely on the low level event loop methods.
The key difference is that rather than doing a yield, instead provide a callback that can be used to process the results:
import asyncio
import random
async def do_stuff():
proc_time = round(random.random(), 2)
print('START: ', proc_time)
await asyncio.sleep(proc_time)
return proc_time
def concurrent_stuff(awaitables, callback):
# Must be async to wait
async def _as_completed():
for coro in asyncio.as_completed(awaitables):
result = await coro
callback(result) # Send result to callback.
# Perform the async calls inside a regular method
asyncio.run(_as_completed())
def when_done(result):
print('FINISHED: ', result)
def main():
awaitables = [do_stuff() for _ in range(5)]
concurrent_stuff(awaitables, when_done)
main()
# START: 0.56
# START: 0.98
# START: 0.39
# START: 0.23
# START: 0.94
# FINISHED: 0.23
# FINISHED: 0.39
# FINISHED: 0.56
# FINISHED: 0.94
# FINISHED: 0.98

Categories

Resources