outer async context manager finalized before inner async generator - python

Given the following minimal example:
#asynccontextmanager
async def async_context():
try:
yield
finally:
await asyncio.sleep(1)
print('finalize context')
async def async_gen():
try:
yield
finally:
await asyncio.sleep(2)
# will never be called if timeout is larger than in async_context
print('finalize gen')
async def main():
async with async_context():
async for _ in async_gen():
break
if __name__ == "__main__":
asyncio.run(main())
I'm breaking while iterating over the async generator and I want the finally block to complete before my async context manager finally block runs. In this example "finalize gen" will never be printed because the program exits before that happens.
Note that I intentionally chose a timeout of 2 in the generators finally block so the context managers finally has a chance to run before. If I chose 1 for both timeouts both messages will be printed.
Is this kind of a race condition? I expected all finally blocks to complete before the program finishes.
How can I prevent the context mangers finally block to run before the generators finally block has completed?
For context:
I use playwright to control a chromium browser. The outer context manager provides a page that it closes in the finally block.
I'm using python 3.9.0.
Try this example: https://repl.it/#trixn86/AsyncGeneratorRaceCondition

The async context manager doesn't know anything about the asynchronous generator. Nothing in main knows about the asynchronous generator after you break, in fact. You've given yourself no way to wait for the generator's finalization.
If you want to wait for the generator to close, you need to handle closure explicitly:
async def main():
async with async_context():
gen = async_gen()
try:
async for _ in gen:
break
finally:
await gen.aclose()
In Python 3.10, you'll be able to use contextlib.aclosing instead of the try/finally:
async def main():
async with async_context():
gen = async_gen()
async with contextlib.aclosing(gen):
async for _ in gen:
break

Related

Why does 'await' break from the local function when called from main()?

I am new to asynchronous programming, and while I understand most concepts, there is one relating to the inner runnings of 'await' that I don't quite understand.
Consider the following:
import asyncio
async def foo():
print('start fetching')
await asyncio.sleep(2)
print('done fetcihng')
async def main():
task1 = asyncio.create_task(foo())
asyncio.run(main())
Output: start fetching
vs.
async def foo():
print('start fetching')
print('done fetcihng')
async def main():
task1 = asyncio.create_task(foo())
asyncio.run(main())
Output: start fetching followed by done fetching
Perhaps it is my understanding of await, which I do understand insofar that we can use it to pause (2 seconds in the case above), or await for functions to fully finish running before any further code is run.
But for the first example above, why does await cause 'done fetching' to not run??
asyncio.create_task schedules an awaitable on the event loop and returns immediately, so you are actually exiting the main function (and closing the event loop) before the task is able to finish
you need to change main to either
async def main():
task1 = asyncio.create_task(foo())
await task1
or
async def main():
await foo()
creating a task first (the former) is useful in many cases, but they all involve situations where the event loop will outlast the task, e.g. a long running server, otherwise you should just await the coroutine directly like the latter

How to forcefully close an async generator?

Let's say I have an async generator like this:
async def event_publisher(connection, queue):
while True:
if not await connection.is_disconnected():
event = await queue.get()
yield event
else:
return
I consume it like this:
published_events = event_publisher(connection, queue)
async for event in published_events:
# do event processing here
It works just fine, however when the connection is disconnected and there is no new event published the async for will just wait forever, so ideally I would like to close the generator forcefully like this:
if connection.is_disconnected():
await published_events.aclose()
But I get the following error:
RuntimeError: aclose(): asynchronous generator is already running
Is there a way to stop processing of an already running generator?
It seems to be related to this issue. Noticable:
As shown in
https://gist.github.com/1st1/d9860cbf6fe2e5d243e695809aea674c, it's an
error to close a synchronous generator while it is being iterated.
...
In 3.8, calling "aclose()" can crash with a RuntimeError. It's no
longer possible to reliably cancel a running asynchrounous
generator.
Well, since we can't cancel running asynchrounous generator, let's try to cancel its running.
import asyncio
from contextlib import suppress
async def cancel_gen(agen):
task = asyncio.create_task(agen.__anext__())
task.cancel()
with suppress(asyncio.CancelledError):
await task
await agen.aclose() # probably a good idea,
# but if you'll be getting errors, try to comment this line
...
if connection.is_disconnected():
await cancel_gen(published_events)
Can't test if it'll work since you didn't provide reproducable example.
You can use a timeout on the queue so is_connected() is polled regularly if there is no item to pop:
async def event_publisher(connection, queue):
while True:
if not await connection.is_disconnected():
try:
event = await asyncio.wait_for(queue.get(), timeout=10.0)
except asyncio.TimeoutError:
continue
yield event
else:
return
Alternatively, it is possible to use Queue.get_nowait().

How to use `async for` in Python?

I mean what do I get from using async for. Here is the code I write with async for, AIter(10) could be replaced with get_range().
But the code runs like sync not async.
import asyncio
async def get_range():
for i in range(10):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
yield i
class AIter:
def __init__(self, N):
self.i = 0
self.N = N
def __aiter__(self):
return self
async def __anext__(self):
i = self.i
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
if i >= self.N:
raise StopAsyncIteration
self.i += 1
return i
async def main():
async for p in AIter(10):
print(f"finally {p}")
if __name__ == "__main__":
asyncio.run(main())
The result I excepted should be :
start 1
start 2
start 3
...
end 1
end 2
...
finally 1
finally 2
...
However, the real result is:
start 0
end 0
finally 0
start 1
end 1
finally 1
start 2
end 2
I know I could get the excepted result by using asyncio.gather or asyncio.wait.
But it is hard for me to understand what I got by use async for here instead of simple for.
What is the right way to use async for if I want to loop over several Feature object and use them as soon as one is finished. For example:
async for f in feature_objects:
data = await f
with open("file", "w") as fi:
fi.write()
But it is hard for me to understand what I got by use async for here instead of simple for.
The underlying misunderstanding is expecting async for to automatically parallelize the iteration. It doesn't do that, it simply allows sequential iteration over an async source. For example, you can use async for to iterate over lines coming from a TCP stream, messages from a websocket, or database records from an async DB driver.
None of the above would work with an ordinary for, at least not without blocking the event loop. This is because for calls __next__ as a blocking function and doesn't await its result. You cannot manually await elements obtained by for because for expects __next__ to signal the end of iteration by raising StopIteration. If __next__ is a coroutine, the StopIteration exception won't be visible before awaiting it. This is why async for was introduced, not just in Python, but also in other languages with async/await and generalized for.
If you want to run the loop iterations in parallel, you need to start them as parallel coroutines and use asyncio.as_completed or equivalent to retrieve their results as they come:
async def x(i):
print(f"start {i}")
await asyncio.sleep(1)
print(f"end {i}")
return i
# run x(0)..x(10) concurrently and process results as they arrive
for f in asyncio.as_completed([x(i) for i in range(10)]):
result = await f
# ... do something with the result ...
If you don't care about reacting to results immediately as they arrive, but you need them all, you can make it even simpler by using asyncio.gather:
# run x(0)..x(10) concurrently and process results when all are done
results = await asyncio.gather(*[x(i) for i in range(10)])
(Adding on the accepted answer - for Charlie's bounty).
Assuming you want to consume each yielded value concurrently, a straightforward way would be:
import asyncio
async def process_all():
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj))
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
...
Explanation:
Consider the following code, without create_task:
async def process_all():
async for obj in my_async_generator:
await process_obj(obj))
This is roughly equivalent to:
async def process_all():
obj1 = await my_async_generator.__anext__():
await process_obj(obj1))
obj2 = await my_async_generator.__anext__():
await process_obj(obj1))
...
Basically, the loop cannot continue because its body is blocking. The way to go is to delegate the processing of each iteration to a new asyncio task which will start without blocking the loop. The, gather wait for all of the tasks - which means, for every iteration to be processed.
Code based on fantastic answer from #matan129, just missing the async generator to make it runnable, once I have that (or if someone wants to contributed it) will finilize this:
import time
import asyncio
async def process_all():
"""
Example where the async for loop allows to loop through concurrently many things without blocking on each individual
iteration but blocks (waits) for all tasks to run.
ref:
- https://stackoverflow.com/questions/56161595/how-to-use-async-for-in-python/72758067#72758067
"""
tasks = []
async for obj in my_async_generator:
# Python 3.7+. Use ensure_future for older versions.
task = asyncio.create_task(process_obj(obj)) # concurrently dispatches a coroutine to be executed.
tasks.append(task)
await asyncio.gather(*tasks)
async def process_obj(obj):
await asyncio.sleep(5) # expensive IO
if __name__ == '__main__':
# - test asyncio
s = time.perf_counter()
asyncio.run(process_all())
# - print stats
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
print('Success, done!\a')

Shutdown infinite async generator

Reproducible error
I tried to reproduce the error in an online REPL here. However, it is not exactly the same implementation (and hence behavior) as my real code (where I do async for response in position_stream(), instead of for position in count() in the REPL).
More details on my actual implementation
I define somewhere a coroutine like so:
async def position(self):
request = telemetry_pb2.SubscribePositionRequest()
position_stream = self._stub.SubscribePosition(request)
try:
async for response in position_stream:
yield Position.translate_from_rpc(response)
finally:
position_stream.cancel()
where position_stream is infinite (or possibly very long lasting). I use it from an example code like this:
async def print_altitude():
async for position in drone.telemetry.position():
print(f"Altitude: {position.relative_altitude_m}")
and print_altitude() is run on the loop with:
asyncio.ensure_future(print_altitude())
asyncio.get_event_loop().run_forever()
That works well. Now, at some point, I'd like to close the stream from the caller. I thought that I could just run asyncio.ensure_future(loop.shutdown_asyncgens()) and wait for my finally close above to get called, but it doesn't happen.
Instead, I receive a warning on an unretrieved exception:
Task exception was never retrieved
future: <Task finished coro=<print_altitude() done, defined at [...]
Why is that, and how can I make it such that all my async generators actually get closed (and run their finally clause)?
First of all, if you stop a loop, none of your coroutines will have a chance to shut down properly. Calling close basically means irreversibly destroying the loop.
If you do not care what happens to those running tasks, you can simply cancel them all, this will stop asynchronous generators as well:
import asyncio
from contextlib import suppress
async def position_stream():
while True:
await asyncio.sleep(1)
yield 0
async def print_position():
async for position in position_stream():
print(f'position: {position}')
async def cleanup_awaiter():
await asyncio.sleep(3)
print('cleanup!')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(print_position())
asyncio.ensure_future(print_position())
loop.run_until_complete(cleanup_awaiter())
# get all running tasks:
tasks = asyncio.gather(*asyncio.Task.all_tasks())
# schedule throwing CancelledError into the them:
tasks.cancel()
# allow them to process the exception and be cancelled:
with suppress(asyncio.CancelledError):
loop.run_until_complete(tasks)
finally:
print('closing loop')
loop.close()

How to make request without blocking (using asyncio)?

I would like to achieve the following using asyncio:
# Each iteration of this loop MUST last only 1 second
while True:
# Make an async request
sleep(1)
However, the only examples I've seen use some variation of
async def my_func():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, requests.get, 'http://www.google.com')
loop = asyncio.get_event_loop()
loop.run_until_complete(my_func())
But run_until_complete is blocking! Using run_until_complete in each iteration of my while loop would cause the loop to block.
I've spent the last couple of hours trying to figure out how to correctly run a non-blocking task (defined with async def) without success. I must be missing something obvious, because something as simple as this should surely be simple. How can I achieve what I have described?
run_until_complete runs the main event loop. It's not "blocking" so to speak, it just runs the event loop until the coroutine you passed as a parameter returns. It has to hang because otherwise, the program would either stop or be blocked by the next instructions.
It's pretty hard to tell what you are willing to achieve, but this piece code actually does something:
async def my_func():
loop = asyncio.get_event_loop()
while True:
res = await loop.run_in_executor(None, requests.get, 'http://www.google.com')
print(res)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(my_func())
It will perform a GET request on Google homepage every seconds, popping a new thread to perform each request. You can convince yourself that it's actually non-blocking by running multiple requests virtually in parallel:
async def entrypoint():
await asyncio.wait([
get('https://www.google.com'),
get('https://www.stackoverflow.com'),
])
async def get(url):
loop = asyncio.get_event_loop()
while True:
res = await loop.run_in_executor(None, requests.get, url)
print(url, res)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(entrypoint())
Another thing to notice is that you're running requests in separate threads each time. It works, but it's sort of a hack. You should rather be using a real asynchronus HTTP client such as aiohttp.
This is Python 3.10
asyncio is single threaded execution, using await to yield the cpu to other function until what is await'ed is done.
import asyncio
async def my_func(t):
print("Start my_func")
await asyncio.sleep(t) # The await yields cpu, while we wait
print("Exit my_func")
async def main():
asyncio.ensure_future(my_func(10)) # Schedules on event loop, we might want to save the returned future to later check for completion.
print("Start main")
await asyncio.sleep(1) # The await yields cpu, giving my_func chance to start.
print("running other stuff")
await asyncio.sleep(15)
print("Exit main")
if __name__ == "__main__":
asyncio.run(main()) # Starts event loop

Categories

Resources