Is there a way to manually switch on asyncio event loop - python

I want to use the event loop to monitor any inserting data into my asyncio.Queue(you can find its source code here https://github.com/python/cpython/blob/3.6/Lib/asyncio/queues.py), but I run into some problems. Here is the following code:
import asyncio
import threading
async def recv(q):
while True:
msg = await q.get()
print(msg)
async def checking_task():
while True:
await asyncio.sleep(0.1)
def loop_in_thread(loop,q):
asyncio.set_event_loop(loop)
asyncio.ensure_future(recv(q))
asyncio.ensure_future(insert(q))
# asyncio.ensure_future(checking_task()) comment this out, and it will work as intended
loop.run_forever()
async def insert(q):
print('invoked')
await q.put('hello')
q = asyncio.Queue()
loop = asyncio.get_event_loop()
t = threading.Thread(target=loop_in_thread, args=(loop, q,))
t.start()
The program has started and we can see the following result
invoked
hello
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x000001E215DCFAC8>()]>>}
But now if we manually add data into q by using q.put_nowait('test'), we would get the following result:
q.put_nowait('test') # a non-async way to add data into queue
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future finished result=None>>}
As you can see, the future is already finished, yet we still haven't print out the newly added string 'test'. In other words, msg = await q.get() is still waiting even though the Future related to q.get() is done and there are no other tasks running. This confuses me because in the official documentation(https://docs.python.org/3/library/asyncio-task.html), it says
result = await future or result = yield from future – suspends the coroutine until the future is done, then returns the future’s result
It seemed that even though the Future is done, we still need some sort of await in other async function to make the event loop keep processing tasks.
I found a workaround to this problem, which is adding a checking_task(), and also add that coroutine into the event loop; then it will work as intended.
But adding a checking_task() coroutine is very costly for CPU since it just runs a while loop. I am wondering if there is some manual way for us to trigger that await event without using a async function. For example, something magical like
q.put_nowait('test')
loop.ok_you_can_start_running_other_pending_tasks()
Helps will be greatly appreciated! Thanks.

So I ended up with using
loop.call_soon_threadsafe(q.put_nowait, 'test')
and it will work as intended. After figure this out, I searched some information about . It turned out this post (Scheduling an asyncio coroutine from another thread) has the same problem. And #kfx's answer would also work, which is
loop.call_soon_threadsafe(loop.create_task, q.put('test'))
Notice asyncio.Queue.put() is a coroutine but asyncio.Queue.put_nowait() is a normal function.

Related

Python asyncio cancel unawaited coroutines

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.
So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()
Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

How to convert this function to async generator

I am trying to build an async generator, but I couldn't find any resources or figure out how to do it.
I am still getting the same error
TypeError: 'async for' requires an object with aiter method, got
coroutine
I read a not understandable for pep from 2016 and I am really confused.
Basically what I am trying to do is to schedule multiple coroutines and when one of them finishes I yield a value so I can process it immediately after the result comes without waiting for every result.
But I couldn't figure out this so I decided I will assume that coroutines will finish in the order they were created and I still have a lot of problems
I am looking for some solution for either yielding coroutines result in one after another or reacting to the first finished coroutine
Thanks in advance for any tips, resources, examples, and solutions :)
async def get_from_few_pages(
self,
max_pages: int = 0,
):
pages = [
asyncio.ensure_future(
asyncio.get_running_loop().create_task(self.extract_data_from_page(i))
)
for i in range(max_pages)
]
for coroutine in pages:
await asyncio.gather(coroutine)
yield coroutine.result()
To create an async generator, you create an async def with a yield inside, much like your code does. In fact, your code looks like something that should actually work, although imperfectly, so I don't know why you're getting the error you quote.
However, there are issues with your code:
it will always yield the coroutines in the order they are given, not in the order in which they complete - but they will run in parallel.
you don't need both ensure_future() and create_task(), create_task() is sufficient
you don't asyncio.gather() to await a single thing, it's for when you have more than one thing to await in parallel
To get a generator that yields awaitables as they complete, you can use asyncio.wait(return_when=FIRST_COMPLETED), like this:
async def get_from_few_pages(self, max_pages):
pending = [asyncio.create_task(self.extract_data_from_page(i))
for i in range(max_pages)]
while pending:
done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
for fut in done:
yield fut.result()

Is it possible to detect when all async tasks are suspended?

I'm trying to test an async code, but I'm having trouble because of the complex connection between some tasks.
The context I need this is some code which reads a file in parallel to it being written by another process. There's some logic in the code where reading a truncated record will make it back off and wait() on an asyncio.Condition to be later released by an inotify event. This code should let it recover by re-reading the record when a future write has been completed by another process. I specifically want to test that this recovery works.
So my plan would be:
write a partial file
run the event loop until it suspends on the condition
write the rest of the file
run the event loop to completion
I had thought this was the anser: Detect an idle asyncio event loop
However a trial test shows that it exits too soon:
import asyncio
import random
def test_ping_pong():
async def ping_pong(idx: int, oth_idx: int):
for i in range(random.randint(100, 1000)):
counters[idx] += 1
async with conditions[oth_idx]:
conditions[oth_idx].notify()
async with conditions[idx]:
await conditions[idx].wait()
async def detect_iowait():
loop = asyncio.get_event_loop()
rsock, wsock = socket.socketpair()
wsock.close()
try:
await loop.sock_recv(rsock, 1)
finally:
rsock.close()
conditions = [asyncio.Condition(), asyncio.Condition()]
counters = [0, 0]
loop = asyncio.get_event_loop()
loop.create_task(ping_pong(0, 1))
loop.create_task(ping_pong(1, 0))
loop.run_until_complete(loop.create_task(detect_iowait()))
assert counters[0] > 10
assert counters[1] > 10
After digging through the source code for python's event loops, I've found nothing exposed that can do this publicly.
It is however possible to use the _ready deque created by the BaseEventLoop. See here. This contains every task that is immediately ready to run. When a task is run it is popped from the _ready deque. When a suspended task is released by another task (eg by calling future.set_result()) the suspended task is immediately added back to the deque. This has existed since python 3.5.
One thing that you can do is repeatedly inject a callback to check how many items in _ready. When all other tasks are suspended, there will be nothing left in the dqueue at the moment the callback runs.
The callback will run at most once per iteration of the event loop:
async def wait_for_deadlock(empty_loop_threshold: int = 0):
def check_for_deadlock():
nonlocal empty_loop_count
# pylint: disable=protected-access
if loop._ready:
empty_loop_count = 0
loop.call_soon(check_for_deadlock)
elif empty_loop_count < empty_loop_threshold:
empty_loop_count += 1
loop.call_soon(check_for_deadlock)
else:
future.set_result(None)
empty_loop_count = 0
loop = asyncio.get_running_loop()
future = loop.create_future()
asyncio.get_running_loop().call_soon(check_for_deadlock)
await future
In the above code the empty_loop_threshold is not really necessary in most cases but exists for cases where tasks communicate with IO. For example if one task communicates to another through IO, there may be a moment where all tasks are suspended even through one has data ready to read. Setting empty_loop_threshold = 1 should get round this.
Using this is relatively simple. You can:
loop.run_until_complete(wait_for_deadlock())
Or as requested in my question:
def some_test():
async def async_test():
await wait_for_deadlock()
inject_something()
await wait_for_deadlock()
loop = loop.get_event_loop()
loop.create_task(task_to_test())
loop.run_until_complete(loop.create_task(async_test)
assert something

Tasks created with create_task that are never awaited, seem to break expectations of cancelation for child tasks

Imagine we're writing an application which allows a user to run an application (let's say it's a series of important operations against an API) continuously, and can run multiple applications concurrently. Requirements include:
the user can control the number of concurrent applications (which may limit concurrent load against an API, which is often important)
if the OS tries to close the Python program running this thing, it should gracefully terminate, allowing any in-progress applications to complete their run before closing
The question here is specifically about the task manager we've coded, so let's stub out some code that illustrates this problem:
import asyncio
import signal
async def work_chunk():
"""Simulates a chunk of work that can possibly fail"""
await asyncio.sleep(1)
async def protected_work():
"""All steps of this function MUST complete, the caller should shield it from cancelation."""
print("protected_work start")
for i in range(3):
await work_chunk()
print(f"protected_work working... {i+1} out of 3 steps complete")
print("protected_work done... ")
async def subtask():
print("subtask: starting loop of protected work...")
cancelled = False
while not cancelled:
protected_coro = asyncio.create_task(protected_work())
try:
await asyncio.shield(protected_coro)
except asyncio.CancelledError:
cancelled = True
await protected_coro
print("subtask: cancelation complete")
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
await asyncio.sleep(5)
def shutdown(signal, main_task):
"""Cleanup tasks tied to the service's shutdown."""
print(f"Received exit signal {signal.name}. Scheduling cancelation:")
main_task.cancel()
async def main():
print("main... start")
coro = asyncio.ensure_future(subtask_manager())
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, coro))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, coro))
await coro
print("main... done")
def run():
asyncio.run(main())
run()
subtask_manager manages a pool of workers, periodically looking up what the present concurrency requirement is and updating the number of active workers appropriately (note that the code above cuts out most of that, and just hard codes a number, since it isn't important to the question).
subtask is the worker loop itself, which continuously runs protected_work() until someone cancels it.
But this code is broken. When you give it a SIGINT, the whole thing immediately crashes.
Before I explain further, let me point you at a critical bit of code:
1 protected_coro = asyncio.create_task(protected_work())
2 try:
3 await asyncio.shield(protected_coro)
4 except asyncio.CancelledError:
5 cancelled = True
6 await protected_coro # <-- This will raise CancelledError too!
After some debugging, we find that our try/except block isn't working. We find that both line 3 AND line 6 raise CancelledError.
When we dig in further, we find that ALL "await" calls throw CancelledError after the subtask manager is canceled, not just the line noted above. (i.e., the second line of work_chunk(), await asyncio.sleep(1), and the 4th line of protected_work(), await work_chunk(), also raise CancelledError.)
What's going on here?
It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
Why?
Clearly, I don't understand how cancelation propagation works in Python. I've struggled to find documentation on how it works. Can someone describe to me how cancelation is propagated in a clear-minded way that explains the behavior found in the example above?
After looking at this problem for a long time, and experimenting with other code snippets (where cancelation propagation works as expected), I started to wonder if the problem is Python doesn't know the order of propagation here, in this case.
But why?
Well, subtask_manager creates tasks, but doesn't await them.
Could it be that Python doesn't assume that the coroutine that created that task (with create_task) owns that task? I think Python uses the await keyword exclusively to know in what order to propagate cancelation, and if after traversing the whole tree of tasks it finds tasks that still haven't been canceled, it just destroys them all.
Therefore, it's up to us to manage Task cancelation propagation ourselves, in any place where we know we haven't awaited an async task. So, we need to refactor subtask_manager to catch its own cancelation, and explicitly cancel and then await all its child tasks:
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
print("cancelation detected, canceling children")
[t.cancel() for t in tasks.values()]
await asyncio.gather(*[t for t in tasks.values()])
return
Now our code works as expected:
Note: I've answered my own question Q&A style, but I still feel unsatisfied with my textual answer about how cancelation propagation works. If anyone has a better explanation of how cancelation propagation works, I would love to read it.
What's going on here? It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
TL;DR Canceling everything is precisely what's happening, simply because the event loop is exiting.
To investigate this, I changed the invocation of add_signal_handler() to loop.call_later(.5, lambda: shutdown(signal.SIGINT, coro)). Python's Ctrl+C handling has odd corners, and I wanted to check whether the strange behavior is the result of that. But the bug was perfectly reproducible without signals, so it wasn't that.
And yet, asyncio cancellation really shouldn't work like your code shows. Canceling a task propagates to the future (or another task) it awaits, but shield is specifically implemented to circumvent that. It creates and returns a fresh future, and connects the result of the original (shielded) future to the new one in a way that cancel() doesn't know how to follow.
It took me some time to unearth what really happens, and that is:
await coro at the end of main awaits the task that gets cancelled, so it gets a CancelledError as soon as shutdown cancels it;
the exception causes main to exit and enters the cleanup sequence at the end of asyncio.run(). This cleanup sequence cancels all tasks, including the ones you've shielded.
You can test it by changing await coro at the end of main() to:
try:
await coro
finally:
print('main... done')
And you will see that "main... done" is printed prior to all the mysterious cancellations you've been witnessing.
So that clears the mystery and to fix the issue, you should postpone exiting main until everything is done. For example, you can create the tasks dict in main, pass it to subtask_manager(), and then await those critical tasks when the main task gets cancelled:
async def subtask_manager(tasks):
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
for t in tasks.values():
t.cancel()
raise
# ... shutdown unchanged
async def main():
print("main... start")
tasks = {}
main_task = asyncio.ensure_future(subtask_manager(tasks))
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, main_task))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, main_task))
try:
await main_task
except asyncio.CancelledError:
await asyncio.gather(*tasks.values())
finally:
print("main... done")
Note that the main task must explicitly cancel its subtasks because that actually wouldn't happen automatically. Cancellation is propagated through a chain of awaits, and subtask_manager doesn't explicitly awaits its subtasks, it just spawns them and awaits something else, effectively shielding them.

When is the task at `create_task()` executed in asyncio?

In the following code:
import asyncio
async def task_func():
print('in task_func')
return 'the result'
async def main(loop):
print('creating task')
task = loop.create_task(task_func())
print('waiting for {!r}'.format(task))
await asyncio.sleep(2)
return_value = await task
print('task completed {!r}'.format(task))
print('return value: {!r}'.format(return_value))
event_loop = asyncio.new_event_loop()
try:
event_loop.run_until_complete(main(event_loop))
finally:
event_loop.close()
When I execute the code, the result is the following:
creating task
waiting for `<Task pending coro=<task_func() running at <ipython-input-29-797f29858344>:1>>`
in task_func
task completed `<Task finished coro=<task_func() done, defined at <ipython-input-29-797f29858344>:1> result='the result'>`
return value: 'the result'
But I don't understand when the code you set at loop.create_task(task_func()) is executed. Specifically, I assumed when you add a task to the event loop, it is executed soon, so I thought in task_func is printed before waiting for <Task....
Then I found it is always executed after the waiting for <Task..., so I added await asyncio.sleep(2), but only found that the in task_func is printed before the finish of 2 seconds.
I also added task_func_2() which is practically the same as task_func() and create its task below task = loop.create_task(task_func()) but do NOT add return_value_2 = await task2, so the await does not execute the task (otherwise the task_func_2() is never executed).
So now I got confuesed. When is the task is executed after it is added to the event loop in loop.create_task()?
Specifically, I assumed when you add a task to the event loop, it is executed soon, so I thought in task_func is printed before waiting for <Task....
"Executed soon" doesn't mean executed right away. Instead, you can think of it as "executed the first chance we get," we being the event loop. Since print immediately follows the call to create_task, at that point the event loop hasn't yet had a chance to run at all. To give event loop a chance to run, you must return to the event loop, either by returning from the current coroutine, or by awaiting something that blocks.
When you await a blocking coroutine such as asyncio.sleep(), the coroutine will temporarily suspend itself and relinquish control to the event loop. The event loop will look at what else there is to do before the sleep elapses and will find the tasks scheduled using create_task in its run queue. This is why task_func and task_func_2 are executed when the main coroutine awaits the sleep - but not before that, and regardless of whether you await them in particular or something else that blocks.
awaiting a coroutine such as task_func means requesting its result then and there, and being prepared to wait for it if the coroutine suspends. (Waiting on something that suspended automatically defers execution to the event loop, allowing other coroutines to make progress.) Although the implementation differs, an await is conceptually similar to joining a thread.

Categories

Resources