In the following code:
import asyncio
async def task_func():
print('in task_func')
return 'the result'
async def main(loop):
print('creating task')
task = loop.create_task(task_func())
print('waiting for {!r}'.format(task))
await asyncio.sleep(2)
return_value = await task
print('task completed {!r}'.format(task))
print('return value: {!r}'.format(return_value))
event_loop = asyncio.new_event_loop()
try:
event_loop.run_until_complete(main(event_loop))
finally:
event_loop.close()
When I execute the code, the result is the following:
creating task
waiting for `<Task pending coro=<task_func() running at <ipython-input-29-797f29858344>:1>>`
in task_func
task completed `<Task finished coro=<task_func() done, defined at <ipython-input-29-797f29858344>:1> result='the result'>`
return value: 'the result'
But I don't understand when the code you set at loop.create_task(task_func()) is executed. Specifically, I assumed when you add a task to the event loop, it is executed soon, so I thought in task_func is printed before waiting for <Task....
Then I found it is always executed after the waiting for <Task..., so I added await asyncio.sleep(2), but only found that the in task_func is printed before the finish of 2 seconds.
I also added task_func_2() which is practically the same as task_func() and create its task below task = loop.create_task(task_func()) but do NOT add return_value_2 = await task2, so the await does not execute the task (otherwise the task_func_2() is never executed).
So now I got confuesed. When is the task is executed after it is added to the event loop in loop.create_task()?
Specifically, I assumed when you add a task to the event loop, it is executed soon, so I thought in task_func is printed before waiting for <Task....
"Executed soon" doesn't mean executed right away. Instead, you can think of it as "executed the first chance we get," we being the event loop. Since print immediately follows the call to create_task, at that point the event loop hasn't yet had a chance to run at all. To give event loop a chance to run, you must return to the event loop, either by returning from the current coroutine, or by awaiting something that blocks.
When you await a blocking coroutine such as asyncio.sleep(), the coroutine will temporarily suspend itself and relinquish control to the event loop. The event loop will look at what else there is to do before the sleep elapses and will find the tasks scheduled using create_task in its run queue. This is why task_func and task_func_2 are executed when the main coroutine awaits the sleep - but not before that, and regardless of whether you await them in particular or something else that blocks.
awaiting a coroutine such as task_func means requesting its result then and there, and being prepared to wait for it if the coroutine suspends. (Waiting on something that suspended automatically defers execution to the event loop, allowing other coroutines to make progress.) Although the implementation differs, an await is conceptually similar to joining a thread.
Related
Having read the documents and watched a number of videos, i am testing asyncio as an alternative to threading.
The docs are here:
https://docs.python.org/3/library/asyncio.html
I have constructed the following code with the expectation that it would produce the following.
before the sleep
hello
world
But in fact is produces this (world comes before hello):
before the sleep
world
hello
Here is the code:
import asyncio
import time
def main():
''' main entry point for the program '''
# create the event loop and add to the loop
# or run directly.
asyncio.run(main_async())
return
async def main_async():
''' the main async function '''
await foo()
await bar()
return
async def foo():
print('before the sleep')
await asyncio.sleep(2)
# time.sleep(0)
print('world')
return
async def bar():
print('hello')
await asyncio.sleep(0)
return
if __name__=='__main__':
''' This is executed when run from the command line '''
main()
The main() function calls the async main_async() function which in turn calls both the foo and bar async functions and both of those run the await asyncio.sleep(x) command.
So my question is: why is the hello world comming in the wrong (unexpected) order given that i was expecting world to be printed approximately 2 seconds after hello ?
You awaited foo() immediately, so bar() was never scheduled until foo() had run to completion; the execution of main_async will never do things after an await until the await has completed. If you want to schedule them both and let them interleave, replace:
await foo()
await bar()
with something like:
await asyncio.gather(foo(), bar())
which will convert both awaitables to tasks, scheduling both on the running asyncio event loop, then wait for both tasks to run to completion. With both scheduled at once, when one blocks on an await (and only await-based blocks, because only await yields control back to the event loop), the other will be allowed to run (and control can only return to the other task when the now running task awaits or finishes).
Basically, you have to remember that asyncio is cooperative multitasking. If you're only executing one task, and that task performs an await, there is nothing else to schedule, so nothing else runs until that await completes. If you block by any means other than an await, you still hold the event loop, and nothing else will get a chance to run, even if it's ready to go. So to gain any benefit from asyncio you need to be careful to:
Ensure other tasks are launched in time to occupy the event loop while the original task(s) are blocking on await.
Ensure you only block via await, so you don't monopolize the event loop unnecessarily.
In python, there are 3 main types awaitable objects: coroutines, Tasks, and Futures.
I can await a coroutine, and also a tasks.
Awaiting a coroutine
import asyncio
async def nested():
return 42
async def main():
print(await nested()) # will print "42".
asyncio.run(main())
Awaiting a task
import asyncio
async def nested():
return 42
async def main():
task = asyncio.create_task(nested())
await task
asyncio.run(main())
What is the value of wrapping the coroutine in a task in the first place? It looks like they do the same thing.
When would I need to use a task vs a coroutine?
Coroutine is just a function that runs in the context of current awaitable. It can yield execution to the event loop on behalf of the caller (the one who calls await). Think of a function that is allowed to pause it's thread. You can call one coroutine from another, but they still share the same thread.
Task, on other hand, immediately posts a separate job to an event loop. The task itself is a handle to that job. You may await a task, but it can run on itself just fine in "parallel" — in single threaded context this means that task can run while other josb are yielding (e.g. waiting for the I/O). Task may complete even before you call await.
Example without tasks:
job_1 = sleep(5)
job_2 = sleep(2)
# will sleep for 5 seconds
await job_1
# will sleep for another 2 seconds
await job_2
Example with tasks:
job_1 = sleep(5)
job_2 = asyncio.create_task(sleep(2))
# will sleep for 5 seconds
await job_1
# by this time, job_2 is complete
# because previous job has yielded at some point, allowing other jobs to run
# thus await takes no time
await job_2
In this case there's no real difference: by awaiting the coroutine it's going to get scheduled as part of the task it's part of. However that means it's driven by its parent.
By wrapping a coroutine in a task, it gets independently scheduled on the event loop, meaning it is not driven by the containing task anymore (it has its own lifecycle) and it can be interacted with more richly (e.g. cancelled or have callbacks added to it).
Think "function" versus "thread", really. A coroutine is just a function which can be suspended (if it awaits stuff), but it still only exists within the lexical and dynamic context of its caller. A task is freed from that context, it makes the wrapped coroutine live its own life in the same way a thread makes the wrapped function (target) live its own life.
Creating a Task schedules the passed coroutine to be run on an event loop. You can use the Task to cancel the underlying coroutine.
Imagine we're writing an application which allows a user to run an application (let's say it's a series of important operations against an API) continuously, and can run multiple applications concurrently. Requirements include:
the user can control the number of concurrent applications (which may limit concurrent load against an API, which is often important)
if the OS tries to close the Python program running this thing, it should gracefully terminate, allowing any in-progress applications to complete their run before closing
The question here is specifically about the task manager we've coded, so let's stub out some code that illustrates this problem:
import asyncio
import signal
async def work_chunk():
"""Simulates a chunk of work that can possibly fail"""
await asyncio.sleep(1)
async def protected_work():
"""All steps of this function MUST complete, the caller should shield it from cancelation."""
print("protected_work start")
for i in range(3):
await work_chunk()
print(f"protected_work working... {i+1} out of 3 steps complete")
print("protected_work done... ")
async def subtask():
print("subtask: starting loop of protected work...")
cancelled = False
while not cancelled:
protected_coro = asyncio.create_task(protected_work())
try:
await asyncio.shield(protected_coro)
except asyncio.CancelledError:
cancelled = True
await protected_coro
print("subtask: cancelation complete")
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
await asyncio.sleep(5)
def shutdown(signal, main_task):
"""Cleanup tasks tied to the service's shutdown."""
print(f"Received exit signal {signal.name}. Scheduling cancelation:")
main_task.cancel()
async def main():
print("main... start")
coro = asyncio.ensure_future(subtask_manager())
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, coro))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, coro))
await coro
print("main... done")
def run():
asyncio.run(main())
run()
subtask_manager manages a pool of workers, periodically looking up what the present concurrency requirement is and updating the number of active workers appropriately (note that the code above cuts out most of that, and just hard codes a number, since it isn't important to the question).
subtask is the worker loop itself, which continuously runs protected_work() until someone cancels it.
But this code is broken. When you give it a SIGINT, the whole thing immediately crashes.
Before I explain further, let me point you at a critical bit of code:
1 protected_coro = asyncio.create_task(protected_work())
2 try:
3 await asyncio.shield(protected_coro)
4 except asyncio.CancelledError:
5 cancelled = True
6 await protected_coro # <-- This will raise CancelledError too!
After some debugging, we find that our try/except block isn't working. We find that both line 3 AND line 6 raise CancelledError.
When we dig in further, we find that ALL "await" calls throw CancelledError after the subtask manager is canceled, not just the line noted above. (i.e., the second line of work_chunk(), await asyncio.sleep(1), and the 4th line of protected_work(), await work_chunk(), also raise CancelledError.)
What's going on here?
It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
Why?
Clearly, I don't understand how cancelation propagation works in Python. I've struggled to find documentation on how it works. Can someone describe to me how cancelation is propagated in a clear-minded way that explains the behavior found in the example above?
After looking at this problem for a long time, and experimenting with other code snippets (where cancelation propagation works as expected), I started to wonder if the problem is Python doesn't know the order of propagation here, in this case.
But why?
Well, subtask_manager creates tasks, but doesn't await them.
Could it be that Python doesn't assume that the coroutine that created that task (with create_task) owns that task? I think Python uses the await keyword exclusively to know in what order to propagate cancelation, and if after traversing the whole tree of tasks it finds tasks that still haven't been canceled, it just destroys them all.
Therefore, it's up to us to manage Task cancelation propagation ourselves, in any place where we know we haven't awaited an async task. So, we need to refactor subtask_manager to catch its own cancelation, and explicitly cancel and then await all its child tasks:
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
print("cancelation detected, canceling children")
[t.cancel() for t in tasks.values()]
await asyncio.gather(*[t for t in tasks.values()])
return
Now our code works as expected:
Note: I've answered my own question Q&A style, but I still feel unsatisfied with my textual answer about how cancelation propagation works. If anyone has a better explanation of how cancelation propagation works, I would love to read it.
What's going on here? It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
TL;DR Canceling everything is precisely what's happening, simply because the event loop is exiting.
To investigate this, I changed the invocation of add_signal_handler() to loop.call_later(.5, lambda: shutdown(signal.SIGINT, coro)). Python's Ctrl+C handling has odd corners, and I wanted to check whether the strange behavior is the result of that. But the bug was perfectly reproducible without signals, so it wasn't that.
And yet, asyncio cancellation really shouldn't work like your code shows. Canceling a task propagates to the future (or another task) it awaits, but shield is specifically implemented to circumvent that. It creates and returns a fresh future, and connects the result of the original (shielded) future to the new one in a way that cancel() doesn't know how to follow.
It took me some time to unearth what really happens, and that is:
await coro at the end of main awaits the task that gets cancelled, so it gets a CancelledError as soon as shutdown cancels it;
the exception causes main to exit and enters the cleanup sequence at the end of asyncio.run(). This cleanup sequence cancels all tasks, including the ones you've shielded.
You can test it by changing await coro at the end of main() to:
try:
await coro
finally:
print('main... done')
And you will see that "main... done" is printed prior to all the mysterious cancellations you've been witnessing.
So that clears the mystery and to fix the issue, you should postpone exiting main until everything is done. For example, you can create the tasks dict in main, pass it to subtask_manager(), and then await those critical tasks when the main task gets cancelled:
async def subtask_manager(tasks):
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
for t in tasks.values():
t.cancel()
raise
# ... shutdown unchanged
async def main():
print("main... start")
tasks = {}
main_task = asyncio.ensure_future(subtask_manager(tasks))
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, main_task))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, main_task))
try:
await main_task
except asyncio.CancelledError:
await asyncio.gather(*tasks.values())
finally:
print("main... done")
Note that the main task must explicitly cancel its subtasks because that actually wouldn't happen automatically. Cancellation is propagated through a chain of awaits, and subtask_manager doesn't explicitly awaits its subtasks, it just spawns them and awaits something else, effectively shielding them.
I want to use the event loop to monitor any inserting data into my asyncio.Queue(you can find its source code here https://github.com/python/cpython/blob/3.6/Lib/asyncio/queues.py), but I run into some problems. Here is the following code:
import asyncio
import threading
async def recv(q):
while True:
msg = await q.get()
print(msg)
async def checking_task():
while True:
await asyncio.sleep(0.1)
def loop_in_thread(loop,q):
asyncio.set_event_loop(loop)
asyncio.ensure_future(recv(q))
asyncio.ensure_future(insert(q))
# asyncio.ensure_future(checking_task()) comment this out, and it will work as intended
loop.run_forever()
async def insert(q):
print('invoked')
await q.put('hello')
q = asyncio.Queue()
loop = asyncio.get_event_loop()
t = threading.Thread(target=loop_in_thread, args=(loop, q,))
t.start()
The program has started and we can see the following result
invoked
hello
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x000001E215DCFAC8>()]>>}
But now if we manually add data into q by using q.put_nowait('test'), we would get the following result:
q.put_nowait('test') # a non-async way to add data into queue
-> print(asyncio.Task.all_tasks())
{<Task pending coro=<recv() running at C:/Users/costa/untitled3.py:39>
wait_for=<Future finished result=None>>}
As you can see, the future is already finished, yet we still haven't print out the newly added string 'test'. In other words, msg = await q.get() is still waiting even though the Future related to q.get() is done and there are no other tasks running. This confuses me because in the official documentation(https://docs.python.org/3/library/asyncio-task.html), it says
result = await future or result = yield from future – suspends the coroutine until the future is done, then returns the future’s result
It seemed that even though the Future is done, we still need some sort of await in other async function to make the event loop keep processing tasks.
I found a workaround to this problem, which is adding a checking_task(), and also add that coroutine into the event loop; then it will work as intended.
But adding a checking_task() coroutine is very costly for CPU since it just runs a while loop. I am wondering if there is some manual way for us to trigger that await event without using a async function. For example, something magical like
q.put_nowait('test')
loop.ok_you_can_start_running_other_pending_tasks()
Helps will be greatly appreciated! Thanks.
So I ended up with using
loop.call_soon_threadsafe(q.put_nowait, 'test')
and it will work as intended. After figure this out, I searched some information about . It turned out this post (Scheduling an asyncio coroutine from another thread) has the same problem. And #kfx's answer would also work, which is
loop.call_soon_threadsafe(loop.create_task, q.put('test'))
Notice asyncio.Queue.put() is a coroutine but asyncio.Queue.put_nowait() is a normal function.
When I call yield from some_coroutine() from with in a couroutine foo, is some_coroutine scheduled in the same even-loop as foo is currently running in? An example:
async def foo():
yield from asyncio.sleep(5)
loop = asyncio.get_event_loop() # this could also be a custom event loop
loop.run_until_completed(foo())
In this example, in which event-loop will sleep be scheduled? I'm especially interested in the case where loop is not the default event-loop.
The documentation, under "Things a coroutine can do" says:
result = await coroutine or result = yield from coroutine – wait for
another coroutine to produce a result (or raise an exception, which
will be propagated). The coroutine expression must be a call to
another coroutine.
It is not clear to me in which loop the coroutine will be scheduled.
Citing docs of get_event_loop
Get the event loop for the current context.
Implementation of default loop (Event loop default policy to be precise):
The default policy defines context as the current thread, and manages an event loop per thread that interacts with asyncio.
An event loop runs in a thread and executes all callbacks and tasks in the same thread (docs),
asyncio.get_event_loop returns the same loop for the same thread,
if you do not explicitly schedule on/interact with different thread's loop, it will use default (*) loop
In your example:
get_event_loop returns current thread's event loop,
foo is scheduled on that loop with run_until_completed
any further async calls (awaits/yield from) are scheduled on the same loop
More info at Concurrency and multithreading.
(*) The event loop you called default is actually a loop of current thread.