Why does this asyncio.Task never finish cancelling? - python

If I run this on the python3 interpreter:
import asyncio
#asyncio.coroutine
def wait(n):
asyncio.sleep(n)
loop = asyncio.get_event_loop()
fut = asyncio.async(wait(10))
fut.add_done_callback(lambda x: print('Done'))
asyncio.Task.all_tasks()
I get the following result:
{<Task pending coro=<coro() running at /usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/coroutines.py:139> cb=[<lambda>() at <ipython-input-5-c72c2da2ffa4>:1]>}
Now if I run fut.cancel() I get True returned. But typing fut returns a representation of the task stating it is cancelling:
<Task cancelling coro=<coro() running at /usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/coroutines.py:139> cb=[<lambda>() at <ipython-input-5-c72c2da2ffa4>:1]>
And the task never actually cancels (fut.cancelled() never returns True)
Why won't it cancel?

Calling task.cancel() only schedules the task to be cancelled on the next run of the event loop; it doesn't immediately cancel the task, or even guarantee that the task will be actually be cancelled when the event loop runs its next iteration. This is all described in the documentation:
cancel()
Request that this task cancel itself.
This arranges for a CancelledError to be thrown into the wrapped
coroutine on the next cycle through the event loop. The coroutine then
has a chance to clean up or even deny the request using
try/except/finally.
Unlike Future.cancel(), this does not guarantee that the task will be
cancelled: the exception might be caught and acted upon, delaying
cancellation of the task or preventing cancellation completely. The
task may also return a value or raise a different exception.
Immediately after this method is called, cancelled() will not return
True (unless the task was already cancelled). A task will be marked as
cancelled when the wrapped coroutine terminates with a CancelledError
exception (even if cancel() was not called).
In your case, you're never actually starting the event loop, so the task never gets cancelled. You would need to call loop.run_until_complete(fut) (or loop.run_forever(), though that's not really the best choice for this particular case) for the task to actually end up getting cancelled.
Also, for what it's worth, it's usually easier to test asyncio code using actual scripts, rather than the interpreter, since it tends to get tedious to have to constantly rewrite coroutines and start/stop the event loop.

With asyncio testing in the interpreter is tricky, because python needs to keep the event loop constantly polling its tasks.
So a few pieces of advice to test asyncio are:
Write and run scripts instead of using the interactive interpreter
Add a loop.run_forever() at the end of the script so all tasks get executed.
An alternative is to run loop.run_until_complete(coro()) for each task you want to run.
Have yield from in front of asyncio.sleep(n) so it can actually be run. The current code returns a generator and does nothing.

Related

Why does `await coro()` block but `await task` does not? [duplicate]

Disclaimer: this is my first time experimenting with the asyncio module.
I'm using asyncio.wait in the following manner to try to support a timeout feature waiting for all results from a set of async tasks. This is part of a larger library so I'm omitting some irrelevant code.
Note that the library already supports submitting tasks and using timeouts with ThreadPoolExecutors and ProcessPoolExecutors, so I'm not really interested in suggestions to use those instead or questions about why I'm doing this with asyncio. On to the code...
import asyncio
from contextlib import suppress
...
class AsyncIOSubmit(Node):
def get_results(self, futures, timeout=None):
loop = asyncio.get_event_loop()
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=timeout)
)
if timeout and unfinished:
# Code options in question would go here...see below.
raise asyncio.TimeoutError
At first I was not worrying about cancelling pending tasks on timeout, but then I got the warning Task was destroyed but it is pending! on program exit or loop.close. After researching a bit I found multiple ways to cancel tasks and wait for them to actually be cancelled:
Option 1:
[task.cancel() for task in unfinished]
for task in unfinished:
with suppress(asyncio.CancelledError):
loop.run_until_complete(task)
Option 2:
[task.cancel() for task in unfinished]
loop.run_until_complete(asyncio.wait(unfinished))
Option 3:
# Not really an option for me, since I'm not in an `async` method
# and don't want to make get_results an async method.
[task.cancel() for task in unfinished]
for task in unfinished:
await task
Option 4:
Some sort of while loop like in this answer. Seems like my other options are better but including for completeness.
Options 1 and 2 both seem to work fine so far. Either option may be "right", but with asyncio evolving over the years the examples and suggestions around the net are either outdated or vary quite a bit. So my questions are...
Question 1
Are there any practical differences between Options 1 and 2? I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete. I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
Question 2
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately? Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not? Maybe that should never happen with libraries designed for asyncio?
Since I'm trying to implement a timeout feature here I'm somewhat sensitive to this. If it's possible these things could take a long time to cancel I'd consider calling cancel and not waiting for it to actually happen, or setting a very short timeout to wait for the cancels to finish.
Question 3
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout? If so I'd obviously have to adjust my logic a bit, but from the docs it seems like that is not possible.
Are there any practical differences between Options 1 and 2?
No. Option 2 looks nicer and might be marginally more efficient, but their net effect is the same.
I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete.
It seems that way at first, but it's not actually the case because loop.run_until_complete runs all tasks submitted to the loop, not just the one passed as argument. It merely stops once the provided awaitable completes - that is what "run until complete" refers to. A loop calling run_until_complete over already scheduled tasks is like the following async code:
ts = [asyncio.create_task(asyncio.sleep(i)) for i in range(1, 11)]
# takes 10s, not 55s
for t in ts:
await t
which is in turn semantically equivalent to the following threaded code:
ts = []
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
ts.append(t)
# takes 10s, not 55s
for t in ts:
t.join()
In other words, await t and run_until_complete(t) block until t has completed, but allow everything else - such as tasks previously scheduled using asyncio.create_task() to run during that time as well. So the total run time will equal the run time of the longest task, not of their sum. For example, if the first task happens to take a long time, all others will have finished in the meantime, and their awaits won't sleep at all.
All this only applies to awaiting tasks that have been previously scheduled. If you try to apply that to coroutines, it won't work:
# runs for 55s, as expected
for i in range(1, 11):
await asyncio.sleep(i)
# also 55s - we didn't call create_task() so it's equivalent to the above
ts = [asyncio.sleep(i) for i in range(1, 11)]
for t in ts:
await t
# also 55s
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
t.join()
This is often a sticking point for asyncio beginners, who write code equivalent to that last asyncio example and expect it to run in parallel.
I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
asyncio.wait is just a convenience API that does two things:
converts the input arguments to something that implements Future. For coroutines that means that it submits them to the event loop, as if with create_task, which allows them to run independently. If you give it tasks to begin with, as you do, this step is skipped.
uses add_done_callback to be notified when the futures are done, at which point it resumes its caller.
So yes, it does the same things, but with a different implementation because it supports many more features.
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately?
In asyncio there shouldn't be "blocking" operations, only those that suspend, and they should be cancelled immediately. The exception to this is blocking code tacked onto asyncio with run_in_executor, where the underlying operation won't cancel at all, but the asyncio coroutine will immediately get the exception.
Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not?
The library doesn't raise CancelledError, it receives it at the await point where it happened to suspend before cancellation occurred. For the library the effect of the cancellation is await ... interrupting its wait and immediately raising CancelledError. Unless caught, the exception will propagate through function and await calls all the way to the top-level coroutine, whose raising CancelledError marks the whole task as cancelled. Well-behaved asyncio code will do just that, possibly using finally to release OS-level resources they hold. When CancelledError is caught, the code can choose not to re-raise it, in which case cancellation is effectively ignored.
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout?
If you're using return_when=asyncio.ALL_COMPLETE (the default), that shouldn't be possible. It is quite possible with return_when=FIRST_COMPLETED, then it is obviously possible independently of timeout.

Handling Timeouts with asyncio

Disclaimer: this is my first time experimenting with the asyncio module.
I'm using asyncio.wait in the following manner to try to support a timeout feature waiting for all results from a set of async tasks. This is part of a larger library so I'm omitting some irrelevant code.
Note that the library already supports submitting tasks and using timeouts with ThreadPoolExecutors and ProcessPoolExecutors, so I'm not really interested in suggestions to use those instead or questions about why I'm doing this with asyncio. On to the code...
import asyncio
from contextlib import suppress
...
class AsyncIOSubmit(Node):
def get_results(self, futures, timeout=None):
loop = asyncio.get_event_loop()
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=timeout)
)
if timeout and unfinished:
# Code options in question would go here...see below.
raise asyncio.TimeoutError
At first I was not worrying about cancelling pending tasks on timeout, but then I got the warning Task was destroyed but it is pending! on program exit or loop.close. After researching a bit I found multiple ways to cancel tasks and wait for them to actually be cancelled:
Option 1:
[task.cancel() for task in unfinished]
for task in unfinished:
with suppress(asyncio.CancelledError):
loop.run_until_complete(task)
Option 2:
[task.cancel() for task in unfinished]
loop.run_until_complete(asyncio.wait(unfinished))
Option 3:
# Not really an option for me, since I'm not in an `async` method
# and don't want to make get_results an async method.
[task.cancel() for task in unfinished]
for task in unfinished:
await task
Option 4:
Some sort of while loop like in this answer. Seems like my other options are better but including for completeness.
Options 1 and 2 both seem to work fine so far. Either option may be "right", but with asyncio evolving over the years the examples and suggestions around the net are either outdated or vary quite a bit. So my questions are...
Question 1
Are there any practical differences between Options 1 and 2? I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete. I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
Question 2
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately? Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not? Maybe that should never happen with libraries designed for asyncio?
Since I'm trying to implement a timeout feature here I'm somewhat sensitive to this. If it's possible these things could take a long time to cancel I'd consider calling cancel and not waiting for it to actually happen, or setting a very short timeout to wait for the cancels to finish.
Question 3
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout? If so I'd obviously have to adjust my logic a bit, but from the docs it seems like that is not possible.
Are there any practical differences between Options 1 and 2?
No. Option 2 looks nicer and might be marginally more efficient, but their net effect is the same.
I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete.
It seems that way at first, but it's not actually the case because loop.run_until_complete runs all tasks submitted to the loop, not just the one passed as argument. It merely stops once the provided awaitable completes - that is what "run until complete" refers to. A loop calling run_until_complete over already scheduled tasks is like the following async code:
ts = [asyncio.create_task(asyncio.sleep(i)) for i in range(1, 11)]
# takes 10s, not 55s
for t in ts:
await t
which is in turn semantically equivalent to the following threaded code:
ts = []
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
ts.append(t)
# takes 10s, not 55s
for t in ts:
t.join()
In other words, await t and run_until_complete(t) block until t has completed, but allow everything else - such as tasks previously scheduled using asyncio.create_task() to run during that time as well. So the total run time will equal the run time of the longest task, not of their sum. For example, if the first task happens to take a long time, all others will have finished in the meantime, and their awaits won't sleep at all.
All this only applies to awaiting tasks that have been previously scheduled. If you try to apply that to coroutines, it won't work:
# runs for 55s, as expected
for i in range(1, 11):
await asyncio.sleep(i)
# also 55s - we didn't call create_task() so it's equivalent to the above
ts = [asyncio.sleep(i) for i in range(1, 11)]
for t in ts:
await t
# also 55s
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
t.join()
This is often a sticking point for asyncio beginners, who write code equivalent to that last asyncio example and expect it to run in parallel.
I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
asyncio.wait is just a convenience API that does two things:
converts the input arguments to something that implements Future. For coroutines that means that it submits them to the event loop, as if with create_task, which allows them to run independently. If you give it tasks to begin with, as you do, this step is skipped.
uses add_done_callback to be notified when the futures are done, at which point it resumes its caller.
So yes, it does the same things, but with a different implementation because it supports many more features.
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately?
In asyncio there shouldn't be "blocking" operations, only those that suspend, and they should be cancelled immediately. The exception to this is blocking code tacked onto asyncio with run_in_executor, where the underlying operation won't cancel at all, but the asyncio coroutine will immediately get the exception.
Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not?
The library doesn't raise CancelledError, it receives it at the await point where it happened to suspend before cancellation occurred. For the library the effect of the cancellation is await ... interrupting its wait and immediately raising CancelledError. Unless caught, the exception will propagate through function and await calls all the way to the top-level coroutine, whose raising CancelledError marks the whole task as cancelled. Well-behaved asyncio code will do just that, possibly using finally to release OS-level resources they hold. When CancelledError is caught, the code can choose not to re-raise it, in which case cancellation is effectively ignored.
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout?
If you're using return_when=asyncio.ALL_COMPLETE (the default), that shouldn't be possible. It is quite possible with return_when=FIRST_COMPLETED, then it is obviously possible independently of timeout.

When will/won't Python suspend execution of a coroutine?

When I run it on cpython 3.6, the following program prints hello world a single time and then spins forever.
As a side note, uncommenting the await asyncio.sleep(0) line causes it to print hello world every second, which is understandable.
import asyncio
async def do_nothing():
# await asyncio.sleep(0)
pass
async def hog_the_event_loop():
while True:
await do_nothing()
async def timer_print():
while True:
print("hello world")
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.create_task(timer_print())
loop.create_task(hog_the_event_loop())
loop.run_forever()
This behavior (printing hello world a single time) makes sense to me, because hog_the_event_loop never blocks and therefore has no need to suspend execution. Can I rely on this behavior? When the line await do_nothing() runs, is it possible that rather than entering the do_nothing() coroutine, execution will actually suspend and resume timer_print(), causing the program to print hello world a second time?
Put more generally: When will python suspend execution of a coroutine and switch to another one? Is it potentially on any use of the await keyword? or is it only in cases where this results in an underlying select call (such as I/O, sleep timers, etc)?
Additional Clarification
I understand that if hog_the_event_loop looked like this, it would certainly never yield execution to another coroutine:
async def hog_the_event_loop():
while True:
pass
I'm trying to specifically get at the question of whether await do_nothing() is any different than the above.
This behavior (printing hello world a single time) makes sense to me, because hog_the_event_loop never blocks and therefore has no need to suspend execution. Can I rely on this behavior?
Yes. This behavior directly follows from how await is both specified and implemented by the language. Changing it to suspend without the awaitable object having itself suspended would certainly be a breaking change, both for asyncio and other Python libraries based in async/await.
Put more generally: When will python suspend execution of a coroutine and switch to another one? Is it potentially on any use of the await keyword?
From the caller's perspective, any await can potentially suspend, at the discretion of the awaited object, also known as an awaitable. So the final decision of whether a particular await will suspend is on the awaitable (or on awaitables it awaits itself if it's a coroutine, and so on). Awaiting an awaitable that doesn't choose to suspend is the same as running ordinary Python code - it won't pass control to the event loop.
The leaf awaitables that actually suspend are typically related to IO readiness or timeout events, but not always. For example, awaiting a queue item will suspend if the queue is empty, and awaiting run_in_executor will suspend until the function running in the other thread completes.
It is worth mentioning that asyncio.sleep is explicitly guaranteed to suspend execution and defer to the event loop, even when the specified delay is 0 (in which case it will immediately resume on the next event loop pass).
No, await do_nothing() will never suspend. await propagates suspension from an awaitable by suspending the surrounding coroutine in response. But when the awaitable is already ready, there’s nothing to wait for and execution continues from the await (with a return value, in general). A nother way of thinking about “nothing to wait for” is that the event loop literally has no object on which to base the timing of resuming from a notional suspension; even “resume as soon as other pending tasks suspend” is a schedule that would have to be expressed as some object (e.g., sleep(0)).
A coroutine that does nothing is always ready, just like one that sleeps is ready after the time elapses. Put differently, a coroutine that just sleeps N times suspends N times—even if N is 0.

How to abort created coroutine python

How do I abort a created coroutine instance? My use case is a function taking in a coroutine and conditionally creating a tracked task for it (so that all of them could be gathered later).
Ignoring it (not awaiting it directly or indirectly) works, but gives a RuntimeWarning. I'd be OK with a solution that suppresses this warning.
Creating a task then immediately cancelling it starts its execution and the cancellation leads to other issues so it's not workable.
Checking the condition before the coroutines are created works, but damages maintainability since the same check for condition has to be in all the places before this function is called.
Use the .close() method:
async def foo():
pass
x = foo()
x.close()
Program finishes fine, no warnings.

asyncio: Why is awaiting a cancelled Future not showing CancelledError?

Given the following program:
import asyncio
async def coro():
future = asyncio.Future()
future.cancel()
print(future) # <Future cancelled>
await future # no output
loop = asyncio.get_event_loop()
loop.create_task(coro())
loop.run_forever()
Why is the CancelledError thrown by await future not shown? Explicitly wrapping await future in try/except shows that it occurs. Other unhandled errors are shown with Task exception was never retrieved, like this:
async def coro2():
raise Exception('foo')
loop.create_task(coro2())
Why is this not the case for awaiting cancelled Futures?
Additional question: What happens internally if a coroutine awaits a cancelled Future? Does it wait forever? Do I have to do any "cleanup" work?
Why is the CancelledError thrown by await future not shown?
The exception is not shown because you're never actually retrieving the result of coro. If you retrieved it in any way, e.g. by calling result() method on the task or just by awaiting it, you would get the expected error at the point where you retrieve it. The easiest way to observe the resulting traceback is by changing run_forever() to run_until_complete(coro()).
What happens internally if a coroutine awaits a cancelled Future? Does it wait forever? Do I have to do any "cleanup" work?
It doesn't wait forever, it receives the CancelledError at the point of await. You have discovered this already by adding a try/except around await future. The cleanup you need to do is the same like for any other exception - either nothing at all, or using with and finally to make sure that the resources you acquired get released in case of an exit.
Other unhandled errors are shown with Task exception was never retrieved [...] Why is this not the case for awaiting cancelled Futures?
Because Future.cancel intentionally disables the logging of traceback. This is to avoid spewing the output with "exception was never retrieved" whenever a future is canceled. Since CancelledError is normally injected from the outside and can happen at (almost) any moment, very little value is derived from retrieving it.
If it sounds strange to show the traceback in case of one exception but not in case of another, be aware that the tracebacks of failed tasks are displayed on a best-effort basis to begin with. Tasks created with create_task and not awaited effectively run "in the background", much like a thread that is not join()ed. But unlike threads, coroutines have the concept of a "result", either an object returned from the coroutine or an exception raised by it. The return value of the coroutine is provided by the result of its task. When the coroutine exits with an exception, the result encapsulates the exception, to be automatically raised when the result is retrieved. This is why Python cannot immediately print the traceback like it does when a thread terminates due to an unhandled exception - it has to wait for someone to actually retrieve the result. It is only when a Future whose result holds an exception is about to get garbage-collected that Python can tell that the result would never be retrieved. It then displays the warning and the traceback to avoid the exception passing silently.

Categories

Resources