How to abort created coroutine python - python

How do I abort a created coroutine instance? My use case is a function taking in a coroutine and conditionally creating a tracked task for it (so that all of them could be gathered later).
Ignoring it (not awaiting it directly or indirectly) works, but gives a RuntimeWarning. I'd be OK with a solution that suppresses this warning.
Creating a task then immediately cancelling it starts its execution and the cancellation leads to other issues so it's not workable.
Checking the condition before the coroutines are created works, but damages maintainability since the same check for condition has to be in all the places before this function is called.

Use the .close() method:
async def foo():
pass
x = foo()
x.close()
Program finishes fine, no warnings.

Related

Why does `await coro()` block but `await task` does not? [duplicate]

Disclaimer: this is my first time experimenting with the asyncio module.
I'm using asyncio.wait in the following manner to try to support a timeout feature waiting for all results from a set of async tasks. This is part of a larger library so I'm omitting some irrelevant code.
Note that the library already supports submitting tasks and using timeouts with ThreadPoolExecutors and ProcessPoolExecutors, so I'm not really interested in suggestions to use those instead or questions about why I'm doing this with asyncio. On to the code...
import asyncio
from contextlib import suppress
...
class AsyncIOSubmit(Node):
def get_results(self, futures, timeout=None):
loop = asyncio.get_event_loop()
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=timeout)
)
if timeout and unfinished:
# Code options in question would go here...see below.
raise asyncio.TimeoutError
At first I was not worrying about cancelling pending tasks on timeout, but then I got the warning Task was destroyed but it is pending! on program exit or loop.close. After researching a bit I found multiple ways to cancel tasks and wait for them to actually be cancelled:
Option 1:
[task.cancel() for task in unfinished]
for task in unfinished:
with suppress(asyncio.CancelledError):
loop.run_until_complete(task)
Option 2:
[task.cancel() for task in unfinished]
loop.run_until_complete(asyncio.wait(unfinished))
Option 3:
# Not really an option for me, since I'm not in an `async` method
# and don't want to make get_results an async method.
[task.cancel() for task in unfinished]
for task in unfinished:
await task
Option 4:
Some sort of while loop like in this answer. Seems like my other options are better but including for completeness.
Options 1 and 2 both seem to work fine so far. Either option may be "right", but with asyncio evolving over the years the examples and suggestions around the net are either outdated or vary quite a bit. So my questions are...
Question 1
Are there any practical differences between Options 1 and 2? I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete. I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
Question 2
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately? Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not? Maybe that should never happen with libraries designed for asyncio?
Since I'm trying to implement a timeout feature here I'm somewhat sensitive to this. If it's possible these things could take a long time to cancel I'd consider calling cancel and not waiting for it to actually happen, or setting a very short timeout to wait for the cancels to finish.
Question 3
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout? If so I'd obviously have to adjust my logic a bit, but from the docs it seems like that is not possible.
Are there any practical differences between Options 1 and 2?
No. Option 2 looks nicer and might be marginally more efficient, but their net effect is the same.
I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete.
It seems that way at first, but it's not actually the case because loop.run_until_complete runs all tasks submitted to the loop, not just the one passed as argument. It merely stops once the provided awaitable completes - that is what "run until complete" refers to. A loop calling run_until_complete over already scheduled tasks is like the following async code:
ts = [asyncio.create_task(asyncio.sleep(i)) for i in range(1, 11)]
# takes 10s, not 55s
for t in ts:
await t
which is in turn semantically equivalent to the following threaded code:
ts = []
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
ts.append(t)
# takes 10s, not 55s
for t in ts:
t.join()
In other words, await t and run_until_complete(t) block until t has completed, but allow everything else - such as tasks previously scheduled using asyncio.create_task() to run during that time as well. So the total run time will equal the run time of the longest task, not of their sum. For example, if the first task happens to take a long time, all others will have finished in the meantime, and their awaits won't sleep at all.
All this only applies to awaiting tasks that have been previously scheduled. If you try to apply that to coroutines, it won't work:
# runs for 55s, as expected
for i in range(1, 11):
await asyncio.sleep(i)
# also 55s - we didn't call create_task() so it's equivalent to the above
ts = [asyncio.sleep(i) for i in range(1, 11)]
for t in ts:
await t
# also 55s
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
t.join()
This is often a sticking point for asyncio beginners, who write code equivalent to that last asyncio example and expect it to run in parallel.
I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
asyncio.wait is just a convenience API that does two things:
converts the input arguments to something that implements Future. For coroutines that means that it submits them to the event loop, as if with create_task, which allows them to run independently. If you give it tasks to begin with, as you do, this step is skipped.
uses add_done_callback to be notified when the futures are done, at which point it resumes its caller.
So yes, it does the same things, but with a different implementation because it supports many more features.
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately?
In asyncio there shouldn't be "blocking" operations, only those that suspend, and they should be cancelled immediately. The exception to this is blocking code tacked onto asyncio with run_in_executor, where the underlying operation won't cancel at all, but the asyncio coroutine will immediately get the exception.
Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not?
The library doesn't raise CancelledError, it receives it at the await point where it happened to suspend before cancellation occurred. For the library the effect of the cancellation is await ... interrupting its wait and immediately raising CancelledError. Unless caught, the exception will propagate through function and await calls all the way to the top-level coroutine, whose raising CancelledError marks the whole task as cancelled. Well-behaved asyncio code will do just that, possibly using finally to release OS-level resources they hold. When CancelledError is caught, the code can choose not to re-raise it, in which case cancellation is effectively ignored.
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout?
If you're using return_when=asyncio.ALL_COMPLETE (the default), that shouldn't be possible. It is quite possible with return_when=FIRST_COMPLETED, then it is obviously possible independently of timeout.

Handling Timeouts with asyncio

Disclaimer: this is my first time experimenting with the asyncio module.
I'm using asyncio.wait in the following manner to try to support a timeout feature waiting for all results from a set of async tasks. This is part of a larger library so I'm omitting some irrelevant code.
Note that the library already supports submitting tasks and using timeouts with ThreadPoolExecutors and ProcessPoolExecutors, so I'm not really interested in suggestions to use those instead or questions about why I'm doing this with asyncio. On to the code...
import asyncio
from contextlib import suppress
...
class AsyncIOSubmit(Node):
def get_results(self, futures, timeout=None):
loop = asyncio.get_event_loop()
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=timeout)
)
if timeout and unfinished:
# Code options in question would go here...see below.
raise asyncio.TimeoutError
At first I was not worrying about cancelling pending tasks on timeout, but then I got the warning Task was destroyed but it is pending! on program exit or loop.close. After researching a bit I found multiple ways to cancel tasks and wait for them to actually be cancelled:
Option 1:
[task.cancel() for task in unfinished]
for task in unfinished:
with suppress(asyncio.CancelledError):
loop.run_until_complete(task)
Option 2:
[task.cancel() for task in unfinished]
loop.run_until_complete(asyncio.wait(unfinished))
Option 3:
# Not really an option for me, since I'm not in an `async` method
# and don't want to make get_results an async method.
[task.cancel() for task in unfinished]
for task in unfinished:
await task
Option 4:
Some sort of while loop like in this answer. Seems like my other options are better but including for completeness.
Options 1 and 2 both seem to work fine so far. Either option may be "right", but with asyncio evolving over the years the examples and suggestions around the net are either outdated or vary quite a bit. So my questions are...
Question 1
Are there any practical differences between Options 1 and 2? I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete. I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
Question 2
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately? Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not? Maybe that should never happen with libraries designed for asyncio?
Since I'm trying to implement a timeout feature here I'm somewhat sensitive to this. If it's possible these things could take a long time to cancel I'd consider calling cancel and not waiting for it to actually happen, or setting a very short timeout to wait for the cancels to finish.
Question 3
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout? If so I'd obviously have to adjust my logic a bit, but from the docs it seems like that is not possible.
Are there any practical differences between Options 1 and 2?
No. Option 2 looks nicer and might be marginally more efficient, but their net effect is the same.
I know run_until_complete will run until the future has completed, so since Option 1 is looping in a specific order I suppose it could behave differently if earlier tasks take longer to actually complete.
It seems that way at first, but it's not actually the case because loop.run_until_complete runs all tasks submitted to the loop, not just the one passed as argument. It merely stops once the provided awaitable completes - that is what "run until complete" refers to. A loop calling run_until_complete over already scheduled tasks is like the following async code:
ts = [asyncio.create_task(asyncio.sleep(i)) for i in range(1, 11)]
# takes 10s, not 55s
for t in ts:
await t
which is in turn semantically equivalent to the following threaded code:
ts = []
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
ts.append(t)
# takes 10s, not 55s
for t in ts:
t.join()
In other words, await t and run_until_complete(t) block until t has completed, but allow everything else - such as tasks previously scheduled using asyncio.create_task() to run during that time as well. So the total run time will equal the run time of the longest task, not of their sum. For example, if the first task happens to take a long time, all others will have finished in the meantime, and their awaits won't sleep at all.
All this only applies to awaiting tasks that have been previously scheduled. If you try to apply that to coroutines, it won't work:
# runs for 55s, as expected
for i in range(1, 11):
await asyncio.sleep(i)
# also 55s - we didn't call create_task() so it's equivalent to the above
ts = [asyncio.sleep(i) for i in range(1, 11)]
for t in ts:
await t
# also 55s
for i in range(1, 11):
t = threading.Thread(target=time.sleep, args=(i,))
t.start()
t.join()
This is often a sticking point for asyncio beginners, who write code equivalent to that last asyncio example and expect it to run in parallel.
I tried looking at the asyncio source code to understand if asyncio.wait just effectively does the same thing with its tasks/futures under the hood, but it wasn't obvious.
asyncio.wait is just a convenience API that does two things:
converts the input arguments to something that implements Future. For coroutines that means that it submits them to the event loop, as if with create_task, which allows them to run independently. If you give it tasks to begin with, as you do, this step is skipped.
uses add_done_callback to be notified when the futures are done, at which point it resumes its caller.
So yes, it does the same things, but with a different implementation because it supports many more features.
I assume if one of the tasks is in the middle of a long-running blocking operation it may not actually cancel immediately?
In asyncio there shouldn't be "blocking" operations, only those that suspend, and they should be cancelled immediately. The exception to this is blocking code tacked onto asyncio with run_in_executor, where the underlying operation won't cancel at all, but the asyncio coroutine will immediately get the exception.
Perhaps that just depends on if the underlying operation or library being used will raise the CancelledError right away or not?
The library doesn't raise CancelledError, it receives it at the await point where it happened to suspend before cancellation occurred. For the library the effect of the cancellation is await ... interrupting its wait and immediately raising CancelledError. Unless caught, the exception will propagate through function and await calls all the way to the top-level coroutine, whose raising CancelledError marks the whole task as cancelled. Well-behaved asyncio code will do just that, possibly using finally to release OS-level resources they hold. When CancelledError is caught, the code can choose not to re-raise it, in which case cancellation is effectively ignored.
Is it possible loop.run_until_complete (or really, the underlying call to async.wait) returns values in unfinished for a reason other than a timeout?
If you're using return_when=asyncio.ALL_COMPLETE (the default), that shouldn't be possible. It is quite possible with return_when=FIRST_COMPLETED, then it is obviously possible independently of timeout.

Why is it that only asynchronous functions can yield in asynchronous code?

In the article "I'm not feeling the async pressure" Armin Ronacher makes the following observation:
In threaded code any function can yield. In async code only async functions can. This means for instance that the writer.write method cannot block.
This observation is made with reference to the following code sample:
from asyncio import start_server, run
async def on_client_connected(reader, writer):
while True:
data = await reader.readline()
if not data:
break
writer.write(data)
async def server():
srv = await start_server(on_client_connected, '127.0.0.1', 8888)
async with srv:
await srv.serve_forever()
run(server())
I do not understand this comment. Specifically:
How come synchronous functions cannot yield when inside of asynchronous functions?
What does yield have to do with blocking execution? Why is it that a function that cannot yield, cannot block?
Going line-by-line:
In threaded code any function can yield.
Programs running on a machine are organized in terms of processes. Each process may have one or more threads. Threads, like processes, are scheduled by (and interruptible by) the operating system. The word "yield" in this context means "letting other code run". When work is split between multiple threads, functions "yield" easily: the operating system suspends the code running in one thread, runs some code in a different thread, suspends that, comes back, and works some more on the first thread, and so on. By switching between threads in this way, concurrency is achieved.
In this execution model, whether the code being suspended is synchronous or asynchronous does not matter. The code within the thread is being run line-by-line, so the fundamental assumption of a synchronous function---that no changes occurred in between running one line of code and the next---is not violated.
In async code only async functions can.
"Async code" in this context means a single-threaded application that does the same work as the multi-threaded application, except that it achieves concurrency by using asynchronous functions within a thread, instead of splitting the work between different threads. In this execution model, your interpreter, not the operating system, is responsible for switching between functions as needed to achieve concurrency.
In this execution model, it is unsafe for work to be suspended in the middle of a synchronous function that's located inside of an asynchronous function. Doing so would mean running some other code in the middle of running your synchronous function, breaking the "line-by-line" assumption made by the synchronous function.
As a result, the interpreter will wait only suspend the execution of an asynchronous function in between synchronous sub-functions, never within one. This is what is meant by the statement that synchronous functions in async code cannot yield: once a synchronous function starts running, it must complete.
This means for instance that the writer.write method cannot block.
The writer.write method is synchronous, and hence, when run in an async program, uninterruptible. If this method were to block, it would block not just the asynchronous function it is running inside of, but the entire program. That would be bad. writer.write avoids blocking the program by writing to a write buffer instead and returning immediately.
Strictly speaking, writer.write can block, it's just inadvisable to do so.
If you need to block inside of an async function, the proper way to do so is to await another async function. This is what e.g. await writer.drain() does. This will block asynchronously: while this specific function remains blocked, it will correctly yield to other functions that can run.
“Yield” here refers to cooperative multitasking (albeit within a process rather than among them). In the context of the async/await style of Python programming, asynchronous functions are defined in terms of Python’s pre-existing generator support: if a function blocks (typically for I/O), all its callers that are performing awaits suspend (with an invisible yield/yield from that is indeed of the generator variety). The actual call for any generator is to its next method; that function actually returns.
Every caller, up to some sort of driver that most programmers never write, must participate for this approach to work: any function that did not suspend would suddenly have the responsibility of the driver of deciding what to do next while waiting on the function it called to complete. This “infectious” aspect of asynchronicity has been called a “color”; it can be problematic, as for example when people forget to await a coroutine call that looks correct because it looks like any other call. (The async/await syntax exists to minimize the disruption of the program’s structure from the concurrency by implicitly converting functions into state machines, but this ambiguity remains.) It can also be a good thing: an asynchronous function can be interrupted exactly when it awaits, so it’s straightforward to reason about the consistency of data structures.
A synchronous function therefore cannot yield simply as a matter of definition. The import of the restriction is rather that a function called with a normal (synchronous) call cannot yield: its caller is not prepared to handle such an interaction. (What will happen if it does anyway is of course the same “forgotten await”.) This also affects refactoring: a function cannot be changed to be asynchronous without changing all its clients (and making them asynchronous as well if they are not already). (This is similar to how all I/O works in Haskell, since it affects the type of any function that performs any.)
Note that yield is allowed in its role as a normal generator used with an ordinary for even in an asynchronous function, but that’s just the general fact that the caller must expect the same protocol as the callee: if an enhanced generator (an “old-style” coroutine) is used with for, it just gets None from every (yield), and if an async function is used with for, it produces awaitables that probably break when they are sent None.
The distinction with threading, or with so-called stackful coroutines or fibers, is that no special resumption support is needed from the caller because the actual function call simply doesn’t return until the thread/fiber is resumed. (In the thread case, the kernel also chooses when to resume it.) In that sense, these approaches are easier to use, but with fibers the ability to “sneak” a pause into any function is partially compromised by the need to specify arguments to that function to tell it about the userspace scheduler with which to register itself (unless you’re willing to use global variables for that…). Threads, on the other hand, have even higher overhead than fibers, which matters when great numbers of them are running.

Unit-testing a periodic coroutine with mock time

I'm using Tornado as a coroutine engine for a periodic process, where the repeating coroutine calls ioloop.call_later() on itself at the end of each execution. I'm now trying to drive this with unit tests (using Tornado's gen.test) where I'm mocking the ioloop's time with a local variable t:
DUT.ioloop.time = mock.Mock(side_effect= lambda: t)
(DUT <==> Device Under Test)
Then in the test, I manually increment t, and yield gen.moment to kick the ioloop. The idea is to trigger the repeating coroutine after various intervals so I can verify its behaviour.
But the coroutine doesn't always trigger - or perhaps it yields back to the testing code before completing execution, causing failures.
I think should be using stop() and wait() to synchronise the test code, but I can't see concretely how to use them in this situation. And how does this whole testing strategy work if the DUT runs in its own ioloop?
In general, using yield gen.moment to trigger specific events is dicey; there are no guarantees about how many "moments" you must wait, or in what order the triggered events occur. It's better to make sure that the function being tested has some effect that can be asynchronously waited for (if it doesn't have such an effect naturally, you can use a tornado.locks.Condition).
There are also subtleties to patching IOLoop.time. I think it will work with the default Tornado IOLoops (where it is possible without the use of mock: pass a time_func argument when constructing the loop), but it won't have the desired effect with e.g. AsyncIOLoop.
I don't think you want to use AsyncTestCase.stop and .wait, but it's not clear how your test is set up.

Why does this asyncio.Task never finish cancelling?

If I run this on the python3 interpreter:
import asyncio
#asyncio.coroutine
def wait(n):
asyncio.sleep(n)
loop = asyncio.get_event_loop()
fut = asyncio.async(wait(10))
fut.add_done_callback(lambda x: print('Done'))
asyncio.Task.all_tasks()
I get the following result:
{<Task pending coro=<coro() running at /usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/coroutines.py:139> cb=[<lambda>() at <ipython-input-5-c72c2da2ffa4>:1]>}
Now if I run fut.cancel() I get True returned. But typing fut returns a representation of the task stating it is cancelling:
<Task cancelling coro=<coro() running at /usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/asyncio/coroutines.py:139> cb=[<lambda>() at <ipython-input-5-c72c2da2ffa4>:1]>
And the task never actually cancels (fut.cancelled() never returns True)
Why won't it cancel?
Calling task.cancel() only schedules the task to be cancelled on the next run of the event loop; it doesn't immediately cancel the task, or even guarantee that the task will be actually be cancelled when the event loop runs its next iteration. This is all described in the documentation:
cancel()
Request that this task cancel itself.
This arranges for a CancelledError to be thrown into the wrapped
coroutine on the next cycle through the event loop. The coroutine then
has a chance to clean up or even deny the request using
try/except/finally.
Unlike Future.cancel(), this does not guarantee that the task will be
cancelled: the exception might be caught and acted upon, delaying
cancellation of the task or preventing cancellation completely. The
task may also return a value or raise a different exception.
Immediately after this method is called, cancelled() will not return
True (unless the task was already cancelled). A task will be marked as
cancelled when the wrapped coroutine terminates with a CancelledError
exception (even if cancel() was not called).
In your case, you're never actually starting the event loop, so the task never gets cancelled. You would need to call loop.run_until_complete(fut) (or loop.run_forever(), though that's not really the best choice for this particular case) for the task to actually end up getting cancelled.
Also, for what it's worth, it's usually easier to test asyncio code using actual scripts, rather than the interpreter, since it tends to get tedious to have to constantly rewrite coroutines and start/stop the event loop.
With asyncio testing in the interpreter is tricky, because python needs to keep the event loop constantly polling its tasks.
So a few pieces of advice to test asyncio are:
Write and run scripts instead of using the interactive interpreter
Add a loop.run_forever() at the end of the script so all tasks get executed.
An alternative is to run loop.run_until_complete(coro()) for each task you want to run.
Have yield from in front of asyncio.sleep(n) so it can actually be run. The current code returns a generator and does nothing.

Categories

Resources