How to cancel all remaining tasks in gather if one fails?

How to cancel all remaining tasks in gather if one fails? - python

In case one task of gather raises an exception, the others are still allowed to continue.
Well, that's not exactly what I need. I want to distinguish between errors that are fatal and need to cancel all remaining tasks, and errors that are not and instead should be logged while allowing other tasks to continue.
Here is my failed attempt to implement this:
from asyncio import gather, get_event_loop, sleep
class ErrorThatShouldCancelOtherTasks(Exception):
pass
async def my_sleep(secs):
await sleep(secs)
if secs == 5:
raise ErrorThatShouldCancelOtherTasks('5 is forbidden!')
print(f'Slept for {secs}secs.')
async def main():
try:
sleepers = gather(*[my_sleep(secs) for secs in [2, 5, 7]])
await sleepers
except ErrorThatShouldCancelOtherTasks:
print('Fatal error; cancelling')
sleepers.cancel()
finally:
await sleep(5)
get_event_loop().run_until_complete(main())
(the finally await sleep here is to prevent the interpreter from closing immediately, which would on its own cancel all tasks)
Oddly, calling cancel on the gather does not actually cancel it!
PS C:\Users\m> .\AppData\Local\Programs\Python\Python368\python.exe .\wtf.py
Slept for 2secs.
Fatal error; cancelling
Slept for 7secs.
I am very surprised by this behavior since it seems to be contradictory to the documentation, which states:
asyncio.gather(*coros_or_futures, loop=None, return_exceptions=False)
Return a future aggregating results from the given coroutine objects or futures.
(...)
Cancellation: if the outer Future is cancelled, all children (that have not completed yet) are also cancelled. (...)
What am I missing here? How to cancel the remaining tasks?

The problem with your implementation is that it calls sleepers.cancel() after sleepers has already raised. Technically the future returned by gather() is in a completed state, so its cancellation must be no-op.
To correct the code, you just need to cancel the children yourself instead of trusting gather's future to do it. Of course, coroutines are not themselves cancelable, so you need to convert them to tasks first (which gather would do anyway, so you're doing no extra work). For example:
async def main():
tasks = [asyncio.ensure_future(my_sleep(secs))
for secs in [2, 5, 7]]
try:
await asyncio.gather(*tasks)
except ErrorThatShouldCancelOtherTasks:
print('Fatal error; cancelling')
for t in tasks:
t.cancel()
finally:
await sleep(5)
I am very surprised by this behavior since it seems to be contradictory to the documentation[...]
The initial stumbling block with gather is that it doesn't really run tasks, it's just a helper to wait for them to finish. For this reason gather doesn't bother to cancel the remaining tasks if some of them fails with an exception - it just abandons the wait and propagates the exception, leaving the remaining tasks to proceed in the background. This was reported as a bug, but wasn't fixed for backward compatibility and because the behavior is documented and unchanged from the beginning. But here we have another wart: the documentation explicitly promises being able to cancel the returned future. Your code does exactly that and that doesn't work, without it being obvious why (at least it took me a while to figure it out, and required reading the source). It turns out that the contract of Future actually prevents this from working. By the time you call cancel(), the future returned by gather has already completed, and cancelling a completed future is meaningless, it is just no-op. (The reason is that a completed future has a well-defined result that could have been observed by outside code. Cancelling it would change its result, which is not allowed.)
In other words, the documentation is not wrong, because canceling would have worked if you had performed it prior to await sleepers having completed. However, it's misleading, because it appears to allow canceling gather() in this important use case of one of its awaitable raising, but in reality doesn't.
Problems like this that pop up when using gather are reason why many people eagerly await (no pun intended) trio-style nurseries in asyncio.

You can create your own custom gather-function
This cancels all its children when any exception occurs:
import asyncio
async def gather(*tasks, **kwargs):
tasks = [ task if isinstance(task, asyncio.Task) else asyncio.create_task(task)
for task in tasks ]
try:
return await asyncio.gather(*tasks, **kwargs)
except BaseException as e:
for task in tasks:
task.cancel()
raise e
# If a() or b() raises an exception, both are immediately cancelled
a_result, b_result = await gather(a(), b())

What you can do with Python 3.10 (and, probably, earlier versions) is use asyncio.wait. It takes an iterable of awaitables and a condition as to when to return, and when the condition is met, it returns two sets of tasks: completed ones and pending ones. You can have it return on the first exception and then cancel the pending tasks one by one:
async def my_task(x):
try:
...
except RecoverableError as e:
...
tasks = [asyncio.crate_task(my_task(x)) for x in xs]
done, pending = await asyncio.wait(taksk, return_when=asyncio.FIRST_EXCEPTION)
for p in pending:
p.cancel()
And you can wrap your tasks in try-except re-raising the fatal exceptions and processing not-fatal ones otherwise. It's not gather, but it looks like it does what you want.
https://docs.python.org/3/library/asyncio-task.html#id9

Related

Proper way of managing recursive Python async tasks

My understanding of async tasks is that they can be created and the returned task object itself can just be discarded because the task will automatically go on the loop, and then asyncio.all_tasks() can be called later to join them. However, hitting all_tasks() once won't account for any tasks created from tasks, and, as a result, an exception will get raised but not propagated. Even after the task that dispatches its own task is executed, asyncio.all_tasks() still doesn't seem to see it.
So, do we actually need proper task accounting and to make sure to gather/run on all created tasks? It seems like this is partially managed, but the important stuff is not.
Example code:
import asyncio
async def func2():
print("Second function")
raise Exception("Inner exception")
async def func1(loop):
print("First function")
loop.create_task(func2())
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
awaitable = func1(loop)
loop.create_task(awaitable)
reported_tasks = asyncio.all_tasks(loop=loop)
awaitable = asyncio.gather(*reported_tasks)
loop.run_until_complete(awaitable)
I'm creating the loop myself because my scenario is a non-main thread (so the loop won't be implicitly created) and I don't want there to be some nuance due to this that confuses things due to being left unspoken.
func1() and func2() execute, but I'll merely just get warned about not materializing the exception in func2() unless I manually collect all of the tasks, enumerate them, and check their exception flags.
On the other hand, if I start adding all of the tags to a list, do I grab that list, get the length, shift that number of items off the front of the list, wait on those to be done, check them for exceptions, and loop until I determine that all tasks have completed? This will potentially include some long-running tasks as well as some failed ones, so maybe I need to figure for a timeout in order to make sure those exceptions get processed seen after being raised? Is there a more elegant solution?
I looked, but the documentation seems mostly about the API rather than the use-cases/patterns and all of the searches I ran still left these questions unanswered.
The documentation does note that you should retain the created tasks and implies that this is for the reason of reference-counting, but there's no reference to the topics above:
Important
Save a reference to the result of this function, to avoid a
task disappearing mid execution.

The problem with your snippet seems to be only that you call all_tasks before the body of func1 is run, and therefore, before the task that will run func2 is ever created.
I typed some stuff on the interactive interpreter, without all the boilerplate you added there, and all_tasks seems to be "seeing" the grand-daughter task, with no problems, and the exception is also raised in the awaiting code:
In [1]: import asyncio
In [2]: async def func2():
...: print("second function")
...: 1 / 0 # raises
...:
In [3]: async def func1():
...: print("first function")
...: asyncio.create_task(func2())
...:
In [4]: async def main():
...: await func1()
...: z = asyncio.all_tasks()
...: print(z)
...:
In [5]: asyncio.run(main())
first function
{<Task pending name='Task-1148' coro=<main() running at <ipython-input-4-489680e1b4c1>:4> cb=[_run_until_complete_cb() at /home/gwidion/.pyenv/versions/3.10-dev/lib/python3.10/asyncio/base_events.py:184]>, <Task pending name='Task-1149' coro=<func2() running at <ipython-input-2-555de5f64f41>:1>>}
second function
Task exception was never retrieved
future: <Task finished name='Task-1149' coro=<func2() done, defined at <ipython-input-2-555de5f64f41>:1> exception=ZeroDivisionError('division by zero')>
Traceback (most recent call last):
File "<ipython-input-2-555de5f64f41>", line 3, in func2
1 / 0 # raises
ZeroDivisionError: division by zero

Python asyncio cancel unawaited coroutines

So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.

So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()

Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()

Tasks created with create_task that are never awaited, seem to break expectations of cancelation for child tasks

Imagine we're writing an application which allows a user to run an application (let's say it's a series of important operations against an API) continuously, and can run multiple applications concurrently. Requirements include:
the user can control the number of concurrent applications (which may limit concurrent load against an API, which is often important)
if the OS tries to close the Python program running this thing, it should gracefully terminate, allowing any in-progress applications to complete their run before closing
The question here is specifically about the task manager we've coded, so let's stub out some code that illustrates this problem:
import asyncio
import signal
async def work_chunk():
"""Simulates a chunk of work that can possibly fail"""
await asyncio.sleep(1)
async def protected_work():
"""All steps of this function MUST complete, the caller should shield it from cancelation."""
print("protected_work start")
for i in range(3):
await work_chunk()
print(f"protected_work working... {i+1} out of 3 steps complete")
print("protected_work done... ")
async def subtask():
print("subtask: starting loop of protected work...")
cancelled = False
while not cancelled:
protected_coro = asyncio.create_task(protected_work())
try:
await asyncio.shield(protected_coro)
except asyncio.CancelledError:
cancelled = True
await protected_coro
print("subtask: cancelation complete")
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
await asyncio.sleep(5)
def shutdown(signal, main_task):
"""Cleanup tasks tied to the service's shutdown."""
print(f"Received exit signal {signal.name}. Scheduling cancelation:")
main_task.cancel()
async def main():
print("main... start")
coro = asyncio.ensure_future(subtask_manager())
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, coro))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, coro))
await coro
print("main... done")
def run():
asyncio.run(main())
run()
subtask_manager manages a pool of workers, periodically looking up what the present concurrency requirement is and updating the number of active workers appropriately (note that the code above cuts out most of that, and just hard codes a number, since it isn't important to the question).
subtask is the worker loop itself, which continuously runs protected_work() until someone cancels it.
But this code is broken. When you give it a SIGINT, the whole thing immediately crashes.
Before I explain further, let me point you at a critical bit of code:
1 protected_coro = asyncio.create_task(protected_work())
2 try:
3 await asyncio.shield(protected_coro)
4 except asyncio.CancelledError:
5 cancelled = True
6 await protected_coro # <-- This will raise CancelledError too!
After some debugging, we find that our try/except block isn't working. We find that both line 3 AND line 6 raise CancelledError.
When we dig in further, we find that ALL "await" calls throw CancelledError after the subtask manager is canceled, not just the line noted above. (i.e., the second line of work_chunk(), await asyncio.sleep(1), and the 4th line of protected_work(), await work_chunk(), also raise CancelledError.)
What's going on here?
It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
Why?
Clearly, I don't understand how cancelation propagation works in Python. I've struggled to find documentation on how it works. Can someone describe to me how cancelation is propagated in a clear-minded way that explains the behavior found in the example above?

After looking at this problem for a long time, and experimenting with other code snippets (where cancelation propagation works as expected), I started to wonder if the problem is Python doesn't know the order of propagation here, in this case.
But why?
Well, subtask_manager creates tasks, but doesn't await them.
Could it be that Python doesn't assume that the coroutine that created that task (with create_task) owns that task? I think Python uses the await keyword exclusively to know in what order to propagate cancelation, and if after traversing the whole tree of tasks it finds tasks that still haven't been canceled, it just destroys them all.
Therefore, it's up to us to manage Task cancelation propagation ourselves, in any place where we know we haven't awaited an async task. So, we need to refactor subtask_manager to catch its own cancelation, and explicitly cancel and then await all its child tasks:
async def subtask_manager():
"""
Manage a pool of subtask workers.
(In the real world, the user can dynamically change the concurrency, but here we'll
hard code it at 3.)
"""
tasks = {}
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
print("cancelation detected, canceling children")
[t.cancel() for t in tasks.values()]
await asyncio.gather(*[t for t in tasks.values()])
return
Now our code works as expected:
Note: I've answered my own question Q&A style, but I still feel unsatisfied with my textual answer about how cancelation propagation works. If anyone has a better explanation of how cancelation propagation works, I would love to read it.

What's going on here? It would seem that Python, for some reason, isn't propagating cancelation as you would expect, and just throws up its hands and says "I'm canceling everything now".
TL;DR Canceling everything is precisely what's happening, simply because the event loop is exiting.
To investigate this, I changed the invocation of add_signal_handler() to loop.call_later(.5, lambda: shutdown(signal.SIGINT, coro)). Python's Ctrl+C handling has odd corners, and I wanted to check whether the strange behavior is the result of that. But the bug was perfectly reproducible without signals, so it wasn't that.
And yet, asyncio cancellation really shouldn't work like your code shows. Canceling a task propagates to the future (or another task) it awaits, but shield is specifically implemented to circumvent that. It creates and returns a fresh future, and connects the result of the original (shielded) future to the new one in a way that cancel() doesn't know how to follow.
It took me some time to unearth what really happens, and that is:
await coro at the end of main awaits the task that gets cancelled, so it gets a CancelledError as soon as shutdown cancels it;
the exception causes main to exit and enters the cleanup sequence at the end of asyncio.run(). This cleanup sequence cancels all tasks, including the ones you've shielded.
You can test it by changing await coro at the end of main() to:
try:
await coro
finally:
print('main... done')
And you will see that "main... done" is printed prior to all the mysterious cancellations you've been witnessing.
So that clears the mystery and to fix the issue, you should postpone exiting main until everything is done. For example, you can create the tasks dict in main, pass it to subtask_manager(), and then await those critical tasks when the main task gets cancelled:
async def subtask_manager(tasks):
while True:
for i in range(3):
task = tasks.get(i)
if not task or task.done():
tasks[i] = asyncio.create_task(subtask())
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
for t in tasks.values():
t.cancel()
raise
# ... shutdown unchanged
async def main():
print("main... start")
tasks = {}
main_task = asyncio.ensure_future(subtask_manager(tasks))
loop = asyncio.get_running_loop()
loop.add_signal_handler(signal.SIGINT, lambda: shutdown(signal.SIGINT, main_task))
loop.add_signal_handler(signal.SIGTERM, lambda: shutdown(signal.SIGTERM, main_task))
try:
await main_task
except asyncio.CancelledError:
await asyncio.gather(*tasks.values())
finally:
print("main... done")
Note that the main task must explicitly cancel its subtasks because that actually wouldn't happen automatically. Cancellation is propagated through a chain of awaits, and subtask_manager doesn't explicitly awaits its subtasks, it just spawns them and awaits something else, effectively shielding them.

When is the right time to call loop.close()?

I have been experimenting with asyncio for a little while and read the PEPs; a few tutorials; and even the O'Reilly book.
I think I got the hang of it, but I'm still puzzled by the behavior of loop.close() which I can't quite figure out when it is "safe" to invoke.
Distilled to its simplest, my use case is a bunch of blocking "old school" calls, which I wrap in the run_in_executor() and an outer coroutine; if any of those calls goes wrong, I want to stop progress, cancel the ones still outstanding, print a sensible log and then (hopefully, cleanly) get out of the way.
Say, something like this:
import asyncio
import time
def blocking(num):
time.sleep(num)
if num == 2:
raise ValueError("don't like 2")
return num
async def my_coro(loop, num):
try:
result = await loop.run_in_executor(None, blocking, num)
print(f"Coro {num} done")
return result
except asyncio.CancelledError:
# Do some cleanup here.
print(f"man, I was canceled: {num}")
def main():
loop = asyncio.get_event_loop()
tasks = []
for num in range(5):
tasks.append(loop.create_task(my_coro(loop, num)))
try:
# No point in waiting; if any of the tasks go wrong, I
# just want to abandon everything. The ALL_DONE is not
# a good solution here.
future = asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
done, pending = loop.run_until_complete(future)
if pending:
print(f"Still {len(pending)} tasks pending")
# I tried putting a stop() - with/without a run_forever()
# after the for - same exception raised.
# loop.stop()
for future in pending:
future.cancel()
for task in done:
res = task.result()
print("Task returned", res)
except ValueError as error:
print("Outer except --", error)
finally:
# I also tried placing the run_forever() here,
# before the stop() - no dice.
loop.stop()
if pending:
print("Waiting for pending futures to finish...")
loop.run_forever()
loop.close()
I tried several variants of the stop() and run_forever() calls, the "run_forever first, then stop" seems to be the one to use according to the pydoc and, without the call to close() yields a satisfying:
Coro 0 done
Coro 1 done
Still 2 tasks pending
Task returned 1
Task returned 0
Outer except -- don't like 2
Waiting for pending futures to finish...
man, I was canceled: 4
man, I was canceled: 3
Process finished with exit code 0
However, when the call to close() is added (as shown above) I get two exceptions:
exception calling callback for <Future at 0x104f21438 state=finished returned int>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
callback(self)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/futures.py", line 414, in _call_set_state
dest_loop.call_soon_threadsafe(_set_state, destination, source)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 620, in call_soon_threadsafe
self._check_closed()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 357, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
which is at best annoying, but to me, totally puzzling: and, to make matter worse, I've been unable to figure out what would The Right Way of handling such a situation.
Thus, two questions:
what am I missing? how should I modify the code above in a way that with the call to close() included does not raise?
what actually happens if I don't call close() - in this trivial case, I presume it's largely redundant; but what might the consequences be in a "real" production code?
For my own personal satisfaction, also:
why does it raise at all? what more does the loop want from the coros/tasks: they either exited; raised; or were canceled: isn't this enough to keep it happy?
Many thanks in advance for any suggestions you may have!

Distilled to its simplest, my use case is a bunch of blocking "old school" calls, which I wrap in the run_in_executor() and an outer coroutine; if any of those calls goes wrong, I want to stop progress, cancel the ones still outstanding
This can't work as envisioned because run_in_executor submits the function to a thread pool, and OS threads can't be cancelled in Python (or in other languages that expose them). Canceling the future returned by run_in_executor will attempt to cancel the underlying concurrent.futures.Future, but that will only have effect if the blocking function is not yet running, e.g. because the thread pool is busy. Once it starts to execute, it cannot be safely cancelled. Support for safe and reliable cancellation is one of the benefits of using asyncio compared to threads.
If you are dealing with synchronous code, be it a legacy blocking call or longer-running CPU-bound code, you should run it with run_in_executor and incorporate a way to interrupt it. For example, the code could occasionally check a stop_requested flag and exit if that is true, perhaps by raising an exception. Then you can "cancel" those tasks by setting the appropriate flag or flags.
how should I modify the code above in a way that with the call to close() included does not raise?
As far as I can tell, there is currently no way to do so without modifications to blocking and the top-level code. run_in_executor will insist on informing the event loop of the result, and this fails when the event loop is closed. It doesn't help that the asyncio future is cancelled, because the cancellation check is performed in the event loop thread, and the error occurs before that, when call_soon_threadsafe is called by the worker thread. (It might be possible to move the check to the worker thread, but it should be carefully analyzed whether it leads a race condition between the call to cancel() and the actual check.)
why does it raise at all? what more does the loop want from the coros/tasks: they either exited; raised; or were canceled: isn't this enough to keep it happy?
It wants the blocking functions passed to run_in_executor (literally called blocking in the question) that have already been started to finish running before the event loop is closed. You cancelled the asyncio future, but the underlying concurrent future still wants to "phone home", finding the loop closed.
It is not obvious whether this is a bug in asyncio, or if you are simply not supposed to close an event loop until you somehow ensure that all work submitted to run_in_executor is done. Doing so requires the following changes:
Don't attempt to cancel the pending futures. Canceling them looks correct superficially, but it prevents you from being able to wait() for those futures, as asyncio will consider them complete.
Instead, send an application-specific event to your background tasks informing them that they need to abort.
Call loop.run_until_complete(asyncio.wait(pending)) before loop.close().
With these modifications (except for the application-specific event - I simply let the sleep()s finish their course), the exception did not appear.
what actually happens if I don't call close() - in this trivial case, I presume it's largely redundant; but what might the consequences be in a "real" production code?
Since a typical event loop runs as long as the application, there should be no issue in not call close() at the very end of the program. The operating system will clean up the resources on program exit anyway.
Calling loop.close() is important for event loops that have a clear lifetime. For example, a library might create a fresh event loop for a specific task, run it in a dedicated thread, and dispose of it. Failing to close such a loop could leak its internal resources (such as the pipe it uses for inter-thread wakeup) and cause the program to fail. Another example are test suites, which often start a new event loop for each unit test to ensure separation of test environments.
EDIT: I filed a bug for this issue.
EDIT 2: The bug was fixed by devs.

Until the upstream issue is fixed, another way to work around the problem is by replacing the use of run_in_executor with a custom version without the flaw. While rolling one's own run_in_executor sounds like a bad idea at first, it is in fact only a small glue between a concurrent.futures and an asyncio future.
A simple version of run_in_executor can be cleanly implemented using the public API of those two classes:
def run_in_executor(executor, fn, *args):
"""Submit FN to EXECUTOR and return an asyncio future."""
loop = asyncio.get_event_loop()
if args:
fn = functools.partial(fn, *args)
work_future = executor.submit(fn)
aio_future = loop.create_future()
aio_cancelled = False
def work_done(_f):
if not aio_cancelled:
loop.call_soon_threadsafe(set_result)
def check_cancel(_f):
nonlocal aio_cancelled
if aio_future.cancelled():
work_future.cancel()
aio_cancelled = True
def set_result():
if work_future.cancelled():
aio_future.cancel()
elif work_future.exception() is not None:
aio_future.set_exception(work_future.exception())
else:
aio_future.set_result(work_future.result())
work_future.add_done_callback(work_done)
aio_future.add_done_callback(check_cancel)
return aio_future
When loop.run_in_executor(blocking) is replaced with run_in_executor(executor, blocking), executor being a ThreadPoolExecutor created in main(), the code works without other modifications.
Of course, in this variant the synchronous functions will continue running in the other thread to completion despite being canceled -- but that is unavoidable without modifying them to support explicit interruption.

Python asyncio future add_done_callback then cancel the future

I have a problem to understand how asyncio works
if I create a future future = asyncio.Future() then add add_done_callback(done_callback) and after that cancel the future
future.cancel() the done_callback not suppose to get fired?
I tried to use the loop.run_forever() but I end up with infinite loop.
I have a small example code:
_future_set = asyncio.Future()
def done_callback(f):
if f.exception():
_future_set.set_exception(f.exception)
elif f.cancelled():
_future_set.cancel()
else:
_future_set.set_result(None)
async def check_stats(future):
while not future.cancelled():
print("not done")
continue
loop.stop()
def set(future):
if not _future_set.done():
future.add_done_callback(done_callback)
loop = asyncio.new_event_loop()
future = loop.create_future()
asyncio.ensure_future(check_stats(future), loop=loop)
set(future)
future.cancel()
loop.run_forever() # infinite
print(_future_set.cancelled())
I know something is missing and maybe this is not the behavior but I will be happy for a little help here.
I am using python 3.6
**update
After set is fire and I bind the add_done_callback to the future
when I cancel the future and the state of the future change to cancelled and done then I expect that _future_set will be cancelled too.
and print(_future_set.cancelled()) will be True

From the docstring (on my unix system) help(loop.run_forever):
run_forever() method of asyncio.unix_events._UnixSelectorEventLoop instance
Run until stop() is called.
When you call loop.run_forever() the program will not progress beyond that line until stop() is called on the ioloop instance, and there's nothing in your code doing so.
loop.run_forever() is essentially doing:
def run_forever(self):
while not self.stopped:
self.check_for_things_to_do()
Without knowing a little more as to what you're trying to achieve, it's hard to help you further. However it seems that you're expecting loop.run_forever() to be asynchronous in the execution of the python code, however this is not the case. The IOLoop will keep looping and check filevents and fire callbacks on futures, and will only return back to the point it's called if it is told to stop looping.
Ah, I realise now what you're expecting to happen. You need to register the futures with the ioloop, either by doing future = loop.create_future() or future = asyncio.Future(loop=loop). The former is the preferred method for creating futures. N.B. the code will still run forever at the loop.run_forever() call unless it is stopped, so your print statement will still never be reached.
Further addendum: If you actually run the code you have in your question, there is an exception being raised at f.exception(), which as as per the docs:
exception()
Return the exception that was set on this future.
The exception (or None if no exception was set) is returned only if the future is done. If the future has been cancelled, raises CancelledError. If the future isn’t done yet, raises InvalidStateError.
This means that the invocation of done_callback() is being stopped at the first if f.exception(). So if you switch done_callback() around to read:
def done_callback(f):
if f.cancelled():
_future_set.cancel()
elif f.exception():
_future_set.set_exception(f.exception)
else:
_future_set.set_result(None)
Then you get the expected output.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.