Select first result from two coroutines in asyncio

Select first result from two coroutines in asyncio - python

Question
Using Python's asyncio module, how do I select the first result from multiple coroutines?
Example
I might want to implement a timeout on waiting on a queue:
result = yield from select(asyncio.sleep(1),
queue.get())
Analagous Operations
This would be similar to Go's select or Clojure's core.async.alt!. It is something like the converse of asyncio.gather (gather is like all, select would be like any.)

Simple solution, by using asyncio.wait and its FIRST_COMPLETED parameter:
import asyncio
async def something_to_wait():
await asyncio.sleep(1)
return "something_to_wait"
async def something_else_to_wait():
await asyncio.sleep(2)
return "something_else_to_wait"
async def wait_first():
done, pending = await asyncio.wait(
[something_to_wait(), something_else_to_wait()],
return_when=asyncio.FIRST_COMPLETED)
print("done", done)
print("pending", pending)
asyncio.get_event_loop().run_until_complete(wait_first())
gives:
done {<Task finished coro=<something_to_wait() done, defined at stack.py:3> result='something_to_wait'>}
pending {<Task pending coro=<something_else_to_wait() running at stack.py:8> wait_for=<Future pending cb=[Task._wakeup()]>>}
Task was destroyed but it is pending!
task: <Task pending coro=<something_else_to_wait() running at stack.py:8> wait_for=<Future pending cb=[Task._wakeup()]>>

You can implement this using both asyncio.wait and asyncio.as_completed:
import asyncio
async def ok():
await asyncio.sleep(1)
return 5
async def select1(*futures, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
return (await next(asyncio.as_completed(futures)))
async def select2(*futures, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
done, running = await asyncio.wait(futures,
return_when=asyncio.FIRST_COMPLETED)
result = done.pop()
return result.result()
async def example():
queue = asyncio.Queue()
result = await select1(ok(), queue.get())
print('got {}'.format(result))
result = await select2(queue.get(), ok())
print('got {}'.format(result))
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(example())
Output:
got 5
got 5
Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at /usr/lib/python3.4/asyncio/queues.py:170> wait_for=<Future pending cb=[Task._wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.4/asyncio/tasks.py:463]>
Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at /usr/lib/python3.4/asyncio/queues.py:170> wait_for=<Future pending cb=[Task._wakeup()]>>
Both implementations return the value awaited from first completed Future, but you can easily tweak it to return the Future itself, instead. Note that because the other Future passed to each select implementation is never awaited, a warning gets raised when the process exits.

In the case of wanting to apply a timeout to a task, there is a standard library function that does exactly this: asyncio.wait_for(). Your example can be written like this:
try:
result = await asyncio.wait_for(queue.get(), timeout=1)
except asyncio.TimeoutError:
# This block will execute if queue.get() takes more than 1s.
result = ...
But this only works for the specific case of a timeout. The other two answers here generalize to any arbitrary set of tasks, but neither of those answers shows how to clean up the tasks which don't finish first. This is what causes the "Task was destroyed but it is pending" messages in the output. In practice, you should do something with those pending tasks. Based on your example, I'll assume that you don't care about the other tasks' results. Here's an example of a wait_first() function that returns the value of the first completed task and cancels the remaining tasks.
import asyncio, random
async def foo(x):
r = random.random()
print('foo({:d}) sleeping for {:0.3f}'.format(x, r))
await asyncio.sleep(r)
print('foo({:d}) done'.format(x))
return x
async def wait_first(*futures):
''' Return the result of the first future to finish. Cancel the remaining
futures. '''
done, pending = await asyncio.wait(futures,
return_when=asyncio.FIRST_COMPLETED)
gather = asyncio.gather(*pending)
gather.cancel()
try:
await gather
except asyncio.CancelledError:
pass
return done.pop().result()
async def main():
result = await wait_first(foo(1), foo(2))
print('the result is {}'.format(result))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Running this example:
# export PYTHONASYNCIODEBUG=1
# python3 test.py
foo(1) sleeping for 0.381
foo(2) sleeping for 0.279
foo(2) done
the result is 2
# python3 test.py
foo(1) sleeping for 0.048
foo(2) sleeping for 0.515
foo(1) done
the result is 1
# python3 test.py
foo(1) sleeping for 0.396
foo(2) sleeping for 0.188
foo(2) done
the result is 2
There are no error messages about pending tasks, because each pending task has been cleaned up correctly.
In practice, you probably want wait_first() to return the future, not the future's result, otherwise it will be really confusing trying to figure out which future finished. But in the example here, I returned the future's result since it looks a little cleaner.

Here's a more robust solution based upon earlier examples that deals with the following:
Uses recursion to return the first non-null result (earlier examples will return first result regardless if null or non-null)
Returns the first non-null result even if another task raises an exception
In the event only non-null results are returned and an exception is raised, the last exception is raised
Deals with multiple tasks completing at the same time - in practice this would be very rare but it can pop up in unit tests where fake async tasks complete immediately.
Note this example requires Python 3.8 due to use of the assignment operator.
async def wait_first(*tasks):
"""Return the result of first async task to complete with a non-null result"""
# Get first completed task(s)
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
# Tasks MAY complete at same time e.g. in unit tests :)
# Coalesce the first result if present
for task in done:
exception = task.exception()
if exception is None and (result := task.result()):
break
else:
result = None
# Gather remaining tasks without raising exceptions
gather = asyncio.gather(*pending, return_exceptions=True)
# Cancel remaining tasks if result is non-null otherwise await next pending tasks
if result:
gather.cancel()
elif pending:
result = await wait_first(*pending)
# Await remaining tasks to ensure they are cancelled
try:
await gather
except asyncio.CancelledError:
pass
# Return result or raise last exception if no result was returned
if exception and result is None:
raise exception
else:
return result

Related

Asyncio module failing to create task

My Source Code:
import asyncio
async def mycoro(number):
print(f'Starting {number}')
await asyncio.sleep(1)
print(f'Finishing {number}')
return str(number)
c = mycoro(3)
task = asyncio.create_task(c)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
loop.close()
Error:
RuntimeError: no running event loop
sys:1: RuntimeWarning: coroutine 'mycoro' was never awaited
I was watching a tutorial and according to my code it was never awaited when I did and it clearly does in the video I was watching.

Simply run the coroutine directly without creating a task for it:
import asyncio
async def mycoro(number):
print(f'Starting {number}')
await asyncio.sleep(1)
print(f'Finishing {number}')
return str(number)
c = mycoro(3)
loop = asyncio.get_event_loop()
loop.run_until_complete(c)
loop.close()
The purpose of asyncio.create_task is to create an additional task from inside a running task. Since it directly starts the new task, it must be used inside a running event loop – hence the error when using it outside.
Use loop.create_task(c) if a task must be created from outside a task.
In more recent version of asyncio, use asyncio.run to avoid having to handle the event loop explicitly:
c = mycoro(3)
asyncio.run(c)
In general, use asyncio.create_task only to increase concurrency. Avoid using it when another task would block immediately.
# bad task usage: concurrency stays the same due to blocking
async def bad_task():
task = asyncio.create_task(mycoro(0))
await task
# no task usage: concurrency stays the same due to stacking
async def no_task():
await mycoro(0)
# good task usage: concurrency is increased via multiple tasks
async def good_task():
tasks = [asyncio.create_task(mycoro(i)) for i in range(3)]
print('Starting done, sleeping now...')
await asyncio.sleep(1.5)
await asyncio.gather(*tasks) # ensure subtasks finished

Change the line
task = asyncio.Task(c)

asyncio create_task() to be first in queue?

As I understand when I call create_task() it will put at the end of the event loop queue.
My use case is the following, I have some tasks made of the same coroutine. I want to cancel all tasks on some failed condition. This is the pattern:
async def coro(params):
# long running task...
if failed_condition:
await cancel_all() # should cancel all tasks made of coro
async def cancel_all():
for task in tasks:
task.cancel()
await asyncio.gather(*tasks) # wait for cancel completion
print("All tasks cancelled")
tasks = []
async def main():
tasks = [loop.create_task(coro(params)) for x in range(5)]
asyncio.gather(*tasks)
The problem is that since cancel_all() itself is awaited by one task, it is cancelled by itself.
How can I solve this?
I could use loop.create_task(cancel_all()) instead, but I want the cancel_all() to run soon as possible.

cancel_all() could exclude the current task:
async def cancel_all():
to_cancel = set(tasks)
to_cancel.discard(asyncio.current_task())
for task in to_cancel:
task.cancel()
await asyncio.gather(*to_cancel)

You can use asyncio.wait with FIRST_EXCEPTION parameter.
import asyncio
import random
class UnexpectedCondition(Exception):
pass
async def coro(condition):
# long running task...
await asyncio.sleep(random.random()*10)
if condition:
raise UnexpectedCondition("Failed")
return "Result"
async def main(f):
tasks = [coro(f(x)) for x in range(5)]
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
for p in pending:
# unfinished tasks will be cancelled
print("Cancel")
p.cancel()
for d in done:
try:
# get result from finished tasks
print(d.result())
except UnexpectedCondition as e:
# handle tasks with unexpected conditions
print(e)
asyncio.run(main(lambda x: x%2 == 0))

Wait for everything to complete (asyncio)

I'm toying around with asyncio and I can't achieve what I'm trying to achieve. Here's my code.
import random
import asyncio
async def my_long_operation(s: str):
print("Starting operation on", s)
await asyncio.sleep(3)
print("End", s)
def random_string(length: int = 10):
alphabet = \
[chr(x) for x in range(ord('a'), ord('z') + 1)]
result = ""
for i in range(length):
result += random.choice(alphabet)
return result
def get_strings(n = 10):
for i in range(n):
s = random_string()
yield s
async def several(loop):
tasks =list()
for s in get_strings():
task = asyncio.create_task(my_long_operation(s))
asyncio.ensure_future(task, loop = loop)
print("OK")
loop = asyncio.get_event_loop()
loop.run_until_complete(several(loop))
# loop.run_forever()
loop.close()
Now my problem is the following.
I'd like to run all of my my_long_operations concurrently, waiting for all of them to finish. The thing is that when I run the code, I get the following error:
Task was destroyed but it is pending!
task: <Task pending coro=<my_long_operation() done, defined at test.py:4> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f75274f7168>()]>>
Which seems to make sense, since my script ended right after starting these operations (without waiting for them to complete).
So we arrive to my actual question: how am I supposed to do this? How can I start these tasks and wait for them to complete before terminating the script, without using run_forever (since run_forever never exits Python...)
Thanks!

ensure_future is basically the "put it in the event loop and then don't bother me with it" method. What you want instead is to await the completion of all your async functions. For that, use asyncio.wait if you're not particularly interested in the results, or asyncio.gather if you want the results:
tasks = map(my_long_operation, get_strings())
await asyncio.wait(list(tasks), loop=loop)
print('OK')

Shutdown infinite async generator

Reproducible error
I tried to reproduce the error in an online REPL here. However, it is not exactly the same implementation (and hence behavior) as my real code (where I do async for response in position_stream(), instead of for position in count() in the REPL).
More details on my actual implementation
I define somewhere a coroutine like so:
async def position(self):
request = telemetry_pb2.SubscribePositionRequest()
position_stream = self._stub.SubscribePosition(request)
try:
async for response in position_stream:
yield Position.translate_from_rpc(response)
finally:
position_stream.cancel()
where position_stream is infinite (or possibly very long lasting). I use it from an example code like this:
async def print_altitude():
async for position in drone.telemetry.position():
print(f"Altitude: {position.relative_altitude_m}")
and print_altitude() is run on the loop with:
asyncio.ensure_future(print_altitude())
asyncio.get_event_loop().run_forever()
That works well. Now, at some point, I'd like to close the stream from the caller. I thought that I could just run asyncio.ensure_future(loop.shutdown_asyncgens()) and wait for my finally close above to get called, but it doesn't happen.
Instead, I receive a warning on an unretrieved exception:
Task exception was never retrieved
future: <Task finished coro=<print_altitude() done, defined at [...]
Why is that, and how can I make it such that all my async generators actually get closed (and run their finally clause)?

First of all, if you stop a loop, none of your coroutines will have a chance to shut down properly. Calling close basically means irreversibly destroying the loop.
If you do not care what happens to those running tasks, you can simply cancel them all, this will stop asynchronous generators as well:
import asyncio
from contextlib import suppress
async def position_stream():
while True:
await asyncio.sleep(1)
yield 0
async def print_position():
async for position in position_stream():
print(f'position: {position}')
async def cleanup_awaiter():
await asyncio.sleep(3)
print('cleanup!')
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(print_position())
asyncio.ensure_future(print_position())
loop.run_until_complete(cleanup_awaiter())
# get all running tasks:
tasks = asyncio.gather(*asyncio.Task.all_tasks())
# schedule throwing CancelledError into the them:
tasks.cancel()
# allow them to process the exception and be cancelled:
with suppress(asyncio.CancelledError):
loop.run_until_complete(tasks)
finally:
print('closing loop')
loop.close()

Create generator that yields coroutine results as the coroutines finish

Currently, I have an inefficient synchronous generator that makes many HTTP requests in sequence and yields the results. I'd like to use asyncio and aiohttp to parallelise the requests and thereby speed up this generator, but I want to keep it as an ordinary generator (not a PEP 525 async generator) so that the non-async code that calls it doesn't need to be modified. How can I create such a generator?

asyncio.as_completed() takes an iterable of coroutines or futures and returns an iterable of futures in the order that the input futures complete. Normally, you'd loop over its result and await the members from inside an async function...
import asyncio
async def first():
await asyncio.sleep(5)
return 'first'
async def second():
await asyncio.sleep(1)
return 'second'
async def third():
await asyncio.sleep(3)
return 'third'
async def main():
for future in asyncio.as_completed([first(), second(), third()]):
print(await future)
# Prints 'second', then 'third', then 'first'
asyncio.run(main())
... but for the purpose of this question, what we want is to be able to yield these results from an ordinary generator, so that normal synchronous code can consume them without ever knowing that async functions are being used under the hood. We can do that by calling loop.run_until_complete() on the futures yielded by our as_completed call...
import asyncio
async def first():
await asyncio.sleep(5)
return 'first'
async def second():
await asyncio.sleep(1)
return 'second'
async def third():
await asyncio.sleep(3)
return 'third'
def ordinary_generator():
loop = asyncio.get_event_loop()
for future in asyncio.as_completed([first(), second(), third()]):
yield loop.run_until_complete(future)
# Prints 'second', then 'third', then 'first'
for element in ordinary_generator():
print(element)
In this way, we've exposed our async code to non-async-land in a manner that doesn't require callers to define any functions as async, or to even know that ordinary_generator is using asyncio under the hood.
As an alternative implementation of ordinary_generator() that offers more flexibility in some circumstances, we can repeatedly call asyncio.wait() with the FIRST_COMPLETED flag instead of looping over as_completed():
import concurrent.futures
def ordinary_generator():
loop = asyncio.get_event_loop()
pending = [first(), second(), third()]
while pending:
done, pending = loop.run_until_complete(
asyncio.wait(
pending,
return_when=concurrent.futures.FIRST_COMPLETED
)
)
for job in done:
yield job.result()
This approach, maintaining a list of pending jobs, has the advantage that we can adapt it to add jobs to the pending list on the fly. This is useful in use cases where our async jobs can add an unpredictable number of further jobs to the queue - like a web spider that follows all links on each page that it visits.
One caveat: the approaches above assume we're calling the synchronous code from the main thread, in which case get_event_loop is guaranteed to give us a loop and we've got no need to .close it. If we want ordinary_generator to be usable from a non-main thread, especially one that may have previously had an event loop created, then life gets harder, because we can't rely on get_event_loop (it raises a RuntimeError on any non-main thread that doesn't have an event loop yet). In that case the simplest thing I can think to do is to spin off a new thread to run our asyncio code, and communicate with it via a queue:
def ordinary_generator():
sentinel = object()
queue = Queue()
def thread_entry_point():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
for future in asyncio.as_completed([first(), second(), third()]):
try:
queue.put(loop.run_until_complete(future))
except Exception as e:
queue.put((sentinel, e))
break
loop.close()
queue.put(sentinel)
Thread(target=thread_entry_point).start()
while True:
val = queue.get()
if val is sentinel:
return
if isinstance(val, tuple) and len(val) == 2 and val[0] is sentinel:
raise val[1]
yield val
(Combining the use of run_until_complete from the penultimate example with the use of an extra thread in the final example is left as an exercise for any reader who needs to do so.)

Mark's answer is great but I'd like to contribute a different implementation that doesn't rely on the low level event loop methods.
The key difference is that rather than doing a yield, instead provide a callback that can be used to process the results:
import asyncio
import random
async def do_stuff():
proc_time = round(random.random(), 2)
print('START: ', proc_time)
await asyncio.sleep(proc_time)
return proc_time
def concurrent_stuff(awaitables, callback):
# Must be async to wait
async def _as_completed():
for coro in asyncio.as_completed(awaitables):
result = await coro
callback(result) # Send result to callback.
# Perform the async calls inside a regular method
asyncio.run(_as_completed())
def when_done(result):
print('FINISHED: ', result)
def main():
awaitables = [do_stuff() for _ in range(5)]
concurrent_stuff(awaitables, when_done)
main()
# START: 0.56
# START: 0.98
# START: 0.39
# START: 0.23
# START: 0.94
# FINISHED: 0.23
# FINISHED: 0.39
# FINISHED: 0.56
# FINISHED: 0.94
# FINISHED: 0.98

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select first result from two coroutines in asyncio - python

Related

Asyncio module failing to create task

asyncio create_task() to be first in queue?

Wait for everything to complete (asyncio)

Shutdown infinite async generator

Create generator that yields coroutine results as the coroutines finish

Categories

Resources