asyncio create_task() to be first in queue?

asyncio create_task() to be first in queue? - python

As I understand when I call create_task() it will put at the end of the event loop queue.
My use case is the following, I have some tasks made of the same coroutine. I want to cancel all tasks on some failed condition. This is the pattern:
async def coro(params):
# long running task...
if failed_condition:
await cancel_all() # should cancel all tasks made of coro
async def cancel_all():
for task in tasks:
task.cancel()
await asyncio.gather(*tasks) # wait for cancel completion
print("All tasks cancelled")
tasks = []
async def main():
tasks = [loop.create_task(coro(params)) for x in range(5)]
asyncio.gather(*tasks)
The problem is that since cancel_all() itself is awaited by one task, it is cancelled by itself.
How can I solve this?
I could use loop.create_task(cancel_all()) instead, but I want the cancel_all() to run soon as possible.

cancel_all() could exclude the current task:
async def cancel_all():
to_cancel = set(tasks)
to_cancel.discard(asyncio.current_task())
for task in to_cancel:
task.cancel()
await asyncio.gather(*to_cancel)

You can use asyncio.wait with FIRST_EXCEPTION parameter.
import asyncio
import random
class UnexpectedCondition(Exception):
pass
async def coro(condition):
# long running task...
await asyncio.sleep(random.random()*10)
if condition:
raise UnexpectedCondition("Failed")
return "Result"
async def main(f):
tasks = [coro(f(x)) for x in range(5)]
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)
for p in pending:
# unfinished tasks will be cancelled
print("Cancel")
p.cancel()
for d in done:
try:
# get result from finished tasks
print(d.result())
except UnexpectedCondition as e:
# handle tasks with unexpected conditions
print(e)
asyncio.run(main(lambda x: x%2 == 0))

Related

Dynamically add new tasks after gather(*tasks)

For example this code:
async def f1(num):
while True:
print(num)
await asyncio.sleep(2)
class ExampleClass:
def __init__():
self.tasks = []
async def main():
for i in range(10):
tasks.append(asyncio.create_task(f1(i)))
await asyncio.gather(*tasks)
def add_new_task(task):
self.tasks.append(task)
Then somewhere outside I call
ExampleClass.add_new_task(task)
What I need is to add new tasks and execute them asynchronously with the existing ones.
May be I should use any other constructions to implement what i want?
What is important is that my tasks probably need to execute forever(forever polling)

The asyncio.gather function is not really convenient for such task. However, you can take a look at asyncio.TaskGroup which was released in Python3.11 and allows to add new tasks dynamically more easily.
import asyncio
from concurrent.futures import wait, FIRST_COMPLETED
async def main():
async with asyncio.TaskGroup() as group:
# Create some tasks
tasks = [
group.create_task(asyncio.sleep(1.0))
for _ in range(10)
]
# Wait for some tasks to finish
done, tasks = await wait(tasks, return_when=FIRST_COMPLETED)
# Add more tasks (/!\ `tasks` is now a set)
tasks.add(group.create_task(asyncio.sleep(2.0)))
# Wait the rest to complete
await asyncio.gather(*tasks)
if __name__ == "__main__":
asyncio.run(main())

So, I come to possible solution for my case with the idea of Louis.
The approximate idea is to use fake async linfinite loop where all my task are executed and where I update new tasks within asyncio.TaskGroup :
async def fake_generator():
while True:
yield None
fake_gen = fake_generator()
async with asyncio.TaskGroup() as tg:
async for _ in fake_gen:
await add_new_data()
for element in self.data:
if not self.data[element]['in_use']:
tg.create_task(job(element))
self.data[element]['in_use'] = True
await asyncio.sleep(60 * 60)

Python Asyncio preferred concurrent design for scalable application

I'm trying to build an appllication with python's asyncio module that is scalable and uses other modules based on asyncio also, the idea is to easily add tasks as the application grows, utilizing syncronization primitives for shared resources between the tasks, yet I'm confused regarding which design would best fit the intent.
import asyncio
async def task1():
while True:
# Code from task1
await asyncio.sleep(1)
async def task2():
while True:
# Code from task2
await asyncio.sleep(1)
async def task3():
while True:
# Code from task3
await asyncio.sleep(1)
async def main():
await asyncio.gather(
task1(),
task2(),
task3()
)
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(main())
In the above approach each task is a while loop, yielding at the end of each execution,
import asyncio
async def task1():
# Code from task1
await asyncio.sleep(1)
async def task2():
# Code from task2
await asyncio.sleep(1)
async def task3():
# Code from task3
await asyncio.sleep(1)
async def main():
while True:
await asyncio.gather(
task1(),
task2(),
task3()
)
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(main())
In this last approach, the gather method is the one being inside a while loop.
I also that perhaps wouldn't be necessary as asyncio tasks are running in a cooperative way, and I would get the same result from awaiting in series each coroutine, saving the usage of mutexes.

Is there anything asynchronous happening besides the asyncio.sleep(1). If not, then why bother using asynchronicity.
In the first piece of code, your three tasks are running independently of each other; you might handle 3 items in task1, 4 items in task2, and none in task3.
In the second, you specifically handle one thing in task1, one thing in task2, one thing in task3, and then start over again. If the tasks run at different speeds, this seems inefficient.

How to handle tasks with varying completion times

I'm watching a video(a) on YouTube about asyncio and, at one point, code like the following is presented for efficiently handling multiple HTTP requests:
# Need an event loop for doing this.
loop = asyncio.get_event_loop()
# Task creation section.
tasks = []
for n in range(1, 50):
tasks.append(loop.create_task(get_html(f"https://example.com/things?id={n}")))
# Task processing section.
for task in tasks:
html = await task
thing = get_thing_from_html(html)
print(f"Thing found: {thing}", flush=True)
I realise that this is efficient in the sense that everything runs concurrently but what concerns me is a case like:
the first task taking a full minute; but
all the others finishing in under three seconds.
Because the task processing section awaits completion of the tasks in the order in which they entered the list, it appears to me that none will be reported as complete until the first one completes.
At that point, the others that finished long ago will also be reported. Is my understanding correct?
If so, what is the normal way to handle that scenario, so that you're getting completion notification for each task the instant that task finishes?
(a) From Michael Kennedy of "Talk Python To Me" podcast fame. The video is Demystifying Python's Async and Await Keywords if you're interested. I have no affiliation with the site other than enjoying the podcast, so heartily recommend it.

If you just need to do something after each task, you can create another async function that does it, and run those in parallel:
async def wrapped_get_html(url):
html = await get_html(url)
thing = get_thing_from_html(html)
print(f"Thing found: {thing}")
async def main():
# shorthand for creating tasks and awaiting them all
await asyncio.gather(*
[wrapped_get_html(f"https://example.com/things?id={n}")
for n in range(50)])
asyncio.run(main())
If for some reason you need your main loop to be notified, you can do that with as_completed:
async def main():
for next_done in asyncio.as_completed([
get_html(f"https://example.com/things?id={n}")
for n in range(50)]):
html = await next_done
thing = get_thing_from_html(html)
print(f"Thing found: {thing}")
asyncio.run(main())

You can make the tasks to run in parallel with the code example below. I introduced asyncio.gather to make task run concurrently. Also I demonstrated poison pill technique and daemon task technique.
Please follow comments in code and feel free to ask questions if you have any.
import asyncio
from random import randint
WORKERS_NUMBER = 5
URL_NUM = 20
async def producer(task_q: asyncio.Queue) -> None:
"""Produce tasks and send them to workers"""
print("Producer-Task Started")
# imagine that it is a list of urls
for i in range(URL_NUM):
await task_q.put(i)
# send poison pill to workers
for i in range(WORKERS_NUMBER):
await task_q.put(None)
print("Producer-Task Finished")
async def results_shower(result_q: asyncio.Queue) -> None:
"""Receives results from worker tasks and show the result"""
while True:
res = await result_q.get()
print(res)
result_q.task_done() # confirm that task is done
async def worker(
name: str,
task_q: asyncio.Queue,
result_q: asyncio.Queue,
) -> None:
"""Get's tasks from task_q, do some job and send results to result_q"""
print(f"Worker {name} Started")
while True:
task = await task_q.get()
# if worker received poison pill - break
if task is None:
break
await asyncio.sleep(randint(1, 10))
result = task ** 2
await result_q.put(result)
print(f"Worker {name} Finished")
async def amain():
"""Wrapper around all async ops in the app"""
_task_q = asyncio.Queue(maxsize=5) # just some random maxsize
_results_q = asyncio.Queue(maxsize=5) # just some random maxsize
# we run results_shower as a "daemon task", so we never await
# if asyncio loop has nothing else to do, loop stops
# without waiting for "daemon task"
asyncio.create_task(results_shower(_results_q))
# gather block means that we run task in parallel and wait till all the task are finished
await asyncio.gather(
producer(_task_q),
*[worker(f"W-{i}", _task_q, _results_q) for i in range(WORKERS_NUMBER)]
)
# q.join() prevents loop from stopping, until results_shower print all task result
# it has some internal counter, which is decreased by task_done and increases
# q.put(). If counter is 0, the q can join.
await _results_q.join()
print("All work is finished!")
if __name__ == '__main__':
asyncio.run(amain())

Asyncio module failing to create task

My Source Code:
import asyncio
async def mycoro(number):
print(f'Starting {number}')
await asyncio.sleep(1)
print(f'Finishing {number}')
return str(number)
c = mycoro(3)
task = asyncio.create_task(c)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
loop.close()
Error:
RuntimeError: no running event loop
sys:1: RuntimeWarning: coroutine 'mycoro' was never awaited
I was watching a tutorial and according to my code it was never awaited when I did and it clearly does in the video I was watching.

Simply run the coroutine directly without creating a task for it:
import asyncio
async def mycoro(number):
print(f'Starting {number}')
await asyncio.sleep(1)
print(f'Finishing {number}')
return str(number)
c = mycoro(3)
loop = asyncio.get_event_loop()
loop.run_until_complete(c)
loop.close()
The purpose of asyncio.create_task is to create an additional task from inside a running task. Since it directly starts the new task, it must be used inside a running event loop – hence the error when using it outside.
Use loop.create_task(c) if a task must be created from outside a task.
In more recent version of asyncio, use asyncio.run to avoid having to handle the event loop explicitly:
c = mycoro(3)
asyncio.run(c)
In general, use asyncio.create_task only to increase concurrency. Avoid using it when another task would block immediately.
# bad task usage: concurrency stays the same due to blocking
async def bad_task():
task = asyncio.create_task(mycoro(0))
await task
# no task usage: concurrency stays the same due to stacking
async def no_task():
await mycoro(0)
# good task usage: concurrency is increased via multiple tasks
async def good_task():
tasks = [asyncio.create_task(mycoro(i)) for i in range(3)]
print('Starting done, sleeping now...')
await asyncio.sleep(1.5)
await asyncio.gather(*tasks) # ensure subtasks finished

Change the line
task = asyncio.Task(c)

Select first result from two coroutines in asyncio

Question
Using Python's asyncio module, how do I select the first result from multiple coroutines?
Example
I might want to implement a timeout on waiting on a queue:
result = yield from select(asyncio.sleep(1),
queue.get())
Analagous Operations
This would be similar to Go's select or Clojure's core.async.alt!. It is something like the converse of asyncio.gather (gather is like all, select would be like any.)

Simple solution, by using asyncio.wait and its FIRST_COMPLETED parameter:
import asyncio
async def something_to_wait():
await asyncio.sleep(1)
return "something_to_wait"
async def something_else_to_wait():
await asyncio.sleep(2)
return "something_else_to_wait"
async def wait_first():
done, pending = await asyncio.wait(
[something_to_wait(), something_else_to_wait()],
return_when=asyncio.FIRST_COMPLETED)
print("done", done)
print("pending", pending)
asyncio.get_event_loop().run_until_complete(wait_first())
gives:
done {<Task finished coro=<something_to_wait() done, defined at stack.py:3> result='something_to_wait'>}
pending {<Task pending coro=<something_else_to_wait() running at stack.py:8> wait_for=<Future pending cb=[Task._wakeup()]>>}
Task was destroyed but it is pending!
task: <Task pending coro=<something_else_to_wait() running at stack.py:8> wait_for=<Future pending cb=[Task._wakeup()]>>

You can implement this using both asyncio.wait and asyncio.as_completed:
import asyncio
async def ok():
await asyncio.sleep(1)
return 5
async def select1(*futures, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
return (await next(asyncio.as_completed(futures)))
async def select2(*futures, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
done, running = await asyncio.wait(futures,
return_when=asyncio.FIRST_COMPLETED)
result = done.pop()
return result.result()
async def example():
queue = asyncio.Queue()
result = await select1(ok(), queue.get())
print('got {}'.format(result))
result = await select2(queue.get(), ok())
print('got {}'.format(result))
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(example())
Output:
got 5
got 5
Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at /usr/lib/python3.4/asyncio/queues.py:170> wait_for=<Future pending cb=[Task._wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.4/asyncio/tasks.py:463]>
Task was destroyed but it is pending!
task: <Task pending coro=<get() done, defined at /usr/lib/python3.4/asyncio/queues.py:170> wait_for=<Future pending cb=[Task._wakeup()]>>
Both implementations return the value awaited from first completed Future, but you can easily tweak it to return the Future itself, instead. Note that because the other Future passed to each select implementation is never awaited, a warning gets raised when the process exits.

In the case of wanting to apply a timeout to a task, there is a standard library function that does exactly this: asyncio.wait_for(). Your example can be written like this:
try:
result = await asyncio.wait_for(queue.get(), timeout=1)
except asyncio.TimeoutError:
# This block will execute if queue.get() takes more than 1s.
result = ...
But this only works for the specific case of a timeout. The other two answers here generalize to any arbitrary set of tasks, but neither of those answers shows how to clean up the tasks which don't finish first. This is what causes the "Task was destroyed but it is pending" messages in the output. In practice, you should do something with those pending tasks. Based on your example, I'll assume that you don't care about the other tasks' results. Here's an example of a wait_first() function that returns the value of the first completed task and cancels the remaining tasks.
import asyncio, random
async def foo(x):
r = random.random()
print('foo({:d}) sleeping for {:0.3f}'.format(x, r))
await asyncio.sleep(r)
print('foo({:d}) done'.format(x))
return x
async def wait_first(*futures):
''' Return the result of the first future to finish. Cancel the remaining
futures. '''
done, pending = await asyncio.wait(futures,
return_when=asyncio.FIRST_COMPLETED)
gather = asyncio.gather(*pending)
gather.cancel()
try:
await gather
except asyncio.CancelledError:
pass
return done.pop().result()
async def main():
result = await wait_first(foo(1), foo(2))
print('the result is {}'.format(result))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
Running this example:
# export PYTHONASYNCIODEBUG=1
# python3 test.py
foo(1) sleeping for 0.381
foo(2) sleeping for 0.279
foo(2) done
the result is 2
# python3 test.py
foo(1) sleeping for 0.048
foo(2) sleeping for 0.515
foo(1) done
the result is 1
# python3 test.py
foo(1) sleeping for 0.396
foo(2) sleeping for 0.188
foo(2) done
the result is 2
There are no error messages about pending tasks, because each pending task has been cleaned up correctly.
In practice, you probably want wait_first() to return the future, not the future's result, otherwise it will be really confusing trying to figure out which future finished. But in the example here, I returned the future's result since it looks a little cleaner.

Here's a more robust solution based upon earlier examples that deals with the following:
Uses recursion to return the first non-null result (earlier examples will return first result regardless if null or non-null)
Returns the first non-null result even if another task raises an exception
In the event only non-null results are returned and an exception is raised, the last exception is raised
Deals with multiple tasks completing at the same time - in practice this would be very rare but it can pop up in unit tests where fake async tasks complete immediately.
Note this example requires Python 3.8 due to use of the assignment operator.
async def wait_first(*tasks):
"""Return the result of first async task to complete with a non-null result"""
# Get first completed task(s)
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
# Tasks MAY complete at same time e.g. in unit tests :)
# Coalesce the first result if present
for task in done:
exception = task.exception()
if exception is None and (result := task.result()):
break
else:
result = None
# Gather remaining tasks without raising exceptions
gather = asyncio.gather(*pending, return_exceptions=True)
# Cancel remaining tasks if result is non-null otherwise await next pending tasks
if result:
gather.cancel()
elif pending:
result = await wait_first(*pending)
# Await remaining tasks to ensure they are cancelled
try:
await gather
except asyncio.CancelledError:
pass
# Return result or raise last exception if no result was returned
if exception and result is None:
raise exception
else:
return result

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

asyncio create_task() to be first in queue? - python

cancel_all() could exclude the current task: async def cancel_all(): to_cancel = set(tasks) to_cancel.discard(asyncio.current_task()) for task in to_cancel: task.cancel() await asyncio.gather(*to_cancel)

Related

Dynamically add new tasks after gather(*tasks)

Python Asyncio preferred concurrent design for scalable application

How to handle tasks with varying completion times

Asyncio module failing to create task

Select first result from two coroutines in asyncio

Categories

Resources