I have an API, which takes a long time to do. So I want to split it into many smaller jobs, then run them in parallel and wait for the result before sending the response.
My snip code:
#app.post("/data")
async def get_tsp_events:
query = [foo, bar, foo, bar]
tasks = [asyncio.create_task(do_work(param1, param2)) for query in queries]
events = await asyncio.gather(*tasks)
return events
async def do_something(arg1, arg2):
log("Start time")
# This take a lot of times
events = [event for event in range(10000000000)]
log("End time")
return events
As I see, all task is run in sequence just like normal code (without using asyncio.create_task() and asyncio.gather)
I'm new in Python and my question is:
I'm wrong or not ? Where ?
There are any solution can help me ?
Thank you all
In fact, async and await didn't introduce real parallel, they works when you do some underlying "asynchronous" operations like aiohttp, aiofile. Nevertheless, they only bring concurrency for IO-bound tasks.
If you want real parallel for CPU-bound tasks, use multiprocessing.
Related
Update: The entire premise for this question just demonstrated my lack of understanding of the concept; insightful answers below - but the question is in it's entirety "just wrong".
I am trying to teach myself about the Python async execution model. The following example program downloads five different web pages asyncronusously:
#!/usr/bin/env python3
import requests
import asyncio
async def download(url):
response = requests.get(url)
print(f"Have downloaded {url}")
async def async_main():
for url in ["https://www.aftenposten.no",
"https://www.vg.no",
"https://lwn.net",
"https://www.dagbladet.no",
"https://www.nrk.no"]:
await download(url)
loop = asyncio.get_event_loop()
loop.run_until_complete(async_main())
~it has roughly the expected speedup and works as expected - all good!~ However I am struggling to understand what happens on the await download(url) line. My layman understanding is that the following process takes place:
The download(url) function is called - "in the background".
The event loop pauses the current coroutine instance and starts the next.
However - for this to work the download(url) call must be in "some execution context", i.e. my guess is that the async implementation is threaded internally? I.e. after some initial fencing the async implementation will invoke the download(url) in a separate execution context - i.e. thread? This is in some contrast to the documentation which states that the async concurrency model does not involve multiple threads/processes?
Grateful for a clarification.
Update: the speedup has been questioned both in comment answers. I have now redone the timing a bit more carefully, and see now that I was wrong - probably saw the result I wanted to see.... More careful timing indicates that the serial is slightly faster. Sorry about the confusion.
it has roughly the expected speedup and works as expected
That doesn't really make sense? There is no speedup in your program, it's completely sequential. If anything there's a small slowdown because it needs to setup an async event loop for nothing.
The download(url) function is called - "in the background".
No, the download(url) function is called in the foreground, but rather than actually call the function right then and there it creates a coroutine. await then "passes" that coroutine upwards until it reaches the event loop, which can run it.
The event loop pauses the current coroutine instance and starts the next.
Coroutines are cooperatives, so it's the exact opposite: the event loop runs a coroutine until that coroutine decides to stop.
At this point if the coroutine yields an awaitable the event loop registers the await-able internally in order to know when it is ready to progress, and runs (resumes) the next task.
However - for this to work the download(url) call must be in "some execution context", i.e. my guess is that the async implementation is threaded internally? I.e. after some initial fencing the async implementation will invoke the download(url) in a separate execution context - i.e. thread?
Your program doesn't work at all because requests has no async support (hence not being await-ed), it's completely blocking.
But an async-aware library would not normally use threading internally, instead it would use non-blocking IO primitives.
There are limited cases where the OS does not support or provide non-blocking IO for an IO task (network address resolution — gethostbyname, getaddrinfo is probably the most common one) in which case the runtime may maintain a pool of helper threads for that purpose, but that should not be the baseline. I don't think it's the case for Python's stdlib though.
This is in some contrast to the documentation which states that the async concurrency model does not involve multiple threads/processes?
No, the documentation is broadly correct.
The documentation is correct.
It involves control passing, it does not signal, to the download(url). Think of it more of a subroutine that while it runs nothing else is running until the download(url) relinquishes control or completes.
There is no threading involved in asyncio. Fundamentally, asyncio coroutines are just fancy Python generators, wrapped in an event loop to handle scheduling.
Consider the following code:
import string
import time
def task1():
for x in string.digits:
yield x
time.sleep(0.5)
def task2():
for x in string.ascii_lowercase:
yield x
time.sleep(0.5)
def loop():
tasks = [task1(), task2()]
completed = set()
while tasks:
for t in tasks:
try:
print(f"task {t.__name__} says:", next(t))
except StopIteration:
completed.add(t)
tasks = [t for t in tasks if t not in completed]
if __name__ == "__main__":
loop()
Here, I have defined two generators (task1 and task2). In loop(), I "start" both tasks; because they use yield, calling the function returns an iterator, rather than executing the function code.
Now they are both running concurrently, though not in parallel -- much like asyncio coroutines. Each function can run as long as it wants until it calls yield, at which point control returns to the loop() function, which gets to decide which task executes next.
Running the above code produces output that looks like:
task task1 says: 0
task task2 says: a
task task1 says: 1
task task2 says: b
task task1 says: 2
task task2 says: c
task task1 says: 3
task task2 says: d
task task1 says: 4
task task2 says: e
task task1 says: 5
task task2 says: f
task task1 says: 6
task task2 says: g
task task1 says: 7
task task2 says: h
task task1 says: 8
task task2 says: i
task task1 says: 9
task task2 says: j
task task2 says: k
...
The article "From yield to async/await
" seems to be a really great overview of the topic.
So given a bit of a complex setup, which is used to generate a list of queries to be run semi-parallel (using a semaphore to not run too many queries at the same time, to not DDoS the server).
i have an (in itself async) function that creates a number of queries:
async def run_query(self, url):
async with self.semaphore:
return await some_http_lib(url)
async def create_queries(self, base_url):
# ...gathering logic is ofc a bit more complex in the real setting
urls = await some_http_lib(base_url).json()
coros = [self.run_query(url) for url in urls] # note: not executed just yet
return coros
async def execute_queries(self):
queries = await self.create_queries('/seom/url')
_logger.info(f'prepared {len(queries)} queries')
results = []
done = 0
# note: ofc, in this simple example call these would not actually be asynchronously executed.
# in the real case i'm using asyncio.gather, this just makes for a slightly better
# understandable example.
for query in queries:
# at this point, the request is actually triggered
result = await query
# ...some postprocessing
if not result['success']:
raise QueryException(result['message']) # ...internal exception
done += 1
_logger.info(f'{done} of {len(queries)} queries done')
results.append(result)
return results
Now this works very nicely, executing exactly as i planned and i can handle an exception in one of the queries by aborting the whole operation.
async def run():
try:
return await QueryRunner.execute_queries()
except QueryException:
_logger.error('something went horribly wrong')
return None
The only problem is that the program is terminated, but leaves me with the usual RuntimeWarning: coroutine QueryRunner.run_query was never awaited, because the queries later in the queue are (rightfully) not executed and as such not awaited.
Is there any way to cancel these unawaited coroutines? Would it be otherwise possible to supress this warning?
[Edit] a bit more context as of how the queries are executed outside this simple example:
the queries are usually grouped together, so there is multiple calls to create_queries() with different parameters. then all collected groups are looped and the queries are executed using asyncio.gather(group). This awaits all the queries of one group, but if one fails, the other groups are canceled aswell, which results in the error being thrown.
So you are asking how to cancel a coroutine that has not yet been either awaited or passed to gather. There are two options:
you can call asyncio.create_task(c).cancel()
you can directly call c.close() on the coroutine object
The first option is a bit more heavyweight (it creates a task only to immediately cancel it), but it uses the documented asyncio functionality. The second option is more lightweight, but also more low-level.
The above applies to coroutine objects that have never been converted to tasks (by passing them to gather or wait, for example). If they have, for example if you called asyncio.gather(*coros), one of them raised and you want to cancel the rest, you should change the code to first convert them to tasks using asyncio.create_task(), then call gather, and use finally to cancel the unfinished ones:
tasks = list(map(asyncio.create_task, coros))
try:
results = await asyncio.gather(*tasks)
finally:
# if there are unfinished tasks, that is because one of them
# raised - cancel the rest
for t in tasks:
if not t.done():
t.cancel()
Use
pending = asyncio.tasks.all_tasks() # < 3.7
or
pending = asyncio.all_tasks() # >= 3.7 (not sure)
to get the list of pending tasks. You can wait for them with
await asyncio.wait(pending, return_when=asyncio.ALL_COMPLETED)
or cancel them:
for task in pending:
task.cancel()
I have written a async function which collects multiple text data and does data processsing in a batch. After that, it returns the output, like this:
import sys
import asyncio
Model_runner():
'''
The model runner combines all the input coming to it and combines in a batch of 10 or 1 sec, which ever duration is less.
After combining, it does processing and returns the output
'''
loop = asyncio.get_event_loop()
model_obj = ModelRunner(loop)
loop.create_task(model_obj.model_runner())
async def process_text(text):
out_ = await model_obj.process_input(text)
return out_
To get the output, I am running the following code:
task1 = asyncio.ensure_future(process_text(text1))
task2 = asyncio.ensure_future(process_text(text2))
task3 = asyncio.ensure_future(process_text(text3))
task4 = asyncio.ensure_future(process_text(text4))
async_tasks = [task1, task2, task3, task4]
out1, out2 ,out3 ,out4 = loop.run_until_complete(asyncio.gather(*async_tasks))
Here, out1, out2, out3, and out4 are the output after processing the text data.
Here, I do not want to combine the task like [task1, task2, task3, task4] and then call the loop.run_until_complete to get the output. Instead, I am looking for a function like this:
out1 = func(text1)
out2 = func(text2)
etc..
But, they should work in in non blocking way like asyncio.ensure_future. How can I do that. Thanks in advance.
Two obvious options:
If you already have multiple threads, why bother with asyncio at all? Just make process_text a regular blocking function and call it from those threads.
Conversely, if you're using asyncio, why use multiple threads at all? Make your top-level tasks async and run them all in one thread.
If you really must use multiple threads and async functions:
Have a single thread running your asyncio loop and the worker threads you already mentioned, and use loop.call_soon_threadsafe in the threads to force the asyncs function to run in the async thread. If you want to get the result back to the thread you can use a queue.Queue to send the result (or results) back.
This final option is the worst one possible and almost certainly not what you want, but I mention it for completeness: start a separate asyncio event loop from each thread that needs it and use those to run your async functions in the worker threads directly.
Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?
asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.
Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
I've written a library of objects, many which make HTTP / IO calls. I've been looking at moving over to asyncio due to the mounting overheads, but I don't want to rewrite the underlying code.
I've been hoping to wrap asyncio around my code in order to perform functions asynchronously without replacing all of my deep / low level code with await / yield.
I began by attempting the following:
async def my_function1(some_object, some_params):
#Lots of existing code which uses existing objects
#No await statements
return output_data
async def my_function2():
#Does more stuff
while True:
loop = asyncio.get_event_loop()
tasks = my_function(some_object, some_params), my_function2()
output_data = loop.run_until_complete(asyncio.gather(*tasks))
print(output_data)
I quickly realised that while this code runs, nothing actually happens asynchronously, the functions complete synchronously. I'm very new to asynchronous programming, but I think this is because neither of my functions are using the keyword await or yield and thus these functions are not cooroutines, and do not yield, thus do not provide an opportunity to move to a different cooroutine. Please correct me if I am wrong.
My question is, is it possible to wrap complex functions (where deep within they make HTTP / IO calls ) in an asyncio await keyword, e.g.
async def my_function():
print("Welcome to my function")
data = await bigSlowFunction()
UPDATE - Following Karlson's Answer
Following and thanks to Karlsons accepted answer, I used the following code which works nicely:
from concurrent.futures import ThreadPoolExecutor
import time
#Some vars
a_var_1 = 0
a_var_2 = 10
pool = ThreadPoolExecutor(3)
future = pool.submit(my_big_function, object, a_var_1, a_var_2)
while not future.done() :
print("Waiting for future...")
time.sleep(0.01)
print("Future done")
print(future.result())
This works really nicely, and the future.done() / sleep loop gives you an idea of how many CPU cycles you get to use by going async.
The short answer is, you can't have the benefits of asyncio without explicitly marking the points in your code where control may be passed back to the event loop. This is done by turning your IO heavy functions into coroutines, just like you assumed.
Without changing existing code you might achieve your goal with greenlets (have a look at eventlet or gevent).
Another possibility would be to make use of Python's Future implementation wrapping and passing calls to your already written functions to some ThreadPoolExecutor and yield the resulting Future. Be aware, that this comes with all the caveats of multi-threaded programming, though.
Something along the lines of
from concurrent.futures import ThreadPoolExecutor
from thinair import big_slow_function
executor = ThreadPoolExecutor(max_workers=5)
async def big_slow_coroutine():
await executor.submit(big_slow_function)
As of python 3.9 you can wrap a blocking (non-async) function in a coroutine to make it awaitable using asyncio.to_thread(). The exampe given in the official documentation is:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
This seems like a more joined up approach than using concurrent.futures to make a coroutine, but I haven't tested it extensively.