I want to write a library that mixes synchronous and asynchronous work, like:
def do_things():
# 1) do sync things
# 2) launch a bunch of slow async tasks and block until they are all complete or an exception is thrown
# 3) more sync work
# ...
I started implementing this using asyncio as an excuse to learn the learn the library, but as I learn more it seems like this may be the wrong approach. My problem is that there doesn't seem to be a clean way to do 2, because it depends on the context of the caller. For example:
I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
Marking do_things as async is too heavy because it shouldn't require the caller to be async. Plus, if do_things was async, calling synchronous code (1 & 3) from an async function seems to be bad practice.
asyncio.get_event_loop() also seems wrong, because it may create a new loop, which if left running would prevent the caller from creating their own loop after calling do_things (though arguably they shouldn't do that). And based on the documentation of loop.close, it looks like starting/stopping multiple loops in a single thread won't work.
Basically it seems like if I want to use asyncio at all, I am forced to use it for the entire lifetime of the program, and therefore all libraries like this one have to be written as either 100% synchronous or 100% asynchronous. But the behavior I want is: Use the current event loop if one is running, otherwise create a temporary one just for the scope of 2, and don't break client code in doing so. Does something like this exist, or is asyncio the wrong choice?
I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
If the caller has a running event loop, you shouldn't run blocking code in the first place because it will block the caller's loop!
With that in mind, your best option is to indeed make do_things async and call sync code using run_in_executor which is designed precisely for that use case:
async def do_things():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, sync_stuff)
await async_func()
await loop.run_in_executor(None, more_sync_stuff)
This version of do_things is usable from async code as await do_things() and from sync code as asyncio.run(do_things()).
Having said that... if you know that the sync code will run very briefly, or you are for some reason willing to block the caller's event loop, you can work around the limitation by starting an event loop in a separate thread:
def run_async(aw):
result = None
async def run_and_store_result():
nonlocal result
result = await aw
t = threading.Thread(target=asyncio.run, args=(run_and_store_result(),))
t.start()
t.join()
return result
do_things can then look like this:
async def do_things():
sync_stuff()
run_async(async_func())
more_sync_stuff()
It will be callable from both sync and async code, but the cost will be that:
it will create a brand new event loop each and every time. (Though you can cache the event loop and never exit it.)
when called from async code, it will block the caller's event loop, thus effectively breaking its asyncio usage, even if most time is actually spent inside its own async code.
Related
I'm not very experienced in Python asyncio, although synchronous Python is going well.
I have a function, which creates a task list, and another function which is to be called with tasks in this list:
import asyncio
async def createTasks(some_dict):
coroutines = []
# some_dict can have an arbitrary number of items
for item in some_dict:
coroutines.append(executeOneTask(item))
tasks = await asyncio.gather(*coroutines, return_exceptions=True)
return tasks
async def executeOneTask(item):
# execute a task asynchronously
return
Here's the part where you are free to correct me if I'm wrong.
Now, my understanding of asyncio is that I need an event loop to execute an asynchronous function, which means that to asyncio.gather I need to await it that means this needs to happen inside an async function. OK, so I need an event loop to create my list of asynchronous tasks that I actually want to execute asynchronously.
If my understanding of event loops is correct, I cannot easily add tasks inside an event loop to that same event loop. Let's assume that I have an asynchronous main() function which is supposed to first retrieve a list of asynchronous tasks using createTasks() and then create an amount (equal to list length) of asynchronous tasks to be run by utilizing executeOneTask().
How should I approach the construction of such a main() function? Do I need multiple event loops? Can I create a task list some other way, which enables easier code?
Side note: The approach I have set up here might be a very difficult or upside-down way to solve the problem. What I aim to do is to create a list of asynchronous tasks and then run those tasks asynchronously. Feel free to not follow the code structure above if a smart solution requires that.
Thanks!
You should only use one event loop in the entire application. Start the main function by asyncio.run(main()) and asyncio creates a loop for you. With Python 3.8 you rarely need to access or use the loop directly but with older versions you may obtain it by asyncio.get_event_loop() if using loop methods or some functions that require loop argument.
Do note that IPython (used in Spyder and Jupyter) also runs its own loop, so in those you can directly call and await without calling asyncio.run.
If you only wish to do async programming but don't specifically need to work with asyncio, I would recommend checking out https://trio.readthedocs.io/ which basically does the same things but is much, much easier to use (correctly).
I am trying to understand asyncio module and spend about one hour with run_coroutine_threadsafe function, I even came to the working example, it works as expected, but works with several limitations.
First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
What are situations when I have to mix asyncio and threading (in Python) in real life? Since as far as I understand asyncio is supposed to take place of threading in Python (due to GIL for not IO ops), if I am wrong, do not be angry and share your suggestions.
Python version is 3.7/3.8
import asyncio
import threading
import time
async def coro_func():
return await asyncio.sleep(3, 42)
def another_thread(_loop):
coro = coro_func() # is local thread coroutine which we would like to run in another thread
# _loop is a loop which was created in another thread
future = asyncio.run_coroutine_threadsafe(coro, _loop)
print(f"{threading.current_thread().name}: {future.result()}")
time.sleep(15)
print(f"{threading.current_thread().name} is Finished")
if __name__ == '__main__':
loop = asyncio.get_event_loop()
main_th_cor = asyncio.sleep(10)
# main_th_cor is used to make loop busy with something until another_thread will not send coroutine to it
print("START MAIN")
x = threading.Thread(target=another_thread, args=(loop, ), name="Some_Thread")
x.start()
time.sleep(1)
loop.run_until_complete(main_th_cor)
print("FINISH MAIN")
First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
This is a good use case for loop.run_forever(). The loop will run and serve the coroutines you submit using run_coroutine_threadsafe. (You can even submit such coroutines from multiple threads in parallel; you never need to instantiate more than one event loop.)
You can stop the loop from a different thread by calling loop.call_soon_threadsafe(loop.stop).
What are situations when I have to mix asyncio and threading (in Python) in real life?
Ideally there should be none. But in the real world, they do crop up; for example:
When you are introducing asyncio into an existing large program that uses threads and blocking calls and cannot be converted to asyncio all at once. run_coroutine_threadsafe allows regular blocking code to make use of asyncio.
When you are dealing with older "async" APIs which use threads under the hood and call the user-supplied APIs from other threads. There are many examples, such as Python's own multiprocessing.
When you need to call blocking functions that have no async equivalent from asyncio - e.g. CPU-bound functions, legacy database drivers, things like that. This is not a use case for run_coroutine_threadsafe, here you'd use run_in_executor, but it is another example of mixing threads and asyncio.
In the Python docs, it states:
Application developers should typically use the high-level asyncio
functions, such as asyncio.run(), and should rarely need to reference
the loop object or call its methods.
This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior.
When using both async and a threadpoolexecutor, as show in the example code (from the docs):
import asyncio
import concurrent.futures
def blocking_io():
# File operations (such as logging) can block the
# event loop: run them in a thread pool.
with open('/dev/urandom', 'rb') as f:
return f.read(100)
async def main():
loop = asyncio.get_running_loop()
# 2. Run in a custom thread pool:
with concurrent.futures.ThreadPoolExecutor() as pool:
result = await loop.run_in_executor(
pool, blocking_io)
print('custom thread pool', result)
asyncio.run(main())
Do I need to call loop.close(), or would the asyncio.run() close the loop for me?
Is using both asyncio and threadpoolexecutor together, one of those situations where you need finer control over the event loop? Can using both, asyncio and threadpoolexecutor together be done without referencing the loop?
For question #1, The coroutines and tasks documentation linked in the Event Loop documentation you reference indicates that asyncio.run closes the loop:
asyncio.run(coro, *, debug=False)
Execute the coroutine coro and return the result.
This function runs the passed coroutine, taking care of managing the
asyncio event loop and finalizing asynchronous generators.
This function cannot be called when another asyncio event loop is
running in the same thread.
If debug is True, the event loop will be run in debug mode.
This function always creates a new event loop and closes it at the
end. It should be used as a main entry point for asyncio programs, and
should ideally only be called once.
For #2, that use of get_running_loop with ThreadExecutor is a way to run blocking code without blocking the OS thread. In Developing with asyncio, they indicate:
The loop.run_in_executor() method can be used with a concurrent.futures.ThreadPoolExecutor to execute blocking code in a different OS thread without blocking the OS thread that the event loop runs in.
It is an exception to the general warning they give in the Event documentation around calling the lower-level methods. So the answer to question #2, the answer is yes, it is one of those situations where you need a small amount of finer control in order to execute this recipe that handles blocking code in async scenarios.
I am writing a Python program which schedules a number of asynchronous, I/O-bound items to occur, many of which will also be scheduling other, similar work items. The work items themselves are completely independent of one another and they do not require each others' results to be complete, nor do I need to gather any results from them for any sort of local output (beyond logging, which takes place as part of the work items themselves).
I was originally using a pattern like this:
async def some_task(foo):
pending = []
for x in foo:
# ... do some work ...
if some_condition:
pending.append(some_task(bar))
if pending:
await asyncio.wait(pending)
However, I was running into trouble with some of the nested asyncio.wait(pending) calls sometimes hanging forever, even though the individual things being awaited were always completing (according to the debug output that was produced when I used KeyboardInterrupt to list out the state of the un-gathered results, which showed all of the futures as being in the done state). When I asked others for help they said I should be using asyncio.create_task instead, but I am not finding any useful information about how to do this nor have I been able to get clarification from the people who suggested this.
So, how can I satisfy this use case?
Python asyncio.Queue may help to tie your program processing to program completion. It has a join() method which will block until all items in the queue have been received and processed.
Another benefit that I like is that the worker becomes more explicit as it pulls from a queue processes, potentially adds more items, and then ACKS, but this is just personal preference.
async def worker(q):
while True:
item = await queue.get()
# process item potentially requeue more work
if some_condition:
await q.put('something new')
queue.task_done()
async def run():
queue = asyncio.Queue()
worker = asyncio.ensure_future(worker(queue))
await queue.join()
worker.cancel()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
The example above was adapted from asyncio producer_consumer example and modified since your worker both consumes and produces:
https://asyncio.readthedocs.io/en/latest/producer_consumer.html
I'm not super sure how to fix your specific example but I would def look at the primitives that asyncio offers to help the event loop hook into your program state, notably join and using a Queue.
A common pattern with asyncio, like the one shown here, is to add a collection of coroutines to a list, and then asyncio.gather them.
For instance:
async def some_task(i):
# Do something asynchronously with i
tasks = [some_task(i) for i in range(100)]
loop.run_until_complete(asyncio.gather(**tasks))
Here, the execution order of this code is such that none of the tasks are running while we build up the list. We add task 1 to the list, then task 2, etc. and then we add the tasks 1-100 to the event loop.
However, I want task creation itself to be part of the event loop. I want task 1 to be scheduled immediately as it's created, and then when task is waiting for something on another thread, return to task creation and create task 2 and add it to the event loop.
I believe this would give me better concurrency from my async code. Is this possible?
For example, my first thought would be to put task creation into a coroutine and schedule tasks as they are created:
async def some_task(i):
# Do something asynchronously with i
async def generate_tasks(loop):
tasks = []
for i in range(100):
task = loop.create_task(some_task(i))
tasks.append(loop)
await asyncio.gather(**tasks)
loop.run_until_complete(generate_tasks())
However, because my generate_tasks never uses await, execution is never passed back to the event loop, so the entirety of generate_tasks will run before some_task() is run at all.
But then, if I await each task as they are created, it will wait for each task to complete before moving on to the next task, giving me no concurrency at all!
async def generate_tasks(loop):
tasks = []
for i in range(100):
await some_task(i)
loop.run_until_complete(generate_tasks())
However, because my generate_tasks never uses await, execution is never passed back to the event loop
You can use await asyncio.sleep(0) to force yielding to the event loop inside for. But that is unlikely to make a difference, creating a task/coroutine pair is really efficient.
Before optimizing this, measure (with something as simple as time.time if need be) how much time it takes to execute the [some_task(i) for i in range(100)] list comprehension. Then consider whether dispersing that time (possibly making it take longer to finish due to increased scheduling overhead) will make any difference for your application. The results might surprise you.