Async Python: blocking/non-blocking for loop - python

This is an interesting situation I have come across. I wrote some code a while back that was synchronous but then I switched to async. The following code resembles the async code.
async def some_coroutine(args, obj):
genarg1, genarg2 = args
for (param1, param2) in some_generator(genarg1, genarg2):
await asyncio.sleep(0) # do this to prevent blocking and give control to event loop
# do work
for i in enumerate(param2):
await asyncio.sleep(0) # do this to prevent blocking and give control to event loop
obj.somefunc(i, param1)
pass
I want to refactor the above such that I can make it compatible with some of the non-async code. I used to have it where the for loops can be called in their own functions except this would block the eventloop. I don't want them to take over the event loop without giving control back to the event loop from time to time. I'd like to refactor to something like this but can't figure out how to avoid the blocking aspect of it:
async def some_coroutine(args, obj):
genarg1, genarg2 = args
somefunc(genarg1, genarg2, obj)
def somefunc(genarg1, genarg2, obj):
for (param1, param2) in some_generator(genarg1, genarg2):
# do work
for i in enumerate(param2):
obj.somefunc(i, param1)
pass
Clearly, the first code block, the protocol attempted to not block the event loop because the code was in one routine and had await asyncio.sleep(0). But now the refactored code breaks apart the for loop and is blocking and I'm not able to place await asyncio.sleep(0) in somefunc. I'd like to refactor the code this way so I could call it from other functions that don't use eventloops (e.g., test cases) but when an eventloop is used, I'd prefer it to be versatile enough to not block it.
Is this possible or am I just thinking about it wrong (i.e., refactor the code differently)?

Related

How to call async function from sync funcion and get result, while a loop is already running

I have a asyncio running loop, and from the coroutine I'm calling a sync function, is there any way we can call and get result from an async function in a sync function
tried below code, it is not working
want to print output of hel() in i() without changing i() to async function
is it possible, if yes how?
import asyncio
async def hel():
return 4
def i():
loop = asyncio.get_running_loop()
x = asyncio.run_coroutine_threadsafe(hel(), loop) ## need to change
y = x.result() ## this lines
print(y)
async def h():
i()
asyncio.run(h())
This is one of the most commonly asked type of question here. The tools to do this are in the standard library and require only a few lines of setup code. However, the result is not 100% robust and needs to be used with care. This is probably why it's not already a high-level function.
The basic problem with running an async function from a sync function is that async functions contain await expressions. Await expressions pause the execution of the current task and allow the event loop to run other tasks. Therefore async functions (coroutines) have special properties that allow them to yield control and resume again where they left off. Sync functions cannot do this. So when your sync function calls an async function and that function encounters an await expression, what is supposed to happen? The sync function has no ability to yield and resume.
A simple solution is to run the async function in another thread, with its own event loop. The calling thread blocks until the result is available. The async function behaves like a normal function, returning a value. The downside is that the async function now runs in another thread, which can cause all the well-known problems that come with threaded programming. For many cases this may not be an issue.
This can be set up as follows. This is a complete script that can be imported anywhere in an application. The test code that runs in the if __name__ == "__main__" block is almost the same as the code in the original question.
The thread is lazily initialized so it doesn't get created until it's used. It's a daemon thread so it will not keep your program from exiting.
The solution doesn't care if there is a running event loop in the main thread.
import asyncio
import threading
_loop = asyncio.new_event_loop()
_thr = threading.Thread(target=_loop.run_forever, name="Async Runner",
daemon=True)
# This will block the calling thread until the coroutine is finished.
# Any exception that occurs in the coroutine is raised in the caller
def run_async(coro): # coro is a couroutine, see example
if not _thr.is_alive():
_thr.start()
future = asyncio.run_coroutine_threadsafe(coro, _loop)
return future.result()
if __name__ == "__main__":
async def hel():
await asyncio.sleep(0.1)
print("Running in thread", threading.current_thread())
return 4
def i():
y = run_async(hel())
print("Answer", y, threading.current_thread())
async def h():
i()
asyncio.run(h())
Output:
Running in thread <Thread(Async Runner, started daemon 28816)>
Answer 4 <_MainThread(MainThread, started 22100)>
In order to call an async function from a sync method, you need to use asyncio.run, however this should be the single entry point of an async program so asyncio makes sure that you don't do this more than once per program, so you can't do that.
That being said, this project https://github.com/erdewit/nest_asyncio patches the asyncio event loop to do that, so after using it you should be able to just call asyncio.run in your sync function.

Wait for async function to complete

My question is more or less like this, which is really an X-Y problem leading back to this. This is, however, not a duplicate, because my use case is slightly different and the linked threads don't answer my question.
I am porting a set of synchronous programs from Java to Python. These programs interact with an asynchronous library. In Java, I could block and wait for this library's asynchronous functions to return a value and then do things with that value.
Here's a code sample to illustrate the problem.
def do_work_sync_1(arg1, arg2, arg3):
# won't even run because await has to be called from an async function
value = await do_work_async(arg1, arg2, arg3)
def do_work_sync_2(arg1, arg2, arg3):
# throws "Loop already running" error because the async library referenced in do_work_async is already using my event loop
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(do_work_async(arg1, arg2, arg3))
def do_work_sync_3(arg1, arg2, arg3):
# throws "got Future attached to a different loop" because the do_work_async refers back to the asynchronous library, which is stubbornly attached to my main loop
thread_pool = ThreadPoolExecutor()
future = thread_pool.submit(asyncio.run, do_work_async(arg1, arg2, arg3)
result = future.result()
def do_work_sync_4(arg1, arg2, arg3):
# just hangs forever
event_loop = asyncio.get_event_loop()
future = asyncio.run_coroutine_threadsafe(do_work_async(arg1, arg2, arg3), event_loop)
return_value = future.result()
async def do_work_async(arg1, arg2, arg3):
value_1 = await async_lib.do_something(arg1)
value_2 = await async_lib.do_something_else(arg2, arg3)
return value_1 + value_2
Python appears to be trying very hard to keep me from blocking anything, anywhere. await can only be used from async def functions, which in their turn must be awaited. There doesn't seem to be a built-in way to keep async def/await from spreading through my code like a virus.
Tasks and Futures don't have any built-in blocking or wait_until_complete mechanisms unless I want to loop on Task.done(), which seems really bad.
I tried asyncio.get_event_loop().run_until_complete(), but that produces an error: This event loop is already running. Apparently I'm not supposed to do that for anything except main().
The second linked question above suggests using a separate thread and wrapping the async function in that. I tested this with a few simple functions and it seems to work as a general concept. The problem here is that my asynchronous library keeps a reference to the main thread's event loop and throws an error when I try to refer to it from the new thread: got Future <Future pending> attached to a different loop.
I considered moving all references to the asynchronous library into a separate thread, but I realized that I still can't block in the new thread, and I'd have to create a third thread for blocking calls, which would bring me back to the Future attached to a different loop error.
I'm pretty much out of ideas here. Is there a way to block and wait for an async function to return, or am I really being forced to convert my entire program to async/await? (If it's the latter, an explanation would be nice. I don't get it.)
It took me some time, but finally I've found the actual question 😇
Is there a way to block and wait for an async function to return, or am I really being forced to convert my entire program to async/await?
There is a high-level function asyncio.run(). It does three things:
create new event loop
run your async function in that event loop
wait for any unfinished tasks and close the loop
Its source code is here: https://github.com/python/cpython/blob/3221a63c69268a9362802371a616f49d522a5c4f/Lib/asyncio/runners.py#L8 You see it uses loop.run_until_complete(main) under the hood.
If you are writing completely asynchronous code, you are supposed to call asyncio.run() somewhere at the end of your main() function, I guess. But that doesn't have to be the case. You can run it wherever you want, as many times you want. Caveats:
in given thread, at one time, there can be only one running event loop
do not run it from async def function, because, obviously, you have already one event loop running, so you can just call that function using await instead
Example:
import asyncio
async def something_async():
print('something_async start')
await asyncio.sleep(1)
print('something_async done')
for i in range(3):
asyncio.run(something_async())
You can have multiple threads with their own event loop:
import asyncio
import threading
async def something_async():
print('something_async start in thread:', threading.current_thread())
await asyncio.sleep(1)
print('something_async done in thread:', threading.current_thread())
def main():
t1 = threading.Thread(target=asyncio.run, args=(something_async(), ))
t2 = threading.Thread(target=asyncio.run, args=(something_async(), ))
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == '__main__':
main()
If you encounter this error: Future attached to a different loop That may mean two tings:
you are using resources tied to another event loop than you are running right now
you have created some resource before starting an event loop - it uses a "default event loop" in that case - but when you run asyncio.run(), you start a different loop. I've encountered this before: asyncio.Semaphore RuntimeError: Task got Future attached to a different loop
You need to use Python version at least 3.5.3 - explanation here.

How to iterate over a large list without blocking event loop

I have a python script with a running asyncio event loop, I want to know how to iterate over a large list without blocking the event loop. Thus keeping the loop running.
I've tried making a custom class with __aiter__ and __anext__ which did not work, I've also tried making an async function that yields the result but it still blocks.
Currently:
for index, item in enumerate(list_with_thousands_of_items):
# do something
The custom class I've tried:
class Aiter:
def __init__(self, iterable):
self.iter_ = iter(iterable)
async def __aiter__(self):
return self
async def __anext__(self):
try:
object = next(self.iter_)
except StopIteration:
raise StopAsyncIteration
return object
But that always results in
TypeError: 'async for' received an object from __aiter__ that does not implement __anext__: coroutine
The async function I made which works but still blocks the event loop is:
async def async_enumerate(iterable, start:int=0):
for idx, i in enumerate(iterable, start):
yield idx, i
As #deceze pointed out, you can use await asyncio.sleep(0) to explicitly pass control to the event loop. There are problems with this approach, though.
Presumably the list is quite large, which is why you needed special measures to unblock the event loop. But if the list is so large, forcing each loop iteration to yield to the event loop will slow it down considerably. Of course, you can alleviate that by adding a counter and only awaiting when i%10 == 0 or when i%100 == 0, etc. But then you have to make arbitrary decisions (guess) regarding how often to give up control. If you yield too often, you're slowing down your function. If you yield too seldom, you're making the event loop unresponsive.
This can be avoided by using run_in_executor, as suggested by RafaëlDera. run_in_executor accepts a blocking function and offloads its execution to a thread pool. It immediately returns a future that can be awaited in asyncio and whose result, once available, will be the return value of the blocking function. (If the blocking function raises, the exception will be propagated instead.) Such await will suspend the coroutine until the function returns or raises in its thread, allowing the event loop to remain fully functional in the meantime. Since the blocking function and the event loop run in separate threads, the function doesn't need to do anything to allow the event work to run - they operate independently. Even the GIL is not a problem here because GIL ensures that the control is passed between threads.
With run_in_executor your code could look like this:
def process_the_list():
for index, item in enumerate(list_with_thousands_of_items):
# do something
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, process_the_list)
asyncio is cooperative multitasking. The cooperative part comes from the fact that your function must yield execution back to the event loop to allow other things to run. Unless you await something (or end your function), you're hogging the event loop.
You can simply await some noop event, probably the most suitable is await asyncio.sleep(0). This ensures your task will resume as soon as possible, but allow other tasks to be scheduled as well.

How can I prevent context switching when calling an async function?

If I use async functions, then all the functions above the stack should also be async, and their call should be preceded by the await keyword. This example emulates modern programs with several architectural layers of the application:
async def func1():
await asyncio.sleep(1)
async def func2():
await func1()
async def func3():
await func2()
async def func4():
await func3()
async def func5():
await func4()
When an execution thread meet 'await', it can switch to another coroutine, which requires resources for context switching. With a large number of competing corutes and different levels of abstraction, these overheads may begin to limit the performance of the entire system. But in the presented example it makes sense to switch the context only in one case, on line:
await asyncio.sleep(1)
How can I ban context switching for certain asynchronous functions?
First of all, by default in your example context wouldn't be switched. In other words, until coroutine faces something actually blocking (like Future) it won't return control to event loop and resume its way directly to an inner coroutine.
I don't know easier way to demonstrate this than inheriting default event loop implementation:
import asyncio
class TestEventLoop(asyncio.SelectorEventLoop):
def _run_once(self):
print('control inside event loop')
super()._run_once()
async def func1():
await asyncio.sleep(1)
async def func2():
print('before func1')
await func1()
print('after func1')
async def main():
print('before func2')
await func2()
print('after func2')
loop = TestEventLoop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.close()
In output you'll see:
control inside event loop
before func2
before func1
control inside event loop
control inside event loop
after func1
after func2
control inside event loop
func2 passed execution flow directly to func1 avoiding event loop's _run_once that could switch to another coroutine. Only when blocking asyncio.sleep was faced, event loop got control.
Although it's a detail of implementation of default event loop.
Second of all, and it's probably much more important, switching between coroutines is extremely cheap comparing to benefit we get from using asyncio to work with I/O.
It's also much cheaper than other async alternatives like switching between OS threads.
Situation when your code is slow because of many coroutines is highly unlikely, but even if it happened you should probably to take a look at more efficient event loop implementations like uvloop.
I would like to point out that if you ever run a sufficiently large number of coroutines that the overhead of switching context becomes an issue, you can ensure reduced concurrency using a Semaphore. I recently received a ~2x performance increase by reducing concurrency from 1000 to 50 for coroutines running HTTP requests.

Python asyncio ensure_future decorator

Let's assume I'm new to asyncio. I'm using async/await to parallelize my current project, and I've found myself passing all of my coroutines to asyncio.ensure_future. Lots of stuff like this:
coroutine = my_async_fn(*args, **kwargs)
task = asyncio.ensure_future(coroutine)
What I'd really like is for a call to an async function to return an executing task instead of an idle coroutine. I created a decorator to accomplish what I'm trying to do.
def make_task(fn):
def wrapper(*args, **kwargs):
return asyncio.ensure_future(fn(*args, **kwargs))
return wrapper
#make_task
async def my_async_func(*args, **kwargs):
# usually making a request of some sort
pass
Does asyncio have a built-in way of doing this I haven't been able to find? Am I using asyncio wrong if I'm lead to this problem to begin with?
asyncio had #task decorator in very early pre-released versions but we removed it.
The reason is that decorator has no knowledge what loop to use.
asyncio don't instantiate a loop on import, moreover test suite usually creates a new loop per test for sake of test isolation.
Does asyncio have a built-in way of doing this I haven't been able to
find?
No, asyncio doesn't have decorator to cast coroutine-functions into tasks.
Am I using asyncio wrong if I'm lead to this problem to begin with?
It's hard to say without seeing what you're doing, but I think it may happen to be true. While creating tasks is usual operation in asyncio programs I doubt you created this much coroutines that should be tasks always.
Awaiting for coroutine - is a way to "call some function asynchronously", but blocking current execution flow until it finished:
await some()
# you'll reach this line *only* when some() done
Task on the other hand - is a way to "run function in background", it won't block current execution flow:
task = asyncio.ensure_future(some())
# you'll reach this line immediately
When we write asyncio programs we usually need first way since we usually need result of some operation before starting next one:
text = await request(url)
links = parse_links(text) # we need to reach this line only when we got 'text'
Creating task on the other hand usually means that following further code doesn't depend of task's result. But again it doesn't happening always.
Since ensure_future returns immediately some people try to use it as a way to run some coroutines concurently:
# wrong way to run concurrently:
asyncio.ensure_future(request(url1))
asyncio.ensure_future(request(url2))
asyncio.ensure_future(request(url3))
Correct way to achieve this is to use asyncio.gather:
# correct way to run concurrently:
await asyncio.gather(
request(url1),
request(url2),
request(url3),
)
May be this is what you want?
Upd:
I think using tasks in your case is a good idea. But I don't think you should use decorator: coroutine functionality (to make request) still is a separate part from it's concrete usage detail (it will be used as task). If requests synchronization controlling is separate from their's main functionalities it's also make sense to move synchronization into separate function. I would do something like this:
import asyncio
async def request(i):
print(f'{i} started')
await asyncio.sleep(i)
print(f'{i} finished')
return i
async def when_ready(conditions, coro_to_start):
await asyncio.gather(*conditions, return_exceptions=True)
return await coro_to_start
async def main():
t = asyncio.ensure_future
t1 = t(request(1))
t2 = t(request(2))
t3 = t(request(3))
t4 = t(when_ready([t1, t2], request(4)))
t5 = t(when_ready([t2, t3], request(5)))
await asyncio.gather(t1, t2, t3, t4, t5)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()

Categories

Resources