Converting small functions to coroutines

Converting small functions to coroutines - python

I feel like there is a gap in my understanding of async IO: is there a benefit to wrapping small functions into coroutines, within the scope of larger coroutines? Is there a benefit to this in signaling the event loop correctly? Does the extent of this benefit depend on whether the wrapped function is IO or CPU-bound?
Example: I have a coroutine, download(), which:
Downloads JSON-serialized bytes from an HTTP endpoint via aiohttp.
Compresses those bytes via bz2.compress() - which is not in itself awaitable
Writes the compressed bytes to S3 via aioboto3
So parts 1 & 3 use predefined coroutines from those libraries; part 2 does not, by default.
Dumbed-down example:
import bz2
import io
import aiohttp
import aioboto3
async def download(endpoint, bucket_name, key):
async with aiohttp.ClientSession() as session:
async with session.request("GET", endpoint, raise_for_status=True) as resp:
raw = await resp.read() # payload (bytes)
# Yikes - isn't it bad to throw a synchronous call into the middle
# of a coroutine?
comp = bz2.compress(raw)
async with (
aioboto3.session.Session()
.resource('s3')
.Bucket(bucket_name)
) as bucket:
await bucket.upload_fileobj(io.BytesIO(comp), key)
As hinted by the comment above, my understanding has always been that throwing a synchronous function like bz2.compress() into a coroutine can mess with it. (Even if bz2.compress() is probably more IO-bound than CPU-bound.)
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
(And now comp = await compress(raw) within download().)
Wa-la, this is now an awaitable coroutine, because a sole return is valid in a native coroutine. Is there a case to be made for using this?
Per this answer, I've heard justification for randomly throwing in asyncio.sleep(0) in a similar manner - just to single back up to the event loop that the calling coroutine wants a break. Is this right?

So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
There is no benefit to it whatsoever. Contrary to expectations, adding an await doesn't guarantee that the control will be passed to the event loop - that will happen only if the awaited coroutine actually suspends. Since compress doesn't await anything, it will never suspend, so it's a coroutine in name only.
Note that adding await asyncio.sleep(0) in coroutines does not solve the problem; see this answer for a more detailed discussion. If you need to run a blocking function, use run_in_executor:
async def compress(*args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, lambda: bz2.compress(*args, **kwargs))

Coroutines allow you to run something concurrently, not in parallel. They allow for a single-threaded cooperative multitasking. This makes sense in two cases:
You need to produce results in lockstep, like two generators would.
You want something useful be done while another coroutine is waiting for I/O.
Things like http requests or disk I/O would allow other coroutines to run while they are waiting for completion of an operation.
bz2.compress() is synchronous and, I suppose, does not release GIL but does release GIL while it is running. This means that no meaningful work can be done while it's running. That is, other coroutines would not run during its invocation, though other threads would.
If you anticipate a large amount of data to compress, so large that the overhead of running a coroutine is small in comparison, you can use bz2.BZ2Compressor and feed it with data in reasonably small blocks (like 128KB), write the result to a stream (S3 supports streaming, or you can use StringIO), and await asyncio.sleep(0) between compressing blocks to yield control.
This will allow other coroutines to also run concurrently with your compression coroutine. Possibly async S3 upload will be occurring in parallel at the socket level, too, while your coroutine would be inactive.
BTW making your compressor explicitly an async generator can be a simpler way to express the same idea.

Related

Does FastAPI websocket example deadlock the process?

The code in the docs has a while True: block and I am curious if something like that would deadlock the process. If I get two requests, would the second one just not go through? why or why not?
Source: https://fastapi.tiangolo.com/advanced/websockets/
#app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
while True:
data = await websocket.receive_text()
await websocket.send_text(f"Message text was: {data}")

The short answer (based on the code sample given in your question) is yes, subsequent requests would go through (regardless of the while True loop).
The long answer is depends on what kind of operations you would like to perform inside the async function or while True loop (i.e., I/O-bound or CPU-bound operations?). When a function is called with await and concerns an I/O-bound operation (i.e., waiting for data from the client to be sent through the network, waiting for contents of a file in the disk to be read, etc.), the event loop (main thread) can then continue on and service other coroutines (requests) while waiting for that operation to finish (i.e., in your case, waiting for a message to be either received or sent). If, however, the await concerns a CPU-bound operation, or a CPU-bound operation is performed regardless of calling it using await (i.e, audio or image processing, machine learning, etc.), the event loop will get blocked; that means, no further requests would go through, until that CPU-bound operation is completed.
Please refer to this answer for more details.

Python Asyncio/Trio for Asynchronous Computing/Fetching

I am looking for a way to efficiently fetch a chunk of values from disk, and then perform computation/calculations on the chunk. My thought was a for loop that would run the disk fetching task first, then run the computation on the fetched data. I want to have my program fetch the next batch as it is running the computation so I don't have to wait for another data fetch every time a computation completes. I expect the computation will take longer than the fetching of the data from disk, and likely cannot be done truly in parallel due to a single computation task already pinning the cpu usage at near 100%.
I have provided some code below in python using trio (but could alternatively be used with asyncio to the same effect) to illustrate my best attempt at performing this operation with async programming:
import trio
import numpy as np
from datetime import datetime as dt
import time
testiters=10
dim = 6000
def generateMat(arrlen):
for _ in range(30):
retval= np.random.rand(arrlen, arrlen)
# print("matrix generated")
return retval
def computeOpertion(matrix):
return np.linalg.inv(matrix)
def runSync():
for _ in range(testiters):
mat=generateMat(dim)
result=computeOpertion(mat)
return result
async def matGenerator_Async(count):
for _ in range(count):
yield generateMat(dim)
async def computeOpertion_Async(matrix):
return computeOpertion(matrix)
async def runAsync():
async with trio.open_nursery() as nursery:
async for value in matGenerator_Async(testiters):
nursery.start_soon(computeOpertion_Async,value)
#await computeOpertion_Async(value)
print("Sync:")
start=dt.now()
runSync()
print(dt.now()-start)
print("Async:")
start=dt.now()
trio.run(runAsync)
print(dt.now()-start)
This code will simulate getting data from disk by generating 30 random matrices, which uses a small amount of cpu. It will then perform matrix inversion on the generated matrix, which uses 100% cpu (with openblas/mkl configuration in numpy). I compare the time taken to run the tasks by timing the synchronous and asynchronous operations.
From what I can tell, both jobs take exactly the same amount of time to finish, meaning the async operation did not speed up the execution. Observing the behavior of each computation, the sequential operation runs the fetch and computation in order and the async operation runs all the fetches first, then all the computations afterwards.
Is there a way to use asynchronously fetch and compute? Perhaps with futures or something like gather()? Asyncio has these functions, and trio has them in a seperate package trio_future. I am also open to solutions via other methods (threads and multiprocessing).
I believe that there likely exists a solution with multiprocessing that can make the disk reading operation run in a separate process. However, inter-process communication and blocking then becomes a hassle, as I would need some sort of semaphore to control how many blocks could be generated at a time due to memory constraints, and multiprocessing tends to be quite heavy and slow.
EDIT
Thank you VPfB for your answer. I am not able to sleep(0) in the operation, but I think even if I did, it would necessarily block the computation in favor of performing disk operations. I think this may be a hard limitation of python threading and asyncio, that it can only execute 1 thread at a time. Running two different processes simultaneously is impossible if both require anything but waiting for some external resource to respond from your CPU.
Perhaps there is a way with an executor for a multiprocessing pool. I have added the following code below:
import asyncio
import concurrent.futures
async def asynciorunAsync():
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
async for value in matGenerator_Async(testiters):
result = await loop.run_in_executor(pool, computeOpertion,value)
print("Async with PoolExecutor:")
start=dt.now()
asyncio.run(asynciorunAsync())
print(dt.now()-start)
Although timing this, it still takes the same amount of time as the synchronous example. I think I will have to go with a more involved solution as it seems that async and await are too crude of a tool to properly do this type of task switching.

I don't work with trio, my answer it asyncio based.
Under these circumstances the only way to improve the asyncio performance I see is to break the computation into smaller pieces and insert await sleep(0) between them. This would allow the data fetching task to run.
Asyncio uses cooperative scheduling. A synchronous CPU bound routine does not cooperate, it blocks everything else while it is running.
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks
to run. This can be used by long-running functions to avoid blocking
the event loop for the full duration of the function call.
(quoted from: asyncio.sleep)
If that is not possible, try to run the computation in an executor. This adds some multi-threading capabilities to otherwise pure asyncio code.

The point of async I/O is to make it easy to write programs where there is lots of network I/O but very little actual computation (or disk I/O). That applies to any async library (Trio or asyncio) or even different languages (e.g. ASIO in C++). So your program is ideally unsuited to async I/O! You will need to use multiple threads (or processes). Although, in fairness, async I/O including Trio can be useful for coordinating work on threads, and that might work well in your case.
As VPfB's answer says, if you're using asyncio then you can use executors, specifically a ThreadPoolExecutor passed to loop.run_in_executor(). For Trio, the equivalent would be trio.to_thread.run_sync() (see also Threads (if you must) in the Trio docs), which is even easier to use. In both cases, you can await the result, so the function is running in a separate thread while the main Trio thread can continue running your async code. Your code would end up looking something like this:
async def matGenerator_Async(count):
for _ in range(count):
yield await trio.to_thread.run_sync(generateMat, dim)
async def my_trio_main()
async with trio.open_nursery() as nursery:
async for matrix in matGenerator_Async(testiters):
nursery.start_soon(trio.to_thread.run_sync, computeOperation, matrix)
trio.run(my_trio_main)
There's no need for the computation functions (generateMat and computeOperation) to be async. In fact, it's problematic if they are because you could no longer run them in a separate thread. In general, only make a function async if it needs to await something or use async with or async for.
You can see from the above example how to pass data to the functions running in the other thread: just pass them as parameters to trio.to_thread.run_sync(), and they will be passed along as parameters to the function. Getting the result back from generateMat() is also straightforward - the return value of the function called in the other thread is returned from await trio.to_thread.run_sync(). Getting the result of computeOperation() is trickier, because it's called in the nursery, so its return value is thrown away. You'll need to pass a mutable parameter to it (like a dict) and stash the result in there. But be careful about thread safety; the easiest way to do that is to pass a new object to each coroutine, and only inspect them all after the nursery has finished.
A few final footnotes that you can probably ignore:
Just to be clear, yield await in the code above isn't some sort of special syntax. It's just await foo(), which returns a value once foo() has finished, followed by yield of that value.
You can change the number of threads Trio uses for calls to to_thread.run_sync() by passing a CapacityLimiter object, or by finding the default one and setting the count on that. It looks like the default is currently 40, so you might want to turn that down a bit, but it's probably not too important.
There is a common myth that Python doesn't support threads, or at least can't do computation in multiple threads simultaneously, because it has a single global lock (the global interpreter lock, or GIL). That would mean that you need to use multiple processes, rather than threads, for your program to really compute thing in parallel. It's true there is a GIL in Python, but so long as you're doing your computation using something like numpy, which you are, then it doesn't stop multithreading from working effectively.
Trio actually has great support for async file I/O. But I don't think it would be helpful in your case.

To supplement my other answer (which uses Trio like you asked), here's how to do it use it just using threads without any async library. The easiest way to do this with Future objects and a ThreadPoolExecutor.
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
for matrix in matGenerator(testiters):
futures.append(executor.submit(computeOperation, matrix))
results = [f.result() for f in futures]
The code is actually pretty similar to the async code, but if anything it's simpler. If you don't need to do network I/O, you're better off with this method.

I think the main issue with using multiprocessing and not seeing any improvement is the 100% utilization of the CPU. It essentially leaves you with an async-like behavior where resources are occasionally being freed up and used for the I/O process. You could set a limit to the number of workers for your ProcessPoolExecutor and that might allow the I/O the room it needs to be more at the ready.
Disclaimer: I'm still new to multiprocessing and threading.

Python asyncio: Enter into a temporary async context?

I want to write a library that mixes synchronous and asynchronous work, like:
def do_things():
# 1) do sync things
# 2) launch a bunch of slow async tasks and block until they are all complete or an exception is thrown
# 3) more sync work
# ...
I started implementing this using asyncio as an excuse to learn the learn the library, but as I learn more it seems like this may be the wrong approach. My problem is that there doesn't seem to be a clean way to do 2, because it depends on the context of the caller. For example:
I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
Marking do_things as async is too heavy because it shouldn't require the caller to be async. Plus, if do_things was async, calling synchronous code (1 & 3) from an async function seems to be bad practice.
asyncio.get_event_loop() also seems wrong, because it may create a new loop, which if left running would prevent the caller from creating their own loop after calling do_things (though arguably they shouldn't do that). And based on the documentation of loop.close, it looks like starting/stopping multiple loops in a single thread won't work.
Basically it seems like if I want to use asyncio at all, I am forced to use it for the entire lifetime of the program, and therefore all libraries like this one have to be written as either 100% synchronous or 100% asynchronous. But the behavior I want is: Use the current event loop if one is running, otherwise create a temporary one just for the scope of 2, and don't break client code in doing so. Does something like this exist, or is asyncio the wrong choice?

I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
If the caller has a running event loop, you shouldn't run blocking code in the first place because it will block the caller's loop!
With that in mind, your best option is to indeed make do_things async and call sync code using run_in_executor which is designed precisely for that use case:
async def do_things():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, sync_stuff)
await async_func()
await loop.run_in_executor(None, more_sync_stuff)
This version of do_things is usable from async code as await do_things() and from sync code as asyncio.run(do_things()).
Having said that... if you know that the sync code will run very briefly, or you are for some reason willing to block the caller's event loop, you can work around the limitation by starting an event loop in a separate thread:
def run_async(aw):
result = None
async def run_and_store_result():
nonlocal result
result = await aw
t = threading.Thread(target=asyncio.run, args=(run_and_store_result(),))
t.start()
t.join()
return result
do_things can then look like this:
async def do_things():
sync_stuff()
run_async(async_func())
more_sync_stuff()
It will be callable from both sync and async code, but the cost will be that:
it will create a brand new event loop each and every time. (Though you can cache the event loop and never exit it.)
when called from async code, it will block the caller's event loop, thus effectively breaking its asyncio usage, even if most time is actually spent inside its own async code.

Why is it that only asynchronous functions can yield in asynchronous code?

In the article "I'm not feeling the async pressure" Armin Ronacher makes the following observation:
In threaded code any function can yield. In async code only async functions can. This means for instance that the writer.write method cannot block.
This observation is made with reference to the following code sample:
from asyncio import start_server, run
async def on_client_connected(reader, writer):
while True:
data = await reader.readline()
if not data:
break
writer.write(data)
async def server():
srv = await start_server(on_client_connected, '127.0.0.1', 8888)
async with srv:
await srv.serve_forever()
run(server())
I do not understand this comment. Specifically:
How come synchronous functions cannot yield when inside of asynchronous functions?
What does yield have to do with blocking execution? Why is it that a function that cannot yield, cannot block?

Going line-by-line:
In threaded code any function can yield.
Programs running on a machine are organized in terms of processes. Each process may have one or more threads. Threads, like processes, are scheduled by (and interruptible by) the operating system. The word "yield" in this context means "letting other code run". When work is split between multiple threads, functions "yield" easily: the operating system suspends the code running in one thread, runs some code in a different thread, suspends that, comes back, and works some more on the first thread, and so on. By switching between threads in this way, concurrency is achieved.
In this execution model, whether the code being suspended is synchronous or asynchronous does not matter. The code within the thread is being run line-by-line, so the fundamental assumption of a synchronous function---that no changes occurred in between running one line of code and the next---is not violated.
In async code only async functions can.
"Async code" in this context means a single-threaded application that does the same work as the multi-threaded application, except that it achieves concurrency by using asynchronous functions within a thread, instead of splitting the work between different threads. In this execution model, your interpreter, not the operating system, is responsible for switching between functions as needed to achieve concurrency.
In this execution model, it is unsafe for work to be suspended in the middle of a synchronous function that's located inside of an asynchronous function. Doing so would mean running some other code in the middle of running your synchronous function, breaking the "line-by-line" assumption made by the synchronous function.
As a result, the interpreter will wait only suspend the execution of an asynchronous function in between synchronous sub-functions, never within one. This is what is meant by the statement that synchronous functions in async code cannot yield: once a synchronous function starts running, it must complete.
This means for instance that the writer.write method cannot block.
The writer.write method is synchronous, and hence, when run in an async program, uninterruptible. If this method were to block, it would block not just the asynchronous function it is running inside of, but the entire program. That would be bad. writer.write avoids blocking the program by writing to a write buffer instead and returning immediately.
Strictly speaking, writer.write can block, it's just inadvisable to do so.
If you need to block inside of an async function, the proper way to do so is to await another async function. This is what e.g. await writer.drain() does. This will block asynchronously: while this specific function remains blocked, it will correctly yield to other functions that can run.

“Yield” here refers to cooperative multitasking (albeit within a process rather than among them). In the context of the async/await style of Python programming, asynchronous functions are defined in terms of Python’s pre-existing generator support: if a function blocks (typically for I/O), all its callers that are performing awaits suspend (with an invisible yield/yield from that is indeed of the generator variety). The actual call for any generator is to its next method; that function actually returns.
Every caller, up to some sort of driver that most programmers never write, must participate for this approach to work: any function that did not suspend would suddenly have the responsibility of the driver of deciding what to do next while waiting on the function it called to complete. This “infectious” aspect of asynchronicity has been called a “color”; it can be problematic, as for example when people forget to await a coroutine call that looks correct because it looks like any other call. (The async/await syntax exists to minimize the disruption of the program’s structure from the concurrency by implicitly converting functions into state machines, but this ambiguity remains.) It can also be a good thing: an asynchronous function can be interrupted exactly when it awaits, so it’s straightforward to reason about the consistency of data structures.
A synchronous function therefore cannot yield simply as a matter of definition. The import of the restriction is rather that a function called with a normal (synchronous) call cannot yield: its caller is not prepared to handle such an interaction. (What will happen if it does anyway is of course the same “forgotten await”.) This also affects refactoring: a function cannot be changed to be asynchronous without changing all its clients (and making them asynchronous as well if they are not already). (This is similar to how all I/O works in Haskell, since it affects the type of any function that performs any.)
Note that yield is allowed in its role as a normal generator used with an ordinary for even in an asynchronous function, but that’s just the general fact that the caller must expect the same protocol as the callee: if an enhanced generator (an “old-style” coroutine) is used with for, it just gets None from every (yield), and if an async function is used with for, it produces awaitables that probably break when they are sent None.
The distinction with threading, or with so-called stackful coroutines or fibers, is that no special resumption support is needed from the caller because the actual function call simply doesn’t return until the thread/fiber is resumed. (In the thread case, the kernel also chooses when to resume it.) In that sense, these approaches are easier to use, but with fibers the ability to “sneak” a pause into any function is partially compromised by the need to specify arguments to that function to tell it about the userspace scheduler with which to register itself (unless you’re willing to use global variables for that…). Threads, on the other hand, have even higher overhead than fibers, which matters when great numbers of them are running.

Understanding asyncio: Asynchronous vs synchronous callbacks

There's one very particular thing about asyncio that I can't seem to understand. And that is the difference between asynchronous and synchronous callbacks. Let me introduce some examples.
Asyncio TCP example:
class EchoServer(asyncio.Protocol):
def connection_made(self, transport):
print('connection made')
def data_received(self, data):
print('data received: ', data.decode())
def eof_received(self):
pass
def connection_lost(self, exc):
print('connection lost:', exc)
Aiohttp example:
async def simple(request):
return Response(text="Simple answer")
async def init(loop):
app = Application(loop=loop)
app.router.add_get('/simple', simple)
return app
loop = asyncio.get_event_loop()
app = loop.run_until_complete(init(loop))
run_app(app, loop=loop)
Those two examples are very similar in functionality but they seem to be both doing it in different way. In the first example, if you want to be notified on some action, you specify a synchronous function (EchoServer.connection_made). However, in the second example, if you want to be notified on some action, you have to define an asynchronous callback function (simple).
I would like to ask what is the difference between these two types of callback. I understand the difference between regular functions and asynchronous functions, but I cannot wrap my head around the callback difference. For example, if I would want to write an asynchrnonous API like aiohttp is, and I would have a function that would do something and call a callback after that, how would I decide if I should demand an asynchronous function to be passed as an argument or just a regular synchronous one?

In aiohttp example you could do asynchronous calls from simple web-handler: access to database, make http requests etc.
In Protocol.data_received() you should call only regular synchronous methods.
UPD
Protocol callbacks are supposed to be synchronous by design.
They are very low-level bridge between sync and async.
You may call async code from them but it requires really tricky coding.
User level asyncio API for sockets etc. are streams: https://docs.python.org/3/library/asyncio-stream.html
When you introduce your own callback system must likely you need asynchronous callback unless you are %100 sure why the callback will never want to call async code.
Regular functions (def) and coroutines (async def) have a different signatures. It's hard to change a required signature, especially if your code has published as a library and you cannot control all users of your callback.
P.S.
The same is true for any public API methods.
The hardest lesson I've learned during development of my libraries is: .close() method should be a coroutine even initially it calls sync functions only, e.g. socket.close().
But later you perhaps will want to add a graceful shutdown which requires a waiting for current activity finishing and so on.
If your users had call your api as obj.close() now they should use await obj.close() but it's backward incompatible change!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting small functions to coroutines - python

Related

Does FastAPI websocket example deadlock the process?

Python Asyncio/Trio for Asynchronous Computing/Fetching

Python asyncio: Enter into a temporary async context?

Why is it that only asynchronous functions can yield in asynchronous code?

Understanding asyncio: Asynchronous vs synchronous callbacks

Categories

Resources