Understanding asyncio: Asynchronous vs synchronous callbacks - python

There's one very particular thing about asyncio that I can't seem to understand. And that is the difference between asynchronous and synchronous callbacks. Let me introduce some examples.
Asyncio TCP example:
class EchoServer(asyncio.Protocol):
def connection_made(self, transport):
print('connection made')
def data_received(self, data):
print('data received: ', data.decode())
def eof_received(self):
pass
def connection_lost(self, exc):
print('connection lost:', exc)
Aiohttp example:
async def simple(request):
return Response(text="Simple answer")
async def init(loop):
app = Application(loop=loop)
app.router.add_get('/simple', simple)
return app
loop = asyncio.get_event_loop()
app = loop.run_until_complete(init(loop))
run_app(app, loop=loop)
Those two examples are very similar in functionality but they seem to be both doing it in different way. In the first example, if you want to be notified on some action, you specify a synchronous function (EchoServer.connection_made). However, in the second example, if you want to be notified on some action, you have to define an asynchronous callback function (simple).
I would like to ask what is the difference between these two types of callback. I understand the difference between regular functions and asynchronous functions, but I cannot wrap my head around the callback difference. For example, if I would want to write an asynchrnonous API like aiohttp is, and I would have a function that would do something and call a callback after that, how would I decide if I should demand an asynchronous function to be passed as an argument or just a regular synchronous one?

In aiohttp example you could do asynchronous calls from simple web-handler: access to database, make http requests etc.
In Protocol.data_received() you should call only regular synchronous methods.
UPD
Protocol callbacks are supposed to be synchronous by design.
They are very low-level bridge between sync and async.
You may call async code from them but it requires really tricky coding.
User level asyncio API for sockets etc. are streams: https://docs.python.org/3/library/asyncio-stream.html
When you introduce your own callback system must likely you need asynchronous callback unless you are %100 sure why the callback will never want to call async code.
Regular functions (def) and coroutines (async def) have a different signatures. It's hard to change a required signature, especially if your code has published as a library and you cannot control all users of your callback.
P.S.
The same is true for any public API methods.
The hardest lesson I've learned during development of my libraries is: .close() method should be a coroutine even initially it calls sync functions only, e.g. socket.close().
But later you perhaps will want to add a graceful shutdown which requires a waiting for current activity finishing and so on.
If your users had call your api as obj.close() now they should use await obj.close() but it's backward incompatible change!

Related

concurrent requests to a async method

I've a simple FastAPI server
#app.get("/")
async def root():
time.sleep(0.5) # any I/O bound task
return {"message": "home"}
I'm sending 10 concurrent requests to the server, which would take little over 0.5 seconds if it was a def method instead of async def:
# sending concurrent requests
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(send_request, [...])
I understand that I should be using some asynchronous method to sleep on the server but why does my CONCURRENT code behave sequentially and taking 5+ seconds when I'm sending concurrent requests?
Technically there's some latency in every line so does it mean I need to make every line asynchronous?
Defination :
asyncio is a library to write concurrent code using the async/await syntax.
asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.
asyncio is often a perfect fit for IO-bound and high-level structured network code.
You can see async performance in IO/Bound like : hitting to DB, Calling API and etc.
In your example time.sleep isn't IO/bound, you can use await asyncio.sleep(0.5) to imagine you have IO/bound.
As i said, if you wanna use async performance you should consider IO/bounds, in CPU bound like solving algorithm, encryption or other process depends on CPU/bound , async functions not helping you.
Simply put, whenever our system is waiting for an answer from an external service, instead of our system sitting idle, it goes to another task to get the answer from the external system.

Running an asynchronous function in parallel with every other function

Here's what I want to do.
I have multiple asynchronous functions and a separate asynchronous function let's say main. I want to call this main function with every other function I call.
I'm using this structure in a telegram bot, and functions are called upon a certain command. But I want to run main on any incoming message including the messages with commands as mentioned above where another function is also called. So in that case, I wanna run both (first command specific function then main function)
I believe this can be done using threading.RLock() as suggested by someone, but I can't figure out how.
What's the best approach for this?
You could use aiotelegram in combination with asyncio's create_task().
While Threads can also do the job, they don't seem to be as good as asynchronous execution.
You can choose any telegram framework that provides an async context like Bot.run() does in aiotelegram, or you can even implement your own API client, just make sure you run on an asynchronous (ASGI) context.
The main idea then is to call asyncio.create_task() to fire up the main() function in parallel with the rest of the function that runs the Telegram Bot command.
Here's an example (note I've use my_main() instead main()):
import asyncio
from aiotg import Bot, Chat
bot = Bot(api_token="...")
async def other_command():
#Replace this with the actual logic
await asyncio.sleeep(2)
async def my_main():
# Replace this with the actual parallel task
await asyncio.sleep(5)
#bot.command(r"/echo_command (.+)")
async def echo(chat: Chat, match):
task = asyncio.create_task(my_main())
return chat.reply(match.group(1))
#bot.command(r"/other_command (.+)")
async def other(chat: Chat, match):
task = asyncio.create_task(my_main())
result = other_command()
return chat.reply(result)
bot.run()
It is important to know that with this approach, the tasks are never awaited nor checked for their completion, so Exceptions or failed executions can be difficult to track, as well as any result from main() that needs to be kept.
A simple solution for this is to declare a global dict() where you store the Task(s), so you can get back to them later on (i.e. with a specific Telegram Command, or running always within certain existing Telegram Commands).
Whatever logic you decide to keep track of the tasks, you can check if they're completed, and their results, if any, with Task.done() and Task.result(). See their official doc for further details about how to manage Tasks.

Python: Call asynchronous code from synchronous method when there is already an event loop running

I am working with FastAPI and uvloop to serve a REST API in an efficient way.
I have a lot of asynchronous code that make calls to remote resources such as a database, a storage, etc, those functions looks like this:
async def _get_remote_resource(key: str) -> Resource:
# do some async work
return resource
I'm implementing an interface to an existing Abstract Base Class where I need to use the asynchronous function from above in a synchronous method. I have done something like:
class Resource:
def __str__(self):
resource = asyncio.run_until_complete(_get_remote_resource(self.key))
return f"{resource.pk}"
Great! Now I do an endpoint in fastapi to make this work accesible:
#app.get("")
async def get(key):
return str(Resource(key))
The problem is that FastAPI already gets and event loop running, using uvloop, and then the asynchronous code fails because the loop is already running.
Is there any way I can call the asynchronous method from the synchronous method in the class? Or do I have to rethink the structure of the code?
The runtime error is designed precisely to prevent what you are trying to do. run_until_complete is a blocking call, and using it inside an async def will halt the outer event loop.
The straightforward fix is to expose the needed functionality through an actual async method, e.g.:
class Resource:
def name(self):
return loop.run_until_complete(self.name_async())
async def name_async(self):
resource = await _get_remote_resource(self.key)
return f"{resource.pk}"
Then in fastapi you'd access the API in the native way:
#app.get("")
async def get(key):
return await Resource(key).name_async()
You could also define __str__(self) to return self.name(), but that's best avoided because something as basic as str() should be callable from within asyncio as well (due to use in logging, debugging, etc.).
I would like to complement the #user4815162342 answer.
FastAPI is an asychronous framework. I would suggest sticking to a few principles:
Do not execute IO operations in synchronous functions in a blocking way. Prepare this resource asynchronously and already pass the ready data to the synchronous function (this principle can be called an asynchronous dependency for synchronous code).
If you still need to perform a blocking IO operation in a synchronous code, then do it in a separate thread. And wait for this result asynchronously by means of asyncio (def endpoint, run_in_executor with ThreadPoolExecutor or def background task).
If you need to do a blocking CPU-bound operation, then delegate its execution to a separate process (the simplest way run_in_executor with ProcessPoolExecutor or any task queue).

Converting small functions to coroutines

I feel like there is a gap in my understanding of async IO: is there a benefit to wrapping small functions into coroutines, within the scope of larger coroutines? Is there a benefit to this in signaling the event loop correctly? Does the extent of this benefit depend on whether the wrapped function is IO or CPU-bound?
Example: I have a coroutine, download(), which:
Downloads JSON-serialized bytes from an HTTP endpoint via aiohttp.
Compresses those bytes via bz2.compress() - which is not in itself awaitable
Writes the compressed bytes to S3 via aioboto3
So parts 1 & 3 use predefined coroutines from those libraries; part 2 does not, by default.
Dumbed-down example:
import bz2
import io
import aiohttp
import aioboto3
async def download(endpoint, bucket_name, key):
async with aiohttp.ClientSession() as session:
async with session.request("GET", endpoint, raise_for_status=True) as resp:
raw = await resp.read() # payload (bytes)
# Yikes - isn't it bad to throw a synchronous call into the middle
# of a coroutine?
comp = bz2.compress(raw)
async with (
aioboto3.session.Session()
.resource('s3')
.Bucket(bucket_name)
) as bucket:
await bucket.upload_fileobj(io.BytesIO(comp), key)
As hinted by the comment above, my understanding has always been that throwing a synchronous function like bz2.compress() into a coroutine can mess with it. (Even if bz2.compress() is probably more IO-bound than CPU-bound.)
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
(And now comp = await compress(raw) within download().)
Wa-la, this is now an awaitable coroutine, because a sole return is valid in a native coroutine. Is there a case to be made for using this?
Per this answer, I've heard justification for randomly throwing in asyncio.sleep(0) in a similar manner - just to single back up to the event loop that the calling coroutine wants a break. Is this right?
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
There is no benefit to it whatsoever. Contrary to expectations, adding an await doesn't guarantee that the control will be passed to the event loop - that will happen only if the awaited coroutine actually suspends. Since compress doesn't await anything, it will never suspend, so it's a coroutine in name only.
Note that adding await asyncio.sleep(0) in coroutines does not solve the problem; see this answer for a more detailed discussion. If you need to run a blocking function, use run_in_executor:
async def compress(*args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, lambda: bz2.compress(*args, **kwargs))
Coroutines allow you to run something concurrently, not in parallel. They allow for a single-threaded cooperative multitasking. This makes sense in two cases:
You need to produce results in lockstep, like two generators would.
You want something useful be done while another coroutine is waiting for I/O.
Things like http requests or disk I/O would allow other coroutines to run while they are waiting for completion of an operation.
bz2.compress() is synchronous and, I suppose, does not release GIL but does release GIL while it is running. This means that no meaningful work can be done while it's running. That is, other coroutines would not run during its invocation, though other threads would.
If you anticipate a large amount of data to compress, so large that the overhead of running a coroutine is small in comparison, you can use bz2.BZ2Compressor and feed it with data in reasonably small blocks (like 128KB), write the result to a stream (S3 supports streaming, or you can use StringIO), and await asyncio.sleep(0) between compressing blocks to yield control.
This will allow other coroutines to also run concurrently with your compression coroutine. Possibly async S3 upload will be occurring in parallel at the socket level, too, while your coroutine would be inactive.
BTW making your compressor explicitly an async generator can be a simpler way to express the same idea.

In trio, how can I have a background task that lives as long as my object does?

I'm writing a class that will spawn tasks during its lifetime. Since I'm using Trio, I can't spawn tasks without a nursery. My first thought was to have a self._nursery in my class that I can spawn tasks into. But it seems that nursery objects can only be used in a context manager, so they are always closed in the same scope where they were created. I don't want to pass in a nursery from outside because it's an implementation detail, but I do want my objects to be able to spawn tasks that last as long as the object does (e.g. a heartbeat task).
How can I write such a class, which has long-lived background tasks, using Trio?
Excellent question!
One of Trio's weirdest, most controversial decisions is that it takes the position that the existence of a background task is not an implementation detail, and should be exposed as part of your API. On balance I think this is the right decision, but it's definitely a bit experimental and has some trade-offs.
Why does Trio do this? In my experience, other systems make it seem like you can abstract away the presence of a background task or thread, but in reality it leaks in all kinds of ways: they end up breaking control-C handling, or they cause problems when you're trying to exit the program cleanly, or they leak when you try to cancel the main operation, or you have sequencing problems because the function you called completed but the work it promised to do is still going on in the background, or the background task crashes with an unexpected exception and then the exception gets lost and all kinds of weird problems ensue... so while it might make your API feel a little messier in the short term, in the long term everything goes easier if you make this explicit.
Also, keep in mind that everyone else writing and using Trio libraries has the same issue, so your API is not going to feel too weird :-).
I don't know what you're trying to do exactly. Maybe it's something like a websocket connection, where you want to constantly be reading from the socket to respond to heartbeat ("ping") requests. One pattern would be to do something like:
#asynccontextmanager
async def open_websocket(url):
ws = WebSocket()
await ws._connect(url)
try:
async with trio.open_nursery() as nursery:
nursery.start_soon(ws._heartbeat_task)
yield ws
# Cancel the heartbeat task, since we're about to close the connection
nursery.cancel_scope.cancel()
finally:
await ws.aclose()
And then your users can use it like:
async with open_websocket("https://...") as ws:
await ws.send("hello")
...
If you want to get fancier, another option would be to provide one version where your users pass in their own nursery, for experts:
class WebSocket(trio.abc.AsyncResource):
def __init__(self, nursery, url):
self._nursery = nursery
self.url = url
async def connect(self):
# set up the connection
...
# start the heartbeat task
self._nursery.start_soon(self._heartbeat_task)
async def aclose(self):
# you'll need some way to shut down the heartbeat task here
...
and then also provide a convenience API, for those who just want one connection and don't want to mess with nurseries:
#asynccontextmanager
async def open_websocket(url):
async with trio.open_nursery() as nursery:
async with WebSocket(nursery, url) as ws:
await ws.connect()
yield ws
The main advantage of the pass-in-a-nursery approach is that if your users want to open lots of websocket connections, an arbitrary number of websocket connections, then they can open one nursery once at the top of their websocket management code, and then have lots of websockets inside it.
You're probably wondering, though: where do you find this #asynccontextmanager? Well, it's included in the stdlib in 3.7, but that's not even out yet, so depending on when you're reading this you might not be using it yet. Until then, the async_generator package gives you #asynccontextmanager all the way back to 3.5.

Categories

Resources