I've a simple FastAPI server
#app.get("/")
async def root():
time.sleep(0.5) # any I/O bound task
return {"message": "home"}
I'm sending 10 concurrent requests to the server, which would take little over 0.5 seconds if it was a def method instead of async def:
# sending concurrent requests
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(send_request, [...])
I understand that I should be using some asynchronous method to sleep on the server but why does my CONCURRENT code behave sequentially and taking 5+ seconds when I'm sending concurrent requests?
Technically there's some latency in every line so does it mean I need to make every line asynchronous?
Defination :
asyncio is a library to write concurrent code using the async/await syntax.
asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.
asyncio is often a perfect fit for IO-bound and high-level structured network code.
You can see async performance in IO/Bound like : hitting to DB, Calling API and etc.
In your example time.sleep isn't IO/bound, you can use await asyncio.sleep(0.5) to imagine you have IO/bound.
As i said, if you wanna use async performance you should consider IO/bounds, in CPU bound like solving algorithm, encryption or other process depends on CPU/bound , async functions not helping you.
Simply put, whenever our system is waiting for an answer from an external service, instead of our system sitting idle, it goes to another task to get the answer from the external system.
Related
I am working with FastAPI and uvloop to serve a REST API in an efficient way.
I have a lot of asynchronous code that make calls to remote resources such as a database, a storage, etc, those functions looks like this:
async def _get_remote_resource(key: str) -> Resource:
# do some async work
return resource
I'm implementing an interface to an existing Abstract Base Class where I need to use the asynchronous function from above in a synchronous method. I have done something like:
class Resource:
def __str__(self):
resource = asyncio.run_until_complete(_get_remote_resource(self.key))
return f"{resource.pk}"
Great! Now I do an endpoint in fastapi to make this work accesible:
#app.get("")
async def get(key):
return str(Resource(key))
The problem is that FastAPI already gets and event loop running, using uvloop, and then the asynchronous code fails because the loop is already running.
Is there any way I can call the asynchronous method from the synchronous method in the class? Or do I have to rethink the structure of the code?
The runtime error is designed precisely to prevent what you are trying to do. run_until_complete is a blocking call, and using it inside an async def will halt the outer event loop.
The straightforward fix is to expose the needed functionality through an actual async method, e.g.:
class Resource:
def name(self):
return loop.run_until_complete(self.name_async())
async def name_async(self):
resource = await _get_remote_resource(self.key)
return f"{resource.pk}"
Then in fastapi you'd access the API in the native way:
#app.get("")
async def get(key):
return await Resource(key).name_async()
You could also define __str__(self) to return self.name(), but that's best avoided because something as basic as str() should be callable from within asyncio as well (due to use in logging, debugging, etc.).
I would like to complement the #user4815162342 answer.
FastAPI is an asychronous framework. I would suggest sticking to a few principles:
Do not execute IO operations in synchronous functions in a blocking way. Prepare this resource asynchronously and already pass the ready data to the synchronous function (this principle can be called an asynchronous dependency for synchronous code).
If you still need to perform a blocking IO operation in a synchronous code, then do it in a separate thread. And wait for this result asynchronously by means of asyncio (def endpoint, run_in_executor with ThreadPoolExecutor or def background task).
If you need to do a blocking CPU-bound operation, then delegate its execution to a separate process (the simplest way run_in_executor with ProcessPoolExecutor or any task queue).
Is there a way to run all messages that arrive to the same websocket sequentially, in a blocking way, while without blocking messages arriving to different websocket?
So let's assume someone is using ThreadPoolExecutor with 8 threads (to utilize all available cores), together with the yield statement and the #gen.coroutine decorator, every time the server runs executor.submit the task goes to some thread arbitrarily. I'd like to enforce that for a given WebSocket, only one thread will handle the tasks, in order to assure things will run sequentially.
I'm writing a class that will spawn tasks during its lifetime. Since I'm using Trio, I can't spawn tasks without a nursery. My first thought was to have a self._nursery in my class that I can spawn tasks into. But it seems that nursery objects can only be used in a context manager, so they are always closed in the same scope where they were created. I don't want to pass in a nursery from outside because it's an implementation detail, but I do want my objects to be able to spawn tasks that last as long as the object does (e.g. a heartbeat task).
How can I write such a class, which has long-lived background tasks, using Trio?
Excellent question!
One of Trio's weirdest, most controversial decisions is that it takes the position that the existence of a background task is not an implementation detail, and should be exposed as part of your API. On balance I think this is the right decision, but it's definitely a bit experimental and has some trade-offs.
Why does Trio do this? In my experience, other systems make it seem like you can abstract away the presence of a background task or thread, but in reality it leaks in all kinds of ways: they end up breaking control-C handling, or they cause problems when you're trying to exit the program cleanly, or they leak when you try to cancel the main operation, or you have sequencing problems because the function you called completed but the work it promised to do is still going on in the background, or the background task crashes with an unexpected exception and then the exception gets lost and all kinds of weird problems ensue... so while it might make your API feel a little messier in the short term, in the long term everything goes easier if you make this explicit.
Also, keep in mind that everyone else writing and using Trio libraries has the same issue, so your API is not going to feel too weird :-).
I don't know what you're trying to do exactly. Maybe it's something like a websocket connection, where you want to constantly be reading from the socket to respond to heartbeat ("ping") requests. One pattern would be to do something like:
#asynccontextmanager
async def open_websocket(url):
ws = WebSocket()
await ws._connect(url)
try:
async with trio.open_nursery() as nursery:
nursery.start_soon(ws._heartbeat_task)
yield ws
# Cancel the heartbeat task, since we're about to close the connection
nursery.cancel_scope.cancel()
finally:
await ws.aclose()
And then your users can use it like:
async with open_websocket("https://...") as ws:
await ws.send("hello")
...
If you want to get fancier, another option would be to provide one version where your users pass in their own nursery, for experts:
class WebSocket(trio.abc.AsyncResource):
def __init__(self, nursery, url):
self._nursery = nursery
self.url = url
async def connect(self):
# set up the connection
...
# start the heartbeat task
self._nursery.start_soon(self._heartbeat_task)
async def aclose(self):
# you'll need some way to shut down the heartbeat task here
...
and then also provide a convenience API, for those who just want one connection and don't want to mess with nurseries:
#asynccontextmanager
async def open_websocket(url):
async with trio.open_nursery() as nursery:
async with WebSocket(nursery, url) as ws:
await ws.connect()
yield ws
The main advantage of the pass-in-a-nursery approach is that if your users want to open lots of websocket connections, an arbitrary number of websocket connections, then they can open one nursery once at the top of their websocket management code, and then have lots of websockets inside it.
You're probably wondering, though: where do you find this #asynccontextmanager? Well, it's included in the stdlib in 3.7, but that's not even out yet, so depending on when you're reading this you might not be using it yet. Until then, the async_generator package gives you #asynccontextmanager all the way back to 3.5.
I have the following scenario:
There is one thread that manages long-polling HTTP connection (non-stop) from an API. When a new message arrives, it must be processed within the special process() method.
I just want to design it in a way that incoming messages will be processed concurrently, but there is another important point: in the end of each processing an answer should be passed to the outcoming queue, which is organized in a separated thread. From there the answers will be sent via HTTP.
Here is a scheme:
Let's consider that it can be 30-50 messages in a second, and procces method will work from 1 up to 10 seconds.
The question is: what library or framework can I use to implement this architecture?
As far as I have researched, Python Tornado have good benchmarks, but here I do not need a web framework, just a tool that can provide a concurrent running of message processors.
Your message rate is pretty low. So you may freely use "standard" tools like RabbitMQ/Redis, Celery ("Celery Project") and asyncio.
RabbitMQ/Redis with Celery - are great tools to implement queues and manage your tasks and processes.
Asyncio is faster than Tornado but it doesn't matter for your task. What is more important is that asyncio gives you all the benefits of modern async/await coroutine technique.
I think everyone knows what to do with long-running tasks in django: use celery and relax. But what if I want to get benefits of the websockets with aiohttp (or tornado)?
Let's say I have very CPU bound task which can take from a couple of seconds till multiple (5-10) minutes. It looks like pretty good idea to handle this task in websocket loop and notify user about the progress. No ajax requests, very fast response for short tasks.
async def websocket_handler(request):
ws = web.WebSocketResponse()
await ws.prepare(request)
async for msg in ws:
if msg.tp == aiohttp.MsgType.text:
answer_to_the_ultimate_question_of_life_the_universe_and_everything =\
long_running_task(msg.data, NotificationHelper(ws))
ws.send_str(json.dumps({
'action': 'got-answer',
'data': answer_to_the_ultimate_question_of_life_the_universe_and_everything,
}))
return ws
But on the other hand, CPU-bound task served in such way blocks entire thread as I understand. If I have 10 workers and 11 clients who wants to use application, 11th client won't be served until the 1st client's task is done.
Maybe, I should run tasks which look big in celery and tasks which look small in the main loop?
So, my question: is there any good design pattern for serving long-running tasks with async server?
Thanks!
Just run your long-running CPU-bound task by loop.run_in_executor() and send progress notifications by loop.call_soon_threadsafe().
If your job is not CPU but IO bound (sending emails for example) you may create a new task by loop.create_task() call. It looks like spawning new thread.
If you cannot use fire-and-forget approach you need to use persistent message broker like RabbitMQ (there is https://github.com/benjamin-hodgson/asynqp library for communicating with Rabbit in asyncio way).