Flask==2.0 (+uwsgi processes=5, threads=20)
Python: 3.9
I have a route that will accept a message and re-direct this message to other API(s) based on the type of message. Let's say there are 4 types of messages with 4 matching downstream APIs. The request is made using requests library and downstream API will return a response that my route will need to return to the client (so I need to wait for the response).
The issue is sometimes one of the downstream APIs can have an issue exhibiting high latency. This has an undesired result as this makes my application slow and impact other message types.
Is there a way to make these requests calls non-blocking so the slow downstream API responses don't slow down the whole app?
I read the flask async guide and from my understanding you can get real benefit if you're making multiple requests calls within a single route which isn't the case for me, it's always a single request.
Async requests to your downstream API could provide some relief in your case, I believe:
Async is beneficial when performing concurrent IO-bound tasks, but will probably not improve CPU-bound tasks.
Consider also using a queueing mechanism (such as background tasks in flask, or something more fully fledged such as RabbitMQ).
Final words - also consider defining timeout for your requests.
https://flask.palletsprojects.com/en/2.1.x/async-await/
Related
My Flask application will receive a request, do some processing, and then make a request to a slow external endpoint that takes 5 seconds to respond. It looks like running Gunicorn with Gevent will allow it to handle many of these slow requests at the same time. How can I modify the example below so that the view is non-blocking?
import requests
#app.route('/do', methods = ['POST'])
def do():
result = requests.get('slow api')
return result.content
gunicorn server:app -k gevent -w 4
If you're deploying your Flask application with gunicorn, it is already non-blocking. If a client is waiting on a response from one of your views, another client can make a request to the same view without a problem. There will be multiple workers to process multiple requests concurrently. No need to change your code for this to work. This also goes for pretty much every Flask deployment option.
First a bit of background, A blocking socket is the default kind of socket, once you start reading your app or thread does not regain control until data is actually read, or you are disconnected. This is how python-requests, operates by default. There is a spin off called grequests which provides non blocking reads.
The major mechanical difference is that send, recv, connect and accept
can return without having done anything. You have (of course) a number
of choices. You can check return code and error codes and generally
drive yourself crazy. If you don’t believe me, try it sometime
Source: https://docs.python.org/2/howto/sockets.html
It also goes on to say:
There’s no question that the fastest sockets code uses non-blocking
sockets and select to multiplex them. You can put together something
that will saturate a LAN connection without putting any strain on the
CPU. The trouble is that an app written this way can’t do much of
anything else - it needs to be ready to shuffle bytes around at all
times.
Assuming that your app is actually supposed to do something more than
that, threading is the optimal solution
But do you want to add a whole lot of complexity to your view by having it spawn it's own threads. Particularly when gunicorn as async workers?
The asynchronous workers available are based on Greenlets (via
Eventlet and Gevent). Greenlets are an implementation of cooperative
multi-threading for Python. In general, an application should be able
to make use of these worker classes with no changes.
and
Some examples of behavior requiring asynchronous workers: Applications
making long blocking calls (Ie, external web services)
So to cut a long story short, don't change anything! Just let it be. If you are making any changes at all, let it be to introduce caching. Consider using Cache-control an extension recommended by python-requests developers.
You can use grequests. It allows other greenlets to run while the request is made. It is compatible with the requests library and returns a requests.Response object. The usage is as follows:
import grequests
#app.route('/do', methods = ['POST'])
def do():
result = grequests.map([grequests.get('slow api')])
return result[0].content
Edit: I've added a test and saw that the time didn't improve with grequests since gunicorn's gevent worker already performs monkey-patching when it is initialized: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L65
In one of the views in my django application, I need to perform a relatively lengthy network IO operation. The problem is other requests must wait for this request to be completed even though they have nothing to do with it.
I did some research and stumbled upon Celery but as I understand, it is used to perform background tasks independent of the request. (so I can not use the result of the task for the response to the request)
Is there a way to process views asynchronously in django so while the network request is pending other requests can be processed?
Edit: What I forgot to mention is that my application is a web service using django rest framework. So the result of a view is a json response not a page that I can later modify using AJAX.
The usual solution here is to offload the task to celery, and return a "please wait" response in your view. If you want, you can then use an Ajax call to periodically hit a view that will report whether the response is ready, and redirect when it is.
You want to maintain that HTTP connection for an extended period of time but still allow other requests to be managed, right? There's no simple solution to this problem. Also, any solution will be a level away from Django as it depends on how you process requests.
I don't know what you're currently using, so I can only tell you how I handled this in the past... I was using uwsgi to provide the WSGI interface between my python application and nginx. In uwsgi I used the asynchronous functions to suspend my long running connection when there was time to wait on the IO connections. The methods allow you to ask it to suspend things until there is something to read or write and then allow other connections to be serviced.
The above mentioned async calls use "green threads". It's much lighter weight then regular threads and you have control over when you move from thread to thread.
I am not saying that it is a good solution for your scenario[1], but the simple answer is using the following pattern:
async_result = some_task.delay(arg1)
result = async_result.get()
Check documentation for the get method. And instead of using the delay method you can use anything that returns an AsyncResult (like the apply_async method
[1] Why it may be a bad idea? Having an ongoing connection waiting a lot is bad for Django (it is not ready for long-lived connections), may conflict with the proxy configuration (if there is a reverse proxy somewhere) and may be identified as a timeout from the browser. So... it seems a Bad Idea[TM] to use this pattern for a Django Rest Framework view.
I have a REST API and now I want to create a web site that will use this API as only and primary datasource. The system is distributed: REST API is on one group of machines and the site will be on the other(s).
I'm expecting to have quite a lot of load, so I'd like to make requests as efficient, as possible.
Do I need some async HTTP requests library or any HTTP client library will work?
API is done using Flask, web site will be also built using Flask and Jinja as template engine.
You could use gevent with Flask to get asynchronous I/O from normally synchronous libraries. See this question for an example of someone getting help with doing that.
You could also run Flask behind gunicorn, which has support for spawning multiple workers (threads, processes, or greenlets) for handling concurrent requests. If you were to take that approach, Flask would remain completely synchronous, and gunicorn would handle creating multiple Flask instances to handle concurrent requests.
Start simple and use the way, which seems to be easy to use for you. Consider optimization to be done later on only if needed.
Use of async libraries would come into play as helpful if you would have thousands of request a second. Much sooner you are likely to have performance problems related to database (if you use it), which is not to be resolved by async magic.
Actually your API is on a separate machine. Even if you make your client calls asynchronous , it will not have any impact on the server. By making your calls asynchronous, your thread in the client will not wait for the response. Your server will react the same when call is sync /async.
And if you want to make your calls async , please check http://stackandqueue.com/?p=57 . It uses unirest to make both get and post async calls
We're creating a web service (RESTful API) in Django. Our API will wrap both our own internal data as well as some other APIs that our web services layer will be accessing.
One of the APIs we're using has some long-running calls that don't return an HTTP response for on the order of a minute. The API has a separate API call to get status of the current operation, but that means that the user has to initiate the long-running operation, then have a separate process poll for status. We don't want our API to work that way, we want the initial request to just return a response that says that it's in progress.
So what we want to do is when we get a long-running request, we kick off an HTTP request of our own to the API asynchronously, then return a response. Then every time we get a status poll we just pass that through and respond with the response we got. When we get the callback that the operation is complete, then the next time we get a status poll we'll just respond that the operation is complete and return the data. This means that we'll need handlers for incoming status requests to check the list of in-progress long-running requests to respond with the status.
Does this seem like a reasonable way to approach this? Which python libraries we should be looking at to make this sort of thing easier? We're not sure whether to go with something low-level like eventlet or twisted, or something a little heavier-weight like celery. Celery seems to be the normal recommendation for this sort of thing, but I'm not 100% sure what its place would be.
Thanks,
Spencer
I faced the same situation a couple of months ago, probably you already solved your problem, but for other person facing the same situation I'll post what I did at that time.
Basically I used the http://www.celeryproject.org/ library, dispatching in a asynchronous way a long running operation returning a successful HTTP response the celery job id, the asynch operation would register the status and job id in a sqlite database (was enough for what I was doing), and a client was querying (using rest) the status of the job.
When I get a GET request from a user, I send them the response and then spend maybe a second logging stuff about that request. Is there a way to close the connection when I have the response ready, but continue doing that logging part, so that the user wouldn't have to wait for it to complete?
From the Google App Engine docs for the Response object:
App Engine does not support sending
data to the user's browser before
exiting the handler. Some web servers
use this technique to "stream" data to
the user's browser over a period of
time in response to a single request.
App Engine does not support this
streaming technique.
So there's no easy way. If you have a bundle of data that you can pass to a longer-running "process and log" method, try using the deferred library. Note that this will requiring bundling your data up and sending it to the task queue to do your processing and logging, so
you may not save much time, and
the results may not look much like you'd want - for example, you'd be logging from a different request, so might need to radically alter the logging
Still, you could try.
You have two options:
Use the Task Queue API. Enqueueing a task should be fast, so long as you have less than 10k of data (which is the limit on a Task Queue payload).
Use the 'sneaky' trick described by Rafe in this video to do processing after the response completes.