Handling time consuming requests in Flask-UWSGI app

Handling time consuming requests in Flask-UWSGI app - python

Am running an app with Flask , UWSGI and Nginx. My UWSGI is set to spawn out 4 parallel processes to handle multiple requests at the same time. Now I have one request that takes lot of time and that changes important data concerning the application. So, when one UWSGI process is processing that request and say all others are also busy, the fifth request would have to wait. The problem here is I cannot change this request to run in an offline mode as it changes important data and the user cannot simply remain unknown about it. What is the best way to handle this situation ?

As an option you can do the following:
Separate the heavy logic from the function which is being called
upon #route and move it into a separate place (a file, another
function, etc)
Introduce Celery to run that pieces of heavy logic
(it will be processed in a separate thread from the #route-decorated functions).
A quick way of doing this is using Redis as a message broker.
Schedule the time-consuming functions from your #route-decorated
functions in Celery (it is possible to pass parameters as well)
This way the HTTP requests won't be blocked for the complete function execution time.

Related

Is there any point of using connection pools with a non multi threaded web server?

I have built a webserver written in python using the flask framework and psycopg2 and I have some questions about concurrent processing as it relates to dbs and the server itself. I am using gunicorn to start my app with
web:gunicorn app:app.
From my understanding a webserver such as this processes requests one at a time. So, if someone makes a get or post request to the server the server must finish responding to that get or post request before it can then move on to another request. If this is the case, then why would I need to make more than one connection cursor object? For example, if someone were making a post request that requires me to update something in the db, then my server can't process requests until I return out of that post end point anyway so that one connection object isn't bottle necking anything is it?
Ultimately, I am trying to allow my server to process a large number of requests simultaneously. In order to do so, I think I would first have to make multiple instances of my server, and THEN the connection pool comes into play right? I think in order to make multiple instances of my server (apologies if any terminologies are being used incorrectly here), I would do one of these things:
one way would be to: I would need to use multiple threads and if the machine my application is deployed on in the cloud has multiple cpu cores, then it can do this(?). However, I have read that python does not support "True multithreading" meaning a multi threaded program is not actually running concurrently, it's just switching back and forth between those threads really quickly, so would this really be any different than my set up currently?
the second way: use multiple gunicorn workers, or use multiple dynos. I think this route is the solution here, but I don't understand the theory on how to set this up at all. If I spawn additional gunicorn workers, what is happening behind the scenes? Would this still all run on my heroku application instance? Does the amount of cores I have access to on heroku affect this in anyway? Also, regardless of which way I pick, what would I be looking to change in the app.py code or would the change solely be inside the procfile?
Assuming I manage to set up multithreading or gunicorn workers, how would this then affect the connection pool set up/what should I do in regards to the connection pool? If anyone familiar with this can help provide some theory or explanations or some resources, I would greatly appreciate it. Thanks all!

From my experience with python here's what I've learned...
If you are using multiple threads or async then you need to use a pool or an async connection
If you have multiple processes and your code is strictly synchronous with no threads then a pool is not necessary. You can reuse a single connection for each process since they are not shared between each other.
Threads dont speed up execution speed in python usually since python will only ever run one thread at a time. Though they can help speed if threads need to block.
For web servers the true bottle neck is IO usually, meaning connecting to db or read file or w.e. Multiple process and making those process async gives the greatest performance. Starlette is a async version of Flask... kinda and is usually much faster when setup properly and using async libraries

Number and type of Gunicorn workers and file handling in Flask app

I am new to python and have just started using flask and building a web application that might need to handle 5 or 6 users at the same time. My flask app will trigger some shell scripts (based on user input) which might take 1 or 2 minutes to process and also it will mail the user about the stauts of their request. It will also write logs in a particular .log file for each day. Now I was reading about it and found out that I dont need to worry about handling multiple request as the GIL will not allow to run more than one instance of my application, even if I have 10 instances running at the same time. Now as mentioned in the Gunicorn doc sync workers will be able to handle only one request, I need to use Async workers but will it affect my application? Like writing to the log file and executing the shell scripts. Do I need to implement locking on the logfile and will it affect the shell script execution?

GIL. It seems you don't need to care about GIL. GIL is about multithreading limitations, in your case, workers are independently running instances of your application. And even in the case of multithreading, GIL will be released when you will call your shell script, so that wouldn't be a problem either.
AsyncWorker. As I understood your design, will not help you, because you need actions (shell and writing log) sequentially, one by one.
Locking. If by each user request you'll run a separate shell script process and write to a separate log file then you don't need locking.
The core thing is that your application will be run by Gunicorn multiple times (workers count), simultaneously and independently. The count of workers depends on the user count you want to process and the type of your load, but values like 2 * cpu_cores_count is almost always fit good. I would recommend sync worker type.
Also, I wrote a blog post about production-ready gunicorn configuration. You may be interested in.

Make a non-blocking request with requests when running Flask with Gunicorn and Gevent

My Flask application will receive a request, do some processing, and then make a request to a slow external endpoint that takes 5 seconds to respond. It looks like running Gunicorn with Gevent will allow it to handle many of these slow requests at the same time. How can I modify the example below so that the view is non-blocking?
import requests
#app.route('/do', methods = ['POST'])
def do():
result = requests.get('slow api')
return result.content
gunicorn server:app -k gevent -w 4

If you're deploying your Flask application with gunicorn, it is already non-blocking. If a client is waiting on a response from one of your views, another client can make a request to the same view without a problem. There will be multiple workers to process multiple requests concurrently. No need to change your code for this to work. This also goes for pretty much every Flask deployment option.

First a bit of background, A blocking socket is the default kind of socket, once you start reading your app or thread does not regain control until data is actually read, or you are disconnected. This is how python-requests, operates by default. There is a spin off called grequests which provides non blocking reads.
The major mechanical difference is that send, recv, connect and accept
can return without having done anything. You have (of course) a number
of choices. You can check return code and error codes and generally
drive yourself crazy. If you don’t believe me, try it sometime
Source: https://docs.python.org/2/howto/sockets.html
It also goes on to say:
There’s no question that the fastest sockets code uses non-blocking
sockets and select to multiplex them. You can put together something
that will saturate a LAN connection without putting any strain on the
CPU. The trouble is that an app written this way can’t do much of
anything else - it needs to be ready to shuffle bytes around at all
times.
Assuming that your app is actually supposed to do something more than
that, threading is the optimal solution
But do you want to add a whole lot of complexity to your view by having it spawn it's own threads. Particularly when gunicorn as async workers?
The asynchronous workers available are based on Greenlets (via
Eventlet and Gevent). Greenlets are an implementation of cooperative
multi-threading for Python. In general, an application should be able
to make use of these worker classes with no changes.
and
Some examples of behavior requiring asynchronous workers: Applications
making long blocking calls (Ie, external web services)
So to cut a long story short, don't change anything! Just let it be. If you are making any changes at all, let it be to introduce caching. Consider using Cache-control an extension recommended by python-requests developers.

You can use grequests. It allows other greenlets to run while the request is made. It is compatible with the requests library and returns a requests.Response object. The usage is as follows:
import grequests
#app.route('/do', methods = ['POST'])
def do():
result = grequests.map([grequests.get('slow api')])
return result[0].content
Edit: I've added a test and saw that the time didn't improve with grequests since gunicorn's gevent worker already performs monkey-patching when it is initialized: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L65

Do I need celery when I am using gevent?

I am working on a django web app that has functions (say for e.g. sync_files()) that take a long time to return. When I use gevent, my app does not block when sync_file() runs and other clients can connect and interact with the webapp just fine.
My goal is to have the webapp responsive to other clients and not block. I do not expect a zillion users to connect to my webapp (perhaps max 20 connections), and I do not want to set this up to become the next twitter. My app is running on a vps, so I need something light weight.
So in my case listed above, is it redundant to use celery when I am using gevent? Is there a specific advantage to using celery? I prefer not to use celery since it is yet another service that will be running on my machine.
edit: found out that celery can run the worker pool on gevent. I think I am a litle more unsure about the relationship between gevent & celery.

In short you do need a celery.
Even if you use gevent and have concurrency, the problem becomes request timeout. Lets say your task takes 10 minutes to run however the typical request timeout is about up to a minute. So what will happen if you trigger the task directly within a view is that the server will start processing it however after a minute a client (browser) will probably disconnect the connection since it will think the server is offline. As a result, your data can become corrupt since you cannot be guaranteed what will happen when connection will close. Celery solves this because it will trigger a background process which will process the task independent of the view. So the user will get the view response right away and at the same time the server will start processing the task. That is a correct pattern to handle any scenarios which require lots of processing.

Django, sleep() pauses all processes, but only if no GET parameter?

Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.

As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!

Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.

It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.