Django, Apache, WSGIDaemonProcess, and threads

Django, Apache, WSGIDaemonProcess, and threads - python

I'm trying to tune the performance of a Django app that's served through Apache.
When doing load testing with Locust, I'm finding that latency is very high when the server is under heavy load. Right now I'm using the mpm_event_module using ThreadsPerChild=25 since that seems to be a common thing I've found in example configs. Similarly, inside my site config I have
WSGIDaemonProcess ... processes=8 threads=15 python-path=...
since this also matched some example.
I was wondering whether the GIL in Python applies to the webserving when configured this way. If so, does that mean I need to change my config to use more processes and fewer threads to get more parallelism? When I tried doing this, my load testing reported many more failures when loading static files, so I'm not sure if it makes sense.

Related

How to calculate max requests per second of a Django app?

I am about the deploy a Django app, and then it struck me that I couldn't find a way to anticipate how many requests per second my application can handle.
Is there a way of calculating how many requests per second can a Django application handle, without resorting to things like doing a test deployment and use an external tool such as locust?
I know there are several factors involved (such as number of database queries, etc.), but perhaps there is a convenient way of calculating, even estimating, how many visitors can a single Django app instance handle.
EDIT: Removed the mention to Gunicorn, since it only adds confusion to what I truly wanted to know.

Is there a way of calculating how many requests per second can a
Django application handle, without resorting to things like doing a
test deployment and use an external tool such as locust?
No and Yes. As mackarone pointed out, I don't think there's anyway you avoid measuring it. Consider the case where you did a local benchmark on your local dev server talking to a local DB instance, in order to generate a baseline for estimation. The issue with this is that the hardware, network (distance between services) all make a huge difference. So any numbers you generated locally would be relatively worthless for capacity planning.
In my experiences, local testing is great for relative changes. Consider the case where you wanted to see the performance impact of sql query planninng. Establishing a local baseline, making the change, than observing the effect locally is useful to gauge relative speedup.
How to generate these numbers?
I would recommend deploying the app to the hardware, and network you plan on testing on. This deploy should use your production configuration and component topology (ie if you're going to run gunicorn, make sure gunicorn is running instead of NGINX, or if you're going to have a proxy in front of gunicorn, make sure that is setup. I would run a single instance of your application using your production config.
Once this is running, I would launch a load test against the single instance using any of the popular load testing tools:
Apache Benchmark
Siege
Vegeta
K6
etc
You can launch these load tests from your single machine and ramp up traffic until response times are no longer acceptable in order to get a feel for the # of concurrent connections, and throughput your application can accommodate.
Now you have some idea of what a single instance of your service is able to handle. Up until your db (or other shared resources) are saturated these numbers can be used to project how many instances of your service are necessary to handle some amount of traffic!

According to the Gunicorn documentation
How Many Workers?
DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.
Obviously, your particular hardware and application are going to affect the optimal number of workers. Our recommendation is to start with the above guess and tune using TTIN and TTOU signals while the application is under load.
Always remember, there is such a thing as too many workers. After a point your worker processes will start thrashing system resources decreasing the throughput of the entire system.
The best thing is tune it using some load testing tool as locust as you mentioned.
Emphasis mine

You have to install (loadtest) first, it is a npm package,
I was learning redis and at that time I found this, you can use it, it worked for me,
For More check this tutorial: https://realpython.com/caching-in-django-with-redis/#start-by-measuring-performance
npm install -g loadtest
loadtest -n 100 -k http://localhost:8000/myUrl/

nginx complex procedural rewrite rule -- plugin? separate server?

I need to use nginx to serve static files (many, many files, most small, but also a mix of big video files). Configuring nginx as a static file server is, of course, pretty easy except...
... the processing we need to take the "input" path (where the user thinks the file is) and convert it to the real location is too complex to be expressed using nginx's rewrite engines. In general, the logic that does the computation is currently expressed in a Python library. (Just accept it's too complex to be an nginx rewrite rule.)
What are my options for making nginx be able to apply my custom rewrite rules to arrive at the proper path for serving the files? I've thought of these 2 options:
Write some C code that I add to nginx that calls out to Python and does my computation. (For this answer, it doesn't even matter that it's Python.
I'm just injecting C code into nginx that will do an arbitrary computation.) I have, though, no clue as to where to even start looking for where to make this modification to allow nginx to do this.
Is it possible that nginx can query a separate backend process (basically a very simple tornado server) and say "hey, given path X what path Y should I serve)?
While not ideal to go cross process, I figure if the two processes are on the same machine, the latency should be low.
Just write a tornado server which does the computation and then does a redirect to nginx.
This is do-able, but then requires our client code to handle redirects, and (a) this feels slower and (b) I don't feel like messing with the clients to make them handle redirects, since now they have to go roundtrip twice to the server.
??

The Lua nginx module is probably the answer you are looking for. It can be configured to call out to an external process to decide how to handle a request, similar to what you're describing in your question, or even to figure out how to handle a request entirely on its own (possibly by querying a database).

Reload single module in cherrypy?

Is it possible to use the python reload command (or similar) on a single module in a standalone cherrypy web app? I have a CherryPy based web application that often is under continual usage. From time to time I'll make an "important" change that only affects one module. I would like to be able to reload just that module immediately, without affecting the rest of the web application. A full restart is, admittedly, fast, however there are still several seconds of downtime that I would prefer to avoid if possible.

Reloading modules is very, very hard to do in a sane way. It leads to the potential of stale objects in your code with impossible-to-interrogate state and subtle bugs. It's not something you want to do.
What real web applications tend to do is to have a server that stays alive in front of their application, such as Apache with mod_proxy, to serve as a reverse proxy. You start your new app server, change your reverse proxy's routing, and only then kill the old app server.
No downtime. No insane, undebuggable code.

Apache + mod_wsgi / Lighttpd + wsgi - am I going to see differences in performance?

I'm a newbie to developing with Python and I'm piecing together the information I need to make intelligent choices in two other open questions. (This isn't a duplicate.)
I'm not developing using a framework but building a web app from scratch using the gevent library. As far as front-end web servers go, it seems I have three choices: nginx, apache, and lighttpd.
From all accounts that I've read, nginx's mod_wsgi isn't suitable.
That leaves two choices - lighttpd and Apache. Under heavy load, am I going to see major differences in performance and memory consumption characteristics? I'm under the impression Apache tends to be memory hungry even when not using prefork, but I don't know how suitable lighttp is for Python apps.
Are there any caveats or benefits to using lighttpd over apache? I really want to hear all the information you can possibly bore me with!

Apache...
Apache is by far the most widely used web server out there. Which is a good thing. There is so much more information on how to do stuff with it, and when something goes wrong there are a lot of people who know how to fix it. But, it is also the slowest out of the box; requring a lot of tweaking and a beefier server than Lighttpd. In your case, it will be a lot easier to get off the ground using Apache and Python. There are countless AMP packages out there, and many guides on how to setup python and make your application work. Just a quick google search will get you on your way. Under heavy load, Lighttpd will outshine Apache, but Apache is like a train. It just keeps chugging along.
Pros
Wide User Base
Universal support
A lot of plugins
Cons
Slow out of the box
Requires performance tweaking
Memory whore (No way you could get it working on a 64MB VPS)
Lighttpd...
Lighttpd is the new kid on the block. It is fast, powerful, and kicks ass performance wise (not to mention use like no memory). Out of the box, Lighttpd wipes the floor with Apache. But, not as many people know Lighttpd, so getting it to work is harder. Yes, it is the second most used webserver, but it does not have as much community support behind it. If you look here, on stackoverflow, there is this dude who keeps asking about how to get his Python app working but nobody has helped him. Under heavy load, if configured correctly, Lighttpd will out preform Apache (I did some tests a while back, and you might see a 200-300% performance increase in requests per second).
Pros
Fast out of the box
Uses very little memory
Cons
Not as much support as Apache
Sometimes just does not work
Nginx
If you were running a static website, then you would use nginx. you are correct in saying nginx's mod_wsgi isn't suitable.
Conclusion
Benefits? There are both web servers; designed to be able to replace one another. If both web servers are tuned correctly and you have ample hardware, then there is no real benefit of using one over another. You should try and see which web server meets your need, but asking me; I would say go with Lighttpd. It is, in my opinion, easier to configure and just works.
Also, You should look at Cherokee Web Server. Mad easy to set up and, the performance aint half bad. And you should ask this on Server Fault as well.

That you have mentioned gevent is important. Does that mean you are specifically trying to implement a long polling application? If you are and that functionality is the bulk of the application, then you will need to put your gevent server behind a front end web server that is implemented using async techniques rather that processes/threading model. Lighttd is an async server and fits that bill whereas Apache isn't. So use of Apache isn't good as front end proxy for long polling application. If that is the criteria though, would actually suggest you use nginx rather than Lighttpd.
Now if you are not doing long polling or anything else that needs high concurrency for long running requests, then you aren't necessarily going to gain too much by using gevent, especially if intention is to use a WSGI layer on top. For WSGI applications, ultimately the performance difference between different servers is minimal because your application is unlikely to be a hello world program that the benchmarks all use. The real bottlenecks are not the server but your application code, database, external callouts, lack of caching etc etc. In light of that, you should just use whatever WSGI hosting mechanism you find easier to use initially and when you properly work out what the hosting requirements are for your application, based on having an actual real application to test, then you can switch to something more appropriate if necessary.
In summary, you are just wasting your time trying to prematurely optimize by trying to find what may be the theoretically best server when in practice your application is what you should be concentrating on initially. After that, you also should be looking at application monitoring tools, because without monitoring tools how are you even going to determine if one hosting solution is better than another.

A good multithreaded python webserver?

I am looking for a python webserver which is multithreaded instead of being multi-process (as in case of mod_python for apache). I want it to be multithreaded because I want to have an in memory object cache that will be used by various http threads. My webserver does a lot of expensive stuff and computes some large arrays which needs to be cached in memory for future use to avoid recomputing. This is not possible in a multi-process web server environment. Storing this information in memcache is also not a good idea as the arrays are large and storing them in memcache would lead to deserialization of data coming from memcache apart from the additional overhead of IPC.
I implemented a simple webserver using BaseHttpServer, it gives good performance but it gets stuck after a few hours time. I need some more matured webserver. Is it possible to configure apache to use mod_python under a thread model so that I can do some object caching?

CherryPy. Features, as listed from the website:
A fast, HTTP/1.1-compliant, WSGI thread-pooled webserver. Typically, CherryPy itself takes only 1-2ms per page!
Support for any other WSGI-enabled webserver or adapter, including Apache, IIS, lighttpd, mod_python, FastCGI, SCGI, and mod_wsgi
Easy to run multiple HTTP servers (e.g. on multiple ports) at once
A powerful configuration system for developers and deployers alike
A flexible plugin system
Built-in tools for caching, encoding, sessions, authorization, static content, and many more
A native mod_python adapter
A complete test suite
Swappable and customizable...everything.
Built-in profiling, coverage, and testing support.

Consider reconsidering your design. Maintaining that much state in your webserver is probably a bad idea. Multi-process is a much better way to go for stability.
Is there another way to share state between separate processes? What about a service? Database? Index?
It seems unlikely that maintaining a huge array of data in memory and relying on a single multi-threaded process to serve all your requests is the best design or architecture for your app.

Twisted can serve as such a web server. While not multithreaded itself, there is a (not yet released) multithreaded WSGI container present in the current trunk. You can check out the SVN repository and then run:
twistd web --wsgi=your.wsgi.application

Its hard to give a definitive answer without knowing what kind of site you are working on and what kind of load you are expecting. Sub second performance may be a serious requirement or it may not. If you really need to save that last millisecond then you absolutely need to keep your arrays in memory. However as others have suggested it is more than likely that you don't and could get by with something else. Your usage pattern of the data in the array may affect what kinds of choices you make. You probably don't need access to the entire set of data from the array all at once so you could break your data up into smaller chunks and put those chunks in the cache instead of the one big lump. Depending on how often your array data needs to get updated you might make a choice between memcached, local db (berkley, sqlite, small mysql installation, etc) or a remote db. I'd say memcached for fairly frequent updates. A local db for something in the frequency of hourly and remote for the frequency of daily. One thing to consider also is what happens after a cache miss. If 50 clients all of a sudden get a cache miss and all of them at the same time decide to start regenerating those expensive arrays your box(es) will quickly be reduced to 8086's. So you have to take in to consideration how you will handle that. Many articles out there cover how to recover from cache misses. Hope this is helpful.

Not multithreaded, but twisted might serve your needs.

You could instead use a distributed cache that is accessible from each process, memcached being the example that springs to mind.

web.py has made me happy in the past. Consider checking it out.
But it does sound like an architectural redesign might be the proper, though more expensive, solution.

Perhaps you have a problem with your implementation in Python using BaseHttpServer. There's no reason for it to "get stuck", and implementing a simple threaded server using BaseHttpServer and threading shouldn't be difficult.
Also, see http://pymotw.com/2/BaseHTTPServer/index.html#module-BaseHTTPServer about implementing a simple multi-threaded server with HTTPServer and ThreadingMixIn

I use CherryPy both personally and professionally, and I'm extremely happy with it. I even do the kinds of thing you're describing, such as having global object caches, running other threads in the background, etc. And it integrates well with Apache; simply run CherryPy as a standalone server bound to localhost, then use Apache's mod_proxy and mod_rewrite to have Apache transparently forward your requests to CherryPy.
The CherryPy website is http://cherrypy.org/

I actually had the same issue recently. Namely: we wrote a simple server using BaseHTTPServer and found that the fact that it's not multi-threaded was a big drawback.
My solution was to port the server to Pylons (http://pylonshq.com/). The port was fairly easy and one benefit was it's very easy to create a GUI using Pylons so I was able to throw a status page on top of what's basically a daemon process.
I would summarize Pylons this way:
it's similar to Ruby on Rails in that it aims to be very easy to deploy web apps
it's default templating language, Mako, is very nice to work with
it uses a system of routing urls that's very convenient
for us performance is not an issue, so I can't guarantee that Pylons would perform adequately for your needs
you can use it with Apache & Lighthttpd, though I've not tried this
We also run an app with Twisted and are happy with it. Twisted has good performance, but I find Twisted's single-threaded/defer-to-thread programming model fairly complicated. It has lots of advantages, but would not be my choice for a simple app.
Good luck.

Just to point out something different from the usual suspects...
Some years ago while I was using Zope 2.x I read about Medusa as it was the web server used for the platform. They advertised it to work well under heavy load and it can provide you with the functionality you asked.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.