All righty so I want to explain my small django issue, that I am having trouble getting around.
The Problem
I have a small website, just a couple of pages that display a list of database records. The website is an internal render farm monitor for my company which will have perhaps a dozen or two active connections at any time. No more than 50.
The problem is that I have three update services that cause a real performance hit when turned on.
The update services each are python scripts that:
Use urllib2 to make a http request to a url.
Wait for the response
Print a success message with time stamps to a log.
Wait 10 seconds, and start again.
The URLs they send requests to cause my django website to poll an external service and read new data into our django database. The urls look like this:
http://webgrid/updateJobs/ (takes about 5 - 15 seconds per update )
http://webgrid/updateTasks/ (takes about 25 - 45 seconds per update )
http://webgrid/updateHosts/ (takes about 5 - 15 seconds per update )
When these update services are turned on (especially updateTasks), it can take well over 10 seconds for http://webgrid/ to even start loading for normal users.
The Setup
Django 1.8, deployed with Gunicron v18.
The main gunicorn service is run with these arguments (Split into a list for easier reading).
<PATH_TO_PYTHON>
<PATH_TO_GUNICORN>
-b localhost:80001
-u farmer
-t 600
-g <COMPANY_NAME>
--max-requests 10000
-n bb_webgrid
-w 17
-p /var/run/gunicorn_bb_webgrid.pid
-D
--log-file /xfs/GridEngine/bbgrid_log/bb_webgrid.log
bb_webgrid.wsgi:application
Apache config for this site:
<VirtualHost *:80>
ServerName webgrid.<INTERAL_COMPANY_URL>
ServerAlias webgrid
SetEnv force-proxy-request-1.0 1
DocumentRoot /xfs/GridEngine/bb_webgrid/www
CustomLog logs/webgrid_access.log combined
ErrorLog logs/webgrid_error.log
#LogLevel warn
<Directory "/xfs/GridEngine/bb_webgrid/www">
AllowOverride All
</Directory>
WSGIDaemonProcess webgrid processes=17 threads=17
WSGIProcessGroup webgrid
</VirtualHost>
This kind of thing shouldn't be done online; by hitting a URL which directs to a view you are unnecessarily tying up your webserver which stops it from doing its real job, which is to respond to user requests.
Instead, do this out-of-band. A really quick an easy way to do this is to write a Django management command; that way you can easily call model methods from a command-line script. Now you can simply point your cron job, or whatever it is, to call these commands, rather than calling a separate Python script which calls a URL on your site.
An alternative is to use Celery; it's a really good system for doing long-running asynchronous tasks. It even has its own scheduling system, so you could replace your cron jobs completely.
Related
I know it's not recommended to run a Bottle or Flask app on production with python myapp.py --port=80 because it's a development server only.
I think it's not recommended as well to run it with python myapp.py --port=5000 and link it to Apache with: RewriteEngine On, RewriteRule /(.*) http://localhost:5000/$1 [P,L] (or am I wrong?), because WSGI is preferred.
So I'm currently setting up Python app <-> mod_wsgi <-> Apache (without gunicorn or other tool to keep things simple).
Question: when using WSGI, I know it's Apache and mod_wsgi that will automatically start/stop enough processes running myapp.py when requests will come, but:
how can I manually stop these processes?
more generally, is there a way to monitor them / know how many processes started by mod_wsgi are currently still running? (one reason, among others, is to check if the processes terminate after a request or if they stay running)
Example:
I made some changes in myapp.py, and I want to restart all processes running it, that have been launched by mod_wsgi (Note: I know that mod_wsgi can watch changes on the source code, and relaunch, but this only works on changes made on the .wsgi file, not on the .py file. I already read that touch myapp.wsgi can be a solution for that, but more generally I'd like to be able to stop and restart manually)
I want to temporarily stop the whole application myapp.py (all instances of it)
I don't want to use service apache2 stop for that because I also run other websites with Apache, not just this one (I have a few VirtualHosts). For the same reason (I run other websites with Apache, and some client might be downloading a 1 GB file at the same time), I don't want to do service apache2 restart that would have an effect on all websites using Apache.
I'm looking for a cleaner way than kill pid or SIGTERM, etc. (because I read it's not recommended to use signals in this case).
Note: I already read How to do graceful application shutdown from mod_wsgi, it helped, but here it's complementary questions, not a duplicate.
My current Python Bottle + Apache + mod_wsgi setup:
Installation:
apt-get install libapache2-mod-wsgi
a2enmod wsgi # might be done automatically by previous line, but just to be sure
Apache config (source: Bottle doc; a more simple config can be found here):
<VirtualHost *:80>
ServerName example.com
WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5
WSGIScriptAlias / /home/www/wsgi_test/app.wsgi
<Directory />
Require all granted
</Directory>
</VirtualHost>
There should be up to 5 processes, is that right? As stated before in the question, how to know how many are running, how to stop them?
/home/www/wsgi_test/app.wsgi (source: Bottle doc)
import os
from bottle import route, template, default_app
os.chdir(os.path.dirname(__file__))
#route('/hello/<name>')
def index(name):
return template('<b>Hello {{name}}</b>!', name=name)
application = default_app()
Taken partially from this question, add display-name to WSGIDaemonProcess so you can grab them using a command like:
ps aux | grep modwsgi
Add this to your configuration:
Define GROUPNAME modwsgi
WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=%{GROUPNAME}
Update
There are a couple of reasons why ps would not give you the DaemonProcess display-name.
As shown in the docs:
display-name=value Defines a different name to show for the daemon
process when using the ps command to list processes. If the value is
%{GROUP} then the name will be (wsgi:group) where group is replaced
with the name of the daemon process group.
Note that only as many characters of the supplied value can be
displayed as were originally taken up by argv0 of the executing
process. Anything in excess of this will be truncated.
This feature may not work as described on all platforms. Typically it
also requires a ps program with BSD heritage. Thus on some versions of
Solaris UNIX the /usr/bin/ps program doesn’t work, but /usr/ucb/ps
does. Other programs which can display this value include htop.
You could:
Set a display-name of smaller length:
WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=wsws
And try to find them by:
ps aux | grep wsws
Or set it to %{GROUP} and filter using the name of the daemon process group (wsgi:group).
The way which processes are managed with mod_wsgi for each mode is described in:
http://modwsgi.readthedocs.io/en/develop/user-guides/processes-and-threading.html
For embedded mode, where your WSGI application is run inside of the Apache child worker processes, Apache manages when processes are created and destroyed based on the Apache MPM settings. Because of how Apache manages the processes, they can be shutdown at any time if there is insufficient request throughput, or more processes could be created if request throughput increases. When running, the same process will handle many requests over time until it gets shutdown. In other words, Apache dynamically manages the number of processes.
Because of this dynamic process management, it is a bad idea to use embedded mode of mod_wsgi unless you know how to tune Apache properly and many other things as well. In short, never use embedded mode unless you have a good amount of experience with Apache and running Python applications with it. You can watch a video about why you wouldn't want to run in embedded mode at:
https://www.youtube.com/watch?v=k6Erh7oHvns
There is also the blog post:
http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html
So use daemon mode and verify that your configuration is correct and you are in fact using daemon mode by using the check in:
http://modwsgi.readthedocs.io/en/develop/user-guides/checking-your-installation.html#embedded-or-daemon-mode
For daemon mode, the WSGI application runs in a separate set of managed processed. These are created at the start and will run until Apache is restarted, or reloading of the process is triggered for various reasons, including:
The daemon process is sent a direct signal to shutdown by a user.
The code of the application sends itself a signal.
The WSGI script file is modified, which will trigger a shutdown so the WSGI application can be reloaded.
A defined request timeout occurs due to stuck or long running request.
A defined maximum number of requests has occurred.
A defined inactivity timeout expires.
A defined timer for periodic process restart expires.
A startup timeout is defined and the WSGI application failed to load in that time.
In these cases, when the process is shutdown, it is replaced.
More details about the various timeout options and how the processes respond to signals can be found in:
http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIDaemonProcess.html
More details about source code reloading and touching of the WSGI script file can be found in:
http://modwsgi.readthedocs.io/en/develop/user-guides/reloading-source-code.html
One item which is documented is how you can incorporate code which will look for any changes to Python code files used by your application. When a change occurs to any of the files, the process will be restarted by sending itself a signal. This should only be used for development and never in production.
If you are using mod_wsgi-express in development, which is preferable to hand configuring Apache yourself, you can use the --reload-on-changes option.
If sending a SIGTERM signal to the daemon process, there is a set shutdown sequence where it will wait a few seconds to wait for current requests to finish. If the requests don't finish, the process will be shutdown anyway. That period of time is dictated by the shutdown timeout. You shouldn't play with that value.
If sending a SIGUSR1 signal to the daemon process, by default it acts just like sending a SIGTERM signal. If however you specify the graceful timeout for shutdown, you can extend how long it will wait for current requests to finish. New requests will be accepting during that period. That graceful timeout also applies in other cases as well, such as maxmimum number of requests received, or timer for periodic restart triggered. If you need the timeout when using SIGUSR1 to be different to those cases, define the eviction timeout instead.
As to how to identify the daemon processes to be sent the signal, use the display-name of option WSGIDaemonProcess. Then use ps to identify the processes, or possibly use killall if it uses the modified process name on your platform. Send the daemon processes the SIGUSR1 signal if want more graceful shutdown and SIGTERM if want them to restart straight away.
If you want to track how long a daemon process has been running, you can use:
import mod_wsgi
metrics = mod_wsgi.process_metrics()
The metrics value will include output like the following for the process the call is made in:
{'active_requests': 1,
'cpu_system_time': 0.009999999776482582,
'cpu_user_time': 0.05000000074505806,
'current_time': 1525047105.710778,
'memory_max_rss': 11767808,
'memory_rss': 11767808,
'pid': 4774,
'request_busy_time': 0.001851,
'request_count': 2,
'request_threads': 2,
'restart_time': 1525047096.31548,
'running_time': 9,
'threads': [{'request_count': 2, 'thread_id': 1},
{'request_count': 1, 'thread_id': 2}]}
If you just want to know how many processes/threads are used for the current daemon process group you can use:
mod_wsgi.process_group
mod_wsgi.application_group
mod_wsgi.maximum_processes
mod_wsgi.threads_per_process
to get details about the process group. The number of process is fixed at this time for daemon mode and the name maximum_processes is just to be consistent with what the name is in embedded mode.
If you need to run code on process shutdown, you should NOT try and define your own signal handlers. Do that and mod_wsgi will actually ignore them as they will interfere with normal operation of Apache and mod_wsgi. Instead, if you need to run code on process shutdown, use atexit.register(). Alternatively, you can subscribe to special events generated by mod_wsgi and trigger something off the process shutdown event.
Edit: a more simple WSGI config is given in my question of Python WSGI handler directly in Apache .htaccess, not in VirtualHost
Based on Evhz's answer, I made a simple test to check that the processes are still running:
Apache config:
<VirtualHost *:80>
ServerName example.com
<Directory />
AllowOverride All
Require all granted
</Directory>
WSGIScriptAlias / /home/www/wsgi_test/app.wsgi
WSGIDaemonProcess yourapp user=www-data group=www-data processes=5 threads=5 display-name=testwsgi
</VirtualHost>
app.wsgi file:
import os, time
from bottle import route, template, default_app
os.chdir(os.path.dirname(__file__))
#route('/hello/<name>')
def index(name):
global i
i += 1
return template('<b>Hello {{name}}</b>! request={{i}}, pid={{pid}}',
name=name, i=i, pid=os.getpid())
i = 0
time.sleep(3) # wait 3 seconds to make the client notice we launch a new process!
application = default_app()
Now access http://www.example.com/hello/you many times:
The initial time.sleep(3) will help, from the client browser, to see exactly when a new process is started, and the request counter i will allow to see how many requests have been served by each process.
The PIDs will correspond to those present in ps aux | grep testwsgi:
Also the time.sleep(3) will happen maximum 5 times (at the startup of each of the 5 processes), then the processes should run forever, until we restart/stop the server or modify the app.wsgi file (modifying it triggers a restart of the 5 processes, you can see new PIDs).
[I'll check that by letting my test run now, and access http://www.example.com/hello/you in 2 days to see if it's still a previously-launched process or a new one!]
Edit: the next day, the same processes were still up and running. Now, two days after, when reloading the same URL, I noticed new processes were created... (Is there a time after which a process with no request dies?)
I'm running multiple Django sites on the same Apache instance under mod_wsgi. Currently my apache.conf files contain the following directives (no WSGIApplicationGroup specified):
WSGIDaemonProcess mysite \
display-name=mysite \
threads=50 \
maximum-requests=10000 \
umask=0002 \
home=/srv/www/mysite \
python-path=/srv/www:/srv/src:/srv/venv/prod/lib/python2.7/site-packages \
python-eggs=/srv/.python-eggs
WSGIProcessGroup mysite
WSGIScriptAlias / /srv/www/mysite/wsgi.py
I touch /srv/www/mysite/wsgi.py whenever I need to reload the site, and it causes a noticeable freeze in all clients.
After reading https://groups.google.com/forum/#!topic/modwsgi/QJkt5UWYpss it sounds like I can get rid of the "reload pause", by specifying process/application groups in the WSGIScriptAlias directive:
WSGIDaemonProcess mysite \
display-name=mysite \
threads=50 \
maximum-requests=10000 \
umask=0002 \
home=/srv/www/mysite \
python-path=/srv/www:/srv/src:/srv/venv/prod/lib/python2.7/site-packages \
python-eggs=/srv/.python-eggs
WSGIScriptAlias / /srv/www/mysite/wsgi.py \
process-group=mysite \
application-group=mysite
IIUC, I need to provide both process-group= and application-group= for the preloading to happen.
All the docs I've found so far uses application-group=%{GLOBAL}, but that seems wrong for my use case, where each virtual host should run code based on the individual site's settings.py file (correct?).
Should I use the predefined %{RESOURCE} variable instead of mysite.
Can I share the same application-group between the http and https versions of the same site? (I know I can't do that with the process group).
Each virtual host Django site should use a separate daemon process group, so application-group of %{GLOBAL} is fine as it is forcing the use of the main interpreter context within the respective process groups. It is not shared across process groups.
Do note that preloading isn't going to necessarily help too much if you are doing restarts when the site is under heavy load as things will still need to wait for the process to start and load the application.
Having threads=50 looks to be quite excessive. What throughout do you get and what is your average response time. Best performance is achieved by using 3-5 threads per process and using multiple processes. Using multiple processes obviously means using more memory though as there will be multiple copies of your application.
Finally, yes it is recommended, unless you have a good reason otherwise, to have both HTTP and HTTPS versions of site delegated to the same daemon process group. Specify the WSGIDaemonProcess in the first VirtualHost as seen for that ServerName by Apache. In the second in the 80/443 pair, don't have a WSGIDaemonProcess and refer to the named process group in the other VirtualHost context. This reaching across is allowed where ServerName is the same.
I have an apache webserver which I have setup a website using flask using mod_wsgi. I am having a couple of issues which may or may not be related.
With every call to a certain page (which runs a function performing heavy computation that takes over 2 seconds), the memory increases about 20 megabytes. My server starts out with about 350megabytes consumed by everything on the machine. The server has a total of 3,620megabytes shown in htop. After I reload this page many times, the total memory used by the server eventually starts topping out around 2,400 megabytes and stops increasing as much. After it gets to this level I haven't been able to get it consume enough memory to go into swap after hundreds of page reloads. Is this by design of flask or apache or python? To me, if there were some kind of caching mechanism, it didn't seem like memory accumulation would happen if the same URL is called every time. If I restart apache, the memory is released.
Sometimes calls to this page result in called functions erroring out, even though they are all read only calls (not writing any data to the disk) and the query string is the same for every page.
I have another page (calling another function which does much less computation), when called concurrently with other pages running on the web server, randomly errors out or the result (an image) comes back unexpectedly.
Could issues 2 and 3 be related to issue 1? Could issues 2 and 3 be due to bad programming somehow or bad memory in the machine? I am able to reproduce the randomness by loading the same URL in about 40 firefox tabs and then choosing the "reload all tabs" option.
What more information should be provided to get a better answer?
I have tried placing
import gc
gc.collect()
into my code.
I do have
WSGIDaemonProcess website user=www-data group=www-data processes=2 threads=2 home=/web/website
WSGIScriptAlias / /web/website/website.wsgi
<Directory /web/website>
WSGIProcessGroup website
WSGIScriptReloading On
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>
in my /etc/apache2/sites-available/default file. It doesn't seem like the memory should grow that much if there are only a total of 4 threads being created, should there?
UPDATE
If I set processes=1 threads=4, then the seemingly random issues occur all the time when two requests are placed at once. One I set processes=4 threads=1, then the seemingly random issues don't happen. The rise in memory is still occurring though, and actually will now rise all the way to the max RAM of the system and start swapping.
UPDATE
Although I haven't gotten this runaway RAM consumption issue resolved, I didn't have problems for several months with my current application. Apparently it wasn't too popular, and after several days or so, apache may have been clearing out the RAM automatically or something.
Now, I've made another application, which is fairly unrelated to the previous one. The previous application was generating about 1 megapixel images using matplotlib. My new application is generating 20 megapixel images and 1 megapixel images using matplotlib. The problem is monumentally larger now when 20 megapixel images are generated with the new application. After the entire swap space is filled up, something seems to get killed, and things work at a decent speed for a while while there is some RAM and swap space available, but is much slower to run when the RAM is consumed. Here are the processes running. I don't think that there are any extra zombie processes running.
$ ps -ef|grep apache
root 3753 1 0 03:45 ? 00:00:02 /usr/sbin/apache2 -k start
www-data 3756 3753 0 03:45 ? 00:00:00 /usr/sbin/apache2 -k start
www-data 3759 3753 0 03:45 ? 00:02:06 /usr/sbin/apache2 -k start
www-data 3762 3753 0 03:45 ? 00:00:01 /usr/sbin/apache2 -k start
www-data 3763 3753 0 03:45 ? 00:00:01 /usr/sbin/apache2 -k start
test 4644 4591 0 12:27 pts/1 00:00:00 tail -f /var/log/apache2/access.log
www-data 4894 3753 0 21:34 ? 00:00:37 /usr/sbin/apache2 -k start
www-data 4917 3753 2 22:33 ? 00:00:36 /usr/sbin/apache2 -k start
www-data 4980 3753 1 22:46 ? 00:00:12 /usr/sbin/apache2 -k start
I am a little confused though when I look at htop because it shows a lot more processes than top or ps.
UPDATE
I have figured out that the memory leak is due to matplotlib (or the way I am using it), and not flask or apache, so the problems 2 and 3 I originally posted are indeed a separate issue from problem 1. Below is a basic function that I made to eliminate/reproduce the problem, interactively in ipython.
def BigComputation():
import cStringIO
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
#larger figure size causes more RAM to be used when savefig is run.
#this function also uses some RAM that is never released automatically
#if plt.close('all') is never run, but it is a small amount,
#so it is hard to tell unless run BigComputation thousands of times.
TheFigure=plt.figure(figsize=(250,8))
file_output = cStringIO.StringIO()
#causes lots of RAM to be used, and never released automatically
TheFigure.savefig(file_output)
#releases all the RAM that is never released automatically
plt.close('all')
return None
The trick to getting rid of the RAM leak is to run
plt.close('all')
within BigComputation(), otherwise, BigComputation() will just keep accumulating RAM every time the function is called. I don't know if I am just using matplotlib inappropriately or have bad coding technique, but I really would think that once BigComputation() returns, it should release all the memory except any global objects or the objects it returned. It seems to me like matplotlib must be creating some global variables in an inappropriate way, because I have no idea what they are named.
I guess where my question stands now is why do I need plt.close('all')? I also need to try the suggestions of Graham Dumpleton in order to further diagnose my apache configuration to see why I need to set threads=1 in apache to get the random errors to go away.
Obviously a programming issue, but made worse by running a multiprocess configuration. Read:
http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html
and also perhaps watch:
http://lanyrd.com/2012/pycon/spcdg/
http://lanyrd.com/2013/pycon/scdyzk/
They explain the need to be careful of how you setup Apache.
UPDATE 1
Based on the configuration you added, you are missing:
WSGIProcessGroup website
Your code will not even be running in the daemon process group. So you are at the mercy of whatever MPM you are using and how many processes it is running.
UPDATE 2
Your Directory block is wrong. It is not referring to the directory. Should be:
<Directory /web>
WSGIProcessGroup website
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>
The WSGIScriptReloading directive is not needed as that is the default.
UPDATE 3
Since you are not providing your exact configuration and so we cant now for sure that what you are giving is the same, to absolutely confirm that you are using daemon mode and thus only maximum 2 processes, do the tests at:
http://code.google.com/p/modwsgi/wiki/CheckingYourInstallation#Embedded_Or_Daemon_Mode
http://code.google.com/p/modwsgi/wiki/CheckingYourInstallation#Sub_Interpreter_Being_Used
You want to get 'website' and ''. Meaning daemon mode and main interpreter.
That we know we are actually talking just about the memory usage of the two daemon processes.
I'm using mod_wsgi to serve a django site through Apache. I also have some Python code that runs as a background process (dameon?). It keeps polling a server and inserts data into one of the Django models. This works fine but can I have this code be a part of my Django application and yet able to constantly run in the background? It doesn't need to be a process per se but a art of the Django site that is active constantly. If so, could you point me to an example or some documentation that would help me accomplish this?
Thanks.
You could either set up a cron job that runs some function you have defined, or - the more advanced and probably recommended method, integrate celery in your project (which is quite easy, actually).
You could create a background thread from the WSGI script when it is first being imported.
import threading
import time
def do_stuff():
time.sleep(60)
... do periodic job
_thread = threading.Thread(target=do_stuff)
_thread.setDaemon(True)
_thread.start()
For this to work though you would have to be using only one daemon process otherwise each process would be doing the same thing which you probably do not want.
If you are using multiple process in daemon process group, an alternative is to create a special daemon process group which the only purpose of is to run this background thread. In other words, the process doesn't actually receive any requests.
You can do this by having:
WSGIDaemonProcess django-jobs processes=1 threads=1
WSGIImportScript /usr/local/django/mysite/apache/django.wsgi \
process-group=django-jobs application-group=%{GLOBAL}
The WSGIImportScript directive says to load that script and run it on startup in the context of the process group 'django-jobs'.
To save having multiple scripts, I have pointed it at what would be your original WSGI script file you used for WSGIScriptAlias. We don't want it to run when it is loaded by that directive though, so we do:
import mod_wsgi
if mod_wsgi.process_group == 'django-jobs':
_thread = threading.Thread(target=do_stuff)
_thread.setDaemon(True)
_thread.start()
Here it looks at the name of the daemon process group and only runs when started up within the special daemon process group set up with single process just for this.
Overall you are just using Apache as a big gloried process manager, albeit one which is already known to be robust. It is a bit of overkill as this process will consume additional memory on top of those accepting and handling requests, but depending on the complexity of what you are doing it can still be useful.
One cute aspect of doing this is that since it is still a full Django application in there, you could map specific URLs to just this process and so provide a remote API to manage or monitor the background task and what it is doing.
WSGIDaemonProcess django-jobs processes=1 threads=1
WSGIImportScript /usr/local/django/mysite/apache/django.wsgi \
process-group=django-jobs application-group=%{GLOBAL}
WSGIDaemonProcess django-site processes=4 threads=5
WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi
WSGIProcessGroup django-site
WSGIApplicationGroup %{GLOBAL}
<Location /admin>
WSGIProcessGroup django-jobs
</Location>
Here, all URLs except for stuff under /admin run in 'django-site', with /admin in 'django-jobs'.
Anyway, that addresses the specific question of doing it within the Apache mod_wsgi daemon process as requested.
As pointed out, the alternative is to have a command line script which sets up and loads Django and does the work and execute that from a cron job. A command line script means occasional transient memory usage, but startup cost for job is higher as need to load everything each time.
I previously used a cron job but I telling you, you will switch to celery after a while.
Celery is the way to go. Plus you can tasked long async process so you can speed up the request/response time.
When I update the code on my website I (naturally) restart my apache instance so that the changes will take effect.
Unfortunately the first page served by each apache instance is quite slow while it loads everything into RAM for the first time (5-7 sec for this particular site).
Subsequent requests only take 0.5 - 1.5 seconds so I would like to eliminate this effect for my users.
Is there a better way to get everything loaded into RAM than to do a wget x times (where x is the number of apache instances defined by ServerLimit in my http.conf)
Writing a restart script that restarts apache and runs wget 5 times seems kind of hacky to me.
Thanks!
The default for Apache/mod_wsgi is to only load application code on first request to a process which requires that applications. So, first step is to configure mod_wsgi to preload your code when the process starts and not only the first request. This can be done in mod_wsgi 2.X using the WSGIImportScript directive.
Presuming daemon mode, which is better option anyway, this means you would have something like:
# Define process group.
WSGIDaemonProcess django display-name=%{GROUP}
# Mount application.
WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi
# Ensure application preloaded on process start. Must specify the
# process group and application group (Python interpreter) to use.
WSGIImportScript /usr/local/django/mysite/apache/django.wsgi \
process-group=django application-group=%{GLOBAL}
<Directory /usr/local/django/mysite/apache>
# Ensure application runs in same process group and application
# group as was preloaded into on process start.
WSGIProcessGroup django
WSGIApplicationGroup %{GLOBAL}
Order deny,allow
Allow from all
</Directory>
When you have made a code change, instead of touch the WSGI script file, which is only checked on the next request, send a SIGINT signal to the processes in the daemon process group instead.
With the 'display-name' option to WSGIDaemonProcess you can identify which processes by using BSD style 'ps' program. With 'display-name' set to '%{GROUP}', the 'ps' output should show '(wsgi:django)' as process name. Identify the process ID and do:
kill -SIGINT pid
Swap 'pid' with actual process ID. If more than one process in daemon process group, send signal to all of them.
Not sure if 'killall' can be used to do this in one step. I had problem with doing it on MacOS X.
In mod_wsgi 3.X the configuration can be simpler and can use instead:
# Define process group.
WSGIDaemonProcess django display-name=%{GROUP}
# Mount application and designate which process group and
# application group (Python interpreter) to run it in. As
# process group and application group named, this will have
# side effect of preloading application on process start.
WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi \
process-group=django application-group=%{GLOBAL}
<Directory /usr/local/django/mysite/apache>
Order deny,allow
Allow from all
</Directory>
That is, no need to use separate WSGIImportScript directive as can specific process group and application group as arguments to WSGIScriptAlias instead with side effect that it will preload application.
How are you running Django (mod_python vs mod_wsgi)?
If you're running mod_wsgi (in daemon mode), restarting Apache isn't necessary to reload your application. All you need to do is update the mtime of your wsgi script (which is done easily with touch).
mod_wsgi's documentation has a pretty thorough explanation of the process:
ReloadingSourceCode