mod_wsgi-express randomly restarting/crashing

mod_wsgi-express randomly restarting/crashing - python

I'm using mod_wsgi-express to serve a simple Flask app on a CentOS 6 server. The app is currently used pretty infrequently.
I'm having an issue where every few days, the app will crash, either serving up 500 errors or not responding at all. This doesn't seem related to load or use.
Looking through the error log, I'm seeing occasional "[notice] caught SIGTERM, shutting down" messages, after which it appears mod_wsgi-express restarts. I don't know why this is happening.
I'm also getting this message:
[error] (2)No such file or directory: mod_wsgi (pid=24550, process='localhost:8080', application=''): Call to fopen() failed ff
or '/tmp/mod_wsgi-localhost:8080:14699/handler.wsgi'.
I've tried running mod_wsgi-express with both gdb and pdb debuggers enabled and haven't seen any bugs related to the actual execution of the Python script.
I also got a
[info] [client 137.78.237.31] (104)Connection reset by peer: core_output_filter: writing data to the network
which kicked off hundreds of
[notice] child pid 15952 exit signal Segmentation fault (11)
over the course of a couple days.
These messages all seem to happen randomly and independent of each other. The only consistent thing is the SIGTERM signal happening every so often.
I'm confused as to what's going on. The implementation seems pretty simple - it's a simple tool that validates metadata in an uploaded file.

For the 500 errors, use the --server-root option to specify a directory other than in the default of /tmp for its working directory.
The default should only be used for short runs and --server-root used to specify a more permanent home if running it for longer periods.
This is necessary as some operating systems can be set up to remove files from /tmp when they haven't been modified for some period. I have been contemplating adding a warning in the startup output specifically to suggest use of --server-root if the default is being relied upon.
For the segmentation fault issue, please come over the mod_wsgi mailing list to discuss further. StackOverflow is not a discussion forum and that issue will likely need more investigation and discussion to understand what your application is doing.
UPDATE 1
Discussion continuing at:
https://groups.google.com/forum/#!topic/modwsgi/NmyOtp-9Pmg

Related

How to setup WSGI server to run similarly to Apache?

I'm coming from PHP/Apache world where running an application is super easy. Whenever PHP application crashes Apache process running that request will stop but server will be still ruining happily and respond to other clients. Is there a way to have Python application work in a smilar way. How would I setup wsgi server like Tornado or CherryPy so it will work similarly? also, how would I run several applications from one server with different domains?

What you are after would possibly happen anyway for WSGI severs. This is because any Python exception only affects the current request and the framework or WSGI server would catch the exception, log it and translate it to a HTTP 500 status page. The application would still be in memory and would continue to handle future requests.
What we get down to is what exactly you mean by 'crashes Apache process'.
It would be rare for your code to crash, as in cause the process to completely exit due to a core dump, the whole process. So are you being confused in your terminology in equating an application language level error to a full process crash.
Even if you did find a way to crash a process, Apache/mod_wsgi handles that okay and the process will be replaced. The Gunicorn WSGI server will also do that. CherryPy will not unless you have a process manager running which monitors it and the process monitor restarts it. Tornado in its single process mode will have the same problem. Using Tornado as the worker in Gunicorn is one way around that plus I believe Tornado itself may have some process manager in it now for running multiple process which allow it to restart processes if they die.
Do note that if your application bug which caused the Python exception is bad enough and it corrupts state within the process, subsequent requests may possibly have issues. This is the one difference with PHP. With PHP, after any request, whether successful or not, the application is effectively thrown away and doesn't persist. So buggy code cannot affect subsequent requests. In Python, because the process with loaded code and retained state is kept between requests, then technically you could get things in a state where you would have to restart the process to fix it. I don't know of any WSGI server though that has a mechanism to automatically restart a process if one request returned an error response.

If you're in an UNIX-like environment, you can run mod_wsgi under Apache in Daemon Mode. This means there will be a separate process for the Python code, and even if it crashes the server will continue running normally (and hopefully the WSGI process will restart itself). A WSGI application can run under multiple processes and multiple threads per process.
As for running multiple domains in the same server, check Name-Based Virtual Hosts.

Tornadoweb webapp cannot be managed via upstart

Few days ago I found out that my webapp wrote ontop of the tornadoweb framework doesn't stop or restart via upstart. Upstart just hangs and doesn't do anything.
I investigated the issue and found that upstart recieves wrong PID, so it can only run once my webapp daemon and can't do anything else.
Strace shows that my daemon makes 4 (!) clone() calls instead of 2.
Week ago anything was good and webapp was fully and correctly managed by the upstart.
OS is Ubuntu 10.04.03 LTS (as it was weeks ago).
Do you have any ideas how to fix it?
PS: I know about "expect fork|daemon" directive, it changes nothing ;)

Sorry my silence, please.
Investigation of the issue ended with the knowledge about uuid python library which adds 2 forks to my daemon. I get rid of this lib and tornado daemon works now properly.
Alternative answer was supervisord which can run any console tools as a daemon which can't daemonize by itself.

There are two often used solutions
The first one is to let your application honestly report its pid. If you could force your application to write the actual pid into the pidfile then you could get its pid from there.
The second one is a little more complicated. You may add specific environment variable for the script invocation. This environment variable will stay with all the forks if forks don't clear environment and than you can find all of your processes by parsing /proc/*/environ files.
There should be easier solution for finding processes by their environment but I'm not sure.

How do I keep a python HTTP Server up forever?

I wrote a simple HTTP server in python to manage a database hosted on a server via a web UI. It is perfectly functional and works as intended. However it has one huge problem, it won't stay put. It will work for an hour or so, but if left unused for long periods of time when returning to use it I have to re-initialize it every time. Right now the method I use to make it serve is:
def main():
global db
db = DB("localhost")
server = HTTPServer(('', 8080), MyHandler)
print 'started httpserver...'
server.serve_forever()
if __name__ == '__main__':
main()
I run this in the background on a linux server so I would run a command like sudo python webserver.py & to detach it, but as I mentioned previously after a while it quits. Any advice is appreciated cause as it stands I don't see why it shuts down.

You can write a UNIX daemon in Python using the python-daemon package, or a Windows service using the pywin32.
Unfortunately, I know of no "portable" solution to writing daemon / service processes (in Python, or otherwise).

Here's one piece of advice in a story about driving. You certainly want to drive safely (figure out why your program is failing and fix it). In the (rare?) case of a crash, some monitoring infrastructure, like monit, can be helpful to restart crashed processes. You probably wouldn't want to use it to paper over a crash just like you wouldn't want to deploy your air bag every time you stopped the car.

Well, first step is to figure out why it's crashing. There's two likely possibilities:
The serve_forever call is throwing an exception.
The python process is crashing/being terminated.
In the former case, you can make it live forever by wrapping it in a loop, with a try-except. Probably a good idea to log the error details.
The latter case is a bit trickier, because it could be caused by a variety of things. Does it happen if you run the script in the foreground? If not, maybe there's some kind of maintenance service running that is terminating your script?
Not really a complete answer, but perhaps enough to help you diagnose the problem.

Have you tried running it from inside a screen session?
$ screen -L sudo python webserver.py

As an alternative to screen there is NoHup which will ensure the process carries on running after your logged out.
Its worth checking the logs to see why its killed/quitting as well as it may not be related to the operating system but an internal fault.

CoreDumpDirectory isn't working on ubuntu; getting segmentation fault in apache2 error log

I am not able to log the apache2 crashes in CoreDumpDirectory on ubuntu 10.10. I am using Django 1.2.3 and apache2 with mod_wsgi. I followed the steps listed in response to this question but to no avail. I added - CoreDumpDirectory /var/cache/apache2/
at the end of apache2.conf file and then after executing
'ulimit -c unlimited', restarted the apache server. Then I replicated the condition that causes apache error log to show- "child pid 27288 exit signal Segmentation fault (11)" but there is no mention of apache2 logging that crash in CoreDumpDirectory and also there is nothing in /var/cache/apache2.

I also ran into this problem of mod_wsgi children not dumping core. Long story short: You need to edit /etc/sysctl.conf and set fs.suid_dumpable=2.
Long story:
Linux prevents dumping core for processes that started as root and then dropped privileges. (This is a security feature, so SUID executables don't leak their memory to the user). Setting suid_dumpable=2 means that core files will be owned by root, so there's no direct security problem there either.
Why this affects mod_wsgi? Apparently mod_wsgi's child processes are forked off from Apache's main process. Apache usually starts up as root, since it needs to bind privileged port numbers like 80, and then drops privileges.
(Original bug report: https://code.google.com/p/modwsgi/issues/detail?id=247)

I was able to solve this problem. The issue was with PyLucene environment being initialized on the run time. I was executing initvm() call everytime a request comes and it was causing segmentation fault. This link directed that I should do it in .wsgi file and after I did that there were no segmentation faults.

Why would Django fcgi just die? How can I find out?

I'm running Django on Linux using fcgi and Lighttpd. Every now and again (about once a day) the server just dies. I'm using the latest stable release of Django, Python and Lighttpd.
The only thing I can think of is that my program is opening a lot of files and executing a lot of external processes, but I'm fairly sure that side of things is watertight.
Looking at the error and access logs, there's nothing exceptional happening (i.e. load isn't above normal). On those occasions where I have had exceptions from Python, these have shown up in the error.log, but when this crash happens I get nothing.
Is there any way of finding out why the process died? Short of putting logging statements on every single line? Obviously I can't reproduce this so I don't know exactly where to look.
Edit
It's the django process that's dying. I'm running the server with manage.py runfcgi daemonize=true method=threaded host=127.0.0.1 port=12345

You could edit manage.py to redirect stderr to a file, assuming runfcgi doesn't do that itself:
import sys
if sys.argv[1] == "runfcgi":
sys.stderr = open("/path/to/my/django-error.log", "a")

Is this on your server? (do you own the box?). I've had that problem on shared hosting, and the host was just killing long processes. Do you know if your fcgi is receiving a SIGTERM?

Have had the same problems. Not only do they die without warning or reason they leak like crazy too with threads being stuck without a master process. We solved this problem by having a cronjob run every 5 minutes that checks if the port number is up and running and if not restart.
By the way, we've now (slowly migrating) given up on fcgi and moved over to uwsgi.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.