Few days ago I found out that my webapp wrote ontop of the tornadoweb framework doesn't stop or restart via upstart. Upstart just hangs and doesn't do anything.
I investigated the issue and found that upstart recieves wrong PID, so it can only run once my webapp daemon and can't do anything else.
Strace shows that my daemon makes 4 (!) clone() calls instead of 2.
Week ago anything was good and webapp was fully and correctly managed by the upstart.
OS is Ubuntu 10.04.03 LTS (as it was weeks ago).
Do you have any ideas how to fix it?
PS: I know about "expect fork|daemon" directive, it changes nothing ;)
Sorry my silence, please.
Investigation of the issue ended with the knowledge about uuid python library which adds 2 forks to my daemon. I get rid of this lib and tornado daemon works now properly.
Alternative answer was supervisord which can run any console tools as a daemon which can't daemonize by itself.
There are two often used solutions
The first one is to let your application honestly report its pid. If you could force your application to write the actual pid into the pidfile then you could get its pid from there.
The second one is a little more complicated. You may add specific environment variable for the script invocation. This environment variable will stay with all the forks if forks don't clear environment and than you can find all of your processes by parsing /proc/*/environ files.
There should be easier solution for finding processes by their environment but I'm not sure.
Related
I am facing the following problem and I am not sure if my approach is anywhere near 'right'.
I've built a Django application that handles students' assignments for a programming subject at university. The original version of this application (https://github.com/elcoya/seal) used a chroot'd daemon to get the code, delivered by the students, place a bash script along-side that code and execute de bash, which could contain any kind of opeartions, like building and testing the students' code. So far... so good. However, running this daemon was a bit of a headache. Since it ran within a jail, the binded /proc, within that jail, became obsolete every time the server was restarted (it was restarted from time to time :( ) or some error occur in the daemon, the process died or was killed, and therefor, stop doing it's job of "correcting" the students' deliveries.
To prevent this errors from happening, and have a more trust worthy automatic correction service, I would like to install a 'django-kronos' task (which runs from the crontab in the server) to do the same job. This would be great, but that would mean that from my Django stack code, I would need to move into the chroot to run the mentioned bash script.
SO suggests this post, but it is from 2012, and it kind of advises against what I am trying to do. Am I missing something here? Is os.chroot(/path/to/jail) the way to go?
You could run your user scripts inside a Docker container. Docker gives you all the benefit of of a jail and much more. For instance, it can restart a container for you if it the host running it were to be rebooted: https://docs.docker.com/engine/admin/start-containers-automatically/
I have written a module in Python and want it to run continuously once started and need to stop it when I need to update other modules. I will likely be using monit to restart it, if module has crashed or is otherwise not running.
I was going through different techniques like Daemon, Upstart and many others.
Which is the best way to go so that I use that approach through out my all new modules to keep running them forever?
From your mention of Upstart I will assume that this question is for a service being run on an Ubuntu server.
On an Ubuntu server an upstart job is really the simplest and most convenient option for creating an always on service that starts up at the right time and can be stopped or reloaded with familiar commands.
To create an upstart service you need to add a single file to /etc/init. Called <service-name>.conf. An example script looks like this:
description "My chat server"
author "your#email-address.com"
start on runlevel [2345]
stop on runlevel [!2345]
env AN_ENVIRONMENTAL_VARIABLE=i-want-to-set
respawn
exec /srv/applications/chat.py
This means that everytime the machine is started it will start the chat.py program. If it dies for whatever reason it will restart it. You don't have to worry about double forking or otherwise daemonizing your code. That's handled for you by upstart.
If you want to stop or start your process you can do so with
service chat start
service chat stop
The name chat is automatically found from the name of the .conf file inside /etc/init
I'm only covering the basics of upstart here. There are lots of other features to make it even more useful. All available by running man upstart.
This method is much more convenient, than writing your own daemonization code. A 4-8 line config file for a built in Ubuntu component is much less error prone than making your code safely double fork and then having another process monitor it to make sure it doesn't go away.
Monit is a bit of a red herring. If you want downtime alerts you will need to run a monitoring program on a separate server anyway. Rely on upstart to keep the process always running on a server. Then have a different service that makes sure the server is actually running. Downtime happens for many different reasons. A process running on the same server will tell you precisely nothing if the server itself goes down. You need a separate machine (or a third party provider like pingdom) to alert you about that condition.
You could check out supervisor. What it is capable of is starting a process at system startup, and then keeping it alive until shutdown.
The simplest configuration file would be:
[program:my_script]
command = /home/foo/bar/venv/bin/python /home/foo/bar/scripts/my_script.py
environment = MY_ENV_VAR=FOO, MY_OTHER_ENV_VAR=BAR
autostart = True
autorestart = True
Then you could link it to /etc/supervisord/conf.d, run sudo supervisorctl to enter management console of supervisor, type in reread so that supervisor notices new config entry and update to display new programs on the status list.
To start/restart/stop a program you could execute sudo supervisorctl start/restart/stop my_script.
I used old-style initscript with start-stop-daemon utility.Look at skel in /etc/init.d
I wrote a simple HTTP server in python to manage a database hosted on a server via a web UI. It is perfectly functional and works as intended. However it has one huge problem, it won't stay put. It will work for an hour or so, but if left unused for long periods of time when returning to use it I have to re-initialize it every time. Right now the method I use to make it serve is:
def main():
global db
db = DB("localhost")
server = HTTPServer(('', 8080), MyHandler)
print 'started httpserver...'
server.serve_forever()
if __name__ == '__main__':
main()
I run this in the background on a linux server so I would run a command like sudo python webserver.py & to detach it, but as I mentioned previously after a while it quits. Any advice is appreciated cause as it stands I don't see why it shuts down.
You can write a UNIX daemon in Python using the python-daemon package, or a Windows service using the pywin32.
Unfortunately, I know of no "portable" solution to writing daemon / service processes (in Python, or otherwise).
Here's one piece of advice in a story about driving. You certainly want to drive safely (figure out why your program is failing and fix it). In the (rare?) case of a crash, some monitoring infrastructure, like monit, can be helpful to restart crashed processes. You probably wouldn't want to use it to paper over a crash just like you wouldn't want to deploy your air bag every time you stopped the car.
Well, first step is to figure out why it's crashing. There's two likely possibilities:
The serve_forever call is throwing an exception.
The python process is crashing/being terminated.
In the former case, you can make it live forever by wrapping it in a loop, with a try-except. Probably a good idea to log the error details.
The latter case is a bit trickier, because it could be caused by a variety of things. Does it happen if you run the script in the foreground? If not, maybe there's some kind of maintenance service running that is terminating your script?
Not really a complete answer, but perhaps enough to help you diagnose the problem.
Have you tried running it from inside a screen session?
$ screen -L sudo python webserver.py
As an alternative to screen there is NoHup which will ensure the process carries on running after your logged out.
Its worth checking the logs to see why its killed/quitting as well as it may not be related to the operating system but an internal fault.
I have hacked together an rc script for celeryd on FreeBSD, but I can't help but think that there must be a better way. celeryd does not daemonize itself, and it seems to have a hard time responding to sigterm as well, so it might be complicated to get to work.
Is this a problem that someone else has solved before?
There's an experimental init.d script here:
https://github.com/ask/celery/tree/master/contrib/generic-init.d/
I don't know if it has been tested on FreeBSD, but it should definitely be made
to work there.
What do you mean celeryd isn't responding to TERM? This is the recommended signal
to use for a clean shutdown as it will finish any currently running tasks.
(there's no time out, so it doesn't help if you have a task in deadlock, for that you may
use the --time-limit argument)
Here's the /etc/default/celeryd file I use (it's for a Django project, for others just replace manage.py celeryd with celeryd):
http://pastie.org/1216111
celerybeat/celeryevcam is using the scripts from contrib/debian/init.d, there are no generic versions of these yet.
I'm running Django on Linux using fcgi and Lighttpd. Every now and again (about once a day) the server just dies. I'm using the latest stable release of Django, Python and Lighttpd.
The only thing I can think of is that my program is opening a lot of files and executing a lot of external processes, but I'm fairly sure that side of things is watertight.
Looking at the error and access logs, there's nothing exceptional happening (i.e. load isn't above normal). On those occasions where I have had exceptions from Python, these have shown up in the error.log, but when this crash happens I get nothing.
Is there any way of finding out why the process died? Short of putting logging statements on every single line? Obviously I can't reproduce this so I don't know exactly where to look.
Edit
It's the django process that's dying. I'm running the server with manage.py runfcgi daemonize=true method=threaded host=127.0.0.1 port=12345
You could edit manage.py to redirect stderr to a file, assuming runfcgi doesn't do that itself:
import sys
if sys.argv[1] == "runfcgi":
sys.stderr = open("/path/to/my/django-error.log", "a")
Is this on your server? (do you own the box?). I've had that problem on shared hosting, and the host was just killing long processes. Do you know if your fcgi is receiving a SIGTERM?
Have had the same problems. Not only do they die without warning or reason they leak like crazy too with threads being stuck without a master process. We solved this problem by having a cronjob run every 5 minutes that checks if the port number is up and running and if not restart.
By the way, we've now (slowly migrating) given up on fcgi and moved over to uwsgi.