What is the recommended way to start long running (bash) scripts on several remote servers via fabric, so that you can later re-attach to the process for checking the status of the process, eventually sigterm it and get the exit code?
EDIT (10-Nov-2012):
In the mean-time I found a question going into the same direction: HOW TO use fabric use with dtach,screen,is there some example
It seems that the preferred way would be to use screen or tmux.
http://www.fabfile.org/faq.html#why-can-t-i-run-programs-in-the-background-with-it-makes-fabric-hang
Related
Updated post:
I have a python web application running on a port. It is used to monitor some other processes and one of its features is to allow users to restart his own processes. The restart is done through invoking a bash script, which will proceed to restart those processes and run them in the background.
The problem is, whenever I kill off the python web application after I have used it to restart any user's processes, those processes will take take over the port used by the python web application in a round-robin fashion, so I am unable to restart the python web application due to the port being bounded. As a result, I must kill off the processes involved in the restart until nothing occupies the port the python web application uses.
Everything is ok except for those processes occupying the port. That is really undesirable.
Processes that may be restarted:
redis-server
newrelic-admin run-program (which spawns another web application)
a python worker process
UPDATE (6 June 2013): I have managed to solve this problem. Look at my answer below.
Original Post:
I have a python web application running on a port. This python program has a function that calls a bash script. The bash script spawns a few background processes, then exits.
The problem is, whenever I kill the python program, the background processes spawned by the bash script will take over and occupy that same port.
Specifically the subprocesses are:
a redis server (with daemonize = true in the configuration file)
newrelic-admin run-program (spawns a web application)
a python worker process
Update 2: I've tried running these with nohup. Only the python worker process doesnt attempt to take over the port after I kill the python web application. The redis server and newrelic-admin still do.
I observed this problem when I was using subprocess.call in the python program to run the bash script. I've tried a double fork method in the python program before running the bash script, but it results in the same problem.
How can I prevent any processes spawned from the bash script from taking over the port?
Thank you.
Update: My intention is that, those processes spawned by the bash script should continue running if the python application is killed off. Currently, they do continue running after I kill off the python application. The problem is, when I kill off the python application, the processes spawned by the bash script start to take over the port in a round-robin fashion.
Update 3: Based on the output I see from 'pstree' and 'ps -axf', processes 1 and 2 (the redis server and the web app spawned by newrelic-admin run-program) are not child processes of the python web application. This makes it even weirder that they take over the port that the python web application occupies when I kill it... Anyone knows why?
Just some background on the methods I've tried to solve my above problem, before I go on to the answer proper:
subprocess.call
subprocess.Popen
execve
the double fork method along with one of the above (http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/)
By the way, none of the above worked for me. Whenever I killed off the web application that executes the bash script (which in turns spawns some background processes we shall denote as Q now), the processes in Q will in a round-robin fashion, take over the port occupied by the web application, so I had to kill them one by one before I could restart my web application.
After many days of living with this problem and moving on to other parts of my project, I thought of some random Stack Overflow posts and other articles on the Internet and from my own experience, recalled my experience of ssh'ing into a remote and starting a detached screen session, then logging out, and logging in again some time later to discover the screen session still alive.
So I thought, hey, what the heck, nothing works so far, so I might as well try using screen to see if it can solve my problem. And to my great surprise and joy it does! So I am posting this solution hopefully to help those who are facing the same issue.
In the bash script, I simply started the processes using a named screen process. For instance, for the redis application, I might start it like this:
screen -dmS redisScreenName redis-server redis.conf
So those processes will keep running on those detached screen sessions they were started with. In this case, I did not daemonize the redis process.
To kill the screen process, I used:
screen -S redisScreenName -X quit
However, this does not kill the redis-server. So I had to kill it separately.
Now, in the python web application, I can just use subprocess.call to execute the bash script, which will spawn detached screen sessions (using 'screen -dmS') which run the processes I want to spawn. And when I kill off the python web application, none of the spawned processes take over its port. Everything works smoothly.
I'm developing a web application using python and memcached.
First I want to start memcached from my python program, and if I'm right subprocess.Popen(["memcached"]); will do it. What really bothers me is how to check if memcached is already running in the background.
I want something like
if check_memcached():
start_memcached();
The check_memcached() function should return true if memcached is not already running. Anybody got any idea how to do this?
The best way to check may depend on your platform. Here's some methods you could try:
Try to connect to the memcached server. If a connection fails, assume that the server is not running.
If you expect memcached to be running locally, look for a PID file. On Debian this is usually located at /var/run/memcached.pid and contains the process id of the running memcached daemon. To be extra sure you could read the PID contained in the file and verify that a process with this PID is still running.
Use the subprocess module to run a command which can directly check for a locally running daemon. eg. Run the command pidof memcached using subprocess and check the return code.
I've been building a performance test suite to exercise a server. Right now I run this by hand but I want to automate it. On the target server I run a python script that logs server metrics and halts when I hit enter. On the tester machine I run a bash script that iterates over JMeter tests, setting timestamps and naming logs and executing the tests.
I want to tie these together so that the bash script drives the whole process, but I am not sure how best to do this. I can start my python script via ssh, but how to halt it when a test is done? If I can do it in ssh then I don't need to mess with existing configuration and that is a big win. The python script is quite simple and I don't mind rewriting it if that helps.
The easiest solution is probably to make the Python script respond to signals. Of course, you can just SIGKILL the script if it doesn't require any cleanup, but having the script actually handle a shutdown request seems cleaner. SIGHUP might be a popular choice. Docs here.
You can send a signal with the kill command so there is no problem sending the signal through ssh, provided you know the pid of the script. The usual solution to this problem is to put the pid in a file in /var/run when you start the script up. (If you've got a Debian/Ubuntu system, you'll probably find that you have the start-stop-daemon utility, which will do a lot of the grunt work here.)
Another approach, which is a bit more code-intensive, is to create a fifo (named pipe) in some known location, and use it basically like you are currently using stdin: the server waits for input from the pipe, and when it gets something it recognizes as a command, it executes the command ("quit", for example). That might be overkill for your purpose, but it has the advantage of being a more articulated communications channel than a single hammer-hit.
I need to use fabfile to remotely start some program in remote boxes from time to time, and get the results. Since the program takes a long while to finish, I wish to make it run in background and so I dont need to wait. So I tried os.fork() to make it work. The problem is that when I ssh to the remote box, and run the program with os.fork() there, the program can work in background fine, but when I tried to use fabfile's run, sudo to start the program remotely, os.fork() cannot work, the program just die silently. So I switched to Python-daemon to daemonalize the program. For a great while, it worked perfectly. But now when I started to make my program to read some Python shelve dicts, python-daemon cannot work any longer. Seems like if you use python-daemon, the shelve dicts cannot be loaded correctly, which I dont know why. Anyone has an idea besides os.fork() and Python-daemon, what else can I try to solve my problem?
If I understand your question right, I think you're making this far too complicated. os.fork() is for multiprocessing, not for running a program in the background.
Let's say for the sake of discussion that you wanted to run program.sh and collect what it sends to standard output. To do this with fabric, create locally:
fabfile.py:
from fabric.api import run
def runmyprogram():
run('./program.sh > output 2> /dev/null < /dev/null &')
Then, locally, run:
fab -H remotebox runmyprogram
The program will execute remotely, but fabric will not wait for it to finish. You'll need to harvest the output files later, perhaps using scp. The "&" makes this run in the background on the remote machine, and output redirection is necessary to avoid a hung fabric session.
If you don't need to use fabric, there are easier ways of doing this. You can ssh individually and run
nohup ./program.sh > output &
then come back later to check output.
If this is something that you'll do on a regular basis, this might be the better option, since you can just set up a cron job to run every so often, and then collect the output whenever you want.
If you'd rather not harvest the output files later, you can use:
fabfile.py:
from fabric.api import run
def runmyprogram():
run('./program.sh')
Then, on your local machine:
fab -H remotebox runmyprogram > output &
The jobs will run remotely, and put all their output back into the local output file. This runs in the background on your local machine, so you can do other things. However, if the connection between your local and remote machines might be interrupted, it's better to use the first approach so the output is always safely stored on the remote machines.
For those who came across this post in the future. Python-daemon can still work. It is just that be sure to load the shelve dicts within the same process. So previously the shelve dicts is loaded in parent process, when python-daemon spawns a child process, the dict handler is not passed correctly. When we fix this, everything works again.
Thanks for those suggesting valuable comments on this thread!
I have a python script and am wondering is there any way that I can ensure that the script run's continuously on a remote computer? Like for example, if the script crashes for whatever reason, is there a way to start it up automatically instead of having to remote desktop. Are there any other factors I have to be aware of? The script will be running on a window's machine.
Many ways - In the case of windows, even a simple looping batch file would probably do - just have it start the script in a loop (whenever it crashes it would return to the shell and be restarted).
Maybe you can use XMLRPC to call functions and pass data. Some time ago I did something like that you ask by using the SimpleXMLRPCServer and xmlrpc.client. You have examples of simple configurations in the docs.
Depends on what you mean by "crash". If it's just exceptions and stuff, you can catch everything and restart your process within itself. If it's more, then one possibility though is to run it as a daemon spawned from a separate python process that acts as a supervisor. I'd recommend supervisord but that's UNIX only. You can clone a subset of the functionality though.