Starting and stopping celery processes in upstart with a python wrapper script - python

So we have an application that has celery workers. We start those workers using an upstart file /etc/init/fact-celery.conf that looks like the following:
description "FaCT Celery Worker."
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
setuid fact
setgid fact
script
[ -r /etc/default/fact ] && . /etc/default/fact
if [ "$START_CELERY" != "yes" ]; then
echo "Service disabled in '/etc/default/fact'. Not starting."
exit 1
fi
ARGUMENTS=""
if [ "$BEAT_SERVICE" = "yes" ]; then
ARGUMENTS="--beat"
fi
/usr/bin/fact celery worker --loglevel=INFO --events --schedule=/var/lib/fact/celerybeat-schedule --queues=$CELERY_QUEUES $ARGUMENTS
end script
It calls a python wrapper script that looks like the following:
#!/bin/bash
WHOAMI=$(whoami)
PYTHONPATH=/usr/share/fact
PYTHON_BIN=/opt/fact-virtual-environment/bin/python
DJANGO_SETTINGS_MODULE=fact.settings.staging
if [ ${WHOAMI} != "fact" ];
then
sudo -u fact $0 $*;
else
# Python needs access to the CWD, but we need to deal with apparmor restrictions
pushd $PYTHONPATH &> /dev/null
PYTHONPATH=${PYTHONPATH} DJANGO_SETTINGS_MODULE=${DJANGO_SETTINGS_MODULE} ${PYTHON_BIN} -m fact.managecommand $*;
popd &> /dev/null
fi
The trouble with this setup is that when we stop the service, we get left over pact-celery workers that don't die. For some reason upstart can't track the forked processes. I've read in some similar posts that upstart can't track more than two forks.
I've tried using expect fork but then upstart just hangs whenever I try to start or stop the service.
Other posts I've found on this say to call the python process directly instead of using the wrapper script, but we've already built apparmor profiles around these scripts and there are other things in our workflow that are pretty dependent on them.
Is there any way, with the current wrapper scripts, to handle killing off all the celery workers on a service stop?

There is some discussion about this in the Workers Guide, but basically the usual process is to send a TERM signal to the worker, which will cause it to wait for all the currently running tasks to finish before exiting clean.
Alternatively, you can send the KILL signal if you want it to stop immediately with potential data loss, but then as you said celery isn't able to intercept the signal and cleanup the children in that case. The only recourse that is mentioned is to manually clean up the children like this:
$ ps auxww | grep 'celery worker' | awk '{print $2}' | xargs kill -9

Related

Run pkill command in bash script. Script dead [duplicate]

I'm writing a stop routine for a start-up service script:
do_stop()
{
rm -f $PIDFILE
pkill -f $DAEMON || return 1
return 0
}
The problem is that pkill (same with killall) also matches the process representing the script itself and it basically terminates itself. How to fix that?
You can explicitly filter out the current PID from the results:
kill $(pgrep -f $DAEMON | grep -v ^$$\$)
To correctly use the -f flag, be sure to supply the whole path to the daemon rather than just a substring. That will prevent you from killing the script (and eliminate the need for the above grep) and also from killing all other system processes that happen to share the daemon's name.
pkill -f accepts a full blown regex. So rather than pkill -f $DAEMON you should use:
pkill -f "^"$DAEMON
To make sure only if process name starts with the given daemon name then only it is killed.
A better solution will be to save pid (Proces Id) of the process in a file when you start the process. And for the stopping the process just read the file to get the process id to be stopped/killed.
Judging by your question, you're not hard over on using pgrep and pkill, so here are some other options commonly used.
1) Use killproc from /etc/init.d/functions or /lib/lsb/init-functions (which ever is appropriate for your distribution and version of linux). If you're writing a service script, you may already be including this file if you used one of the other services as an example.
Usage: killproc [-p pidfile] [ -d delay] {program} [-signal]
The main advantage to using this is that it sends SIGTERM, waits to see if the process terminates and sends SIGKILL only if necessary.
2) You can also use the secret sauce of killproc, which is to find the process ids to kill using pidof which has a -o option for excluding a particular process. The argument for -o could be $$, the current process id, or %PPID, which is a special variable that pidof interprets as the script calling pidof. Finally if the daemon is a script, you'll need the -x so your trying to kill the script by it's name rather than killing bash or python.
for pid in $(pidof -o %PPID -x progd); do
kill -TERM $pid
done
You can see an example of this in the article Bash: How to check if your script is already running

How to tag to a python command run in a bash shell

Is it possible to tag a python program run from the command line?
Context: Said command will be run with nohup in the background, and will be killed and restarted at midnight via cron. My intention is to pipe ps into egrep for said tag, grab the pid, and kill -9 before restarting.
minimal, complete, and verifiable example
Start a python web server:
$ nohup python -m http.server 8888 &
Add a tag to the command. Note that -tag is just my imagination at work.. this is what I want:
$ nohup python -m http.server 8888 & -tag "ced72ca0-cd19-11ea-87d0-0242ac130003"
grep for tag:
$ ps aux | egrep "ced72ca0-cd19-11ea-87d0-0242ac130003"
...grab the pid from this, and kill -9
Because you're saying that you want to kill the processes through isolated cron jobs at nidnight, I guess that the $! based solutions in the linked questions (like (How to get the process ID to kill a nohup process?)) are no option for you.
In order to identity your HTTP server processes, your idea is to 'tag' them with a unique ID so the cron jobs will find them.
What you could do in your specific case is to make use of the fact that the listening TCP sockets are unique on your given machine, and retrieve the associated pid through netstat.
A bash script along the lines of:
#!/bin/bash
port=${1:-"8888"}
IP=${2:-"0.0.0.0"}
pid=`netstat -antp 2>/dev/null | grep -E "^(\S+\s+){3}$IP:$port\s+\S+\s+LISTEN" | sed -E 's/ˆ(\S+\s+){6}([0-9]+).*$/\2/'`
[[ -n "$pid" ]] && kill -TERM $pid
... that you parameterize with IP and port through your cronjob.
You can put code in file with name ced72ca0-cd19-11ea-87d0-0242ac130003,
#!/bin/bash
python -m http.server
set it executable
chmod +x ced72ca0-cd19-11ea-87d0-0242ac130003
run it
nohup ced72ca0-cd19-11ea-87d0-0242ac130003 &
and then you can kill it
pkill ced72ca0-cd19-11ea-87d0-0242ac130003
or even using only beginning of filename
pkill ced
EDIT:
Because new script doesn't get any arguments so you can run it with any argument(s) - ie. some tag/word
nohup ced72ca0-cd19-11ea-87d0-0242ac130003 hello_world &
and then you can kill it using -f
pkill -f hello_world
or even using part of word
pkill -f hello
pkill -f world
This way you can even use normal name for script and add tag
nohup my_script ced72ca0-cd19-11ea-87d0-0242ac130003 &
and kill with -f
pkill -f ced72ca0-cd19-11ea-87d0-0242ac130003
or using only part of word
pkill -f ced

Starting/stopping a background Python process wtihout nohup + ps aux grep + kill

I usually use:
nohup python -u myscript.py &> ./mylog.log & # or should I use nohup 2>&1 ? I never remember
to start a background Python process that I'd like to continue running even if I log out, and:
ps aux |grep python
# check for the relevant PID
kill <relevantPID>
It works but it's a annoying to do all these steps.
I've read some methods in which you need to save the PID in some file, but that's even more hassle.
Is there a clean method to easily start / stop a Python script? like:
startpy myscript.py # will automatically continue running in
# background even if I log out
# two days later, even if I logged out / logged in again the meantime
stoppy myscript.py
Or could this long part nohup python -u myscript.py &> ./mylog.log & be written in the shebang of the script, such that I could start the script easily with ./myscript.py instead of writing the long nohup line?
Note : I'm looking for a one or two line solution, I don't want to have to write a dedicated systemd service for this operation.
As far as I know, there are just two (or maybe three or maybe four?) solutions to the problem of running background scripts on remote systems.
1) nohup
nohup python -u myscript.py > ./mylog.log 2>&1 &
1 bis) disown
Same as above, slightly different because it actually remove the program to the shell job lists, preventing the SIGHUP to be sent.
2) screen (or tmux as suggested by neared)
Here you will find a starting point for screen.
See this post for a great explanation of how background processes works. Another related post.
3) Bash
Another solution is to write two bash functions that do the job:
mynohup () {
[[ "$1" = "" ]] && echo "usage: mynohup python_script" && return 0
nohup python -u "$1" > "${1%.*}.log" 2>&1 < /dev/null &
}
mykill() {
ps -ef | grep "$1" | grep -v grep | awk '{print $2}' | xargs kill
echo "process "$1" killed"
}
Just put the above functions in your ~/.bashrc or ~/.bash_profile and use them as normal bash commands.
Now you can do exactly what you told:
mynohup myscript.py # will automatically continue running in
# background even if I log out
# two days later, even if I logged out / logged in again the meantime
mykill myscript.py
4) Daemon
This daemon module is very useful:
python myscript.py start
python myscript.py stop
Do you mean log in and out remotely (e.g. via SSH)? If so, a simple solution is to install tmux (terminal multiplexer). It creates a server for terminals that run underneath it as clients. You open up tmux with tmux, type in your command, type in CONTROL+B+D to 'detach' from tmux, and then type exit at the main terminal to log out. When you log back in, tmux and the processes running in it will still be running.

Best way to manage docker containers with supervisord

I have to setup "dockerized" environments (integration, qa and production) on the same server (client's requirement). Each environment will be composed as follow:
rabbitmq
celery
flower
python 3 based application called "A" (specific branch per
environment)
Over them, jenkins will handle the deployment based on CI.
Using set of containers per environment sounds like the best approach.
But now I need, process manager to run and supervise all of them:
3 rabbit containers,
3 celery/flower containers,
3 "A" containers,
1 jenkins containers.
Supervisord seem to be the best choice, but during my tests, i'm not able to "properly" restart a container. Here a snippet of the supervisord.conf
[program:docker-rabbit]
command=/usr/bin/docker run -p 5672:5672 -p 15672:15672 tutum/rabbitmq
startsecs=20
autorestart=unexpected
exitcodes=0,1
stopsignal=KILL
So I wonder what is the best way to separate each environment and be able to manage and supervise each service (a container).
[EDIT My solution inspired by Thomas response]
each container is run by a .sh script that looking like
rabbit-integration.py
#!/bin/bash
#set -x
SERVICE="rabbitmq"
SH_S = "/path/to_shs"
export MY_ENV="integration"
. $SH_S/env_.sh
. $SH_S/utils.sh
SERVICE_ENV=$SERVICE-$MY_ENV
ID_FILE=/tmp/$SERVICE_ENV.name # pid file
trap stop SIGHUP SIGINT SIGTERM # trap signal for calling the stop function
run_rabbitmq
$SH_S/env_.sh is looking like:
# set env variable
...
case $MONARCH_ENV in
$INTEGRATION)
AMQP_PORT="5672"
AMQP_IP="172.17.42.1"
...
;;
$PREPRODUCTION)
AMQP_PORT="5673"
AMQP_IP="172.17.42.1"
...
;;
$PRODUCTION)
AMQP_PORT="5674"
REDIS_IP="172.17.42.1"
...
esac
$SH_S/utils.sh is looking like:
#!/bin/bash
function random_name(){
echo "$SERVICE_ENV-$(cat /proc/sys/kernel/random/uuid)"
}
function stop (){
echo "stopping docker container..."
/usr/bin/docker stop `cat $ID_FILE`
}
function run_rabbitmq (){
# do no daemonize and use stdout
NAME="$(random_name)"
echo $NAME > $ID_FILE
/usr/bin/docker run -i --name "$NAME" -p $AMQP_IP:$AMQP_PORT:5672 -p $AMQP_ADMIN_PORT:15672 -e RABBITMQ_PASS="$AMQP_PASSWORD" myimage-rabbitmq &
PID=$!
wait $PID
}
At least myconfig.intergration.conf is looking like:
[program:rabbit-integration]
command=/path/sh_s/rabbit-integration.sh
startsecs=20
priority=90
autorestart=unexpected
exitcodes=0,1
stopsignal=TERM
In the case i want use the same container the startup function is looking like:
function _run_my_container () {
NAME="my_container"
/usr/bin/docker start -i $NAME &
PID=$!
wait $PID
rc=$?
if [[ $rc != 0 ]]; then
_run_my_container
fi
}
where
function _run_my_container (){
/usr/bin/docker run -p{} -v{} --name "$NAME" myimage &
PID=$!
wait $PID
}
Supervisor requires that the processes it manages do not daemonize, as per its documentation:
Programs meant to be run under supervisor should not daemonize
themselves. Instead, they should run in the foreground. They should
not detach from the terminal from which they are started.
This is largely incompatible with Docker, where the containers are subprocesses of the Docker process itself (i.e. and hence are not subprocesses of Supervisor).
To be able to use Docker with Supervisor, you could write an equivalent of the pidproxy program that works with Docker.
But really, the two tools aren't really architected to work together, so you should consider changing one or the other:
Consider replacing Supervisor with Docker Compose (which is designed to work with Docker)
Consider replacing Docker with Rocket (which doesn't have a "master" process)
You need to make sure you use stopsignal=INT in your supervisor config, then exec docker run normally.
[program:foo]
stopsignal=INT
command=docker -rm run whatever
At least this seems to work for me with docker version 1.9.1.
If you run docker from inside a shell script, it is very important that you have exec in front of the docker run command, so that docker run replaces the shell process and thus receives the SIGINT directly from supervisord.
You can have Docker just not detach and then things work fine. We manage our Docker containers in this way through supervisor. Docker compose is great, but if you're already using Supervisor to manage non-docker things as well, it's nice to keep using it to have all your management in one place. We'll wrap our docker run in a bash script like the following and have supervisor track that, and everything works fine:
#!/bin/bash¬
TO_STOP=docker ps | grep $SERVICE_NAME | awk '{ print $1 }'¬
if [$TO_STOP != '']; then¬
docker stop $SERVICE_NAME¬
fi¬
TO_REMOVE=docker ps -a | grep $SERVICE_NAME | awk '{ print $1 }'¬
if [$TO_REMOVE != '']; then¬
docker rm $SERVICE_NAME¬
fi¬
¬
docker run -a stdout -a stderr --name="$SERVICE_NAME" \
--rm $DOCKER_IMAGE:$DOCKER_TAG
I found that executing docker run via supervisor actually works just fine, with a few precautions. The main thing one needs to avoid is allowing supervisord to send a SIGKILL to the docker run process, which will kill off that process but not the container itself.
For the most part, this can be handled by following the instructions in Why Your Dockerized Application Isn’t Receiving Signals. In short, one needs to:
Use the CMD ["/path/to/myapp"] form (same for ENTRYPOINT) instead of the shell form (CMD /path/to/myapp).
Pass --init to docker run.
If using an ENTRYPOINT, ensure its last line calls exec, so as to avoid spawning a new process.
If the above still isn't working, add a STOPSIGNAL to your Dockerfile.
Additionally, you'll want to make sure that your stopwaitsecs setting in supervisor is greater than the time your process might take to shutdown gracefully when it receives a SIGTERM (e.g., graceful_timeout if using gunicorn).
Here's a sample config to run a gunicorn container:
[program:gunicorn]
command=/usr/bin/docker run --init --rm -i -p 8000:8000 gunicorn
redirect_stderr=true
stopwaitsecs=31

Bash: running background job and getting pid

$1 &
echo $!
is there a different way to launch a command in the background and return the pid immediately?
So when I launch bash run.sh "python worker.py" it will give me the pid of the launched job.
I am using paramiko, a python library which doesn't work with python worker.py &. so I want to create a bash script which will do this for me on the remote server.
Since you're using bash, you can just get the list of background processes from jobs, and instruct it to return the PID via the -l flag. To quote man bash:
jobs [-lnprs] [ jobspec ... ]
jobs -x command [ args ... ]
The first form lists the active jobs. The options have the
following meanings:
-l List process IDs in addition to the normal information.
So in your case, something like
jobs -l | grep 'worker.py' | awk '{print $2}' would probably give you what you want.

Categories

Resources