I am having some docker container which listens on RabbitMQ and process the message received. I have a code pipeline which kicks off the rebuilding of the image and updating the tasks when there is a code commit.
My problem here is the container will be killed abruptly during the message processing is there any way where I can stop the container killing until the process is finished and allow it to stop so that a new new container will be automatically created as I am ok with the current container processing the message with the old code. My container is running python code inside.
ECS by default sends a SIGTERM:
StopTask
Stops a running task.
When StopTask is called on a task, the equivalent of docker stop is
issued to the containers running in the task. This results in a
SIGTERM and a default 30-second timeout, after which SIGKILL is sent
and the containers are forcibly stopped. If the container handles the
SIGTERM gracefully and exits within 30 seconds from receiving it, no
SIGKILL is sent.
Note
The default 30-second timeout can be configured on the Amazon ECS
container agent with the ECS_CONTAINER_STOP_TIMEOUT variable. For more
information, see Amazon ECS Container Agent Configuration in the
Amazon Elastic Container Service Developer Guide.
Knowing this, you can add a simple check in your app to catch the SIGTERM, and react appropriately.
Two ways to achieve a graceful stop in Docker:
Using docker stop
When issuing a docker stop command to a container, Docker fires a SIGTERM signal to the process inside the container, and waits for 10 seconds before cleaning up the container.
You can specify the timeout other than 10s:
docker stop --time 30 <CONTAINER>
You need to ensure that the process handles the SIGTERM signal properly. Otherwise it will be rudely killed by a SIGKILL signal.
Using docker kill
By default the docker kill command sends a SIGKILL signal to the process. But you can specify another signal to be used:
docker kill --signal SIGQUIT <CONTAINER>
Also ensure that the process handles the specified signal properly. Worse than docker stop, the docker kill command does not have a timeout behavior.
Related
I am deploying a python code on Linux VM from Azure release pipeline. When I try to start the script from bash task, the task keeps on running. If I try to invoke the scipt start command using nohup, the azure agent kills the process when finishing the task.
The STDIO streams did not close within 10 seconds of the exit event from process '/bin/bash'. This may indicate a child process inherited the STDIO streams and has not yet exited.
Task contents:
python3 app/main.py
Question:
1.How to run a script in background on linux VM from Azure release pipeline?
2.Is there a specific way to setup this as a process from pipeline on linux VM?
I'm running a celery worker as a systemd daemon which serves a lot of long-running agents.
When I restart the worker all the agents hang and stop running new tasks waiting for pending ones.
Restarting agents is not an acceptable solution for me.
I'd also avoid using task timeouts
Is there a way to restart the worker gracefully to not impact already running agents?
All the agents are python scripts.
Using Python 3.6.1. I am emulating launching an airflow webserver from the command as a process using subprocess.Popen.
After doing some things, I later move to kill (or terminate) it.
webserver_process = subprocess.Popen(["airflow", "webserver"])
webserver_process.kill()
My understanding is that this will send a SIGKILL to the webserver, whose underlying gunicorn should shutdown immediately.
However, when I navigate to http://localhost:8080 I see that the webserver is still running. Similarly when I then run sudo netstat -nlp|grep 8080 (I am using UNIX, and airflow webserver launches on port 8080), I discover:
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN
It's only when I kill the process manually using sudo fuser -k 8080/tcp that it finally dies.
What's going on here?
The python process returned by airflow webserver command actually calls subprocess.Popen to start gunicorn in a subprocess.
You can test this by calling webserver_process.pid, you'll notice that it's a different pid from the gunicorn master process pid.
I have python script that runs on a windows machine and when the machine receives updates and shuts down for a reboot I would like to listen for that signal and gracefully handle cleanup in my script.
When a process is killed via normal means my understanding is that the process is sent the SIGTERMsignal. This signal can be caught and processed.
However it seems that when the system performs an automatic reboot due to windows updates there may not be a SIGTERM signal sent? Is SIGKILL sent instead? I understand that SIGKILL cannot be caught and the process is pretty much dead, this I can live with - it just changes my strategy on how I will recover.
I understand the process by which I can listen for signals in Python, what I am specifically asking for is clarification on what signal is sent to a process when windows automatically reboots after a windows update
I have a celery setup and running fine using rabbitmq as the broker. I also have CELERY_SEND_TASK_ERROR_EMAILS=True in my settings. I receive emails if there is an Exception
thrown while executing the tasks which is fine.
My question is is there a way either with celery or rabbitmq, to receive an error notification from either celery if the broker connection cannot be established or rabbitmq itself if the rabbitmq-server running dies.
I think the right tool for this job is a process control system like supervisord, which launches/watches processes and can trigger events when those processes die or restart. More specifically, using the plugin superlance, you can send an email when a process dies.