Gunicorn graceful stopping with docker-compose - python

I find that when I use docker-compose to shut down my gunicorn (19.7.1) python application, it always takes 10s to shut down. This is the default maximum time docker-compose waits before forcefully killing the process (adjusted with the -t / --timeout parameter). I assume this means that gunicorn isn't being gracefully shut down. I can reproduce this with:
docker-compose.yml:
version: "3"
services:
test:
build: ./
ports:
- 8000:8000
Dockerfile:
FROM python
RUN pip install gunicorn
COPY test.py .
EXPOSE 8000
CMD gunicorn -b :8000 test:app
test.py
def app(_, start_response):
"""Simplest possible application object"""
data = b'Hello, World!\n'
status = '200 OK'
response_headers = [
('Content-type', 'text/plain'),
('Content-Length', str(len(data)))
]
start_response(status, response_headers)
return iter([data])
Then running the app with:
docker-compose up -d
and gracefully stopping it with:
docker-compose stop
version:
docker-compose version 1.12.0, build b31ff33
I would prefer to allow gunicorn to stop gracefully. I think it should be able to based on the signal handlers in base.py.
All of the above is also true for updating images using docker-compose up -d twice, the second time with a new image to replace the old one.
Am I misunderstanding / misusing something? What signal does docker-compose send to stop processes? Shouldn't gunicorn be using it? Should I be able to restart my application faster than 10s?

TL;DR
Add exec after CMD in your dockerfile: CMD exec gunicorn -b :8000 test:app.
Details
I had the same issue, when I ran docker exec my_running_gunicorn ps aux, I saw something like:
gunicorn 1 0.0 0.0 4336 732 ? Ss 10:38 0:00 /bin/sh -c gunicorn -c gunicorn.conf.py vision:app
gunicorn 5 0.1 1.1 91600 22636 ? S 10:38 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
gunicorn 8 0.2 2.5 186328 52540 ? S 10:38 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
The 1 PID is not the gunicorn master, hence it didn't receive the sigterm signal.
With the exec in the Dockerfile, I now have
gunicorn 1 32.0 1.1 91472 22624 ? Ss 10:43 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
gunicorn 7 45.0 1.9 131664 39116 ? R 10:43 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
and it works.

Related

Using nice command on gunicorn service workers

I have a service:
[Unit]
Description=tweetsift
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/var/www/html
ExecStart=sudo /usr/bin/nice -n -20 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
Restart=on-failure
[Install]
WantedBy=multi-user.target
When I run sudo systemctl status tweet I see that I am using /usr/bin/nice for the main PID. However it is not taking on the workers.
tweetsift.service - tweet
Loaded: loaded (/etc/systemd/system/tweet.service; enabled; preset: enabled)
Active: active (running) since Mon 2023-01-09 04:36:08 UTC; 5min ago
Main PID: 3124 (sudo)
Tasks: 12 (limit: 4661)
Memory: 702.8M
CPU: 7.580s
CGroup: /system.slice/tweet.service
├─3124 sudo /usr/bin/nice -n -20 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3125 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3126 sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3127 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3128 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3129 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3130 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
└─3131 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
I am running a machine learning script that sucks down the CPU. I tried using nice python3 tweet.py and it works and doesn't kill the process.
However, when I try call the api endpoint I built. The service starts up using a worker and then gets killed for OOM (Out of Memory).
I am using Ubuntu 20.04 & Apache2
Any ideas? I was able to get nice running on the main PID by updating the /etc/sudoers/ and adding a line to allow sudo to use it.
But I still can't get the script to run as a service using nice for the workers PIDs too when they start up upon an API call to the flask app I've got running.
I am using gunicorn (version 20.1.0)
Thanks!
I've tried everything at this point. I want nice to be applied to gunicorn workers when my flask app has an api call sent to it without getting killed for OOM.
I'm using a 4GB Intel 80GB Disk premium intel droplet on DigitalOcean.

"[CRITICAL] WORKER TIMEOUT" in logs when running "Hello Cloud Run with Python" from GCP Setup Docs

Following the tutorial here I have the following 2 files:
app.py
from flask import Flask, request
app = Flask(__name__)
#app.route('/', methods=['GET'])
def hello():
"""Return a friendly HTTP greeting."""
who = request.args.get('who', 'World')
return f'Hello {who}!\n'
if __name__ == '__main__':
# Used when running locally only. When deploying to Cloud Run,
# a webserver process such as Gunicorn will serve the app.
app.run(host='localhost', port=8080, debug=True)
Dockerfile
# Use an official lightweight Python image.
# https://hub.docker.com/_/python
FROM python:3.7-slim
# Install production dependencies.
RUN pip install Flask gunicorn
# Copy local code to the container image.
WORKDIR /app
COPY . .
# Service must listen to $PORT environment variable.
# This default value facilitates local development.
ENV PORT 8080
# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind 0.0.0.0:$PORT --workers 1 --threads 8 app:app
I then build and run them using Cloud Build and Cloud Run:
PROJECT_ID=$(gcloud config get-value project)
DOCKER_IMG="gcr.io/$PROJECT_ID/helloworld-python"
gcloud builds submit --tag $DOCKER_IMG
gcloud run deploy --image $DOCKER_IMG --platform managed
The code appears to run fine, and I am able to access the app on the given URL. However the logs seem to indicate a critical error, and the workers keep restarting. Here is the log file from Cloud Run after starting up the app and making a few requests in my web browser:
2020-03-05T03:37:39.392Z Cloud Run CreateService helloworld-python ...
2020-03-05T03:38:03.285477Z[2020-03-05 03:38:03 +0000] [1] [INFO] Starting gunicorn 20.0.4
2020-03-05T03:38:03.287294Z[2020-03-05 03:38:03 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2020-03-05T03:38:03.287362Z[2020-03-05 03:38:03 +0000] [1] [INFO] Using worker: threads
2020-03-05T03:38:03.318392Z[2020-03-05 03:38:03 +0000] [4] [INFO] Booting worker with pid: 4
2020-03-05T03:38:15.057898Z[2020-03-05 03:38:15 +0000] [1] [INFO] Starting gunicorn 20.0.4
2020-03-05T03:38:15.059571Z[2020-03-05 03:38:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2020-03-05T03:38:15.059609Z[2020-03-05 03:38:15 +0000] [1] [INFO] Using worker: threads
2020-03-05T03:38:15.099443Z[2020-03-05 03:38:15 +0000] [4] [INFO] Booting worker with pid: 4
2020-03-05T03:38:16.320286ZGET200 297 B 2.9 s Safari 13 https://helloworld-python-xhd7w5igiq-ue.a.run.app/
2020-03-05T03:38:16.489044ZGET404 508 B 6 ms Safari 13 https://helloworld-python-xhd7w5igiq-ue.a.run.app/favicon.ico
2020-03-05T03:38:21.575528ZGET200 288 B 6 ms Safari 13 https://helloworld-python-xhd7w5igiq-ue.a.run.app/
2020-03-05T03:38:27.000761ZGET200 285 B 5 ms Safari 13 https://helloworld-python-xhd7w5igiq-ue.a.run.app/?who=me
2020-03-05T03:38:27.347258ZGET404 508 B 13 ms Safari 13 https://helloworld-python-xhd7w5igiq-ue.a.run.app/favicon.ico
2020-03-05T03:38:34.802266Z[2020-03-05 03:38:34 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:4)
2020-03-05T03:38:35.302340Z[2020-03-05 03:38:35 +0000] [4] [INFO] Worker exiting (pid: 4)
2020-03-05T03:38:48.803505Z[2020-03-05 03:38:48 +0000] [5] [INFO] Booting worker with pid: 5
2020-03-05T03:39:10.202062Z[2020-03-05 03:39:09 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:5)
2020-03-05T03:39:10.702339Z[2020-03-05 03:39:10 +0000] [5] [INFO] Worker exiting (pid: 5)
2020-03-05T03:39:18.801194Z[2020-03-05 03:39:18 +0000] [6] [INFO] Booting worker with pid: 6
Note the worker timeouts and reboots at the end of the logs. The fact that its a CRITICAL error makes me think it shouldn't be happing. Is this expected behavior? Is this a side effect of the Cloud Run machinery starting and stopping my service as requests come and go?
Cloud Run has scaled down one of your instances, and the gunicorn arbiter is considering it stalled.
You should add --timeout 0 to your gunicorn invocation to disable the worker timeout entirely, it's unnecessary for Cloud Run.
i was facing the error [11229] [CRITICAL] WORKER TIMEOUT (pid:11232) on heroku
i changed my Procfile to this
web: gunicorn --workers=3 app:app --timeout 200 --log-file -
and it fixed my problem by incresing the --timeout
Here's a working example of a Flask app on Cloud run. My guess is that your last line or the Decker file and the last part of your python file are the ones causing this behavior.
main.py
# main.py
#gcloud beta run services replace service.yaml
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello_world():
msg = "Hello World"
return msg
Dockerfile (the apt-get part is not needed)
# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.7
# Install manually all the missing libraries
RUN apt-get update
RUN apt-get install -y gconf-service libasound2 libatk1.0-0 libcairo2 libcups2 libfontconfig1 libgdk-pixbuf2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libxss1 fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils
# Install Python dependencies.
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . .
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app
then build using:
gcloud builds submit --tag gcr.io/[PROJECT]/[MY_SERVICE]
and deploy:
gcloud beta run deploy [MY_SERVICE] --image gcr.io/[PROJECT]/[MY_SERVICE] --region europe-west1 --platform managed
UPDATE
I've checked again the logs you've provided.
Getting this kind of warning/error is normal at the beginning after a new deployment as your old instances are not handling any requests but instead they are idle at that time until they are completely shut down.
Gunicorn also has a default timeout of 30s which matches with the time between the time of "Booting worker" and the time you see the error.
for those who are entering here and have this problem but with django (probably it will work the same) with gunicorn, supervisor and nginx, check your configuration in the gunicorn_start file or where you have the gunicorn parameters, in my case I have it like this, in the last line add the timeout
NAME="myapp" # Name of the application
DJANGODIR=/webapps/myapp # Django project directory
SOCKFILE=/webapps/myapp/run/gunicorn.sock # we will communicte using this unix socket
USER=root # the user to run as
GROUP=root # the group to run as
NUM_WORKERS=3 # how many worker processes should Gunicorn spawn
DJANGO_SETTINGS_MODULE=myapp.settings # which settings file should Django use
DJANGO_WSGI_MODULE=myapp.wsgi # WSGI module name
echo "Starting $NAME as `whoami`"
# Activate the virtual environment
cd $DJANGODIR
source ../bin/activate
export DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE
export PYTHONPATH=$DJANGODIR:$PYTHONPATH
# Create the run directory if it doesn't exist
RUNDIR=$(dirname $SOCKFILE)
test -d $RUNDIR || mkdir -p $RUNDIR
# Start your Django Unicorn
# Programs meant to be run under supervisor should not daemonize themselves (do not use --daemon)
exec ../bin/gunicorn ${DJANGO_WSGI_MODULE}:application \
--name $NAME \
--workers $NUM_WORKERS \
--user=$USER --group=$GROUP \
--bind=unix:$SOCKFILE \
--log-level=debug \
--log-file=- \
--timeout 120 #This

How many parallel request can Python Flask process using Gunicorn?

Can you please check if I am missing anything in the config?
I am running a python flask app using Gunicorn.
Our current flow of events is:
Jmeter will trigger 8-10 jobs in parallel and send them to the AWS load balancer
A requests then goes through an Nginix proxy and is forwarded to a Gunicorn/Flask app running on EC2 instance.
I am using the following configurations to enable multi-processing on Gunicorn/Flask, but these commands are not having any effect, as I see the jobs are being executed serially and not in parallel.
  
Please help me understand what I need to change in order to have all of these jobs execute in parallel.
 
Here is the list of the commands I have tried out but nothing has been working:
These commands are sync commands which I tried:
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --threads 2
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --threads 2 max_requests_jitter 4
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --max-requests 4
These command are async commands which I tried:
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --worker-class tornado
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --worker-class gevent
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --worker-class gthread
gunicorn app1:application -b localhost:8000 --timeout 90000 -w 17 --worker-class eventlet

uWSGI runs Django project from command line but not from Emperor uwsgi.service file

I am running a virtualenv with Python3.6 on Ubuntu 16.04 for my Django project using uwsgi and NGINX.
I have uWSGI installed globally and also in the virtualenv.
I can run my project from the command line using uWSGI within the env with
/home/user/Env/myproject/bin/uwsgi --http :8080 --home /home/user/Env/myproject --chdir /home/user/myproject/src/myproject -w myproject.wsgi
and go to my domain and it loads fine.
However I am obviously running uWSGI in "Emperor mode" and when I set the service file up (along with NGINX) the domain displays internal server error.
The uWSGI logs trace to --- no python application found ---
I was having this problem when running
uwsgi --http :8080 --home /home/user/Env/myproject --chdir /home/user/myproject/src/myproject -w myproject.wsgi
because it was using the global install uwsgi instead of the virtualenv one.
I changed my StartExec to the virtualenv uwsgi path but no luck.
I can't figure out what I'm doing wrong, path error? Syntax error?
my /etc/systemd/system/uwsgi.service file
[Unit]
Description=uWSGI Emperor service
[Service]
ExecStartPre=/bin/bash -c 'mkdir -p /run/uwsgi; chown user:www-data /run/uwsgi'
ExecStart=/home/user/Env/myproject/bin/uwsgi --emperor /etc/uwsgi/sites
Restart=always
KillSignal=SIGQUIT
Type=notify
NotifyAccess=all
[Install]
WantedBy=multi-user.target
Okay bit silly but it seems I ran sudo systemctl stop uwsgi and then sudo systemctl start uwsgi and it works now.

Airflow with systemd: `airflow.pid` vs `airflow-monitor.pid`

My systemd unit file is working (below).
However the airflow-monitor.pid file is transiently becoming read-only, which sometimes prevents airflow from starting. Our workaround is to delete airflow-monitor.pid if this happens. This is not the same file as airflow.pid.
It looks like airflow.pid is gunicorn and airflow-monitor.pid is a python process as airflow webserver.
systemd unit file:
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
# by default we just set $AIRFLOW_HOME to its default dir: $HOME/airflow , so lets skip this for now
EnvironmentFile=/home/airflow/airflow/airflow.systemd.environment
#WorkingDirectory=/home/airflow/airflow-venv
#Environment=PATH="/home/airflow/airflow-venv/bin:$PATH"
PIDFile=/home/airflow/airflow/airflow.pid
User=airflow
Group=airflow
Type=simple
# this was originally the file webserver.pid but did not run
#ExecStart=/bin/bash -c 'source /home/airflow/airflow-venv/bin/activate ; /home/airflow/airflow-venv/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon'
#ExecStart=/home/airflow/airflow-venv/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
ExecStart=/usr/local/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Here is the output of the pid files:
airflow#airflow:~$ cat airflow/airflow.pid
8397
airflow#airflow:~$ cat airflow/airflow-monitor.pid
8377
airflow#airflow:~$ ps faux | grep 8377
airflow 26004 0.0 0.0 14224 976 pts/0 S+ 18:05 0:00 | \_ grep --color=auto 8377 airflow 8377 0.4 1.0 399676 83804 ? Ss Aug23 6:14 /usr/bin/python /usr/local/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
airflow#airflow:~$ ps faux | grep 8397
airflow 26028 0.0 0.0 14224 940 pts/0 R+ 18:05 0:00 | \_ grep --color=auto 8397 airflow 8397 0.0 0.6 186652 55496 ? S Aug23 0:32 gunicorn: master [airflow-webserver]
Not quite sure why airflow-monitor.pid is becoming read-only, but you can avoid this pid file entirely by not running the webserver with --daemon. I don't think it's necessary with systemd.
Relevant block of code: https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L754-L765

Categories

Resources