Airflow with systemd: `airflow.pid` vs `airflow-monitor.pid` - python

My systemd unit file is working (below).
However the airflow-monitor.pid file is transiently becoming read-only, which sometimes prevents airflow from starting. Our workaround is to delete airflow-monitor.pid if this happens. This is not the same file as airflow.pid.
It looks like airflow.pid is gunicorn and airflow-monitor.pid is a python process as airflow webserver.
systemd unit file:
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
# by default we just set $AIRFLOW_HOME to its default dir: $HOME/airflow , so lets skip this for now
EnvironmentFile=/home/airflow/airflow/airflow.systemd.environment
#WorkingDirectory=/home/airflow/airflow-venv
#Environment=PATH="/home/airflow/airflow-venv/bin:$PATH"
PIDFile=/home/airflow/airflow/airflow.pid
User=airflow
Group=airflow
Type=simple
# this was originally the file webserver.pid but did not run
#ExecStart=/bin/bash -c 'source /home/airflow/airflow-venv/bin/activate ; /home/airflow/airflow-venv/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon'
#ExecStart=/home/airflow/airflow-venv/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
ExecStart=/usr/local/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Here is the output of the pid files:
airflow#airflow:~$ cat airflow/airflow.pid
8397
airflow#airflow:~$ cat airflow/airflow-monitor.pid
8377
airflow#airflow:~$ ps faux | grep 8377
airflow 26004 0.0 0.0 14224 976 pts/0 S+ 18:05 0:00 | \_ grep --color=auto 8377 airflow 8377 0.4 1.0 399676 83804 ? Ss Aug23 6:14 /usr/bin/python /usr/local/bin/airflow webserver -p 8080 --pid /home/airflow/airflow/airflow.pid --daemon
airflow#airflow:~$ ps faux | grep 8397
airflow 26028 0.0 0.0 14224 940 pts/0 R+ 18:05 0:00 | \_ grep --color=auto 8397 airflow 8397 0.0 0.6 186652 55496 ? S Aug23 0:32 gunicorn: master [airflow-webserver]

Not quite sure why airflow-monitor.pid is becoming read-only, but you can avoid this pid file entirely by not running the webserver with --daemon. I don't think it's necessary with systemd.
Relevant block of code: https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L754-L765

Related

Using nice command on gunicorn service workers

I have a service:
[Unit]
Description=tweetsift
After=network.target
[Service]
User=root
Group=root
WorkingDirectory=/var/www/html
ExecStart=sudo /usr/bin/nice -n -20 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
Restart=on-failure
[Install]
WantedBy=multi-user.target
When I run sudo systemctl status tweet I see that I am using /usr/bin/nice for the main PID. However it is not taking on the workers.
tweetsift.service - tweet
Loaded: loaded (/etc/systemd/system/tweet.service; enabled; preset: enabled)
Active: active (running) since Mon 2023-01-09 04:36:08 UTC; 5min ago
Main PID: 3124 (sudo)
Tasks: 12 (limit: 4661)
Memory: 702.8M
CPU: 7.580s
CGroup: /system.slice/tweet.service
├─3124 sudo /usr/bin/nice -n -20 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3125 sudo -u root sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3126 sudo gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3127 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3128 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3129 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
├─3130 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
└─3131 /usr/bin/python3 /usr/local/bin/gunicorn -w 4 -b 0.0.0.0:5000 endpoint:app
I am running a machine learning script that sucks down the CPU. I tried using nice python3 tweet.py and it works and doesn't kill the process.
However, when I try call the api endpoint I built. The service starts up using a worker and then gets killed for OOM (Out of Memory).
I am using Ubuntu 20.04 & Apache2
Any ideas? I was able to get nice running on the main PID by updating the /etc/sudoers/ and adding a line to allow sudo to use it.
But I still can't get the script to run as a service using nice for the workers PIDs too when they start up upon an API call to the flask app I've got running.
I am using gunicorn (version 20.1.0)
Thanks!
I've tried everything at this point. I want nice to be applied to gunicorn workers when my flask app has an api call sent to it without getting killed for OOM.
I'm using a 4GB Intel 80GB Disk premium intel droplet on DigitalOcean.

After I exit from the remote server(CentOS-7), the `python3 manage.py runserver` will not work

I use the python3 manage.py runserver run the APIs of my Django-Rest-Framework project in my remote server(CentOS-7).
But after I exit from the remote server(CentOS-7), the APIs will not service.
If I login again to the remote server, APIs still not work, but I list the runserver command, it is there.
[root#www ~]# ps aux | grep runserver
lll 26439 0.0 0.5 275884 41704 ? S 07:29 0:00 python3 manage.py runserver
lll 26443 3.1 1.0 380044 83264 ? S 07:29 10:22 /home/lll/repo/Qit/venv_dist/bin/python3 manage.py runserver
root 32575 0.0 0.0 112680 972 pts/1 S+ 12:56 0:00 grep --color=auto runserver
My question is, when I login the remote server to runserver the django, the APIs works, but I logout the remote server, the APIs can not access now.
My settings of wsgi.py:
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "Qn.settings")
import django
print("django.setup()")
django.setup()
from socketio import Middleware
from qn_admin_website_chat.views import sio
django_app = get_wsgi_application()
application = Middleware(sio, django_app)
import eventlet
import eventlet.wsgi
eventlet.wsgi.server(eventlet.listen(('', 8000)), application)
Some friends said can use eventlet.wsgi.server as distribute server, so I use this way to deployment my project.
Better use uWSGI in systemd service.
# /etc/systemd/system/app.service
[Service]
Type=simple
ExecStart=/blabla/virtualenv/bin/uwsgi --ini /path/to/config.ini
[Install]
WantedBy=multi-user.target
% sudo systemctl enable --now app.service

How can I configure celery to run on startup of nginx?

I have celery running locally by just running celery -A proj -l info (although I don't even know if I should be using this command in production), and I want to get celery running on my production web server every time nginx starts. The init system is systemd
Create a service file like this celery.service
[Unit]
Description=celery service
After=network.target
[Service]
PIDFile=/run/celery/pid
User=celery
Group=celery
RuntimeDirectory=/path/to/project
WorkingDirectory=/path/to/project
ExecStart=celery -A proj -l info
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-abort
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Move the file to /etc/systemd/system/ and next time your restart server, celery will be started by systemd on boot.

Gunicorn graceful stopping with docker-compose

I find that when I use docker-compose to shut down my gunicorn (19.7.1) python application, it always takes 10s to shut down. This is the default maximum time docker-compose waits before forcefully killing the process (adjusted with the -t / --timeout parameter). I assume this means that gunicorn isn't being gracefully shut down. I can reproduce this with:
docker-compose.yml:
version: "3"
services:
test:
build: ./
ports:
- 8000:8000
Dockerfile:
FROM python
RUN pip install gunicorn
COPY test.py .
EXPOSE 8000
CMD gunicorn -b :8000 test:app
test.py
def app(_, start_response):
"""Simplest possible application object"""
data = b'Hello, World!\n'
status = '200 OK'
response_headers = [
('Content-type', 'text/plain'),
('Content-Length', str(len(data)))
]
start_response(status, response_headers)
return iter([data])
Then running the app with:
docker-compose up -d
and gracefully stopping it with:
docker-compose stop
version:
docker-compose version 1.12.0, build b31ff33
I would prefer to allow gunicorn to stop gracefully. I think it should be able to based on the signal handlers in base.py.
All of the above is also true for updating images using docker-compose up -d twice, the second time with a new image to replace the old one.
Am I misunderstanding / misusing something? What signal does docker-compose send to stop processes? Shouldn't gunicorn be using it? Should I be able to restart my application faster than 10s?
TL;DR
Add exec after CMD in your dockerfile: CMD exec gunicorn -b :8000 test:app.
Details
I had the same issue, when I ran docker exec my_running_gunicorn ps aux, I saw something like:
gunicorn 1 0.0 0.0 4336 732 ? Ss 10:38 0:00 /bin/sh -c gunicorn -c gunicorn.conf.py vision:app
gunicorn 5 0.1 1.1 91600 22636 ? S 10:38 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
gunicorn 8 0.2 2.5 186328 52540 ? S 10:38 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
The 1 PID is not the gunicorn master, hence it didn't receive the sigterm signal.
With the exec in the Dockerfile, I now have
gunicorn 1 32.0 1.1 91472 22624 ? Ss 10:43 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
gunicorn 7 45.0 1.9 131664 39116 ? R 10:43 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -c gunicorn.conf.py vision:app
and it works.

How to debug a Python script that segfaults when run as a systemd?

This is driving me nuts.
A Flask app works fine if i personally run uWSGI from the CLI:
uwsgi --emperor /etc/uwsgi/emperor.ini
but when trying to start it as a service with systemd, there is a segfault and the resulting coredump says almost nothing:
sudo systemctl start emperor.uwsgi
coredump:
[New LWP 7639]
Core was generated by `/usr/bin/uwsgi --ini website.ini'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000123c in ?? ()
So, I have no idea how to get more detailed information. This is not a script I'm running with Python app.py, but is a script being served by uWSGI.
I'm clueless and would appreciate any advice.
Thanks.
Edit I - systemd init script:
[Unit]
Description=uWSGI Emperor
After=syslog.target
[Service]
ExecStart=/usr/bin/uwsgi --ini /etc/uwsgi/emperor.ini
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -INT $MAINPID
Restart=always
Type=notify
StandardError=syslog
NotifyAccess=all
KillSignal=SIGQUIT
[Install]
WantedBy=multi-user.target
If i run /usr/bin/uwsgi --ini /etc/uwsgi/emperor.ini manually, works fine.

Categories

Resources