Python app scheduler doesn't run the same on systemd

Python app scheduler doesn't run the same on systemd - python

I have a python service that uses a background scheduler to run different tasks. It keeps our database up to date with other APIs using HTTP GETs and POSTs.
We had this running on heroku without trouble. We recently moved our production box on OVH. When I run it using
python runtasks.py
everything works fine. When I run it in the background using
python runtasks.py &
or using this systemd unit file
[Unit]
...
[Service]
Restart=always
ExecStart=/usr/bin/python /path/to/python/file/runtasks.py
[Install]
WantedBy=multi-user.target
The process runs without errors, however it does not work for more than 1 hour.
I've used the journalctl to get the process logs
Aug 10 13:10:23 ns504338.ip-192-99-1.net bash[23763]:
WARNING:apscheduler.scheduler:Execution of job "Foobar.run
(trigger: interval[0:00:20], next run at: 2015-08-10 13:10:23 EDT)"
skipped: maxim
Aug 10 13:10:43 ns504338.ip-192-99-1.net bash[23763]:
WARNING:apscheduler.scheduler:Execution of job "Foobar.run
(trigger: interval[0:00:20], next run at: 2015-08-10 13:10:43 EDT)"
skipped: maxim
Aug 10 13:11:03 ns504338.ip-192-99-1.net bash[23763]:
WARNING:apscheduler.scheduler:Execution of job "Foobar.run
(trigger: interval[0:00:20], next run at: 2015-08-10 13:11:03 EDT)"
skipped: maxim
...
These warnings are present less frequently when it is running correctly. And they are among other log messages.
My current assumption is a missing env variable or relative path. I'm currently investigating, any help is much appreciated!

Related

Systemd stucked into ExecStartPre step

I've made a very simple python program which extract the latests news popped in some website and send them to me via Telegram. The program is working perfectly when I am launching the command below in the console :
/usr/bin/python3.9 /home/dietpi/news/news.py
However when I try to automate it in systemd (to automatically restart if there is any bug or so), I noticed the services is blocked into the ExecStartPre step forever :
[Unit]
Description=News Service
Wants=network.target
After=network.target
[Service]
ExecStartPre=/bin/sleep 10
ExecStart=/usr/bin/python3.9 /home/dietpi/news/news.py
Restart=always
[Install]
WantedBy=multi-user.target
I put ExecStartPre command to let the Pi to setup properly the network before launching the program (I noticed a failure occur if not done, as the program starts too quickly and generates an error).
When I reboot the Pi, here is what I can see when I am opening the status of the services (by using the command: systemctl --type=service):
UNIT LOAD ACTIVE SUB JOB DESCRIPTION
news.service loaded activating start-pre start News Service
When I look more into detail on this service here is what I have (by using command: sudo systemctl status news.service):
● news.service - News Service
Loaded: loaded (/etc/systemd/system/news.service; enabled; vendor preset: enabled)
Active: activating (start-pre) since Fri 2022-02-04 17:03:58 GMT; 2s ago
Cntrl PID: 552 (sleep)
Tasks: 1 (limit: 4915)
CPU: 4ms
CGroup: /system.slice/news.service
└─552 /bin/sleep 10
Feb 04 17:03:58 DietPi systemd[1]: Starting News Service...
If I launch this command multiple time, I see the "activating" step goes up to 10s, then starts again from 0s >>> Which shows I am stuck in the ExecStartPre step :(
If you have any idea on how to solve this issue, it would be much appreciated :)

Try to create your own sleep script:
sleep.py:
import time
import sys
if __name__ == '__main__':
time.sleep(10)
sys.exit()
In your system unit:
ExecStartPre /usr/bin/python3.9 /home/dietpi/news/sleep.py
Personally, I prefer use supervisord to launch my python script as a service:
/etc/supervisor/conf.d/news.conf:
[program:news]
command = /usr/bin/python3.9 news.py
directory = /home/dietpi/news/
user = dietpi
autostart = true
autorestart = true
stdout_logfile = /var/log/supervisor/news.log
redirect_stderr = true

Trying to get supervisor to create a worker for python-rq

I am trying to get supervisor to spawn a worker following this pattern using python-RQ, much like what is mentioned in this stackoverflow question. I can start workers manually from the terminal as follows:
$ venv/bin/rq worker
14:35:27 Worker rq:worker:fd403822518a4f21802fa0dc417e526a: started, version 1.2.2
14:35:27 *** Listening on default...
It works great. I can confirm the worker exists in another terminal:
$ venv/bin/rq info
0 queues, 0 jobs total
fd403822518a4f21802fa0dc417e526a (b'' 41735): idle default
1 workers, 0 queues
Now to start a worker using supervisor.... Here is my supervisord.conf file, located in the same directory.
[supervisord]
;[program:worker]
command=venv/bin/rq worker
process_name=%(program_name)s-%(process_num)s
numprocs=1
directory=.
stopsignal=TERM
autostart=false
autorestart=false
I can start supervisor as follows:
$ venv/bin/supervisord -n
2020-03-05 14:36:45,079 INFO supervisord started with pid 41757
However, checking for a new worker, I see it's not there.
$ venv/bin/rq info
0 queues, 0 jobs total
0 workers, 0 queues
I have tried a multitude of other ways to get this worker to start, such as...
... within the virtual environment:
$ source venv/bin/activate
(venv) $ rq worker
*** Listening on default...
... using a shell file
#!/bin/bash
source /venv/bin/activate
rq worker low
$ ./start.sh
*** Listening on default...
... using a python script
$ venv/bin/python3 worker.py
*** Listening on default...
When started manually they all work fine. Changing the command= in supervisord.conf doesn't seem to make a difference. There is no worker to be found. What am I missing? Why won't supervisor start a worker? I am running this in Mac OS and my file structure is as follows:
.
|--__pycache__
|--supervisord.conf
|--supervisord.log
|--supervisord.pid
|--main.py
|--task.py
|--venv
|--bin
|--rq
|--supervisord
|--...etc
|--include
|--lib
|--pyenv.cfg
Thanks in advance.

I had two problems with supervisord.conf, which was preventing the worker from starting. The corrected config file is as follows:
[supervisord]
[program:worker]
command=venv/bin/rqworker
process_name=%(program_name)s-%(process_num)s
numprocs=1
directory=.
stopsignal=TERM
autostart=true
autorestart=false
First, the line [program:worker] was in fact commented out. I must have taken this line from the commented out sample file and not realized. However removing the comment still didn't start the worker.... I also had to set autostart=true, as starting supervisor does not automatically start a command.

Why does systemd reap child process when running script with Fabric but not ssh?

I am running a Python script over SSH on an Ubuntu 18.04.2 server.
When I use ssh to login to the server and run the script, and then terminate the ssh session, the Python script also terminates as expected (I'm not using nohup, &, etc.) However, when I run the same script using Fabric, and then terminate the local Fabric process, the python process on the server gets reaped by systemd. This is what the systemd status looks like:
● session-219.scope - Session 219 of user root
Loaded: loaded (/run/systemd/transient/session-219.scope; transient)
Transient: yes
Active: active (abandoned) since Fri 2019-12-27 00:56:07 PST; 2min 55s ago
Tasks: 1
CGroup: /user.slice/user-0.slice/session-219.scope
└─6872 /root/peacock/bin/python3 -m src.main
Dec 27 00:56:07 master systemd[1]: Started Session 219 of user root.
Dec 27 00:57:52 master sshd[6783]: pam_unix(sshd:session): session closed for user root
Is there a way to prevent systemd from reaping the child process, similar to the behavior of ssh? And why does it only get reaped when using Fabric but not ssh directly?
More details:
The Python script is a simple flask app. The gist of it is:
flask_app = Flask('app')
#flask.route('/')
def index():
# ....
if __name__ == '__main__':
flask_app.run(host='0.0.0.0')
The Fabric script is roughly as follows:
server_conn = fabric.Connection('1.2.3.4')
with server_conn.cd('/root/peacock'):
server_conn.run('/root/peacock/bin/python3 -m src.main')

If you need to run a process as a daemon on the remote box, I would suggest that you make it a systemd unit. This way you can control it with standard commands and access its logs like any other service on the system.
Your config could look like (/etc/systemd/system/peacock.service):
[Unit]
Description=Peacock systemd service.
[Service]
Type=simple
ExecStart=/root/peacock/bin/python3 -m src.main
[Install]
WantedBy=multi-user.target
Remember to sudo chmod 644 /etc/systemd/system/peacock.service. Then your fabric script would look like:
server_conn = fabric.Connection('1.2.3.4')
with server_conn.cd('/root/peacock'):
server_conn.run('systemctl start peacock.service')
Later you can check status on this service. You will also be able to access logs with journalctl -u peacock

Why the supervisor make the celery worker changing form running to starting all the time?

backgroud
The system is Centos7, which have a python2.x. 1GB memory and single core.
I install python3.x , I can code python3 into python3.
The django-celery project is based on a virtualenv python3.x,and I had make it well at nginx,uwsgi,mariadb. At least,I think so for no error happend.
I try to use supervisor to control the django-celery's worker,like below:
command=env/bin/python project/manage.py celeryd -l INFO -n worker_%(process_num)s
numprocs=4
process_name=projects_worker_%(process_num)s
stdout_logfile=logfile.log
etderr_logfile=logfile_err.log
Also had make setting about celery events,celery beat,this part is well ,no error happend. Error comes from the part of worker.
When I keep the proces big than 1,it would run at first,when I do supervisorctl status,all are running.
But when I do the same command to see status once more times,some process status change to starting.
So I try more times,I found that:the worker's status would always change from running to starting and then changeing starting to running-- no stop.
When I check the supervisor's logfile at tmp/supervisor.log,it shows like:
exit status 1; not expected
entered runnging state,process has stayed up for > than 1 seconds(startsecs)
'project_worker_0' with pid 2284
Maybe it shows why the worker change status all the time.
What's more ,when I change the proces to 1,the worker could failed.The worker's log show me:
stale pidfile exists.Removing it
But,I did not ponit the pidfile path to worker.And,I just found the events's and beat 's pidfie at the / path,no worker's pidfile.Also ,I try find / -name *.pid to find a pidfile like worker,or celeryd,but here did not exist.
question
firstly, I want to deploy the project , so ,did here any other way to deploy the django-celery with virtulanev's celery part?
If here anyone can tell me how this phenomenon comes,I would better to choose supervisor to deploy the celery part. Anyone can help me about it ?
PS
Any of your thoughts may be helpful to me, best wishs!

Finally, I solve this problem yesterday night.
about the reason
I make the project could success running at a windows 10 system, but did no check when I change the project to centos7.+. The command:env/bin/python project/manage.py celeryd could not run success. So the supervisor would start a process which will failed soon.
Why the command could not success? I had pip installed all the package need. But it show err below:
Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).
User information: uid=0 euid=0 gid=0 egid=0
I try to search some blog about this error, and get the anser:
export C_FORCE_ROOT='true' # at the centos enviroument
action to solve(after meeting error like this)
add export C_FORCE_ROOT='true' to centos's enviroment file and source it.
check command 'env/bin/python project/manage.py celeryd ',did it run successful.
restart the supervisord. Attention please! not supervisorctl reload,it just reload the .conf file,not the environment file. Try kill the process supervisord -c xx.conf(ps aux | grep supervisord and kill -9 process_number,be careful).
some url about the blog
the error when just run celeryd not sucess in chinese

Adding a python daemon to the systemd

The essence:
I have created a daemon to manage some tasks on a remote platform.
It is written in python and accepts start, stop and restart arguments.
While trying to add it to the systemd (so it would start on system startup and be stopped on shutdown, etc.) I encountered a problem:
It seems to see daemon running, but I am not sure if it actually works, because restarting or requesting status returns with an error:
[user#centos ~]# systemctl restart mydaemon
Failed to restart mydaemon.service: Unit mydaemon.service failed to load: No such file or directory.
[user#centos ~]# systemctl status mydaemon
● mydaemon.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
The specifics:
The code itself follows the well-known example by Sander Marechal with very few changes. By itself it works without any problems, and properly reacts to all accepted arguments. The pid is saved in /tmp/my-daemon.pid.
The systemd service file is in the user daemons directory: /usr/lib/systemd/user/mydaemon.service, and the code is as follows:
[Unit]
Description=The user daemon
[Service]
Type=forking
ExecStart=/usr/bin/python /home/frcr/mydaemon_v01.py start
ExecStop=/usr/bin/python /home/frcr/mydaemon_v01.py stop
RestartSec=5
TimeoutSec=60
RuntimeMaxSec=infinity
Restart=always
PIDFile=/tmp/my-daemon.pid
[Install]
WantedBy=multi-user.target
systemctl returns the status of it as active, but only if provided the pid:
[user#centos ~]# systemctl status 9177
● session-481.scope - Session 481 of user user
Loaded: loaded
Drop-In: /run/systemd/system/session-481.scope.d
└─50-After-systemd-logind\x2eservice.conf, 50-After-systemd-user-sessions\x2eservice.conf, 50-Description.conf, 50-SendSIGHUP.conf, 50-Slice.conf
Active: active (running) since Tue 2016-05-17 06:24:51 EDT; 1h 43min ago
CGroup: /user.slice/user-0.slice/session-481.scope
├─8815 sshd: root#pts/0
├─8817 -bash
├─9177 python /home/user/mydaemon_v01.py start
└─9357 systemctl status 9177
I have seen a similar question here on stack overflow, but it doesn't seem to have the solution to my problem.
I assume I am missing something very obvious due to the sheer lack of experience with systemd, and I'd be extremely grateful if somebody could point it out for me or show me the right direction to move. Thanks in advance and please forgive my mad English skillz.

Enabling the daemon with a full path name worked around the issue but there is a better solution.
The issue was the service was in a user directory but was started as a system service. However /usr/lib was not the right place to add new service files anyway. That directory is for files shipped as part of operating system packages. The correct directory to add a new system service is in /etc/systemd/system See related docs about systemd paths.
You still want to enable the service to make sure it gets loaded at boot time.

After some additional googling I found a solution: I actually forgot to add the daemon to systemctl:
[root#centos ~]# systemctl enable /usr/lib/systemd/user/mydaemon.service
Created symlink from /etc/systemd/system/multi-user.target.wants/mydaemon.service to /usr/lib/systemd/user/mydaemon.service.
Created symlink from /etc/systemd/system/mydaemon.service to /usr/lib/systemd/user/mydaemon.service.
It is also worth mentioning that the absolute path is required.
The only thing left is to refresh systemctl:
[root#centos ~]# systemctl daemon-reload
After that the service is added, and
[root#centos ~]# systemctl start mydaemon
[root#centos ~]# systemctl restart mydaemon
[root#centos ~]# systemctl stop mydaemon
all work perfectly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.