I am using systemd on Raspbian to run a Python script script.py. The my.service file looks like this:
[Unit]
Description=My Python Script
Requires=other.service
[Service]
Restart=always
ExecStart=/home/script.py
ExecStop=/home/script.py
[Install]
WantedBy=multi-user.target
When the Required=other.service stops, I want my.service to stop immediately and also terminate Python process running script.py.
However, when trying this out by stopping other.service and then monitoring the state of my.service using systemctl, it seems like it takes good while for my.service to actually enter a 'failed' state (stopped). It seems that calling ExecStop to the script is not enough to terminate my.service itself and the subsequent script.py in a minute manner.
Just to be extra clear: I want the script to terminate pretty immediately in a way that is analogous to Ctrl + C. Basic Python clean-up is OK, but I don't want systemd to be waiting for a 'graceful' response time-out, or something like that.
Questions:
Is my interpretation of the delay correct, or is it just systemctl that is slow to update its status overview?
What is the recommendable way to stop the service and terminate the script. Should I include some sort of SIGINT catching in the Python script? If so, how? Or is there something that can be done in my.service to expedite the stopping of the service and killing of the script?
I think you should look into TimeoutStopSec and it's default value param DefaultTimeoutStartSec. On the priovided links, there are some more info about WatchdogSec and other options that you might find usefull. It looks like DefaultTimeoutStartSec's default is 90 seconds, which might be the delay you are experiencing..?
Under unit section options you should use Requisite=other.service This is similar to Requires= However, if the units listed here are not started already, they will not be started and the transaction will fail immediately.
For triggering script execution again under unit section you can use OnFailure= which is a space-separated list of one or more units that are activated when this unit enters the "failed" state.
Also using BindsTo= option configures requirement dependencies, very similar in style to Requires=, however in addition to this behavior, it also declares that this unit is stopped when any of the units listed suddenly disappears. Units can suddenly, unexpectedly disappear if a service terminates on its own choice, a device is unplugged or a mount point unmounted without involvement of systemd.
I think in your case BindsTo= is the option to use since it causes the current unit to stop when the associated unit terminates.
From systemd.unit man
Related
Background
I'm struggling to find a example of WDT in the way I want to use it. Wondering if I misunderstanding its use.
my python writing is pure hobby, honestly Classes intimidate me.
in short my program reads a number of sensors connected to a raspberry pi and writes the data to a cloud hosted object database.
i have an intermittent error that while I try to figure out I want to implement a based watchdog timer.
This is what I'd like to implement so in the very least I continue to collect and store data.
I've read about the builtin watchdog timer the raspberry pi has built in here: https://diode.io/raspberry%20pi/running-forever-with-the-raspberry-pi-hardware-watchdog-20202/
The problem I want the raspberry pi to reboot if my program hangs, but when that happens the OS is still fine, so the solution in the link above is not effective.
What I'd like to implement:
set the builtin watchdog timer to reboot the raspberry pi after 200 seconds without restarting (patting?) the timer. I think the instructions for this are in the link above.
Within my python script, after I iterate through each sensor, restart (or pat?) the watchdog timer and if 200 seconds elapse between pattings (meaning my program hangs) then RPi reboots.
is this possible?
can someone help me with some simple code? I was hoping to keep this simple and avoid classes and/or threads...
thank you in advance
The WDT is probably not the right solution for the problem you are describing. Based on your description, it sounds like what you have is a program that is intended to run periodically (either on a fixed schedule or in response to some event), and that program currently has a bug that is causing it to hang intermittently and never complete it's task or terminate.
The first and best way to solve that, I'm sure you can guess, is to fix the bug. But your thinking is not unreasonable either, and doing what you describe is very common. There are many valid approaches that will do what you want without the complexity of trying to deal with a hardware timer. One of the easiest is probably to just wrap the program in a shell script and use the timeout command to limit how long it is allowed to execute before it is terminated.
Assuming the script is located at /home/user/my_script.py, and you want to run it every 10 minutes, allowing it 2 minutes before it is killed, this would work:
create a wrapper shell script:
#!/usr/bin/env bash
if ! timeout 120s python /home/user/my_script.py; then
msg="$(date) - script exceeded timeout and was killed"
else
msg="$(date) - script ran successfully"
fi
echo "${msg}" >> /home/user/my_script.log
put the script in a file, say at /home/user/wrapper_script.sh and run chmod 755 /home/user/wrapper_script.sh to make it executable.
schedule the script to run every 10 minutes using cron. At a shell, use crontab -e to edit the users crontab file and add a new line like this:
*/10 * * * * /home/user/wrapper_script.sh
now, every 10 minutes the wrapper will start automatically, and it will kick off the python script. It will wait 2 minutes for the script to stop normally, after which it will reach the timeout and terminate it.
Note: depending on how your python program is written, you might have to use some other options to the timeout command to specify which signal it should use to stop the program. If it is a very basic python script, it should be fine using the default.
Edit: based on the comments, you might be able to just change the command you're using to this:
xterm -T "HMS" -geometry 100x70+10+35 -hold -e sudo timeout 120s /usr/bin/python3 /home/pi/h$
Doing that won't actually schedule the script to run at any fixed interval, so it assumes you already have something in place to handle that. All this will do is make srue that the script is restricted to 120 seconds of run time before it is killed.
Is there a good way to automatically restart an instance if it reaches the end of a start up script?
I have a Python script that I want to run continuously on Compute Engine which checks the pub/sub from a GAE instance that's running a CRON job. I haven't figured out a good way to catch every possible error and there are many edge cases that are hard to test (e.g. the instance running out of memory). It would be better if I could just restart the instance every time the script finishes (because it should never finish). The autorestart option won't work because the instance doesn't shutdown, it just stops running the script.
A simple shutdown -r now may be enough.
Or if you prefer gcloud:
gcloud compute instances reset $(hostname)
Mind that reset is a real reset, without a proper OS shutdown.
You might also need to check this documentation before performing 'Resetting or Restarting operation in an instance'
My python script needs to be killed every hour and after I need to restarted it. I need this to do because it's possible sometimes (I create screenshots) a browser window is hanging because of a user login popup or something.. Anyway. I created 2 files 'reload.py' and 'screenshot.py'. I run reload.py by cronjob.
I thought something like this would work
# kill process if still running
try :
os.system("killall -9 screenshotTaker");
except :
print 'nothing to kill'
# reload or start process
os.execl("/path/to/script/screenshots.py", "screenshotTaker")
The problem is, and what I read aswel the second argument of execl (the given process name) doesn't work? How can I set a process name for it to make the kill do it's work?
Thanks in advance!
The first argument to os.execl is the path to the executable. The remaining arguments are passed to that executable as if their where typed on the command-line.
If you want "screenshotTaker" become the name of the process, that is "screenshots.py" responsibility to do so. Do you do something special in that sense in that script?
BTW, a more common approach is to keep track (in /var/run/ usually) of the PID of the running program. And kill it by PID. This could be done with Python (using os.kill) At system-level, some distribution have helpers for that exact purpose. For example, on Debian there is start-stop-daemon. Here is a excerpt of the man:
start-stop-daemon(8) dpkg utilities start-stop-daemon(8)
NAME
start-stop-daemon - start and stop system daemon programs
SYNOPSIS
start-stop-daemon [options] command
DESCRIPTION
start-stop-daemon is used to control the creation and termination of
system-level processes. Using one of the matching options,
start-stop-daemon can be configured to find existing instances of a
running process.
I need to created a daemon in python. I did search and found a good piece of code. The daemon should be started automatically after system boots and it should be started if it was unexpectedly closed. I went through chapter about daemons in Advanced programming in the Unix environment and have two questions.
To run script automatically after the boot I need put my daemon script to /etc/init.d. Is that correct?
What should I do to respawn the daemon? According to the book I need add a respawn entry into /etc/inittab, but I don't have /etc/inittab on my system. Should I create it by myself?
I suggest you look into upstart if you're on Ubuntu. It's way better than inittab but does involve some learning curve to be honest.
Edit (by Blair): here is an adapted example of an upstart script I wrote for one of my own programs recently. A basic upstart script like this is fairly readable/understandable, though (like many such things) they can get complicated when you start doing fancy stuff.
description "mydaemon - my cool daemon"
# Start and stop conditions. Runlevels 2-5 are the
# multi-user (i.e, networked) levels. This means
# start the daemon when the system is booted into
# one of these runlevels and stop when it is moved
# out of them (e.g., when shut down).
start on runlevel [2345]
stop on runlevel [!2345]
# Allow the service to respawn automatically, but if
# crashes happen too often (10 times in 5 seconds)
# theres a real problem and we should stop trying.
respawn
respawn limit 10 5
# The program is going to daemonise (double-fork), and
# upstart needs to know this so it can track the change
# in PID.
expect daemon
# Set the mode the process should create files in.
umask 022
# Make sure the log folder exists.
pre-start script
mkdir -p -m0755 /var/log/mydaemon
end script
# Command to run it.
exec /usr/bin/python /path/to/mydaemon.py --logfile /var/log/mydaemon/mydaemon.log
To create a daemon, use double fork() as shown in the code you found.
Then you need to write an init script for your daemon and copy it into /etc/init.d/.
http://www.novell.com/coolsolutions/feature/15380.html
There are many ways to specify how the daemon will be auto-started, e.g., chkconfig.
http://linuxcommand.org/man_pages/chkconfig8.html
Or you can manually create the symlinks for certain runlevels.
Finally you need to restart the service when it unexpectedly exits. You may include a respawn entry for the serivce in /etc/inittab.
http://linux.about.com/od/commands/l/blcmdl5_inittab.htm
I am working on a django web application.
A function 'xyx' (it updates a variable) needs to be called every 2 minutes.
I want one http request should start the daemon and keep calling xyz (every 2 minutes) until I send another http request to stop it.
Appreciate your ideas.
Thanks
Vishal Rana
There are a number of ways to achieve this. Assuming the correct server resources I would write a python script that calls function xyz "outside" of your django directory (although importing the necessary stuff) that only runs if /var/run/django-stuff/my-daemon.run exists. Get cron to run this every two minutes.
Then, for your django functions, your start function creates the above mentioned file if it doesn't already exist and the stop function destroys it.
As I say, there are other ways to achieve this. You could have a python script on loop waiting for approx 2 minutes... etc. In either case, you're up against the fact that two python scripts run on two different invocations of cpython (no idea if this is the case with mod_wsgi) cannot communicate with each other and as such IPC between python scripts is not simple, so you need to use some sort of formal IPC (like semaphores, files etc) rather than just common variables (which won't work).
Probably a little hacked but you could try this:
Set up a crontab entry that runs a script every two minutes. This script will check for some sort of flag (file existence, contents of a file, etc.) on the disk to decide whether to run a given python module. The problem with this is it could take up to 1:59 to run the function the first time after it is started.
I think if you started a daemon in the view function it would keep the httpd worker process alive as well as the connection unless you figure out how to send a connection close without terminating the django view function. This could be very bad if you want to be able to do this in parallel for different users. Also to kill the function this way, you would have to somehow know which python and/or httpd process you want to kill later so you don't kill all of them.
The real way to do it would be to code an actual daemon in w/e language and just make a system call to "/etc/init.d/daemon_name start" and "... stop" in the django views. For this, you need to make sure your web server user has permission to execute the daemon.
If the easy solutions (loop in a script, crontab signaled by a temp file) are too fragile for your intended usage, you could use Twisted facilities for process handling and scheduling and networking. Your Django app (using a Twisted client) would simply communicate via TCP (locally) with the Twisted server.