How to auto-restart a python script on fail?

How to auto-restart a python script on fail? - python

This post describes how to keep a child process alive in a BASH script:
How do I write a bash script to restart a process if it dies?
This worked great for calling another BASH script.
However, I tried executing something similar where the child process is a Python script, daemon.py which creates a forked child process which runs in the background:
#!/bin/bash
PYTHON=/usr/bin/python2.6
function myprocess {
$PYTHON daemon.py start
}
NOW=$(date +"%b-%d-%y")
until myprocess; do
echo "$NOW Prog crashed. Restarting..." >> error.txt
sleep 1
done
Now the behaviour is completely different. It seems the python script is no longer a child of of the bash script but seems to have 'taken over' the BASH scripts PID - so there is no longer a BASH wrapper round the called script...why?

A daemon process double-forks, as the key point of daemonizing itself -- so the PID that the parent-process has is of no value (it's gone away very soon after the child process started).
Therefore, a daemon process should write its PID to a file in a "well-known location" where by convention the parent process knows where to read it from; with this (traditional) approach, the parent process, if it wants to act as a restarting watchdog, can simply read the daemon process's PID from the well-known location and periodically check if the daemon is still alive, and restart it when needed.
It takes some care in execution, of course (a "stale" PID will stay in the "well known location" file for a while and the parent must take that into account), and there are possible variants (the daemon could emit a "heartbeat" so that the parent can detect not just dead daemons, but also ones that are "stuck forever", e.g. due to a deadlock, since they stop giving their "heartbeat" [[via UDP broadcast or the like]] -- etc etc), but that's the general idea.

You should look at the Python Enhancement Proposal 3143 (PEP) here. In it Ben suggests including a daemon library in the python standard lib. He goes over LOTS of very good information about daemons and is a pretty easy read. The reference implementation is here.

It seems that the behavior is completely different because here your "daemon.py" is launched in background as a daemon.
In the other link you pointed to the process that is surveyed is not a daemon, it does not start in the background. The launcher simply wait forever that the child process stop.
There is several ways to overcome this. The classical one is the way #Alex explain, using some pid file in conventional places.
Another way could be to build the watchdog inside your running daemon and daemonize the watchdog... this would simulate a correct process that do not break at random (something that shouldn't occur)...

Make use of 'https://github.com/ut0mt8/simple-ha' .
simple-ha
Tired of keepalived, corosync, pacemaker, heartbeat or whatever ? Here a simple daemon wich ensure a Heartbeat between two hosts. One is active, and the other is backup, launching script when changing state. Simple implementation, KISS. Production ready (at least it works for me :)
Life will be too easy !

Related

When a parent NodeJS process exits, causing child processes to exit, how is that communicated?

index.js
const childProcess = require("child_process");
childProcess.spawn("python", ["main.py"]);
main.py
import time
while True:
time.sleep(1)
When running the NodeJS process with node index.js, it runs forever since the Python child process it spawns runs forever.
When the NodeJS process is ended by x-ing out of the Command Prompt, the Python process ends as well, which is desired, but how can you run some cleanup code in the Python code before it exits?
Previous attempts
Look in documentation for child_process.spawn for how this termination is communicated from parent to child, perhaps by a signal. Didn't find it.
In Python use signal.signal(signal.SIGTERM, handler) (as well as with signal.SIGINT). Didn't get handler to run (though, ending the NodeJS process with Ctrl-C instead of closing the window did get the SIGINT handler to run even though I'm not explicitly forwarding input from the NodeJS process to the Python child process).
Lastly, though this reproducible example is also valid and much more simple, my real-life use case involves Electron, so in case that introduces a complication, or a solution, figured I'd mention it.
Windows 10, NodeJS 12.7.0, Python 3.8.3

How to avoid python-daemon limitations?

I have a python script which attempts to communicate with a python daemon. When the original script is invoked, it checks to see if the daemon exists. If the daemon exists, the original script writes to a named pipe to communicate with the daemon. If the daemon does not exists, the original script attempts to create a daemon using DaemonContext and then writes to the named pipe.
Pseudo-code of the original script:
from daemon import DaemonContext
if daemon_exists():
pass
else:
with DaemonContext():
create_daemon()
communicate_with_daemon()
The problem is that when the daemon is created, the parent process is killed (i.e. communicate_with_daemon will never be executed). This prevents the original script from creating a daemon and communicating with it.
According to this answer, this problem is a limitation of the python-daemon library. How would I get around this?
Thanks.

You're describing not a limitation, but a definition of how a daemon process works.
[…] the parent process is killed (i.e. communicate_with_daemon will never be executed).
Yes, that's right; the daemon process detaches from what started it. That's what makes the process a daemon.
However, this statement is not true:
This prevents the original script from creating a daemon and communicating with it.
There are numerous other ways to communicate between processes. The general name for this is Inter-Process Communication. The solutions are many, and which you choose depends on the constraints of your application.
For example, you could open a socket at a known path and preserve that open file; you could open a network port and communicate through the loopback interface; you could make a "drop-box" communication at a file on the local filesystem store, a database, or otherwise; etc.

using python multiprocessing to control independent background workers after the spawning process has been closed

i see a lot of examples of how to use multiprocessing but they all talk about spawning workers and controlling them while the main process is alive. my question is how to control background workers in the following way:
start 5 worker from command line:
manager.py --start 5
after that, i will be able to list and stop workers on demand from command line:
manager.py --start 1 #will add 1 more worker
manager.py --list
manager.py --stop 2
manager.py --sendmessagetoall "hello"
manager.py --stopall
the important point is that manager.py should exit after every run. what i don't understand is how to get a list of already running workers from an newly created manager.py program and communicate with them.
edit: Bilkokuya suggested that i will have (1)a manager process that manage a list of workers... and will also listen to incoming commands. and (2) a small command line tool that will send messages to the first manager process... actually it sounds like a good solution. but still, the question remains the same - how do i communicate with another process on a newly created command line program (process 2)? all the examples i see (of Queue for example) works only when both processes are running all the time

The most portable solution I can suggest (although this will still involve further research for you), is to have a long-running process that manages the "background worker" processes. This shouldn't ever be killed off, as it handles the logic for piping messages to each sub process.
Manager.py can then implement logic to create communication to that long-running process (whether that's via pipes, sockets, HTTP or any other method you like). So manager.py effectively just passes on a message to the 'server' process "hey please stop all the child processes" or "please send a message to process 10" etc.
There is a lot of work involved in this, and a lot to research. But the main thing you'll want to look up is how to handle IPC (Inter-Process Communication). This will allow your Manager.py script to interact with an existing/long-running process that can better manage each background worker.
The alternative is to rely fully on your operating system's process management APIs. But I'd suggest from experience that this is a much more error prone and troublesome solution.

What does this daemonize method do?

I was looking around on GitHub, when I stumbled across this method called daemonize() in a reverse shell example. source
What I don't quite understand is what it does in this context, wouldn't running this code from the command line as such: python example.py & not achieve the same thing?
Deamonize method source:
def daemonize():
pid = os.fork()
if pid > 0:
sys.exit(0) # Exit first parent
pid = os.fork()
if pid > 0:
sys.exit(0) # Exit second parent

A background process - running python2.7 <file>.py with the & signal - is not the same thing as a true daemon process.
A true daemon process:
Runs in the background. This also happens if you use &, and is where the similarity ends.
Is not in the same process group as the terminal. When the terminal closes, the daemon will not die either. This does not happen with & - the process remains the same, it is simply moved to the background.
Properly closes all inherited file descriptors (including input, output, etc.) so that nothing ties it back to the parent. Again, this does not happen with & - it will still write to the terminal.
Should only ideally be killed by SIGKILL, not SIGHUP. Running with & allows your process to be killed by SIGHUP.
All of this, however, is pedantry. Few tasks really require you to go to the extreme that these properties require - a background task spawned in a new terminal using screen can usually do the same job, though less efficiently, and you may as well call that a daemon in that it is a long-running background task. The only real difference between that and a true daemon is that the latter simply tries to avoid all avenues of potential death.
The code you saw simply forks the current process. Essentially, it clones the current process, kills its parent and 'acts in the background' by simply being a separate process that does not block the current execution - a bit of an ugly hack, if you ask me, but it works.

Have a look at Orphan Processes and Daemon Process. A process without a parent becomes a child of init (pid 1).
When it comes time to shut down a group of processes, say all the children of a bash instance, the OS will go about giving a sighup to the children of that bash. An orphan, forced as in this case, or other due to some accident, won't get that treatment and will stay around longer.

How to correctly handle autorun start & stop on linux with python

I have two scripts: "autorun.py" and "main.py". I added "autorun.py" as service to the autorun in my linux system. works perfectly!
Now my question is: When I want to launch "main.py" from my autorun script, and "main.py" will run forever, "autorun.py" never terminates as well! So when I do
sudo service autorun-test start
the command also never finishes!
How can I run "main.py" and then exit, and to finish it up, how can I then stop "main.py" when "autorun.py" is launched with the parameter "stop" ? (this is how all other services work I think)
EDIT:
Solution:
if sys.argv[1] == "start":
print "Starting..."
with daemon.DaemonContext(working_directory="/home/pi/python"):
execfile("main.py")
else:
pid = int(open("/home/pi/python/main.pid").read())
try:
os.kill(pid, 9)
print "Stopped!"
except:
print "No process with PID "+str(pid)

First, if you're trying to create a system daemon, you almost certainly want to follow PEP 3143, and you almost certainly want to use the daemon module to do that for you.
When I want to launch "main.py" from my autorun script, and "main.py" will run forever, "autorun.py" never terminates as well!
You didn't say how you're running it. If you're doing anything that launches main.py as a child and waits (or, worse, tries to import/execfile/etc. in the same process), you can't do that. Either autorun.py has to launch and detach main.py (or do so indirectly via some external tool), or main.py has to daemonize when launched.
how can I then stop "main.py" when "autorun.py" is launched with the parameter "stop" ?
You need some form of inter-process communication (IPC), and some way for autorun to find the right IPC channel to use.
If you're building a network server, the right answer might be to connect to it as a client. But otherwise, the simplest thing to do is kill the process with a signal.
If you're using the daemon module, it can easily map signals to callbacks. Or, if you don't need any cleanup, just use SIGTERM, which by default will abruptly terminate. If neither of those applies, you will have to set up a custom signal handler (and within that handler do something useful—e.g., set a flag that your main code checks periodically).
How do you know what process to send the signal to? The standard way to do this is to have main.py record its PID in a pidfile at startup. You read that pidfile, and signal whatever process is specified there. (If you get an error because there is no process with that PID, that just means the daemon already quit for some reason—possibly because of an unhandled exception, or even a segfault. You may want to log that, but treat the "stop" as successful otherwise.) Again, if you're using daemon, it does the pidfile stuff for you; if not, you have to do it yourself.
You may want to take a look at the service scripts for daemons that came with your computer. They're probably all written in bash rather than Python, but it's not that hard to figure out what they're doing. Or… just use one of them as a skeleton, in which case you don't really need any bash knowledge; it's just search-and-replace on the name.
If your distro has LSB-style init functions, you can use something like this example. That one does a whole lot more than you need to, but it's a good example of all of the details. Or do it all from scratch with something like this example. This one is doing the pidfile management and the backgrounding from the service script (turning a non-daemon program into a daemon), which you don't need if you're using daemon properly, and it's using SIGHUP instead of SIGTERM. You can google yourself for other examples of init.d service scripts.
But again, if you're just trying to do this for your own system, the best thing to do is look inside the /etc/init.d on your distro. There will be dozens of examples there, and 90% of them will be exactly the same except for the name of the daemon.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.