proper way to stop a daemon process

proper way to stop a daemon process - python

I have a Jython script that I run as a daemon. It starts up, logs into a server and then goes into a loop that checks for things to process, processes them, then sleeps for 5 seconds.
I have a cron job that checks every 5 minutes to make sure that the process is running and starts it again if not.
I have another cron job that once a day restarts the process no matter what. We do this because sometimes the daemon's connection to the server sometimes gets screwed up and there is no way to tell when this happens.
The problem I have with this "solution" is the 2nd cron job that kills the process and starts another one. Its okay if it gets killed while it is sleeping but bad things might happen if the daemon is in the middle of processing things when it is killed.
What is the proper way to stop a daemon process... instead of just killing it?
Is there a standard practice for this in general, in Python, or in Java?
In the future I may move to pure Python instead of Jython.
Thanks

You can send a SIGTERM first before sending SIGKILL when terminating the process and receive the signal by the Jython script.
For example, send a SIGTERM, which can be received and processed by your script and if nothing happens within a specified time period, you can send SIGKILL and force kill the process.
For more information on handling the events, please see the signal module documentation.
Also, example that may be handy (uses atexit hook):
#!/usr/bin/env python
from signal import signal, SIGTERM
from sys import exit
import atexit
def cleanup():
print "Cleanup"
if __name__ == "__main__":
from time import sleep
atexit.register(cleanup)
# Normal exit when killed
signal(SIGTERM, lambda signum, stack_frame: exit(1))
sleep(10)
Taken from here.

The normal Linux type way to do this would be to send a signal to your long-running process that's hanging. You can handle this with Python's built in signal library.
http://docs.python.org/library/signal.html
So, you can send a SIGHUP to your 1st app from your 2nd app, and handle it in the first based on whether you're in a state where it's OK to reboot.

Related

Can somebody explain how gunicorn handles SIGTERM or SIGINT signals

I run application via gunicorn with single worker.
So, I have main process (let say, wth id=12). Also I can create another process from main using multiprocessing (id = 20, whatever).
Also I want to provide possibility to interrupt child process (with id=20). To achieve this, from main process call os.kill(20, signal.SIGTERM). Also, custom handler for SIGTERM signal provided.
The problem is, that if application catches the signal, process (id=20) stopped as I expected, but also main process stopped after, too. And worker restarts.
The docs (https://docs.gunicorn.org/en/latest/signals.html#master-process) pretty poor or maybe I something missed.
I've playing around for a while and realized that I can stop child process via SIGUSR1 or SIGUSR2 without worker restarting (of course, if my custom method provided for this signals).
But I'm not sure, is it correct way to stop process. Or maybe I can do same with SIGTERM, somehow.
Thanks.

What is the best way to debug a python multiprocess script which fails to terminate?

I am writing a python script which uses multiprocessing, multithreading and zeromq for interprocess communication. It all works fine until the program finishes: at that time the child processes terminate properly (sigwait is intercepted and the child procs terminate which I have confirmed with the ps command) but the main process often does not shut down - occasionally it does, but most of the time it does not. I have confirmed that all remaining threads of the main process are daemonic and that the last row of the script is executed properly (it is a logging.info call). I am using fork for forking processes and can see that a Forkprocess still runs in addition to the main process.
What is the best way to debug this, considering that the script has actually finished ? Maybe add a pdb or breakpoint() right at the end ?
Thanks in advance.
Here is the output, after the last row the script usually does not terminate:
INFO root::remaining active child processes: [<ForkProcess name='SyncManager-1' pid=6362 parent=6361 started>]
INFO root::non-daemonic threads which are still running, preventing orderly shutdown: [].
INFO root::======== PID: 6361 main() end: shut down completed.=========
EDIT:
I refactored the code and noticed that it now misbehaves very rarely. I am 99.9% certain that it is due to an open zeromq REQ/REP 'socket' at the time of shutdown. The refactoring made sure that these sockets are only held open only for a very short time - but it is not predictable what sockets are open at shutdown so occasionally it still hangs.
I will write a simple testharness with two processes communicating via REQ/REP sockets then shut down the child process followed by main process. I expect same result, i.e., interpreter not shutting down. Lets see, keep you posted.

I think you could try viztracer. The good thing about viztracer is that it can display all the processes on the same timeline. Maybe you can catch what's stopping your main process/forked process from shutting down. If it's a deadlock it should be noticeable. However, without the code, I really can't tell if it would help for sure.

terminate Python multithreaded program with log output

Issues
I currently have a simple Python multithreaded server program, which will run forever with out manual interruption. I want to achieve that it can be terminated gracefully at some point. Once it is terminated, I want the server to output some stats.
Solutions I have tried
Terminate the program by kill. The issue is that the server cannot output the stats because the HARD termination.
Create a control thread in the program, which listens the key input. And if key is pressed, then terminate the program and get stats. The issue with this approach is I need to do every step manually. E.g, SSH to the device, start the program, and press key at some point.
Question
Is there a way that I can run some bash/or other program to stop the program gracefully with stats output?

Have you tried to use signal.signal() to register a handler for e.g. SIGTERM? There you could implement this part of code that throws out the statistics and then just terminate the program.

The standard approach is to either
make threads sufficiently short-lived
at the stop signal, stop spawning new ones and .join() the active ones.
or
make threads periodically (e.g. after serving each request) check some shared stop flag and quit when it's set
at the stop signal, set the stop flag, then .join() the threads
Some threads can be .setDaemon(True), but only if they can be safely killed off (there's no exception or anything raised in the thread, it's just stopped where it is).
If a thread is in a blocking call, it may be possible to unblock it by shutting down the facility that it is waiting on (close the socket or the stream).

python function not running as thread

this is done in python 2.7.12
serialHelper is a class module arround python serial and this code does work nicely
#!/usr/bin/env python
import threading
from time import sleep
import serialHelper
sh = serialHelper.SerialHelper()
def serialGetter():
h = 0
while True:
h = h + 1
s_resp = sh.getResponse()
print ('response ' + s_resp)
sleep(3)
if __name__ == '__main__':
try:
t = threading.Thread(target=sh.serialReader)
t.setDaemon(True)
t.start()
serialGetter()
#tSR = threading.Thread(target=serialGetter)
#tSR.setDaemon(True)
#tSR.start()
except Exception as e:
print (e)
however the attemp to run serialGetter as thread as remarked it just dies.
Any reason why that function can not run as thread ?

Quoting from the Python documentation:
The entire Python program exits when no alive non-daemon threads are left.
So if you setDaemon(True) every new thread and then exit the main thread (by falling off the end of the script), the whole program will exit immediately. This kills all of the threads. Either don't use setDaemon(True), or don't exit the main thread without first calling join() on all of the threads you want to wait for.
Stepping back for a moment, it may help to think about the intended use case of a daemon thread. In Unix, a daemon is a process that runs in the background and (typically) serves requests or performs operations, either on behalf of remote clients over the network or local processes. The same basic idea applies to daemon threads:
You launch the daemon thread with some kind of work queue.
When you need some work done on the thread, you hand it a work object.
When you want the result of that work, you use an event or a future to wait for it to complete.
After requesting some work, you always eventually wait for it to complete, or perhaps cancel it (if your worker protocol supports cancellation).
You don't have to clean up the daemon thread at program termination. It just quietly goes away when there are no other threads left.
The problem is step (4). If you forget about some work object, and exit the app without waiting for it to complete, the work may get interrupted. Daemon threads don't gracefully shut down, so you could leave the outside world in an inconsistent state (e.g. an incomplete database transaction, a file that never got closed, etc.). It's often better to use a regular thread, and replace step (5) with an explicit "Finish up your work and shut down" work object that the main thread hands to the worker thread before exiting. The worker thread then recognizes this object, stops waiting on the work queue, and terminates itself once it's no longer doing anything else. This is slightly more up-front work, but is much safer in the event that a work object is inadvertently abandoned.
Because of all of the above, I recommend not using daemon threads unless you have a strong reason for them.

Twisted program and TERM signal

I have a simple example:
from twisted.internet import utils,reactor
def test:
utils.getProcessOutput(executable="/bin/sleep",args=["10000"])
reactor.callWhenRunning(test)
reactor.run()
when I send signal "TERM" to program, "sleep" continues to be carried out, when I press Ctrl-C on keyboard "sleep" stopping. ( Ctrl-C is not equivalent signal TERM ?) Why ? How to kill "sleep" after send signal "TERM" to this program ?

Ctrl-C sends SIGINT to the entire foreground process group. That means it gets send to your Twisted program and to the sleep child process.
If you want to kill the sleep process whenever the Python process is going to exit, then you may want a before shutdown trigger:
def killSleep():
# Do it, somehow
reactor.addSystemEventTrigger('before', 'shutdown', killSleep)
As your example code is written, killSleep is difficult to implement. getProcessOutput doesn't give you something that easily allows the child to be killed (for example, you don't know its pid). If you use reactor.spawnProcess and a custom ProcessProtocol, this problem is solved though - the ProcessProtocol will be connected to a process transport which has a signalProcess method which you can use to send a SIGTERM (or whatever you like) to the child process.
You could also ignore SIGINT and this point and then manually deliver it to the whole process group:
import os, signal
def killGroup():
signal.signal(signal.SIGINT, signal.SIG_IGN)
os.kill(-os.getpgid(os.getpid()), signal.SIGINT)
reactor.addSystemEventTrigger('before', 'shutdown', killGroup)
Ignore SIGINT because the Twisted process is already shutting down and another signal won't do any good (and will probably confuse it or at least lead to spurious errors being reported). Sending a signal to -os.getpgid(os.getpid()) is how to send it to your entire process group.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.