When is a Python program not responding to interrupts? - python

I have a Python3 daemon process running on Linux. It is a normal single thread process running in the background and doing select.select() in the main loop and then handling I/O. Sometimes (approx. 1 or 2 times in a month) it stops responding. When it happens, I'd like to debug the problem.
I have tried the pyrasite, but was not succesfull, because stdin/stdout of the daemon is redirected to the /dev/null device and pyrasite uses this stdin/stdout, not the console where it was started from.
So I have added a SIGUSR1 signal handler which logs the stack trace. Works fine normally.
Today I got a freeze. ps shows, the daemon is in "S" (interruptible sleep) state. A busy loop is ruled out.
The server does not respond to SIGUSR or SIGINT (used for shutdown).
I'd like to have at least some hint what is going on there.
Under what conditions is a sleeping Python3 Linux process not handling interrupts that it is supposed to handle?
UPDATE:
I could reproduce the issue finally. After adding a lot of debug messages, I have found a race condition that I'm going to fix soon.
When the daemon is not responding, it is sleeping at os.read(p) where p is a read-end of a new pipe (see: os.pipe) where nobody writes to.
However, all my attempts to write a simple demonstration program failed. When I tried to read from an empty pipe, the program blocked as expected, but could be interrupted (killed from other terminal with SIGINT) as usual. The mystery remains unsolved.
UPDATE2:
Finally some code! I have deliberately chosen low level system calls.
import os
import time
import signal
import sys
def sighandler(*unused):
print("got signal", file=sys.stderr)
print("==========")
signal.signal(signal.SIGUSR1, sighandler)
pid = os.getpid()
rfd, wfd = os.pipe()
if os.fork():
os.close(wfd)
print("parent: read() start")
os.read(rfd, 4096)
print("parent: read() stop")
else:
os.close(rfd)
os.kill(pid, signal.SIGUSR1)
print("child: wait start")
time.sleep(3)
print("child: wait end")
If you run this many time, you'll get this:
parent: read() start
got signal
child: wait start
child: wait end
parent: read() stop
which is fine, but sometimes you'll see this:
parent: read() start
child: wait start
child: wait end
got signal
parent: read() stop
What is happening here:
parent starts a read from the pipe
child sends a signal to the parent. The parent must have received this signal, but it seems to be "somehow postponed"
child waits
child exits, pipe is closed automatically
parent's read operation ends with an EOF
the signal is handled now
Now, due to a bug in my program, the signal was received in step 2, but the EOF was not delivered, so the read did not finish and step 6 (signal handling) was never reached.
That is all information I am able to provide.

Related

python function not running as thread

this is done in python 2.7.12
serialHelper is a class module arround python serial and this code does work nicely
#!/usr/bin/env python
import threading
from time import sleep
import serialHelper
sh = serialHelper.SerialHelper()
def serialGetter():
h = 0
while True:
h = h + 1
s_resp = sh.getResponse()
print ('response ' + s_resp)
sleep(3)
if __name__ == '__main__':
try:
t = threading.Thread(target=sh.serialReader)
t.setDaemon(True)
t.start()
serialGetter()
#tSR = threading.Thread(target=serialGetter)
#tSR.setDaemon(True)
#tSR.start()
except Exception as e:
print (e)
however the attemp to run serialGetter as thread as remarked it just dies.
Any reason why that function can not run as thread ?
Quoting from the Python documentation:
The entire Python program exits when no alive non-daemon threads are left.
So if you setDaemon(True) every new thread and then exit the main thread (by falling off the end of the script), the whole program will exit immediately. This kills all of the threads. Either don't use setDaemon(True), or don't exit the main thread without first calling join() on all of the threads you want to wait for.
Stepping back for a moment, it may help to think about the intended use case of a daemon thread. In Unix, a daemon is a process that runs in the background and (typically) serves requests or performs operations, either on behalf of remote clients over the network or local processes. The same basic idea applies to daemon threads:
You launch the daemon thread with some kind of work queue.
When you need some work done on the thread, you hand it a work object.
When you want the result of that work, you use an event or a future to wait for it to complete.
After requesting some work, you always eventually wait for it to complete, or perhaps cancel it (if your worker protocol supports cancellation).
You don't have to clean up the daemon thread at program termination. It just quietly goes away when there are no other threads left.
The problem is step (4). If you forget about some work object, and exit the app without waiting for it to complete, the work may get interrupted. Daemon threads don't gracefully shut down, so you could leave the outside world in an inconsistent state (e.g. an incomplete database transaction, a file that never got closed, etc.). It's often better to use a regular thread, and replace step (5) with an explicit "Finish up your work and shut down" work object that the main thread hands to the worker thread before exiting. The worker thread then recognizes this object, stops waiting on the work queue, and terminates itself once it's no longer doing anything else. This is slightly more up-front work, but is much safer in the event that a work object is inadvertently abandoned.
Because of all of the above, I recommend not using daemon threads unless you have a strong reason for them.

How are different signals handled in python

I am using python 2.7 version on ubuntu. I am curious regarding how different signals are handled in python program during its execution. Is there any priority based selection.? For eg: If there are two different signals generated at the same time, which one will be served first? In my program given below it waits for the user to press Ctrl-C key, if done so it will display "Process can't be killed with ctrl-c key!". Along with this it keep generating an SIGALRM signal every second and keeps generating "Got an alarm" message in output every second.
#!/usr/bin/env python
import signal
import time
def ctrlc_catcher(signum, frm):
print "Process can't be killed with ctrl-c!"
def alarm_catcher(signum,frame):
print "Got an alarm"
signal.signal(signal.SIGINT, ctrlc_catcher)
signal.signal(signal.SIGALRM, alarm_catcher)
while True:
signal.alarm(1)
time.sleep(1)
pass
Now when I execute the programme it produces following output indefinitely:
Got an alarm
Got an alarm
Got an alarm
Got an alarm
If during the execution I hit Ctrl-C key once then the output is interrupted and as shown below:
Got an alarm
Got an alarm
Got an alarm
Got an alarm
Process can't be killed with ctrl-c
Got an alarm
Everything is working as programmed and as expected.
My question is if I press the ctrl-c key continuously then why the output is as given below:
Process can't be killed with ctrl-c
Process can't be killed with ctrl-c
Process can't be killed with ctrl-c
Why isn't the output regarding the triggering of alarm also shows up in the above output as the alarm is being triggered every second?
Is the alarm signal (signal.ALARM) being ignored because of signal.SIGNIT? Or the continuous pressing of Ctrl-C key is pausing something?
Thanks
The behavior you see is due to the interaction of two factors:
(1) When you call signal.alarm, you clear any previous alarms; after the call, only the most recently requested alarm is scheduled.
(2) A caught signal terminates time.sleep and causes the sleep to be cut short; it does not resume after the signal handler returns.
Now, when you send SIGINT to your process, it usually arrives during the sleep, which it interrupts, and so after your handler ctlc_catcher returns the while loop immediately continues to the next iteration, scheduling a new alarm for one second from that point and clearing any old alarms. In other words, if SIGINT arrives during an iteration of the loop, that iteration will almost never end up sleeping for a full second, and so the next iteration of the loop will execute and clear the already scheduled alarm before it has a chance to be delivered.
It follows from this that if you press cntl-C more frequently than once per second, you won't see "Got an alarm." at all.
If you want to guarantee that an alarm is delivered once per second despite any interrupts, you'll have to do some extra work to decide, on each loop iteration, whether you should schedule an alarm.
Perhaps something like this?
#!/usr/local/cpython-3.3/bin/python
import subprocess
p = subprocess.Popen("./app",
stdin = subprocess.PIPE,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE,
shell = True)
p.stdin.write(bytes("3\n", "ascii"))
p.stdin.write(bytes("4\n", "ascii"))
print(p.stdout.read())
exit_code = p.wait()
print(exit_code)

Pausing Python subprocesses from keyboard input without killing the subprocess

I'm working on a project to produce a shell in Python, and one important feature is the ability to pause and background a running subprocess. However the only methods I've found of pausing the subprocess appear to kill it instantly, so I can't resume it later.
Our group has tried excepting KeyboardInterrupt:
try:
process = subprocess.Popen(processName)
process.communicate()
except KeyboardInterrupt:
print "control character pressed"
and also using signals:
def signal_handler(signal,frame):
print 'control character pressed'
signal.signal(signal.SIGINT, signal_handler)
process.communicate()
Another issue is that both of these only work when Ctrl-C is pressed, nothing else has any effect (I imagine this is why the subprocesses are being killed).
The reason you have the process dying is because you are allowing the Ctrl+C to reach the subprocess. If you were to use the parameter preexec_fn = os.setpgrp, as part of the Popen call, then the the child is set to be in a different process group from the parent.
Ctrl+C sends a SIGINT to the complete process group, but since the child is in a different process group, it doesn't receive the SIGINT and thus doesn't die.
After that, the send_signal() function can be used to send a SIGSTOP to the child process whenever you want to pause it, and a SIGCONT to resume it.

proper way to stop a daemon process

I have a Jython script that I run as a daemon. It starts up, logs into a server and then goes into a loop that checks for things to process, processes them, then sleeps for 5 seconds.
I have a cron job that checks every 5 minutes to make sure that the process is running and starts it again if not.
I have another cron job that once a day restarts the process no matter what. We do this because sometimes the daemon's connection to the server sometimes gets screwed up and there is no way to tell when this happens.
The problem I have with this "solution" is the 2nd cron job that kills the process and starts another one. Its okay if it gets killed while it is sleeping but bad things might happen if the daemon is in the middle of processing things when it is killed.
What is the proper way to stop a daemon process... instead of just killing it?
Is there a standard practice for this in general, in Python, or in Java?
In the future I may move to pure Python instead of Jython.
Thanks
You can send a SIGTERM first before sending SIGKILL when terminating the process and receive the signal by the Jython script.
For example, send a SIGTERM, which can be received and processed by your script and if nothing happens within a specified time period, you can send SIGKILL and force kill the process.
For more information on handling the events, please see the signal module documentation.
Also, example that may be handy (uses atexit hook):
#!/usr/bin/env python
from signal import signal, SIGTERM
from sys import exit
import atexit
def cleanup():
print "Cleanup"
if __name__ == "__main__":
from time import sleep
atexit.register(cleanup)
# Normal exit when killed
signal(SIGTERM, lambda signum, stack_frame: exit(1))
sleep(10)
Taken from here.
The normal Linux type way to do this would be to send a signal to your long-running process that's hanging. You can handle this with Python's built in signal library.
http://docs.python.org/library/signal.html
So, you can send a SIGHUP to your 1st app from your 2nd app, and handle it in the first based on whether you're in a state where it's OK to reboot.

Twisted program and TERM signal

I have a simple example:
from twisted.internet import utils,reactor
def test:
utils.getProcessOutput(executable="/bin/sleep",args=["10000"])
reactor.callWhenRunning(test)
reactor.run()
when I send signal "TERM" to program, "sleep" continues to be carried out, when I press Ctrl-C on keyboard "sleep" stopping. ( Ctrl-C is not equivalent signal TERM ?) Why ? How to kill "sleep" after send signal "TERM" to this program ?
Ctrl-C sends SIGINT to the entire foreground process group. That means it gets send to your Twisted program and to the sleep child process.
If you want to kill the sleep process whenever the Python process is going to exit, then you may want a before shutdown trigger:
def killSleep():
# Do it, somehow
reactor.addSystemEventTrigger('before', 'shutdown', killSleep)
As your example code is written, killSleep is difficult to implement. getProcessOutput doesn't give you something that easily allows the child to be killed (for example, you don't know its pid). If you use reactor.spawnProcess and a custom ProcessProtocol, this problem is solved though - the ProcessProtocol will be connected to a process transport which has a signalProcess method which you can use to send a SIGTERM (or whatever you like) to the child process.
You could also ignore SIGINT and this point and then manually deliver it to the whole process group:
import os, signal
def killGroup():
signal.signal(signal.SIGINT, signal.SIG_IGN)
os.kill(-os.getpgid(os.getpid()), signal.SIGINT)
reactor.addSystemEventTrigger('before', 'shutdown', killGroup)
Ignore SIGINT because the Twisted process is already shutting down and another signal won't do any good (and will probably confuse it or at least lead to spurious errors being reported). Sending a signal to -os.getpgid(os.getpid()) is how to send it to your entire process group.

Categories

Resources