Problem: When executing a python script from the command line, it catches and handles SIGTERM signals as expected. However, if the script is called from by a bash script, and then bash script then sends the signal to the python script, it does not handle the SIGTERM signal as expected.
The python script in question is extremely simple: it waits for a SIGTERM and then waits for a few seconds before exiting.
#!/usr/bin/env python3
import sys
import signal
import time
# signal handler
def sigterm_handler(signal, frame):
time.sleep(5)
print("dying")
sys.exit()
# register the signal handler
signal.signal(signal.SIGTERM, sigterm_handler)
while True:
time.sleep(1)
If this is called directly and then the signal sent from the command line
i.e.
> ./sigterm_tester.py &
> kill -15 <PID>
the signal handling performs normally (it waits 5 seconds, posts "dying" to stdout, and exits)
However, if it is instead called from a bash script, it no longer seems to catch the SIGTERM and instead exits immediately.
This simple bash script executes the python script and then kills its child (the python script). However, the termination occurs immediately instead of after a 5 second delay, and there is no printing of "dying" to stdout (or to a file when I attempted stdout redirection).
#!/bin/bash
./sigterm_tester.py &
child=$(pgrep -P $$)
kill -15 $child
while true;
do
sleep 1
done
Some additional information: I have also tested this with sh as well as bash and the same behavior occurs. Additionally I have tested this and gotten the same behavior in a MacOS environment as well as a Linux environment. I also tested it with both python2 and python3.
My question is why is the behavior different seemingly dependent on how the program is called, and is there a way to ensure that the python program appropriately handles signals even when called from a bash script?
Summing #Matt Walck comments. In the bash script you were killing the python process right after invoking it, which might not had enough time to register on the sigterm signal. Adding a sleep command between the spawn and the kill command will back the theory up.
#!/bin/bash
./sigterm_tester.py &
child=$(pgrep -P $$)
#DEBUGONLY
sleep 2
kill -15 $child
while true;
do
sleep 1
done
Related
I have a process which for certain reasons, I must call with the following (please don't judge...)
process = subprocess.Popen("some_command &", shell=True, executable='/bin/bash')
some_command is supposed to terminate by itself when some external conditions are met.
How can I check when some_command has terminated?
process.poll()
always returns 0
A simple script to demonstrate my situation:
import subprocess
process = subprocess.Popen("sleep 5 &", shell=True, executable='/bin/bash')
while True:
print(process.poll())
some_command & tells bash to run some_command in the background. This means that your shell launches some_command, then promptly exits, severing the tie between the running some_command and your Python process (some_command's parent process no longer exists after all). poll() is accurately reporting that bash itself finished running, exiting with status 0; it has no idea what may or may not be happening with some_command; that's bash's problem (and bash didn't care either).
If you want to be able to poll to check if some_command is still running, don't background it via bash shell metacharacters; without &, bash will continue running until it finishes, so you'll have an indirect indication of when some_command finishes from the fact that bash itself is still running. It's still in the background (in the sense that it runs in parallel with your Python code; the Python process won't stall waiting on it or anything unless you explicitly wait or communicate with process):
process = subprocess.Popen("some_command", shell=True, executable='/bin/bash')
Of course, unless some_command is some bash builtin, bash is just getting in the way here; as noted subprocess.Popen always runs stuff in the background unless you explicitly ask for it to wait, so you didn't need bash's help to background anything:
process = subprocess.Popen(["some_command"])
would get similar behavior, and actually let you examine the return code from some_command directly, with no intermediary bash process involved.
When running Python from a Linux shell (same behavior observed in both bash and ksh), and generating a SIGINT with a Ctl-C keypress, I have discovered behavior that I am unable to understand, and which has frustrated me considerably.
When I press Ctl-C, the Python process appropriately terminates, but the shell continues to the next command on the line, as shown in the following console capture:
$ python -c "import time; time.sleep(100)"; echo END
^CTraceback (most recent call last):
File "<string>", line 1, in <module>
KeyboardInterrupt
END
In contrast, I had expected, and would like, that the shell processes the signal in such a way that execution does not continue to the next command on the line, as I see when I call the sleep function from a bash subshell instead of from Python.
For example, I would expect the above capture to appear more similar to the following:
$ bash -c "sleep 100"; echo END
^C
Python 2 and 3 are installed on my system, and while the above capture was generated running Python 2, both behave the same way.
My best explanation is that when I press Ctl-C while the Python process is running, the signal somehow goes directly to the Python process, whereas normally it is handled by the calling shell, then propagated to the subprocess. However, I have no idea why or how Python is causing this difference.
The examples above are trivial tests but the behavior is also observed in real-world uses. Installing custom signal handlers does not resolve the issue.
After considerable digging I found a few loosely related questions on Stack Overflow that eventually led me to an article describing the proper handling of SIGINT. (The most relevant section is How to be a proper program.)
From this information, I was able to solve the problem. Without it, I would have never have come close.
The solution is best illustrated by beginning with a Bash script that cannot be terminated by a keyboard interrupt, but which does hide the ugly stack trace from Python's KeyboardInterrupt exception.
A basic example might appear as follows:
#!/usr/bin/env bash
echo "Press Ctrl-C to stop... No sorry it won't work."
while true
do
python -c '
import time, signal
signal.signal(signal.SIGINT, signal.SIG_IGN)
time.sleep(100)
'
done
For the outer script to process the interrupt, the following change is required:
echo "Press Ctrl-C to stop..."
while true
do
python -c '
import time, signal, os
signal.signal(signal.SIGINT, signal.SIG_DFL)
time.sleep(100)
'
done
However, the solution makes it impossible to use a custom handler (for example, to perform cleanup). If doing so is required, then a more sophisticated approach is needed.
The required change is illustrated as follows:
#!/usr/bin/env bash
echo "Press [CTRL+C] to stop ..."
while true
do
python -c '
import time, sys, signal, os
def handle_int(signum, frame):
# Cleanup code here
signal.signal(signum, signal.SIG_DFL)
os.kill(os.getpid(), signum)
signal.signal(signal.SIGINT, handle_int)
time.sleep(100)
'
done
The reason appears to be that unless the inner process terminates through executing the default SIGINT handler provided by the system, the parent bash process does not realize that the child has terminated because of a keyboard interrupt, and does not itself terminate.
I have not fully understood all the ancillary issues quite yet, such as whether the parent process is not receiving the SIGINT from the system, or is receiving a signal, but ignoring it. I also have no idea what the default handler does or how the parent detects that it was called. If I am able to learn more, I will offer an update.
I must advance the question of whether the current behavior of Python should be considered a design flaw in Python. I have seen various manifestations of this issue over the years when calling Python from a shell script, but have not had the luxury of investigation until now. I have not found a single article through a web search, however, on the topic. If the issue does represent a flaw, it surprised me to observe that not many developers are affected.
The behavior of any program that gets a CTRL+C is up to that program. Usually the behavior is to exit, but some programs might just abort some internal procedure instead of stopping the whole program. It's even possible (though it may be considered bad manners) for a program to ignore the keystroke completely.
The behavior of the program is defined by the signal handlers it has set up. The C library provides default signal handlers (which do things like exit on SIGTERM and SIGINT), but a program can provide its own handlers that will run instead. Not all signals allow arbitrary responses. For instance, SIGSEGV (a seg-fault) requires the program to exit, though it can configure its signal handlers to make a core dump or not. SIGKILL can't be handled at all (the OS kernel takes care of it).
To customize signal handlers in Python, you'll want to use the signal module from the standard library. You can call signal.signal to set your own signal handler function for any of the signals defined by your system's C library. Typing CTRL+C is going to send SIGINT on any UNIX-based system, so that's probably what you'll want to handle if you want your own behavior.
Try something like this:
import signal
import sys
import time
def interrupt_handler(sig, frame):
sys.exit(1)
signal.signal(signal.SIGINT, interrupt_handler)
time.sleep(100)
If you run this script and interrupt it with CTRL+C, it should exit silently, just like your bash script does.
You could explicitly handle it on the bash side in a script file like this:
if python -c "import time; time.sleep(100)"; then
echo END
fi
or, more aggressively,
python -c "import time; time.sleep(100)"
[[ $? -ne 0 ]] && exit
echo END
$? is the return status code of the previous command. Where a status code of 0 means it exited fine, and anything else was an error. So, we use the short-circuit nature of && to succinctly exit if the previous command fails.
(See https://unix.stackexchange.com/questions/186826/parent-script-continues-when-child-exits-with-non-zero-exit-code for more info on that)
Note: this will exit the bash script for any kind of python failure, not just ctrl+c, e.g. IndexError, AssertionError, etc
Using python3/linux/bash:
gnr#localhost: cat my_script
#!/usr/bin/python3
import time, pexpect
p = pexpect.spawn('sleep 123')
p.sendintr()
time.sleep(1000)
This works fine when run as is (i.e. my_script starts a sleep 123 child process and then sends it a SIGINT which kills sleep 123). However, when I background my_script as a grandchild process, it no longer is able to kill the sleep 123 command:
gnr#localhost: (my_script &> /dev/null &)
Anyone know what's going on here/how to change my_script or pexpect to be able to still send SIGINT to it's child process?
I'm thinking this is has something to do with the backgrounding causing there to be no controlling terminal, and maybe I need to create a new pty?
Update: Never figured out how to create a pty (though ssh'ing into localhost with a -t option worked) - ended up doing an os.fork() to background a child process rather than the (my_script &> /dev/null &) which works because (I'm guessing) the controlling terminal is not immediately closed.
Are you sure the process isn't being killed? I would expect it to show <defunct> in the process list as the process that spawned is now sitting in a sleep and proper cleanup can't complete until sleep finishes. <defunct> processes have been killed, just their parents haven't done the cleanup.
If you can somehow modify your code so that the parent actually goes through the normal processing and shuts down the child (spawn) then it should work. Although clumsy this might work:
import time, pexpect, os
newpid = os.fork()
if newpid == 0:
# Child
p = pexpect.spawn('sleep 123')
p.sendintr()
else:
# parent
time.sleep(1000)
In this case we fork our own child who handles the spawn and does the kill. Since our child isn't blocking on its own sleep it exits gracefully which includes properly cleaning up the process it killed. In the mean time the main (parent) thread is waiting on a sleep
After your comment it occurred to me that although I was placing my script in the background at the bash prompt, I wasn't doing it the same as yours.
I was using
(expecttest.py > /dev/null 2>&1 &)
This redirects stdin and stdout to >/dev/null and puts the process in the background.
If I take your original code and rather than doing a sendintr and instead do a terminate using your invocation from the command shell it works. It seems that sleep 123 doesn't respond to what pexpect is doing in that case.
I wrote a signal handler that can restart the script it self by:
kill -10 $PID
and I register the handler in the beginning of the script.
signal.signal(signal.SIGUSR1, restart_handler)
My script mainly do below things:
download some source code.
unzip it and other stuff.
use os.system(bash -c 'make -j16 > log.txt')
When downloading source code, I use kill -10 to restart it,
it runs the handler quickly and normally as I expected.
However, when it start to make -j16, I use the same kill command,
but need to wait very long time to reach signal handler.
(looks like the signal is not handled immediately,
but if I use kill -9 $PID it can be killed immediately)
How to make my customized signal handler can act as quick as -9?
below picture it the pstree output when make -j16:
https://www.dropbox.com/s/rbfzn0p0f2p55xx/make.png
When using mpirun, is it possible to catch signals (for example, the SIGINT generated by ^C) in the code being run?
For example, I'm running a parallelized python code. I can except KeyboardInterrupt to catch those errors when running python blah.py by itself, but I can't when doing mpirun -np 1 python blah.py.
Does anyone have a suggestion? Even finding how to catch signals in a C or C++ compiled program would be a helpful start.
If I send a signal to the spawned Python processes, they can handle the signals properly; however, signals sent to the parent orterun process (i.e. from exceeding wall time on a cluster, or pressing control-C in a terminal) will kill everything immediately.
I think it is really implementation dependent.
In SLURM, I tried to use sbatch --signal USR1#30 to send SIGUSR1 (whose signum is 30,10 or 16) to the program launched by srun commands. And the process received signal SIGUSR1 = 10.
For platform MPI of IBM, according to https://www.ibm.com/support/knowledgecenter/en/SSF4ZA_9.1.4/pmpi_guide/signal_propagation.html
SIGINT, SIGUSR1, SIGUSR2 will be bypassed to processes.
In MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures.
ref: http://lists.mpich.org/pipermail/discuss/2014-October/003242.html>
Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes.
ref: http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect14>
For IntelMPI, according to https://software.intel.com/en-us/mpi-developer-reference-linux-hydra-environment-variables
I_MPI_JOB_SIGNAL_PROPAGATION and I_MPI_JOB_TIMEOUT_SIGNAL can be set to send signal.
Another thing worth notice: For many python scripts, they will invoke other library or codes through cython, and if the SIGUSR1 is caught by the sub-process, something unwanted might happen.
If you use mpirun --nw, then mpirun itself should terminate as soon as it's started the subprocesses, instead of waiting for their termination; if that's acceptable then I believe your processes would be able to catch their own signals.
The signal module supports setting signal handlers using signal.signal:
Set the handler for signal signalnum to the function handler. handler can be a callable Python object taking two arguments (see below), or one of the special values signal.SIG_IGN or signal.SIG_DFL. The previous signal handler will be returned ...
import signal
def ignore(sig, stack):
print "I'm ignoring signal %d" % (sig, )
signal.signal(signal.SIGINT, ignore)
while True: pass
If you send a SIGINT to a Python interpreter running this script (via kill -INT <pid>), it will print a message and simply continue to run.