I wrote a signal handler that can restart the script it self by:
kill -10 $PID
and I register the handler in the beginning of the script.
signal.signal(signal.SIGUSR1, restart_handler)
My script mainly do below things:
download some source code.
unzip it and other stuff.
use os.system(bash -c 'make -j16 > log.txt')
When downloading source code, I use kill -10 to restart it,
it runs the handler quickly and normally as I expected.
However, when it start to make -j16, I use the same kill command,
but need to wait very long time to reach signal handler.
(looks like the signal is not handled immediately,
but if I use kill -9 $PID it can be killed immediately)
How to make my customized signal handler can act as quick as -9?
below picture it the pstree output when make -j16:
https://www.dropbox.com/s/rbfzn0p0f2p55xx/make.png
Related
Problem: When executing a python script from the command line, it catches and handles SIGTERM signals as expected. However, if the script is called from by a bash script, and then bash script then sends the signal to the python script, it does not handle the SIGTERM signal as expected.
The python script in question is extremely simple: it waits for a SIGTERM and then waits for a few seconds before exiting.
#!/usr/bin/env python3
import sys
import signal
import time
# signal handler
def sigterm_handler(signal, frame):
time.sleep(5)
print("dying")
sys.exit()
# register the signal handler
signal.signal(signal.SIGTERM, sigterm_handler)
while True:
time.sleep(1)
If this is called directly and then the signal sent from the command line
i.e.
> ./sigterm_tester.py &
> kill -15 <PID>
the signal handling performs normally (it waits 5 seconds, posts "dying" to stdout, and exits)
However, if it is instead called from a bash script, it no longer seems to catch the SIGTERM and instead exits immediately.
This simple bash script executes the python script and then kills its child (the python script). However, the termination occurs immediately instead of after a 5 second delay, and there is no printing of "dying" to stdout (or to a file when I attempted stdout redirection).
#!/bin/bash
./sigterm_tester.py &
child=$(pgrep -P $$)
kill -15 $child
while true;
do
sleep 1
done
Some additional information: I have also tested this with sh as well as bash and the same behavior occurs. Additionally I have tested this and gotten the same behavior in a MacOS environment as well as a Linux environment. I also tested it with both python2 and python3.
My question is why is the behavior different seemingly dependent on how the program is called, and is there a way to ensure that the python program appropriately handles signals even when called from a bash script?
Summing #Matt Walck comments. In the bash script you were killing the python process right after invoking it, which might not had enough time to register on the sigterm signal. Adding a sleep command between the spawn and the kill command will back the theory up.
#!/bin/bash
./sigterm_tester.py &
child=$(pgrep -P $$)
#DEBUGONLY
sleep 2
kill -15 $child
while true;
do
sleep 1
done
Using python3/linux/bash:
gnr#localhost: cat my_script
#!/usr/bin/python3
import time, pexpect
p = pexpect.spawn('sleep 123')
p.sendintr()
time.sleep(1000)
This works fine when run as is (i.e. my_script starts a sleep 123 child process and then sends it a SIGINT which kills sleep 123). However, when I background my_script as a grandchild process, it no longer is able to kill the sleep 123 command:
gnr#localhost: (my_script &> /dev/null &)
Anyone know what's going on here/how to change my_script or pexpect to be able to still send SIGINT to it's child process?
I'm thinking this is has something to do with the backgrounding causing there to be no controlling terminal, and maybe I need to create a new pty?
Update: Never figured out how to create a pty (though ssh'ing into localhost with a -t option worked) - ended up doing an os.fork() to background a child process rather than the (my_script &> /dev/null &) which works because (I'm guessing) the controlling terminal is not immediately closed.
Are you sure the process isn't being killed? I would expect it to show <defunct> in the process list as the process that spawned is now sitting in a sleep and proper cleanup can't complete until sleep finishes. <defunct> processes have been killed, just their parents haven't done the cleanup.
If you can somehow modify your code so that the parent actually goes through the normal processing and shuts down the child (spawn) then it should work. Although clumsy this might work:
import time, pexpect, os
newpid = os.fork()
if newpid == 0:
# Child
p = pexpect.spawn('sleep 123')
p.sendintr()
else:
# parent
time.sleep(1000)
In this case we fork our own child who handles the spawn and does the kill. Since our child isn't blocking on its own sleep it exits gracefully which includes properly cleaning up the process it killed. In the mean time the main (parent) thread is waiting on a sleep
After your comment it occurred to me that although I was placing my script in the background at the bash prompt, I wasn't doing it the same as yours.
I was using
(expecttest.py > /dev/null 2>&1 &)
This redirects stdin and stdout to >/dev/null and puts the process in the background.
If I take your original code and rather than doing a sendintr and instead do a terminate using your invocation from the command shell it works. It seems that sleep 123 doesn't respond to what pexpect is doing in that case.
I have a python script that constantly runs (it has an infinite loop), but I want it to be able to still accept input while running. It will run in the background and then at any time I want to be able to type
scriptname stop
and stop it (or something like that). That way it can call a shutdown method to save information and quit.
Currently it runs in the foreground in the terminal, and can't be stopped by a keyboard interrupt, so the only way to kill it is to close the terminal or kill python.
How can I do something like this?
Use supervisord. It exists to manage processes, and provides a command interface to start and stop them.
When supervisor kills a process, it sends SIGTERM (or any other signal you choose). So, to shutdown cleanly, you need to handle that signal.
See this question on how to handle SIGTERM: Python - Trap all signals
Processes can still listen on their own pipes for input, and send output that way.
If you are in Windows then You are at right point...
Just Rename your file: script.py to script.pyw and Use It Normally.
Your Script will run in background.
To close that script:
Go to Task Manager , click on Process Tab , look out for python , End Task.
If You need more information I am Ready to Provide to you...
I am Not Sure About Linux or Ubuntu.
Thanks.
well, I have a usr1 signal handler in a script. By sending a SIGUSR1 from outside to my script, my handler does its work, but the signal is spread also to the child that I create via Popen. How can I do this?
The rsync manual page says that exit code 20 means:
Received SIGUSR1 or SIGINT
So if you are killing it with kill (not kill -15 which you say you sometimes use) then it would die with this exit code too.
When using mpirun, is it possible to catch signals (for example, the SIGINT generated by ^C) in the code being run?
For example, I'm running a parallelized python code. I can except KeyboardInterrupt to catch those errors when running python blah.py by itself, but I can't when doing mpirun -np 1 python blah.py.
Does anyone have a suggestion? Even finding how to catch signals in a C or C++ compiled program would be a helpful start.
If I send a signal to the spawned Python processes, they can handle the signals properly; however, signals sent to the parent orterun process (i.e. from exceeding wall time on a cluster, or pressing control-C in a terminal) will kill everything immediately.
I think it is really implementation dependent.
In SLURM, I tried to use sbatch --signal USR1#30 to send SIGUSR1 (whose signum is 30,10 or 16) to the program launched by srun commands. And the process received signal SIGUSR1 = 10.
For platform MPI of IBM, according to https://www.ibm.com/support/knowledgecenter/en/SSF4ZA_9.1.4/pmpi_guide/signal_propagation.html
SIGINT, SIGUSR1, SIGUSR2 will be bypassed to processes.
In MPICH, SIGUSR1 is used by the process manager for internal notification of abnormal failures.
ref: http://lists.mpich.org/pipermail/discuss/2014-October/003242.html>
Open MPI on the other had will forward SIGUSR1 and SIGUSR2 from mpiexec to the other processes.
ref: http://www.open-mpi.org/doc/v1.6/man1/mpirun.1.php#sect14>
For IntelMPI, according to https://software.intel.com/en-us/mpi-developer-reference-linux-hydra-environment-variables
I_MPI_JOB_SIGNAL_PROPAGATION and I_MPI_JOB_TIMEOUT_SIGNAL can be set to send signal.
Another thing worth notice: For many python scripts, they will invoke other library or codes through cython, and if the SIGUSR1 is caught by the sub-process, something unwanted might happen.
If you use mpirun --nw, then mpirun itself should terminate as soon as it's started the subprocesses, instead of waiting for their termination; if that's acceptable then I believe your processes would be able to catch their own signals.
The signal module supports setting signal handlers using signal.signal:
Set the handler for signal signalnum to the function handler. handler can be a callable Python object taking two arguments (see below), or one of the special values signal.SIG_IGN or signal.SIG_DFL. The previous signal handler will be returned ...
import signal
def ignore(sig, stack):
print "I'm ignoring signal %d" % (sig, )
signal.signal(signal.SIGINT, ignore)
while True: pass
If you send a SIGINT to a Python interpreter running this script (via kill -INT <pid>), it will print a message and simply continue to run.