I have a simple perl script that calls another python script to do the deployment of a server in cloud .
I capture the exit status of the deployment inside perl to take any further action after success/failure setup.
It's like:
$cmdret = system("python script.py ARG1 ARG2");
Here the python script runs for 3hrs to 7 hrs.
The problem here is that, irrespective of the success or failure return status, the system receive a Signal HUP at this step randomly even if the process is running in backened and breaks the steps further.
So does anyone know, if there is any time limit for holding the return status from the system which leads to sending Hangup Signal?
Inside the python script script.py, pexpect is used execute scripts remotely:
doSsh(User,Passwd,Name,'cd '+OutputDir+';python host-bringup.py setup')
doSsh(User,Passwd,Name,'cd '+OpsHome+'/ops/hlevel;python dshost.py start')
....
And doSsh is a pexpect subroutine:
def doSsh(user,password,host,command):
try:
child = pexpect.spawn("ssh -o ServerAliveInterval=100 -n %s#%s '%s'" % (user,host,command),logfile=sys.stdout,timeout=None)
i = child.expect(['password:', r'\(yes\/no\)',r'.*password for paasusr: ',r'.*[$#] ',pexpect.EOF])
if i == 0:
child.sendline(password)
elif i == 1:
child.sendline("yes")
child.expect("password:")
child.sendline(password)
data = child.read()
print data
child.close()
return True
except Exception as error:
print error
return False
This first doSsh execution takes ~6 hours and this session is killed after few hours of execution with the message : Signal HUP caught; exitingbut
the execution python host-bringup.py setup still runs in the remote host.
So in the local system, the next doSsh never runs and also the rest steps inside the perl script never continue.
SIGHUP is sent when the terminal disconnects. When you want to create a process that's not tied to the terminal, you daemonize it.
Note that nohup doesn't deamonize.
$ nohup perl -e'system "ps", "-o", "pid,ppid,sid,cmd"'
nohup: ignoring input and appending output to `nohup.out'
$ cat nohup.out
PID PPID SID CMD
21300 21299 21300 -bash
21504 21300 21300 perl -esystem "ps", "-o", "pid,ppid,sid,cmd"
21505 21504 21300 ps -o pid,ppid,sid,cmd
As you can see,
perl's PPID is that of the program that launched it.
perl's SID is that of the program that launched it.
Since the session hasn't changed, the terminal will send SIGHUP to perl when it disconnects as normal.
That said, nohup changes how perl's handles SIGHUP by causing it to be ignored.
$ perl -e'system "kill", "-HUP", "$$"; print "SIGHUP was ignored\n"'
Hangup
$ echo $?
129
$ nohup perl -e'system "kill", "-HUP", "$$"; print "SIGHUP was ignored\n"'
nohup: ignoring input and appending output to `nohup.out'
$ echo $?
0
$ tail -n 1 nohup.out
SIGHUP was ignored
If perl is killed by the signal, it's because something changed how perl handles SIGHUP.
So, either daemonize the process, or have perl ignore use SIGHUP (e.g. by using nohup). But if you use nohup, don't re-enable the default SIGHUP behaviour!
If your goal is to make your perl program ignore the HUP signal, you likely just need to set the HUP entry of the $SIG global signal handler hash:
$SIG{ 'HUP' } = 'IGNORE';
for gory details, see
perldoc perlipc
Related
I have a python script that opens multiple concurrent pseudo-tty ssh sessions to a server. My problem is that the output is garbled:
for i in range(0, 3):
subprocess.Popen(
"ssh -tt -q myserver 'echo 11; echo 22; echo 33; echo 44;'",
shell=True
)
Output:
11
22
33
44
11
22
33
44
11
22
33
44
The output varies. Sometimes it works, but most of the time I get those weird indentations. In reality I want to launch remote python processes (a locust load gen slave), but I've simplified it to just use echo.
Things I've tried:
universal_newlines=True, bufsize=1 (doesnt help)
remove -tt (fixes the output but has the undesired side effect of remote processes not dying right away if python/ssh is terminated)
piping to cat -e to get hidden characters (for debugging):
11^M$
22^M$
33^M$
44^M$
11$
22$
33$
44$
11$
22$
33$
44$
I'm not sure if is even a python issue or just an SSH issue. My guess is that I need to use some sort of line buffering, but I dont know how :-/
I'm on MacOS Mojave, and I've tried both in iTerm2 and Term if that matters.
Edit: I'm not sure it is related, but the problem appears to occur more frequently if I ensure python keeps running until the ssh session has terminated (by adding time.sleep(10) at the end of the script)
edit 2: I tried #FLemaitre 's solution (not using -tt and killing explicitly), and it works in the simple case, but not when spawning locust:
proc = subprocess.Popen(
"ssh servername 'locust --slave --master-port 7777 --no-web -f locustfile.py & read; kill $!'",
shell=True,
stdin=subprocess.PIPE,
)
time.sleep(10)
proc.kill()
proc.wait()
On the remote a bash -c locust --slave ... process is started. It dies when ssh is killed, but locust itself (a child of the above process) does not :-/
I reproduce systematically the issue with the following script:
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 10):
proc = subprocess.Popen(
"ssh -tt -q localhost 'echo 11; echo 22; echo 33; '",
shell=True
)
time.sleep(4)
And I think the issue is not related to Python. These multiple ssh with pseudo-TTY seem to conflict with each other's. Eventually, the terminal used to run this script ends up broken as well (whereas it wasn't sourced):
>cat test2.py
import subprocess
import time
import atexit
... etc ...
I checked the documentation and this -t option seems to do much more than what you are actually trying to achieve. When I remove the second t and the -q options, I sometimes (not often), get a cryptic error message stating that something went wrong (but I no longer manage to reproduce it). I checked with google but without much success. Still, I'm convinced that this option is overkill and I would rather focus on the undying processes. This one issue is well known:
Starting a process over ssh using bash and then killing it on sigint
The second answer is your -tt option, but the best answer suits your example very well and is superior (with -tt you solve the ssh propagation of the termination but do not tackle the same issue between Python and its subprocess). For example:
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 10):
proc = subprocess.Popen(
"ssh localhost 'sleep 90 & read ; kill $!'",
shell=True,
stdin=subprocess.PIPE
)
time.sleep(40)
With this solution, stdin is shared by all actors (python, the python subprocess, the ssh process, the sleep process), and its closure at any point in the chain is detected by the final business process, trigering a graceful shutdown.
Edit with locust:
I gave it a quick try and the issue was that a simple 'kill' is ignored by the slave (looks like an issue on lucust side). It seems to work with a 'kill -9':
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 2):
proc = subprocess.Popen(
"ssh localhost 'python -m locust --slave --no-web -f ~devsup/users/flemaitre/tmp/locust_config.py & read ; kill -9 $!'",
shell=True,
stdin=subprocess.PIPE
)
time.sleep(40)
I'm trying to trigger a bash script which runs for hours on my remote server using npm module "simple-ssh". But, I don't want to wait till the completion of the bash script execution. I want to continue to my next chain the exec and end the ssh session. As my bash script is running in the background, I expect it to complete gracefully even if I end my ssh session.
Problem is I'm unable to achieve this as the control never gets out of first block in the chain till the bash script completion is done. Please suggest a way to perform this goal.
server.js
const sh_script = 'sh /ws/jobs/test.sh'
var SSH = require('simple-ssh');
var ssh = new SSH({
host: 'HOST IP',
user: 'username',
pass: 'password'
});
ssh
.exec(sh_script, {
out: function(stdout) {
console.log(stdout);
},
})
.exec('echo "exiting shell"',{
out: function(){
ssh.end();
console.log("exiting");
}
})
.start();
test.sh (This shell script takes hours to complete)
function start_ui_validation {
......
}
echo "Before starting subshell"
(
start_ui_validation "params"
) &
echo "Finished"
The command you are trying to execute sh /ws/jobs/test.sh is running in foreground, that's why the control is not returning to your subsequent .exec()s. You'd need to put your previous process in background for the control to return to you. This can be easily achieved by nohup. try changing your sh_script to something like this:
sh_script = 'nohup /ws/jobs/test.sh >/dev/null 2>&1 &';
nohup will start your process detached from your ssh session. >/dev/null 2>&1 will redirect stdout and stderr from the process to /dev/null and the & in the end will put your process in background. After this the control should return back to your code, and next exec should continue.
There are other ways of achieving this without nohup (in case you do not have nohup on your remote system). You could do sh_script = 'sh /ws/jobs/test.sh >/dev/null 2>&1 &'; which starts the script in background and then add another .exec('disown', {out:function() {}}) which will disconnect the last process from your ssh session. Use this only if nohup is not available for some reason.
Why
import subprocess
p = subprocess.Popen(["/bin/bash", "-c", "timeout -s KILL 1 sleep 5 2>/dev/null"])
p.wait()
print(p.returncode)
returns
[stderr:] /bin/bash: line 1: 963663 Killed timeout -s KILL 1 sleep 5 2> /dev/null
[stdout:] 137
when
import subprocess
p = subprocess.Popen(["/bin/bash", "-c", "timeout -s KILL 1 sleep 5"])
p.wait()
print(p.returncode)
returns
[stdout:] -9
If you change bash to dash, you'll get 137 in both cases. I know that -9 is KILL code and 137 is 128 + 9. But seems weird for similar code to get different returncode.
Happens on Python 2.7.12 and python 3.4.3
Looks like Popen.wait() does not call Popen._handle_exitstatus https://github.com/python/cpython/blob/3.4/Lib/subprocess.py#L1468 when using /bin/bash but I could not figure out why.
This is due to the fact how bash executes timeout with or without redirection/pipes or any other bash features:
With redirection
python starts bash
bash starts timeout, monitors the process and does pipe handling.
timeout transfers itself into a new process group and starts sleep
After one second, timeout sends SIGKILL into its process group
As the process group died, bash returns from waiting for timeout, sees the SIGKILL and prints the message pasted above to stderr. It then sets its own exit status to 128+9 (a behaviour simulated by timeout).
Without redirection
python starts bash.
bash sees that it has nothing to do on its own and calls execve() to effectively replace itself with timeout.
timeout acts as above, the whole process group dies with SIGKILL.
python get's an exit status of 9 and does some mangling to turn this into -9 (SIGKILL)
In other words, without redirection/pipes/etc. bash withdraws itself from the call-chain. Your second example looks like subprocess.Popen() is executing bash, yet effectively it does not. bash is no longer there when timeout does its deed, which is why you don't get any messages and an unmangled exit status.
If you want consistent behaviour, use timeout --foreground; you'll get an exit status of 124 in both cases.
I don't know about dash; yet suppose it does not do any execve() trickery to effectively replace itself with the only program it's executing. Therefore you always see the mangled exit status of 128+9 in dash.
Update: zshshows the same behaviour, while it drops out even for simple redirections such as timeout -s KILL 1 sleep 5 >/tmp/foo and the like, giving you an exit status of -9. timeout -s KILL 1 sleep 5 && echo $? will give you status 137 in zsh also.
I have a problem with the way signals are propagated within a process group. Here is my situation and an explication of the problem :
I have an application, that is launched by a shell script (with a su). This shell script is itself launched by a python application using subprocess.Popen
I call os.setpgrp as a preexec_function and have verified using ps that the bash script, the su command and the final application all have the same pgid.
Now when I send signal USR1 to the bash script (the leader of the process group), sometimes the application see this signal, and sometimes not. I can't figure out why I have this random behavior (The signal is seen by the app about 50% of the time)
Here is he example code I am testing against :
Python launcher :
#!/usr/bin/env python
p = subprocess.Popen( ["path/to/bash/script"], stdout=…, stderr=…, preexec_fn=os.setpgrp )
# loop to write stdout and stderr of the subprocesses to a file
# not that I use fcntl.fcntl(p.stdXXX.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
p.wait()
Bash script :
#!/bin/bash
set -e
set -u
cd /usr/local/share/gios/exchange-manager
CONF=/etc/exchange-manager.conf
[ -f $CONF ] && . $CONF
su exchange-manager -p -c "ruby /path/to/ruby/app"
Ruby application :
#!/usr/bin/env ruby
Signal.trap("USR1") do
puts "Received SIGUSR1"
exit
end
while true do
sleep 1
end
So I try to send the signal to the bash wrapper (from a terminal or from the python application), sometimes the ruby application will see the signal and sometimes not. I don't think it's a logging issue as I have tried to replace the puts by a method that write directly to a different file.
Do you guys have any idea what could be the root cause of my problem and how to fix it ?
Your signal handler is doing too much. If you exit from within the signal handler, you are not sure that your buffers are properly flushed, in other words you may not be exiting gracefully your program. Be careful of new signals being received when the program is already inside a signal handler.
Try to modify your Ruby source to exit the program from the main loop as soon as an "exit" flag is set, and don't exit from the signal handler itself.
Your Ruby application becomes:
#!/usr/bin/env ruby
$done = false
Signal.trap("USR1") do
$done = true
end
until $done do
sleep 1
end
puts "** graceful exit"
Which should be much safer.
For real programs, you may consider using a Mutex to protect your flag variable.
I want to launch a background Python job from a bash script and then gracefully kill it with SIGINT. This works fine from the shell, but I can't seem to get it to work in a script.
loop.py:
#! /usr/bin/env python
if __name__ == "__main__":
try:
print 'starting loop'
while True:
pass
except KeyboardInterrupt:
print 'quitting loop'
From the shell I can interrupt it:
$ python loop.py &
[1] 15420
starting loop
$ kill -SIGINT 15420
quitting loop
[1]+ Done python loop.py
kill.sh:
#! /bin/bash
python loop.py &
PID=$!
echo "sending SIGINT to process $PID"
kill -SIGINT $PID
But from a script I can't:
$ ./kill.sh
starting loop
sending SIGINT to process 15452
$ ps ax | grep loop.py | grep -v grep
15452 pts/3 R 0:08 python loop.py
And, if it's been launched from a script I can no longer kill it from the shell:
$ kill -SIGINT 15452
$ ps ax | grep loop.py | grep -v grep
15452 pts/3 R 0:34 python loop.py
I'm assuming I'm missing some fine point of bash job control.
You're not registering a signal handler. Try the below. It seems to work fairly reliably. I think the rare exception is when it catches the signal before Python registers the script's handler. Note that KeyboardInterrupt is only supposed to be raised, "when the user hits the interrupt key". I think the fact that it works for a explicit (e.g. via kill) SIGINT at all is an accident of implementation.
import signal
def quit_gracefully(*args):
print 'quitting loop'
exit(0);
if __name__ == "__main__":
signal.signal(signal.SIGINT, quit_gracefully)
try:
print 'starting loop'
while True:
pass
except KeyboardInterrupt:
quit_gracefully()
In addition to #matthew-flaschen's answer, you can use exec in the bash script to effectively replace the scope to the process being opened:
#!/bin/bash
exec python loop.py &
PID=$!
sleep 5 # waiting for the python process to come up
echo "sending SIGINT to process $PID"
kill -SIGINT $PID
I agree with Matthew Flaschen; the problem is with python, which apparently doesn't register the KeyboardInterrupt exception with SIGINT when it's not called from an interactive shell.
Of course, nothing prevents you from registering your signal handler like this:
def signal_handler(signum, frame):
raise KeyboardInterrupt, "Signal handler"
When you run command in background with &, SIGINT will be ignored.
Here's the relevant section of man bash:
Non-builtin commands run by bash have signal handlers set to the values inherited by the shell from
its parent. When job control is not in effect, asynchronous commands ignore SIGINT and SIGQUIT in
addition to these inherited handlers. Commands run as a result of command substitution ignore the
keyboard-generated job control signals SIGTTIN, SIGTTOU, and SIGTSTP.
I think you need to set signal handler explicitly as Matthew commented.
The script kill.sh also have a problem. Since loop.py is sent to background, there's no guarantee that kill runs after python loop.py.
#! /bin/bash
python loop.py &
PID=$!
#
# NEED TO WAIT ON EXISTENCE OF python loop.py PROCESS HERE.
#
echo "sending SIGINT to process $PID"
kill -SIGINT $PID
Tried #Steen's approach, but alas, it does not apparently hold on Mac.
Another solution, pretty much the same as the above but a little more general, is to just re-install the default handler if SIGINT is being ignored:
def _ensure_sigint_handler():
# On Mac, even using `exec <cmd>` in `bash` still yields an ignored SIGINT.
sig = signal.getsignal(signal.SIGINT)
if signal.getsignal(signal.SIGINT) == signal.SIG_IGN:
signal.signal(signal.SIGINT, signal.default_int_handler)
# ...
_ensure_sigint_handler()