I have a parent shell script that calls a python script. To notify in case of python script failure, have added a TRAP in shell script. But somehow, python script is getting killed/stopped for some reason without going through the TRAP function.
Request to help with the scenarios when a script can behave in such manner
Shell Script (Parent Process):
parent.sh
set -e
on_exit(){
if [ "$?" -eq 0 ]
then
echo "Success"
else
echo "Failure"
fi
}
trap on_exit EXIT
py_script=$(python child.py)
Python Script (Child Process): child.py
def func():
isDone = "false"
while isDone == "false":
print("Waiting")
try:
## GET request which sets isDone="true" on specific value
except Exception as e:
print("Something went wrong")
sys.exit(1)
time.sleep(10)
print("Completed")
Python script never prints "Something went wrong".
Is it possible that Linux is killing the process in background if it runs for around 12 hours?
EDIT:
Investigated it further and got to know that the python process was still running in the background without performing anything. When killed it manually it threw the notification.
But the question remains, in which scenario can a process go into a state without executing further lines without being killed. I am not aware of any pending state.
Related
I have started a python script in nohup. The script contains a finally block which will print the remaining items in the queue to print into a log file, if any Exception OR Keyboard interrupt occurs.
It is working fine as long as I'm in the same shell from where I executed the script. For eg: below steps are working as I have expected.
# Run the script in the background with nohup
nohup script_name.py > script_name.log 2>&1 &
# List the processes currently running in the background
jobs
# Bring back the process to foreground
fg 1
# Send Keyboard interrupt to stop the process by allowing to execute finally code block
Press CTRL+C
The script prints all the pending items in the queue to the "script_name.log" file.
But the problem is, once I exit the current shell from where I have started the "nohup" process, I can no longer bring that process back to foreground to send Keyboard interrupt.
At the same time, if I use kill PROCESS_ID command, it is not allowing the finally block to execute, so I'm also loosing the pending items in the queue.
Is there is any way I can terminate the process OR send a Keyboard interrupt to nohup process, still by allowing finally block to execute the code?
Thank you.
UPDATED: High level python code:
It is actually a Python project with lot of directories and other scripts but let me give only the finally block part of the code here.
try:
cc = crawler.MultiThreadedWebCrawler(max_workers)
cc.run_web_crawler()
finally:
cc.info(script_start_time)
# Inside another script, the definition of "info" method is,
def info(self, script_start_time):
print('\n', self.crawl_queue.qsize(), ' URLs in crawl_queue are:\n')
while self.crawl_queue.qsize() > 0:
print(self.crawl_queue.qsize(), ' ', self.crawl_queue.get(), '\n')
print("Script execution started at ", script_start_time)
print("Script execution ended at ", datetime.now().strftime('%Y-%m-%d__%H_%M_%S'))
Use the value inside a txt file as a kill switch.
Have your program open a txt file every time it's finished it's 'jobs'. Check to see if the value extracted is a 1 or a 0. If it's 0 then continue. If it's 1 then exit the script and output the results.
KillOrStay = open("/root/KillOrStay.txt").readlines()[0]
if int(KillOrStay) == 1:
# Output your results here
exit()
else:
pass
I have a shell script calling Python inside it.
#! /bin/bash
shopt -s extglob
echo "====test===="
~/.conda/envs/my_env/bin/python <<'EOF'
import sys
import os
try:
print("inside python")
x = 2/0
except Exception as e:
print("Exception: %s" % e)
sys.exit(2)
print("at the end of python")
EOF
echo "end of script"
If I execute this, the lines below still get printed.
"end of script"
I want to exit the shell in the exception block of the python script and let the script not reach EOF
Is there a way to create and kill a subprocess in the except block above, that will kill the entire shell script?
Can I spawn a dummy subprocess and kill it inside the exception block there by killing the entire shell script?
Any examples would be helpful.
Thanks in advance.
The whole EOF ... EOF block gets executed within the Python runtime so exiting from it doesn't affect the bash script. You'll need to collect the exit status and check it after the Python execution if you want to stop the further bash script progress, i.e.:
#!/bin/bash
~/.conda/envs/my_env/bin/python <<'EOF'
import sys
sys.exit(0x01) # use any exit code from 0-0xFF range, comment out for a clean exit
print("End of the Python script that will not execute without commenting out the above.")
EOF
exit_status=$? # store the exit status for later use
# now lets check the exit status and see if python returned a non-zero exit status
if [ $exit_status -ne 0 ]; then
echo "Python exited with a non-zero exit status, abort!"
exit $exit_status # exit the bash script with the same status
fi
# continue as usual...
echo "All is good, end of script"
From the shell script you have 2 options:
set -e: all errors quit the script
check python subcommand return code, abort if non-zero
(maybe more details here: Aborting a shell script if any command returns a non-zero value?)
Now, if you don't want to change the handling from your shell script, you could get the parent process of the python script and kill it:
except Exception as e:
import os,signal,sys
print("Exception: %s" % e)
os.kill(os.getppid(),signal.SIGTERM)
sys.exit(2)
if you need this on windows, this doesn't work (os.kill doesn't exist), you have to adapt it to invoke taskkill:
subprocess.call(["taskkill","/F","/PID",str(os.getppid())])
Now I would say that killing the parent process is bad practice. Unless you don't control the code of this parent process, you should try to handle the exit gracefully.
One way to kill the entire script could be to save the PID and then using Python's system commands to execute a kill command on the PID when the exception happens. If we imported 'os' it would be something along the lines of:
# In a shell
PID=$$
...
// Some Python Exception happens
os.system('kill -9' + $PID)
Is there a way to have Python print a statement when a script finishes successfully?
Example code would be something like:
if 'code variable' == 0:
print "Script ran successfully"
else:
print "There was an error"
How could I pass the value of the exit code to a variable (e.g. 'code variable')?
I feel like this would be a nice thing to include in a script for other users.
Thanks.
You can do this from the shell -- e.g. in Bash:
python python_code.py && echo "script exited successfully" || echo "there was an error."
You can't have a program write something like this for itself because it doesn't know it's exit code until it has exited -- at which time it isn't running any longer to report the error :-).
There are other things you can do to proxy this behavior from within the process itself:
try:
main()
except SystemExit as ext:
if ext.code:
print ("Error")
else:
print ("Success")
raise SystemExit(ext.code)
else:
print ("Success")
However, this doesn't help if somebody uses os._exit -- and we're only catching sys.exit here, no other exceptions that could be causing a non-zero exit status.
Just write print at the end of a script if it's in form of executing straight from top to bottom. If there's an error, python stops the script and your print won't be executed. The different case is when you use try for managing exceptions.
Or make yourself a script for running python script.py with try and your except will give you an exception for example to a file or wherever you'd like it to store/show.
I have a simple perl script that calls another python script to do the deployment of a server in cloud .
I capture the exit status of the deployment inside perl to take any further action after success/failure setup.
It's like:
$cmdret = system("python script.py ARG1 ARG2");
Here the python script runs for 3hrs to 7 hrs.
The problem here is that, irrespective of the success or failure return status, the system receive a Signal HUP at this step randomly even if the process is running in backened and breaks the steps further.
So does anyone know, if there is any time limit for holding the return status from the system which leads to sending Hangup Signal?
Inside the python script script.py, pexpect is used execute scripts remotely:
doSsh(User,Passwd,Name,'cd '+OutputDir+';python host-bringup.py setup')
doSsh(User,Passwd,Name,'cd '+OpsHome+'/ops/hlevel;python dshost.py start')
....
And doSsh is a pexpect subroutine:
def doSsh(user,password,host,command):
try:
child = pexpect.spawn("ssh -o ServerAliveInterval=100 -n %s#%s '%s'" % (user,host,command),logfile=sys.stdout,timeout=None)
i = child.expect(['password:', r'\(yes\/no\)',r'.*password for paasusr: ',r'.*[$#] ',pexpect.EOF])
if i == 0:
child.sendline(password)
elif i == 1:
child.sendline("yes")
child.expect("password:")
child.sendline(password)
data = child.read()
print data
child.close()
return True
except Exception as error:
print error
return False
This first doSsh execution takes ~6 hours and this session is killed after few hours of execution with the message : Signal HUP caught; exitingbut
the execution python host-bringup.py setup still runs in the remote host.
So in the local system, the next doSsh never runs and also the rest steps inside the perl script never continue.
SIGHUP is sent when the terminal disconnects. When you want to create a process that's not tied to the terminal, you daemonize it.
Note that nohup doesn't deamonize.
$ nohup perl -e'system "ps", "-o", "pid,ppid,sid,cmd"'
nohup: ignoring input and appending output to `nohup.out'
$ cat nohup.out
PID PPID SID CMD
21300 21299 21300 -bash
21504 21300 21300 perl -esystem "ps", "-o", "pid,ppid,sid,cmd"
21505 21504 21300 ps -o pid,ppid,sid,cmd
As you can see,
perl's PPID is that of the program that launched it.
perl's SID is that of the program that launched it.
Since the session hasn't changed, the terminal will send SIGHUP to perl when it disconnects as normal.
That said, nohup changes how perl's handles SIGHUP by causing it to be ignored.
$ perl -e'system "kill", "-HUP", "$$"; print "SIGHUP was ignored\n"'
Hangup
$ echo $?
129
$ nohup perl -e'system "kill", "-HUP", "$$"; print "SIGHUP was ignored\n"'
nohup: ignoring input and appending output to `nohup.out'
$ echo $?
0
$ tail -n 1 nohup.out
SIGHUP was ignored
If perl is killed by the signal, it's because something changed how perl handles SIGHUP.
So, either daemonize the process, or have perl ignore use SIGHUP (e.g. by using nohup). But if you use nohup, don't re-enable the default SIGHUP behaviour!
If your goal is to make your perl program ignore the HUP signal, you likely just need to set the HUP entry of the $SIG global signal handler hash:
$SIG{ 'HUP' } = 'IGNORE';
for gory details, see
perldoc perlipc
I have a problem with the way signals are propagated within a process group. Here is my situation and an explication of the problem :
I have an application, that is launched by a shell script (with a su). This shell script is itself launched by a python application using subprocess.Popen
I call os.setpgrp as a preexec_function and have verified using ps that the bash script, the su command and the final application all have the same pgid.
Now when I send signal USR1 to the bash script (the leader of the process group), sometimes the application see this signal, and sometimes not. I can't figure out why I have this random behavior (The signal is seen by the app about 50% of the time)
Here is he example code I am testing against :
Python launcher :
#!/usr/bin/env python
p = subprocess.Popen( ["path/to/bash/script"], stdout=…, stderr=…, preexec_fn=os.setpgrp )
# loop to write stdout and stderr of the subprocesses to a file
# not that I use fcntl.fcntl(p.stdXXX.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
p.wait()
Bash script :
#!/bin/bash
set -e
set -u
cd /usr/local/share/gios/exchange-manager
CONF=/etc/exchange-manager.conf
[ -f $CONF ] && . $CONF
su exchange-manager -p -c "ruby /path/to/ruby/app"
Ruby application :
#!/usr/bin/env ruby
Signal.trap("USR1") do
puts "Received SIGUSR1"
exit
end
while true do
sleep 1
end
So I try to send the signal to the bash wrapper (from a terminal or from the python application), sometimes the ruby application will see the signal and sometimes not. I don't think it's a logging issue as I have tried to replace the puts by a method that write directly to a different file.
Do you guys have any idea what could be the root cause of my problem and how to fix it ?
Your signal handler is doing too much. If you exit from within the signal handler, you are not sure that your buffers are properly flushed, in other words you may not be exiting gracefully your program. Be careful of new signals being received when the program is already inside a signal handler.
Try to modify your Ruby source to exit the program from the main loop as soon as an "exit" flag is set, and don't exit from the signal handler itself.
Your Ruby application becomes:
#!/usr/bin/env ruby
$done = false
Signal.trap("USR1") do
$done = true
end
until $done do
sleep 1
end
puts "** graceful exit"
Which should be much safer.
For real programs, you may consider using a Mutex to protect your flag variable.