Python is mounted hangs with stale mount. Suggestions? - python

I ran into a problem today where a mount went stale and this caused my entire python application to hang indefinitely.
What happened is the mount went stale, and then I called os.path.exists(path) on that path. The call hangs indefinitely.
I really really need to prevent this. My only idea is to put the os.path.exists call on a background thread and kill the thread abruptly (obviously not preferred) if it's still alive after a timeout amount of seconds. Ideally I would avoid this by making a call to check if the mount is stale first.
Any ideas? FYI, all calls that try to access this path hang including os.path.ismount(path). While this is a rare event, I can't have my entire system freezing on users :/.

You can use python subprocess module and shell "timeout" command to detect the hange mount:
call = subprocess.Popen(["timeout 10 ls /your_mount_dir/ &> /dev/null; echo $?"], stdout=subprocess.PIPE, shell=True)
output = call.communicate()
result = output[0].strip()
if result != '0':
# Mount is hung.

Similar to myheartsgoon's answer, but safer and more simple.
try:
subprocess.check_call(["timeout", "4", "ls", "/mnt/your_nas"])
except subprocess.CalledProcessError:
# is hanging

Related

Stop a bash script in python [duplicate]

I am currently trying to write (Python 2.7.3) kind of a wrapper for GDB, which will allow me to dynamically switch from scripted input to interactive communication with GDB.
So far I use
self.process = subprocess.Popen(["gdb vuln"], stdin = subprocess.PIPE, shell = True)
to start gdb within my script. (vuln is the binary I want to examine)
Since a key feature of gdb is to pause the execution of the attached process and allow the user to inspect registers and memory on receiving SIGINT (STRG+C) I do need some way to pass a SIGINT signal to it.
Neither
self.process.send_signal(signal.SIGINT)
nor
os.kill(self.process.pid, signal.SIGINT)
or
os.killpg(self.process.pid, signal.SIGINT)
work for me.
When I use one of these functions there is no response. I suppose this problem arises from the use of shell=True. However, at this point I am really out of ideas.
Even my old friend Google couldn't really help me out this time, so maybe you can help me. Thank's in advance.
Cheers, Mike
Here is what worked for me:
import signal
import subprocess
try:
p = subprocess.Popen(...)
p.wait()
except KeyboardInterrupt:
p.send_signal(signal.SIGINT)
p.wait()
I looked deeper into the problem and found some interesting things. Maybe these findings will help someone in the future.
When calling gdb vuln using suprocess.Popen() it does in fact create three processes, where the pid returned is the one of sh (5180).
ps -a
5180 pts/0 00:00:00 sh
5181 pts/0 00:00:00 gdb
5183 pts/0 00:00:00 vuln
Consequently sending a SIGINT to the process will in fact send SIGINT to sh.
Besides, I continued looking for an answer and stumbled upon this post
https://bugzilla.kernel.org/show_bug.cgi?id=9039
To keep it short, what is mentioned there is the following:
When pressing STRG+C while using gdb regularly SIGINT is in fact sent to the examined program (in this case vuln), then ptrace will intercept it and pass it to gdb.
What this means is, that if I use self.process.send_signal(signal.SIGINT) it will in fact never reach gdb this way.
Temporary Workaround:
I managed to work around this problem by simply calling subprocess.popen() as follows:
subprocess.Popen("killall -s INT " + self.binary, shell = True)
This is nothing more than a first workaround. When multiple applications with the same name are running might do some serious damage. Besides, it somehow fails, if shell=True is not set.
If someone has a better fix (e.g. how to get the pid of the process startet by gdb), please let me know.
Cheers, Mike
EDIT:
Thanks to Mark for pointing out to look at the ppid of the process.
I managed to narrow down the process's to which SIGINT is sent using the following approach:
out = subprocess.check_output(['ps', '-Aefj'])
for line in out.splitlines():
if self.binary in line:
l = line.split(" ")
while "" in l:
l.remove("")
# Get sid and pgid of child process (/bin/sh)
sid = os.getsid(self.process.pid)
pgid = os.getpgid(self.process.pid)
#only true for target process
if l[4] == str(sid) and l[3] != str(pgid):
os.kill(pid, signal.SIGINT)
I have done something like the following in the past and if I recollect it seemed to work for me :
def detach_procesGroup():
os.setpgrp()
subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=detach_processGroup)

How to guarantee file removing after script stopped working?

I have a script running by crontab every hour and interacts with API (database sync). Usually it take one hour or so, and I check for the next run if this process still in the memory or not:
#/usr/bin/env python
import os
import sys
pid = str(os.getpid())
pidfile = "/tmp/mydaemon.pid"
if os.path.isfile(pidfile):
print "%s already exists, exiting" % pidfile
sys.exit()
file(pidfile, 'w').write(pid)
try:
# Do some actual work here
finally:
os.unlink(pidfile)
BUT after some time script stopped working, when I look at the "ps aux | grep python", I don't see this script as the process, but I do see file on the place.
And when I run script manually, I see information printed iteratively on the screen, but after some time I see the word "Terminated", script exited and file still on the place.
How to guarantee 100% the file removed after the script stopped working?
Thanks!
It looks like your script is terminated unexpectedly, most probably due to too high memory usage. It's not guaranteed that finally will be executed on unexpected program termination. So, first of all I suggest you to find the cause of the unexpected termination an fix it.
Actually there is no 100% way to guarantee that the file will be removed. However, there are a few workarounds for handling dangling pid files.
Place your pid files on the /var/run volume, so they will be removed on unexpected system restart.
Check wether the process with such pid is still running on each script execution:
import os
def is_alive(pid):
try:
os.kill(pid, 0) # do nothing but throws an exception
return True
except OSError:
return False
# and add this to your code:
if os.path.isfile(pidfile):
with open(pidfile) as f:
if is_alive(f.read()):
sys.exit()
Again, provided code is not 100% safe because of possible pid collisions. You can make the verification of running process more sophisticated by adding parsing of ps command output. Try to find a line with the desired pid value and check wether it looks similar to your crontab entry.
Normally you can use atextit module functionality, but in your case (unexpected termination) it also may not work.
Maybe use of mkstemp (specifying required program suffix/refix) within with statement may work: it will create unique pidfile in /tmp and clear it, when with block completes or terminates.

python execute subprocess without waiting for subprocess to terminate

Is it possible to start subprocess without waiting for it to terminate?
I have a Windows program that I need to execute in python script and I want to leave it run on background without waiting for the subprocess to quit since the program is expecting input to terminate itself (press q to quit).
I have tried many ways but none of them worked.
What I basically want to achieve is the following:
args = [os.path.join(path, 'myProgram.exe'), '/run']
p = subprocess.Popen(args)
and do some other stuff here with the myProgram.exe still running.
The myProgram.exe can also be installed as service. When I tried this approach by
subprocess.call('net start myService', shell=True)
the service always fails to start. It fails with system error code 1067 which means the process terminated unexpectedly.
NOTE: I'm using python 2.7
Thanks for the advices.
Edit:
I have found a workaround which I don't understand...
As workaround I've created a myProgram.bat file which starts myProgram.exe.
BUT there's a catch, if I do only
start myProgram.exe
it behaves exactly the same as when calling subprocess - it terminates. However if I do
timeout 0 -- #means wait 0 seconds
start myProgram.exe
the program starts normally.
you could try:
args = ['start', os.path.join(path, 'myProgram.exe'), '/run']
see start /? for help ( or http://www.computerhope.com/starthlp.htm)
If the program expects a q to quit, maybe send it one?
args = [os.path.join(path, 'myProgram.exe'), '/run']
p = subprocess.Popen(args, stdin=subprocess.PIPE)
...
p.stdin.write("q\n")

IOError Input/Output Error When Printing

I have inherited some code which is periodically (randomly) failing due to an Input/Output error being raised during a call to print. I am trying to determine the cause of the exception being raised (or at least, better understand it) and how to handle it correctly.
When executing the following line of Python (in a 2.6.6 interpreter, running on CentOS 5.5):
print >> sys.stderr, 'Unable to do something: %s' % command
The exception is raised (traceback omitted):
IOError: [Errno 5] Input/output error
For context, this is generally what the larger function is trying to do at the time:
from subprocess import Popen, PIPE
import sys
def run_commands(commands):
for command in commands:
try:
out, err = Popen(command, shell=True, stdout=PIPE, stderr=PIPE).communicate()
print >> sys.stdout, out
if err:
raise Exception('ERROR -- an error occurred when executing this command: %s --- err: %s' % (command, err))
except:
print >> sys.stderr, 'Unable to do something: %s' % command
run_commands(["ls", "echo foo"])
The >> syntax is not particularly familiar to me, it's not something I use often, and I understand that it is perhaps the least preferred way of writing to stderr. However I don't believe the alternatives would fix the underlying problem.
From the documentation I have read, IOError 5 is often misused, and somewhat loosely defined, with different operating systems using it to cover different problems. The best I can see in my case is that the python process is no longer attached to the terminal/pty.
As best I can tell nothing is disconnecting the process from the stdout/stderr streams - the terminal is still open for example, and everything 'appears' to be fine. Could it be caused by the child process terminating in an unclean fashion? What else might be a cause of this problem - or what other steps could I introduce to debug it further?
In terms of handling the exception, I can obviously catch it, but I'm assuming this means I wont be able to print to stdout/stderr for the remainder of execution? Can I reattach to these streams somehow - perhaps by resetting sys.stdout to sys.__stdout__ etc? In this case not being able to write to stdout/stderr is not considered fatal but if it is an indication of something starting to go wrong I'd rather bail early.
I guess ultimately I'm at a bit of a loss as to where to start debugging this one...
I think it has to do with the terminal the process is attached to. I got this error when I run a python process in the background and closed the terminal in which I started it:
$ myprogram.py
Ctrl-Z
$ bg
$ exit
The problem was that I started a not daemonized process in a remote server and logged out (closing the terminal session). A solution was to start a screen/tmux session on the remote server and start the process within this session. Then detaching the session+log out keeps the terminal associated with the process. This works at least in the *nix world.
I had a very similar problem. I had a program that was launching several other programs using the subprocess module. Those subprocesses would then print output to the terminal. What I found was that when I closed the main program, it did not terminate the subprocesses automatically (as I had assumed), rather they kept running. So if I terminated both the main program and then the terminal it had been launched from*, the subprocesses no longer had a terminal attached to their stdout, and would throw an IOError. Hope this helps you.
*NB: it must be done in this order. If you just kill the terminal, (for some reason) that would kill both the main program and the subprocesses.
I just got this error because the directory where I was writing files to ran out of memory. Not sure if this is at all applicable to your situation.
I'm new here, so please forgive if I slip up a bit when it comes to the code detail.
Recently I was able to figure out what cause the I/O error of the print statement when the terminal associated with the run of the python script is closed.
It is because the string to be printed to stdout/stderr is too long. In this case, the "out" string is the culprit.
To fix this problem (without having to keep the terminal open while running the python script), simply read the "out" string line by line, and print line by line, until we reach the end of the "out" string. Something like:
while true:
ln=out.readline()
if not ln: break
print ln.strip("\n") # print without new line
The same problem occurs if you print the entire list of strings out to the screen. Simply print the list one item by one item.
Hope that helps!
The problem is you've closed the stdout pipe which python is attempting to write to when print() is called
This can be caused by running a script in the background using & and then closing the terminal session (ie. closing stdout)
$ python myscript.py &
$ exit
One solution is to set stdout to a file when running in the background
Example
$ python myscript.py > /var/log/myscript.log 2>&1 &
$ exit
No errors on print()
It could happen when your shell crashes while the print was trying to write the data into it.
For my case, I just restart the service, then this issue disappear. don't now why.
My issue was the same OSError Input/Output error, for Odoo.
After I restart the service, it disappeared.

What is happening to my process?

I'm executing a SSH process like so:
checkIn()
sshproc = subprocess.Popen([command], shell=True)
exit = os.waitpid(sshproc.pid, 0)[1]
checkOut()
Its important that the process form checkIn() and checkOut() actions before and after these lines of code. I have a test case that involves that I exit the SSH session by closing the terminal window manually. Sure enough, my program doesn't operate correctly and checkOut() is never called in this case. Can someone give me a pointer into what I can look in to fix this bug?
Let me know if any other information would helpful.
Thanks!
The Python process would normally execute in the same window as the ssh subprocess, and therefore be terminated just as abruptly when you close that window -- before getting a chance to execute checkOut. To try and ensure that a function gets called at program exit (though for sufficiently-abrupt terminations, depending on your OS, there may be no guarantees), try Python standard library module atexit.
Perhaps all you need is a try ... finally block?
try:
checkIn()
sshproc = subprocess.Popen([command], shell=True)
exit = os.waitpid(sshproc.pid, 0)[1]
finally:
checkOut()
Unless the system crashes, the process receives SIGKILL, etc., checkOut() should be called.

Categories

Resources