I have a thread which handles commands sent to a device. It opens a subprocess, sends the command to qmicli application (https://sigquit.wordpress.com/2012/08/20/an-introduction-to-libqmi/), gets a reply and the reply is dealt with.
This generally works fine for days/weeks of running. However I noticed that sometimes the thread would sometimes just stop doing anything when I make a subprocess.Popen call (the next lines of code do not run), the simplified code looks like this:
try:
self.qmi_process = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
# Log value of self.qmi_process happens here
if self.qmi_process:
out = self.qmi_process.communicate()
else:
return "ERROR: no qmi_process"
self.qmi_process = None
ret = ''.join(str(e) for e in out if e)
except:
return "ERROR: Caught unhandled exception"
I have started logging the value of the subprocess.Popen call to see if the communicate() call was blocking or was it failing before this when the subprocess call is created. It turns out that for some reason the subprocess.Popen fails and self.qmi_process value is not logged, but my Exception code is not being called, any idea how that could happen?
subprocess.Popen does not return.
I have multiple threads calling popen, I've read this can cause deadlock in 2.7?
Related
I'm trying to write some basic tests for a piece of code that normally accepts input endlessly through stdin until given a specific exit command.
I want to check if the program crashes on being given some input string (after some amount of time to account for processing), but can't seem to figure out how to send data and not be stuck waiting for output which I don't care about.
My current code looks like this (using cat as an example of the program):
myproc = subprocess.Popen(['cat'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
myproc.communicate(input=inputdata.encode("utf-8"))
time.sleep(0.1)
if myproc.poll() != None:
print("not running")
else:
print("still running")
How can I modify this to allow the program to proceed to the polling instead of hanging after the communicate() call?
You are using the wrong tool here with communicate which waits for the end of the program. You should simply feed the standard input of the subprocess:
myproc = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
myproc.stdin.write(inputdata.encode("utf-8"))
time.sleep(0.1)
if myproc.poll() != None:
print("not running")
else:
print("still running")
But beware: you cannot be sure that the output pipes will contain anything before the end of the subprocess...
You could set a timeout in the Popen.communicate(input=None, timeout=None) function. After the timeout the process is still running and I think but you have to test it you can still send in input with communicate.
From the docs:
If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output.
The child process is not killed if the timeout expires, so in order to
cleanup properly a well-behaved application should kill the child
process and finish communication:
I think I understand what you want here. If you know an existing command that will crash your program, you can use subprocess.Popen.wait() and it'll still block, but it'll return a tuple of the output message and the error associated with it, if any.
Then you can note the error and catch it in a try exception statement.
This was really helpful when I was working with sub processes:
https://docs.python.org/3/library/asyncio-subprocess.html
I am working with the cwiid library, which is a library written in C, but used in python. The library allows me to use a Wiimote to control some motors on a robot. The code is running as a daemon on an embedded device without a monitor, keyboard, or mouse.
When I try to initialize the object:
import cwiid
while True:
try:
wm = cwiid.Wiimote()
except RuntimeError:
# RuntimeError exception thrown if no Wiimote is trying to connect
# Wait a second
time.sleep(1)
# Try again
continue
99% of the time, everything works, but once in a while, the library gets into some sort of weird state where the call to cwiid.Wiimote() results in the library writing "Socket connect error (control channel)" to stderr, and python throwing an exception. When this happens, every subsequent call to cwiid.Wiimote() results in the same thing being written to stderr, and the same exception being thrown until I reboot the device.
What I want to do is detect this problem, and have python reboot the device automatically.
The type of exception the cwiid library throws if it's in a weird state is also RuntimeError, which is no different than a connection timeout exception (which is very common), so I can't seem to differentiate it that way. What I want to do is read stderr right after running cwiid.Wiimote() to see if the message "Socket connect error (control channel)" appears, and if so, reboot.
So far, I can redirect stderr to prevent the message from showing up by using some os.dup() and os.dup2() methods, but that doesn't appear to help me read stderr.
Most of the examples online deal with reading stderr if you're running something with subprocess, which doesn't apply in this case.
How could I go about reading stderr to detect the message being written to it?
I think what I'm looking for is something like:
while True:
try:
r, w = os.pipe()
os.dup2(sys.stderr.fileno(), r)
wm = cwiid.Wiimote()
except RuntimeError:
# RuntimeError exception thrown if no Wiimote is trying to connect
if ('Socket connect error (control channel)' in os.read(r, 100)):
# Reboot
# Wait a second
time.sleep(1)
# Try again
continue
This doesn't seem to work the way I think it should though.
As an alternative to fighting with stderr, how about the following which retries several times in quick succession (which should handle connection errors) before giving up:
while True:
for i in range(50): # try 50 times
try:
wm = cwiid.Wiimote()
break # break out of "for" and re-loop in "while"
except RuntimeError:
time.sleep(1)
else:
raise RuntimeError("permanent Wiimote failure... reboot!")
Under the hood, subprocess uses anonymous pipes in addition to dups to redirect subprocess output. To get a process to read its own stderr, you need to do this manually. It involves getting an anonymous pipe, redirecting the standard error to the pipe's input, running the stderr-writing action in question, reading the output from the other end of the pipe, and cleaning everything back up. It's all pretty fiddly, but I think I got it right in the code below.
The following wrapper for your cwiid.Wiimote call will return a tuple consisting of the result returned by the function call (None in case of RuntimeError) and stderr output generated, if any. See the tests function for example of how it's supposed to work under various conditions. I took a stab at adapting your example loop but don't quite understand what's supposed to happen when the cwiid.Wiimote call succeeds. In your example code, you just immediately re-loop.
Edit: Oops! Fixed a bug in example_loop() where Wiimote was called instead of passed as an argument.
import time
import os
import fcntl
def capture_runtime_stderr(action):
"""Handle runtime errors and capture stderr"""
(r,w) = os.pipe()
fcntl.fcntl(w, fcntl.F_SETFL, os.O_NONBLOCK)
saved_stderr = os.dup(2)
os.dup2(w, 2)
try:
result = action()
except RuntimeError:
result = None
finally:
os.close(w)
os.dup2(saved_stderr, 2)
with os.fdopen(r) as o:
output = o.read()
return (result, output)
## some tests
def return_value():
return 5
def return_value_with_stderr():
os.system("echo >&2 some output")
return 10
def runtime_error():
os.system("echo >&2 runtime error occurred")
raise RuntimeError()
def tests():
print(capture_runtime_stderr(return_value))
print(capture_runtime_stderr(return_value_with_stderr))
print(capture_runtime_stderr(runtime_error))
os.system("echo >&2 never fear, stderr is back to normal")
## possible code for your loop
def example_loop():
while True:
(wm, output) = capture_runtime_stderr(cmiid.Wiimote)
if wm == None:
if "Socket connect error" in output:
raise RuntimeError("library borked, time to reboot")
time.sleep(1)
continue
## do something with wm??
I am trying to execute a command as follows but it is STUCK in try block as below until the timeout kicks in,the python script executes fine by itself independently,can anyone suggest why is it so and how to debug this?
cmd = "python complete.py"
proc = subprocess.Popen(cmd.split(' '),stdout=subprocess.PIPE )
print "Executing %s"%cmd
try:
print "In try" **//Stuck here**
proc.wait(timeout=time_out)
except TimeoutExpired as e:
print e
proc.kill()
with proc.stdout as stdout:
for line in stdout:
print line,
proc.stdout isn't available to be read after the process exits. Instead, you need to read it while the process is running. communicate() will do that for you, but since you're not using it, you get to do it yourself.
Right now, your process is almost certainly hanging trying to write to its stdout -- which it can't do, because the other end of the pipe isn't being read from.
See also Using module 'subprocess' with timeout.
I'm writing a python script that launches programs in the background and then monitors to see if they encounter an error. I am using the subprocess module to start the process and keep a list of running programs.
processes.append((subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE), command))
I have found that when I try to monitor the programs by calling communicate on the subprocess object, the main program waits for the program to finish. I have tried to use poll(), but that doesn't give me access to the error code that caused the crash and I would like to address the issue and retry opening the process.
runningProcesses is a list of tuples containing the subprocess object and the command associated with it.
def monitorPrograms(runningProcesses):
for program in runningProcesses:
temp = program[0].communicate()
if program[0].returncode:
if program[0].returncode == 1:
print "Program exited successfully."
else:
print "Whoops, something went wrong. Program %s crashed." % program[0].pid
When I tried to get the return code without using communicate, the crash of the program didn't register.
Do I have to use threads to run the communication in parallel or is there a simpler way that I am missing?
No need to use threads, to monitor multiple processes, especially if you don't use their output (use DEVNULL instead of PIPE to hide the output), see Python threading multiple bash subprocesses?
Your main issue is incorrect Popen.poll() usage. If it returns None; it means that the process is still running -- you should call it until you get non-None value. Here's a similar to your case code example that prints ping processes statuses.
If you do want to get subprocess' stdout/stderr as a string then you could use threads, async.io.
If you are on Unix and you control all the code that may spawn subprocesses then you could avoid polling and handle SIGCHLD yourself. asyncio stdlib library may handle SIGCHLD. You could also implement it manually, though it might be complicated.
Based on my research, the best way to do this is with threads. Here's an article that I referenced when creating my own package to solve this problem.
The basic method used here is to spin of threads that constantly request log output (and finally the exit status) of the subprocess call.
Here's an example of my own "receiver" which listens for logs:
class Receiver(threading.Thread):
def __init__(self, stream, stream_type=None, callback=None):
super(Receiver, self).__init__()
self.stream = stream
self.stream_type = stream_type
self.callback = callback
self.complete = False
self.text = ''
def run(self):
for line in iter(self.stream.readline, ''):
line = line.rstrip()
if self.callback:
line = self.callback(line, msg_type=self.stream_type)
self.text += line + "\n"
self.complete = True
And now the code that spins the receiver off:
def _execute(self, command):
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell=True, preexec_fn=os.setsid)
out = Receiver(process.stdout, stream_type='out', callback=self.handle_log)
err = Receiver(process.stderr, stream_type='err', callback=self.handle_log)
out.start()
err.start()
try:
self.wait_for_complete(out)
except CommandTimeout:
os.killpg(process.pid, signal.SIGTERM)
raise
else:
status = process.poll()
output = CommandOutput(status=status, stdout=out.text, stderr=err.text)
return output
finally:
out.join(timeout=1)
err.join(timeout=1)
CommandOutput is simply a named tuple that makes it easy to reference the data I care about.
You'll notice I have a method 'wait_for_complete' which waits for the receiver to set complete = True. Once complete, the execute method calls process.poll() to get the exit code. We now have all stdout/stderr and the status code of the process.
I've been using subprocess.check_output() for some time to capture output from subprocesses, but ran into some performance problems under certain circumstances. I'm running this on a RHEL6 machine.
The calling Python environment is linux-compiled and 64-bit. The subprocess I'm executing is a shell script which eventually fires off a Windows python.exe process via Wine (why this foolishness is required is another story). As input to the shell script, I'm piping in a small bit of Python code that gets passed off to python.exe.
While the system is under moderate/heavy load (40 to 70% CPU utilization), I've noticed that using subprocess.check_output(cmd, shell=True) can result in a significant delay (up to ~45 seconds) after the subprocess has finished execution before the check_output command returns. Looking at output from ps -efH during this time shows the called subprocess as sh <defunct>, until it finally returns with a normal zero exit status.
Conversely, using subprocess.call(cmd, shell=True) to run the same command under the same moderate/heavy load will cause the subprocess to return immediately with no delay, all output printed to STDOUT/STDERR (rather than returned from the function call).
Why is there such a significant delay only when check_output() is redirecting the STDOUT/STDERR output into its return value, and not when the call() simply prints it back to the parent's STDOUT/STDERR?
Reading the docs, both subprocess.call and subprocess.check_output are use-cases of subprocess.Popen. One minor difference is that check_output will raise a Python error if the subprocess returns a non-zero exit status. The greater difference is emphasized in the bit about check_output (my emphasis):
The full function signature is largely the same as that of the Popen constructor, except that stdout is not permitted as it is used internally. All other supplied arguments are passed directly through to the Popen constructor.
So how is stdout "used internally"? Let's compare call and check_output:
call
def call(*popenargs, **kwargs):
return Popen(*popenargs, **kwargs).wait()
check_output
def check_output(*popenargs, **kwargs):
if 'stdout' in kwargs:
raise ValueError('stdout argument not allowed, it will be overridden.')
process = Popen(stdout=PIPE, *popenargs, **kwargs)
output, unused_err = process.communicate()
retcode = process.poll()
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
raise CalledProcessError(retcode, cmd, output=output)
return output
communicate
Now we have to look at Popen.communicate as well. Doing this, we notice that for one pipe, communicate does several things which simply take more time than simply returning Popen().wait(), as call does.
For one thing, communicate processes stdout=PIPE whether you set shell=True or not. Clearly, call does not. It just lets your shell spout whatever... making it a security risk, as Python describes here.
Secondly, in the case of check_output(cmd, shell=True) (just one pipe)... whatever your subprocess sends to stdout is processed by a thread in the _communicate method. And Popen must join the thread (wait on it) before additionally waiting on the subprocess itself to terminate!
Plus, more trivially, it processes stdout as a list which must then be joined into a string.
In short, even with minimal arguments, check_output spends a lot more time in Python processes than call does.
Let's look at the code. The .check_output has the following wait:
def _internal_poll(self, _deadstate=None, _waitpid=os.waitpid,
_WNOHANG=os.WNOHANG, _os_error=os.error, _ECHILD=errno.ECHILD):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it cannot reference anything
outside of the local scope (nor can any methods it calls).
"""
if self.returncode is None:
try:
pid, sts = _waitpid(self.pid, _WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except _os_error as e:
if _deadstate is not None:
self.returncode = _deadstate
if e.errno == _ECHILD:
# This happens if SIGCLD is set to be ignored or
# waiting for child processes has otherwise been
# disabled for our process. This child is dead, we
# can't get the status.
# http://bugs.python.org/issue15756
self.returncode = 0
return self.returncode
The .call waits using the following code:
def wait(self):
"""Wait for child process to terminate. Returns returncode
attribute."""
while self.returncode is None:
try:
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
# This happens if SIGCLD is set to be ignored or waiting
# for child processes has otherwise been disabled for our
# process. This child is dead, we can't get the status.
pid = self.pid
sts = 0
# Check the pid and loop as waitpid has been known to return
# 0 even without WNOHANG in odd situations. issue14396.
if pid == self.pid:
self._handle_exitstatus(sts)
return self.returncode
Notice that bug related to internal_poll. It is viewable at http://bugs.python.org/issue15756. Pretty much exactly the issue you are running into.
Edit: The other potential issue between .call and .check_output is that .check_output actually cares about stdin and stdout and will try to perform IO against both pipes. If you are running into a process that get's itself into a zombie state it is possible that a read against a pipe in a defunct state is causing the hang you are experiencing.
In most cases zombie states get cleaned up pretty quickly, but, they will not if for instance they are interrupted while in a system call (like read or write). Of course the read/write system call should itself be interrupted as soon as the IO can no longer be performed, but, it is possible that you are hitting some sort of race condition where things are getting killed in a bad order.
The only way that I can think of to determine which is the cause in this case is for you to either add debugging code to the subprocess file or to invoke the python debugger and initiate a backtrace when you run into the condition you are experiencing.