I want to capture stdout from a long-ish running process started via subprocess.Popen(...) so I'm using stdout=PIPE as an arg.
However, because it's a long running process I also want to send the output to the console (as if I hadn't piped it) to give the user of the script an idea that it's still working.
Is this at all possible?
Cheers.
The buffering your long-running sub-process is probably performing will make your console output jerky and very bad UX. I suggest you consider instead using pexpect (or, on Windows, wexpect) to defeat such buffering and get smooth, regular output from the sub-process. For example (on just about any unix-y system, after installing pexpect):
>>> import pexpect
>>> child = pexpect.spawn('/bin/bash -c "echo ba; sleep 1; echo bu"', logfile=sys.stdout); x=child.expect(pexpect.EOF); child.close()
ba
bu
>>> child.before
'ba\r\nbu\r\n'
The ba and bu will come with the proper timing (about a second between them). Note the output is not subject to normal terminal processing, so the carriage returns are left in there -- you'll need to post-process the string yourself (just a simple .replace!-) if you need \n as end-of-line markers (the lack of processing is important just in case the sub-process is writing binary data to its stdout -- this ensures all the data's left intact!-).
S. Lott's comment points to Getting realtime output using subprocess and Real-time intercepting of stdout from another process in Python
I'm curious that Alex's answer here is different from his answer 1085071.
My simple little experiments with the answers in the two other referenced questions has given good results...
I went and looked at wexpect as per Alex's answer above, but I have to say reading the comments in the code I was not left a very good feeling about using it.
I guess the meta-question here is when will pexpect/wexpect be one of the Included Batteries?
Can you simply print it as you read it from the pipe?
Inspired by pty.openpty() suggestion somewhere above, tested on python2.6, linux. Publishing since it took a while to make this working properly, w/o buffering...
def call_and_peek_output(cmd, shell=False):
import pty, subprocess
master, slave = pty.openpty()
p = subprocess.Popen(cmd, shell=shell, stdin=None, stdout=slave, close_fds=True)
os.close(slave)
line = ""
while True:
try:
ch = os.read(master, 1)
except OSError:
# We get this exception when the spawn process closes all references to the
# pty descriptor which we passed him to use for stdout
# (typically when it and its childs exit)
break
line += ch
sys.stdout.write(ch)
if ch == '\n':
yield line
line = ""
if line:
yield line
ret = p.wait()
if ret:
raise subprocess.CalledProcessError(ret, cmd)
for l in call_and_peek_output("ls /", shell=True):
pass
Alternatively, you can pipe your process into tee and capture only one of the streams.
Something along the lines of sh -c 'process interesting stuff' | tee /dev/stderr.
Of course, this only works on Unix-like systems.
Related
Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.
I am working on some scripts (in the company I work in) that are loaded/unloaded into hypervisors to fire a piece of code when an event occurs. The only way to actually unload a script is to hit Ctrl-C. I am writing a function in Python that automates the process
As soon as it sees the string "done" in the output of the program, it should kill the vprobe.
I am using subprocess.Popen to execute the command:
lineList = buff.readlines()
cmd = "vprobe /vprobe/myhello.emt"
p = subprocess.Popen(args = cmd, shell=True,stdout = buff, universal_newlines = True,preexec_fn=os.setsid)
while not re.search("done",lineList[-1]):
print "waiting"
os.kill(p.pid,signal.CTRL_C_EVENT)
As you can see, I am writing the output in buff file descriptor opened in read+write mode. I check the last line; if it has 'done', I kill it. Unfortunately, the CTRL_C_EVENT is only valid for Windows.
What can I do for Linux?
I think you can just send the Linux equivalent, signal.SIGINT (the interrupt signal).
(Edit: I used to have something here discouraging the use of this strategy for controlling subprocesses, but on more careful reading it sounds like you've already decided you need control-C in this specific case... So, SIGINT should do it.)
In Linux, Ctrl-C keyboard interrupt can be sent programmatically to a process using Popen.send_signal(signal.SIGINT) function. For example
import subprocess
import signal
..
process = subprocess.Popen(..)
..
process.send_signal(signal.SIGINT)
..
Don't use Popen.communicate() for blocking commands..
Maybe I misunderstand something, but the way you do it it is difficult to get the desired result.
Whatever buff is, you query it first, then use it in the context of Popen() and then you hope that by maciv lineList fills itself up.
What you probably want is something like
logfile = open("mylogfile", "a")
p = subprocess.Popen(['vprobe', '/vprobe/myhello.emt'], stdout=subprocess.PIPE, buff, universal_newlines=True, preexec_fn=os.setsid)
for line in p.stdout:
logfile.write(line)
if re.search("done", line):
break
print "waiting"
os.kill(p.pid, signal.CTRL_C_EVENT)
This gives you a pipe end fed by your vprobe script which you can read out linewise and act appropriately upon the found output.
I am trying to read from both stdout and stderr from a Popen and print them out. The command I am running with Popen is the following
#!/bin/bash
i=10
while (( i > 0 )); do
sleep 1s
echo heyo-$i
i="$((i-1))"
done
echo 'to error' >&2
When I run this in the shell, I get one line of output and then a second break and then one line again, etc. However, I am unable to recreate this using python. I am starting two threads, one each to read from stdout and stderr, put the lines read into a Queue and another thread that takes items from this queue and prints them out. But with this, I see that all the output gets printed out at once, after the subprocess ends. I want the lines to be printed as and when they are echo'ed.
Here's my python code:
# The `randoms` is in the $PATH
proc = sp.Popen(['randoms'], stdout=sp.PIPE, stderr=sp.PIPE, bufsize=0)
q = Queue()
def stream_watcher(stream, name=None):
"""Take lines from the stream and put them in the q"""
for line in stream:
q.put((name, line))
if not stream.closed:
stream.close()
Thread(target=stream_watcher, args=(proc.stdout, 'out')).start()
Thread(target=stream_watcher, args=(proc.stderr, 'err')).start()
def displayer():
"""Take lines from the q and add them to the display"""
while True:
try:
name, line = q.get(True, 1)
except Empty:
if proc.poll() is not None:
break
else:
# Print line with the trailing newline character
print(name.upper(), '->', line[:-1])
q.task_done()
print('-*- FINISHED -*-')
Thread(target=displayer).start()
Any ideas? What am I missing here?
Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.
IIRC, setting shell=True on Popen should do it.
I have an app that reads in stuff from stdin and returns, after a newline, results to stdout
A simple (stupid) example:
$ app
Expand[(x+1)^2]<CR>
x^2 + 2*x + 1
100 - 4<CR>
96
Opening and closing the app requires a lot of initialization and clean-up (its an interface to a Computer Algebra System), so I want to keep this to a minimum.
I want to open a pipe in Python to this process, write strings to its stdin and read out the results from stdout. Popen.communicate() doesn't work for this, as it closes the file handle, requiring to reopen the pipe.
I've tried something along the lines of this related question:
Communicate multiple times with a process without breaking the pipe? but I'm not sure how to wait for the output. It is also difficult to know a priori how long it will take the app to finish to process for the input at hand, so I don't want to make any assumptions. I guess most of my confusion comes from this question: Non-blocking read on a subprocess.PIPE in python where it is stated that mixing high and low level functions is not a good idea.
EDIT:
Sorry that I didn't give any code before, got interrupted. This is what I've tried so far and it seems to work, I'm just worried that something goes wrong unnoticed:
from subprocess import Popen, PIPE
pipe = Popen(["MathPipe"], stdin=PIPE, stdout=PIPE)
expressions = ["Expand[(x+1)^2]", "Integrate[Sin[x], {x,0,2*Pi}]"] # ...
for expr in expressions:
pipe.stdin.write(expr)
while True:
line = pipe.stdout.readline()
if line != '':
print line
# output of MathPipe is always terminated by ';'
if ";" in line:
break
Potential problems with this?
Using subprocess, you can't do this reliably. You might want to look at using the pexpect library. That won't work on Windows - if you're on Windows, try winpexpect.
Also, if you're trying to do mathematical stuff in Python, check out SAGE. They do a lot of work on interfacing with other open-source maths software, so there's a chance they've already done what you're trying to.
Perhaps you could pass stdin=subprocess.PIPE as an argument to subprocess.Popen. This will make the process' stdin available as a general file-like object:
import sys, subprocess
proc = subprocess.Popen(["mathematica <args>"], stdin=subprocess.PIPE,
stdout=sys.stdout, shell=True)
proc.stdin.write("Expand[ (x-1)^2 ]") # Write whatever to the process
proc.stdin.flush() # Ensure nothing is left in the buffer
proc.terminate() # Kill the process
This directs the subprocess' output directly to your python process' stdout. If you need to read the output and do some editing first, that is possible as well. Check out http://docs.python.org/library/subprocess.html#popen-objects.
I'm running multiple commands which may take some time, in parallel, on a Linux machine running Python 2.6.
So, I used subprocess.Popen class and process.communicate() method to parallelize execution of mulitple command groups and capture the output at once after execution.
def run_commands(commands, print_lock):
# this part runs in parallel.
outputs = []
for command in commands:
proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
output, unused_err = proc.communicate() # buffers the output
retcode = proc.poll() # ensures subprocess termination
outputs.append(output)
with print_lock: # print them at once (synchronized)
for output in outputs:
for line in output.splitlines():
print(line)
At somewhere else it's called like this:
processes = []
print_lock = Lock()
for ...:
commands = ... # a group of commands is generated, which takes some time.
processes.append(Thread(target=run_commands, args=(commands, print_lock)))
processes[-1].start()
for p in processes: p.join()
print('done.')
The expected result is that each output of a group of commands is displayed at once while execution of them is done in parallel.
But from the second output group (of course, the thread that become the second is changed due to scheduling indeterminism), it begins to print without newlines and adding spaces as many as the number of characters printed in each previous line and input echo is turned off -- the terminal state is "garbled" or "crashed". (If I issue reset shell command, it restores normal.)
At first, I tried to find the reason from handling of '\r', but it was not the reason. As you see in my code, I handled it properly using splitlines(), and I confirmed that with repr() function applied to the output.
I think the reason is concurrent use of pipes in Popen and communicate() for stdout/stderr. I tried check_output shortcut method in Python 2.7, but no success. Of course, the problem described above does not occur if I serialize all command executions and prints.
Is there any better way to handle Popen and communicate() in parallel?
A final result inspired by the comment from J.F.Sebastian.
http://bitbucket.org/daybreaker/kaist-cs443/src/247f9ecf3cee/tools/manage.py
It seems to be a Python bug.
I am not sure it is clear what run_commands needs to be actually doing, but it seems to be simply doing a poll on a subprocess, ignoring the return-code and continuing in the loop. When you get to the part where you are printing output, how could you know the sub-processes have completed?
In your example code I noticed your use of:
for line in output.splitlines():
to address partially the issue of " /r " ; use of
for line in output.splitlines(True):
would have been helpful.