Popen does not give output immediately when available - python

I am trying to read from both stdout and stderr from a Popen and print them out. The command I am running with Popen is the following
#!/bin/bash
i=10
while (( i > 0 )); do
sleep 1s
echo heyo-$i
i="$((i-1))"
done
echo 'to error' >&2
When I run this in the shell, I get one line of output and then a second break and then one line again, etc. However, I am unable to recreate this using python. I am starting two threads, one each to read from stdout and stderr, put the lines read into a Queue and another thread that takes items from this queue and prints them out. But with this, I see that all the output gets printed out at once, after the subprocess ends. I want the lines to be printed as and when they are echo'ed.
Here's my python code:
# The `randoms` is in the $PATH
proc = sp.Popen(['randoms'], stdout=sp.PIPE, stderr=sp.PIPE, bufsize=0)
q = Queue()
def stream_watcher(stream, name=None):
"""Take lines from the stream and put them in the q"""
for line in stream:
q.put((name, line))
if not stream.closed:
stream.close()
Thread(target=stream_watcher, args=(proc.stdout, 'out')).start()
Thread(target=stream_watcher, args=(proc.stderr, 'err')).start()
def displayer():
"""Take lines from the q and add them to the display"""
while True:
try:
name, line = q.get(True, 1)
except Empty:
if proc.poll() is not None:
break
else:
# Print line with the trailing newline character
print(name.upper(), '->', line[:-1])
q.task_done()
print('-*- FINISHED -*-')
Thread(target=displayer).start()
Any ideas? What am I missing here?

Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.

IIRC, setting shell=True on Popen should do it.

Related

How to get live output with subprocess in Python

I am trying to run a python file that prints something, waits 2 seconds, and then prints again. I want to catch these outputs live from my python script to then process them. I tried different things but nothing worked.
process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
while True:
output = process.stdout.readline()
if process.poll() is not None and output == '':
break
if output:
print(output.strip())
I'm at this point but it doesn't work. It waits until the code finishes and then prints all the outputs.
I just need to run a python file and get live outputs from it, if you have other ideas for doing it, without using the print function let me know, just know that I have to run the file separately. I just thought of the easiest way possible but, from what I'm seeing it can't be done.
There are three layers of buffering here, and you need to limit all three of them to guarantee you get live data:
Use the stdbuf command (on Linux) to wrap the subprocess execution (e.g. run ['stdbuf', '-oL'] + cmd instead of just cmd), or (if you have the ability to do so) alter the program itself to either explicitly change the buffering on stdout (e.g. using setvbuf for C/C++ code to switch stdout globally to line-buffered mode, rather than the default block buffering it uses when outputting to a non-tty) or to insert flush statements after critical output (e.g. fflush(stdout); for C/C++, fileobj.flush() for Python, etc.) the buffering of the program to line-oriented mode (or add fflushs); without that, everything is stuck in user-mode buffers of the sub-process.
Add bufsize=0 to the Popen arguments (probably not needed since you don't send anything to stdin, but harmless) so it unbuffers all piped handles. If the Popen is in text=True mode, switch to bufsize=1 (which is line-buffered, rather than unbuffered).
Add flush=True to the print arguments (if you're connected to a terminal, the line-buffering will flush it for you, so it's only if stdout is piped to a file that this will matter), or explicitly call sys.stdout.flush().
Between the three of these, you should be able to guarantee no data is stuck waiting in user-mode buffers; if at least one line has been output by the sub-process, it will reach you immediately, and any output triggered by it will also appear immediately. Item #1 is the hardest in most cases (when you can't use stdbuf, or the process reconfigures its own buffering internally and undoes the effect of stdbuf, and you can't modify the process executable to fix it); you have complete control over #2 and #3, but #1 may be outside your control.
This is the code I use for that same purpose:
def run_command(command, **kwargs):
"""Run a command while printing the live output"""
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
**kwargs,
)
while True: # Could be more pythonic with := in Python3.8+
line = process.stdout.readline()
if not line and process.poll() is not None:
break
print(line.decode(), end='')
An example of usage would be:
run_command(['git', 'status'], cwd=Path(__file__).parent.absolute())

How to get output from python2 subprocess which run a script using multiprocessing?

Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

Read stdout from subprocess until there is nothing left

I would like to run several commands in the same shell. After some research I found that I could keep a shell open using the return process from Popen. I can then write and read to stdin and stdout. I tried implementing it as such:
process = Popen(['/bin/sh'], stdin=PIPE, stdout=PIPE)
process.stdin.write('ls -al\n')
out = ' '
while not out == '':
out = process.stdout.readline().rstrip('\n')
print out
Not only is my solution ugly, it doesn't work. out is never empty because it hands on the readline(). How can I successfully end the while loop when there is nothing left to read?
Use iter to read data in real time:
for line in iter(process.stdout.readline,""):
print line
If you just want to write to stdin and get the output you can use communicate to make the process end:
process = Popen(['/bin/sh'], stdin=PIPE, stdout=PIPE)
out,err =process.communicate('ls -al\n')
Or simply get the output use check_output:
from subprocess import check_output
out = check_output(["ls", "-al"])
The command you're running in a subprocess is sh, so the output you're reading is sh's output. Since you didn't indicate to the shell it should quit, it is still alive, thus its stdout is still open.
You can perhaps write exit to its stdin to make it quit, but be aware that in any case, you get to read things you don't need from its stdout, e.g. the prompt.
Bottom line, this approach is flawed to start with...

Displaying output of shell commands with shared environments

Is there any way to display the output of a shell command in Python, as the command runs?
I have the following code to send commands to a specific shell (in this case, /bin/tcsh):
import subprocess
import select
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
poll = select.poll()
poll.register(cmd.stdout.fileno(),select.POLLIN)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
ready = poll.poll()
if ready:
result = cmd.stdout.readline()
print result
Also, I got the code above from this thread, but I am not sure I understand how the polling mechanism works.
What exactly is registered above?
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
Yes, it is entirely possible to display the output of a shell comamand as the command runs. There are two requirements:
1) The command must flush its output.
Many programs buffer their output differently according to whether the output is connected to a terminal, a pipe, or a file. If they are connected to a pipe, they might write their output in much bigger chunks much less often. For each program that you execute, consult its documentation. Some versions of /bin/cat', for example, have the -u switch.
2) You must read it piecemeal, and not all at once.
Your program must be structured to one piece at a time from the output stream. This means that you ought not do these, which each read the entire stream at one go:
cmd.stdout.read()
for i in cmd.stdout:
list(cmd.stdout.readline())
But instead, you could do one of these:
while not_dead_yet:
line = cmd.stdout.readline()
for line in iter(cmd.stdout.readline, b''):
pass
Now, for your three specific questions:
Is there any way to display the output of a shell command in Python, as the command runs?
Yes, but only if the command you are running outputs as it runs and doesn't save it up for the end.
What exactly is registered above?
The file descriptor which, when read, makes available the output of the subprocess.
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
You don't. You also don't need the poll(). It is possible, if your commands list is fairly large, that might need to poll() both the stdin and stdout streams to avoid a deadlock. But if your commands list is fairly modest (less than 5Kbytes), then you will be OK just writing them at the beginning.
Here is one possible solution:
#! /usr/bin/python
import subprocess
import select
# Critical: all of this must fit inside ONE pipe() buffer
commands = ['echo Start\n', 'date\n', 'sleep 10\n', 'date\n', 'exit\n']
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
for line in iter(cmd.stdout.readline, b''):
print line

Python - capture Popen stdout AND display on console?

I want to capture stdout from a long-ish running process started via subprocess.Popen(...) so I'm using stdout=PIPE as an arg.
However, because it's a long running process I also want to send the output to the console (as if I hadn't piped it) to give the user of the script an idea that it's still working.
Is this at all possible?
Cheers.
The buffering your long-running sub-process is probably performing will make your console output jerky and very bad UX. I suggest you consider instead using pexpect (or, on Windows, wexpect) to defeat such buffering and get smooth, regular output from the sub-process. For example (on just about any unix-y system, after installing pexpect):
>>> import pexpect
>>> child = pexpect.spawn('/bin/bash -c "echo ba; sleep 1; echo bu"', logfile=sys.stdout); x=child.expect(pexpect.EOF); child.close()
ba
bu
>>> child.before
'ba\r\nbu\r\n'
The ba and bu will come with the proper timing (about a second between them). Note the output is not subject to normal terminal processing, so the carriage returns are left in there -- you'll need to post-process the string yourself (just a simple .replace!-) if you need \n as end-of-line markers (the lack of processing is important just in case the sub-process is writing binary data to its stdout -- this ensures all the data's left intact!-).
S. Lott's comment points to Getting realtime output using subprocess and Real-time intercepting of stdout from another process in Python
I'm curious that Alex's answer here is different from his answer 1085071.
My simple little experiments with the answers in the two other referenced questions has given good results...
I went and looked at wexpect as per Alex's answer above, but I have to say reading the comments in the code I was not left a very good feeling about using it.
I guess the meta-question here is when will pexpect/wexpect be one of the Included Batteries?
Can you simply print it as you read it from the pipe?
Inspired by pty.openpty() suggestion somewhere above, tested on python2.6, linux. Publishing since it took a while to make this working properly, w/o buffering...
def call_and_peek_output(cmd, shell=False):
import pty, subprocess
master, slave = pty.openpty()
p = subprocess.Popen(cmd, shell=shell, stdin=None, stdout=slave, close_fds=True)
os.close(slave)
line = ""
while True:
try:
ch = os.read(master, 1)
except OSError:
# We get this exception when the spawn process closes all references to the
# pty descriptor which we passed him to use for stdout
# (typically when it and its childs exit)
break
line += ch
sys.stdout.write(ch)
if ch == '\n':
yield line
line = ""
if line:
yield line
ret = p.wait()
if ret:
raise subprocess.CalledProcessError(ret, cmd)
for l in call_and_peek_output("ls /", shell=True):
pass
Alternatively, you can pipe your process into tee and capture only one of the streams.
Something along the lines of sh -c 'process interesting stuff' | tee /dev/stderr.
Of course, this only works on Unix-like systems.

Categories

Resources