How to get live output with subprocess in Python - python

I am trying to run a python file that prints something, waits 2 seconds, and then prints again. I want to catch these outputs live from my python script to then process them. I tried different things but nothing worked.
process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
while True:
output = process.stdout.readline()
if process.poll() is not None and output == '':
break
if output:
print(output.strip())
I'm at this point but it doesn't work. It waits until the code finishes and then prints all the outputs.
I just need to run a python file and get live outputs from it, if you have other ideas for doing it, without using the print function let me know, just know that I have to run the file separately. I just thought of the easiest way possible but, from what I'm seeing it can't be done.

There are three layers of buffering here, and you need to limit all three of them to guarantee you get live data:
Use the stdbuf command (on Linux) to wrap the subprocess execution (e.g. run ['stdbuf', '-oL'] + cmd instead of just cmd), or (if you have the ability to do so) alter the program itself to either explicitly change the buffering on stdout (e.g. using setvbuf for C/C++ code to switch stdout globally to line-buffered mode, rather than the default block buffering it uses when outputting to a non-tty) or to insert flush statements after critical output (e.g. fflush(stdout); for C/C++, fileobj.flush() for Python, etc.) the buffering of the program to line-oriented mode (or add fflushs); without that, everything is stuck in user-mode buffers of the sub-process.
Add bufsize=0 to the Popen arguments (probably not needed since you don't send anything to stdin, but harmless) so it unbuffers all piped handles. If the Popen is in text=True mode, switch to bufsize=1 (which is line-buffered, rather than unbuffered).
Add flush=True to the print arguments (if you're connected to a terminal, the line-buffering will flush it for you, so it's only if stdout is piped to a file that this will matter), or explicitly call sys.stdout.flush().
Between the three of these, you should be able to guarantee no data is stuck waiting in user-mode buffers; if at least one line has been output by the sub-process, it will reach you immediately, and any output triggered by it will also appear immediately. Item #1 is the hardest in most cases (when you can't use stdbuf, or the process reconfigures its own buffering internally and undoes the effect of stdbuf, and you can't modify the process executable to fix it); you have complete control over #2 and #3, but #1 may be outside your control.

This is the code I use for that same purpose:
def run_command(command, **kwargs):
"""Run a command while printing the live output"""
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
**kwargs,
)
while True: # Could be more pythonic with := in Python3.8+
line = process.stdout.readline()
if not line and process.poll() is not None:
break
print(line.decode(), end='')
An example of usage would be:
run_command(['git', 'status'], cwd=Path(__file__).parent.absolute())

Related

How do I properly loop through subprocess.stdout

I'm creating a program where I need to use a powershell session and I found out how I could have a persistent session using the below code. However I want to loop through the new lines of the output of powershell when a command has been run. The for loop below is the only way i've found to do so but it expects an EOF and doesn't get it so it just lingers and the program never exits. How can I get the amount of new lines in stdout so I can properly loop through them?
from subprocess import Popen, PIPE
process = Popen(["powershell"], stdin=PIPE, stdout=PIPE)
def ps(command):
command = bytes("{}\n".format(command), encoding='utf-8')
process.stdin.write(command)
process.stdin.flush()
process.stdout.readline()
return process.stdout.readline().decode("utf-8")
ps("echo hello world")
for line in process.stdout:
print(line.strip().decode("utf-8"))
process.stdin.close()
process.wait()
You need the Powershell command to know when to exit. Typically, the solution is to not just flush, but close the stdin for the child process; when it's done with its work and finds EOF on its input, it should exit on its own. Just change:
process.stdin.flush()
to:
process.stdin.close()
which implies a flush and also ensures the child process knows input is done. If that doesn't work on its own, you might explicitly add a quit or exit (whatever Powershell uses to terminate the session manually) command after the command you're actually running.
If you must run multiple commands in the single subprocess, and each command must be fully consumed before the next one is sent, there are terrible heuristic solutions available, e.g. sending three commands at once, where the second simply echoes a sentinel string and the third explicitly flushes stdout (to ensure block buffering doesn't mean you deadlock waiting for the sentinel when its stuck in subprocess's internal buffers), and your loop can terminate once it sees the sentinel. Without a sentinel, it's worse, because you basically can't tell when the command is done, and just have to use the select/selectors module to poll the process's stdout with a timeout, reading lines whenever there is available data, and assuming the process is done if no new input is available without the expected timeout window.

How to get output from python2 subprocess which run a script using multiprocessing?

Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

Displaying output of shell commands with shared environments

Is there any way to display the output of a shell command in Python, as the command runs?
I have the following code to send commands to a specific shell (in this case, /bin/tcsh):
import subprocess
import select
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
poll = select.poll()
poll.register(cmd.stdout.fileno(),select.POLLIN)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
ready = poll.poll()
if ready:
result = cmd.stdout.readline()
print result
Also, I got the code above from this thread, but I am not sure I understand how the polling mechanism works.
What exactly is registered above?
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
Yes, it is entirely possible to display the output of a shell comamand as the command runs. There are two requirements:
1) The command must flush its output.
Many programs buffer their output differently according to whether the output is connected to a terminal, a pipe, or a file. If they are connected to a pipe, they might write their output in much bigger chunks much less often. For each program that you execute, consult its documentation. Some versions of /bin/cat', for example, have the -u switch.
2) You must read it piecemeal, and not all at once.
Your program must be structured to one piece at a time from the output stream. This means that you ought not do these, which each read the entire stream at one go:
cmd.stdout.read()
for i in cmd.stdout:
list(cmd.stdout.readline())
But instead, you could do one of these:
while not_dead_yet:
line = cmd.stdout.readline()
for line in iter(cmd.stdout.readline, b''):
pass
Now, for your three specific questions:
Is there any way to display the output of a shell command in Python, as the command runs?
Yes, but only if the command you are running outputs as it runs and doesn't save it up for the end.
What exactly is registered above?
The file descriptor which, when read, makes available the output of the subprocess.
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
You don't. You also don't need the poll(). It is possible, if your commands list is fairly large, that might need to poll() both the stdin and stdout streams to avoid a deadlock. But if your commands list is fairly modest (less than 5Kbytes), then you will be OK just writing them at the beginning.
Here is one possible solution:
#! /usr/bin/python
import subprocess
import select
# Critical: all of this must fit inside ONE pipe() buffer
commands = ['echo Start\n', 'date\n', 'sleep 10\n', 'date\n', 'exit\n']
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
for line in iter(cmd.stdout.readline, b''):
print line

Popen does not give output immediately when available

I am trying to read from both stdout and stderr from a Popen and print them out. The command I am running with Popen is the following
#!/bin/bash
i=10
while (( i > 0 )); do
sleep 1s
echo heyo-$i
i="$((i-1))"
done
echo 'to error' >&2
When I run this in the shell, I get one line of output and then a second break and then one line again, etc. However, I am unable to recreate this using python. I am starting two threads, one each to read from stdout and stderr, put the lines read into a Queue and another thread that takes items from this queue and prints them out. But with this, I see that all the output gets printed out at once, after the subprocess ends. I want the lines to be printed as and when they are echo'ed.
Here's my python code:
# The `randoms` is in the $PATH
proc = sp.Popen(['randoms'], stdout=sp.PIPE, stderr=sp.PIPE, bufsize=0)
q = Queue()
def stream_watcher(stream, name=None):
"""Take lines from the stream and put them in the q"""
for line in stream:
q.put((name, line))
if not stream.closed:
stream.close()
Thread(target=stream_watcher, args=(proc.stdout, 'out')).start()
Thread(target=stream_watcher, args=(proc.stderr, 'err')).start()
def displayer():
"""Take lines from the q and add them to the display"""
while True:
try:
name, line = q.get(True, 1)
except Empty:
if proc.poll() is not None:
break
else:
# Print line with the trailing newline character
print(name.upper(), '->', line[:-1])
q.task_done()
print('-*- FINISHED -*-')
Thread(target=displayer).start()
Any ideas? What am I missing here?
Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.
IIRC, setting shell=True on Popen should do it.

Better multithreaded use of Python subprocess.Popen & communicate()?

I'm running multiple commands which may take some time, in parallel, on a Linux machine running Python 2.6.
So, I used subprocess.Popen class and process.communicate() method to parallelize execution of mulitple command groups and capture the output at once after execution.
def run_commands(commands, print_lock):
# this part runs in parallel.
outputs = []
for command in commands:
proc = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True)
output, unused_err = proc.communicate() # buffers the output
retcode = proc.poll() # ensures subprocess termination
outputs.append(output)
with print_lock: # print them at once (synchronized)
for output in outputs:
for line in output.splitlines():
print(line)
At somewhere else it's called like this:
processes = []
print_lock = Lock()
for ...:
commands = ... # a group of commands is generated, which takes some time.
processes.append(Thread(target=run_commands, args=(commands, print_lock)))
processes[-1].start()
for p in processes: p.join()
print('done.')
The expected result is that each output of a group of commands is displayed at once while execution of them is done in parallel.
But from the second output group (of course, the thread that become the second is changed due to scheduling indeterminism), it begins to print without newlines and adding spaces as many as the number of characters printed in each previous line and input echo is turned off -- the terminal state is "garbled" or "crashed". (If I issue reset shell command, it restores normal.)
At first, I tried to find the reason from handling of '\r', but it was not the reason. As you see in my code, I handled it properly using splitlines(), and I confirmed that with repr() function applied to the output.
I think the reason is concurrent use of pipes in Popen and communicate() for stdout/stderr. I tried check_output shortcut method in Python 2.7, but no success. Of course, the problem described above does not occur if I serialize all command executions and prints.
Is there any better way to handle Popen and communicate() in parallel?
A final result inspired by the comment from J.F.Sebastian.
http://bitbucket.org/daybreaker/kaist-cs443/src/247f9ecf3cee/tools/manage.py
It seems to be a Python bug.
I am not sure it is clear what run_commands needs to be actually doing, but it seems to be simply doing a poll on a subprocess, ignoring the return-code and continuing in the loop. When you get to the part where you are printing output, how could you know the sub-processes have completed?
In your example code I noticed your use of:
for line in output.splitlines():
to address partially the issue of " /r " ; use of
for line in output.splitlines(True):
would have been helpful.

Categories

Resources