I'd like to use Python as an external process and communicate with it by standard input/standard output. Specifically, I want to
for line in sys.stdin:
result = compute_something(line)
print result
sys.stdout.flush()
The output flush is to get the result back right away, without buffering. I want to do the same with the input--- to process each line right away, without buffering. However, the above code does not respond to each line individually; it waits until a large amount of data is accumulated in the standard input and then processes everything at once.
This is true even if the calling program flushes its standard output with every line. It's also true even if I'm running the above directly on a console. The buffer is in Python, not the calling program.
Moreover, I found that control-D on the console makes Python flush its standard input buffer. (And then I can continue to send more input afterward!) However, that's not useful to me because the calling program can't send the equivalent of control-D at the end of each line.
One more thing: for line in sys.stdin.xreadlines() appears to be equivalent to for line in sys.stdin: they both buffer.
So my question is, how can I write a Python script that does not buffer its input, so that it processes each line of input right away?
(I solved the problem before posting the question, but I think I should share it anyway--- others might encounter this problem and I'm still interested in any comments on why this is happening and what we should all know about how to control Python's or Linux's implicit buffering.)
Here's one way to avoid input buffering:
while True:
line = sys.stdin.readline()
result = compute_something(line)
print result
sys.stdout.flush()
Apparently, .readline() avoids the input buffer while direct iteration and .xreadlines() do not.
Related
I am trying to run a python file that prints something, waits 2 seconds, and then prints again. I want to catch these outputs live from my python script to then process them. I tried different things but nothing worked.
process = subprocess.Popen(cmd, stdout=subprocess.PIPE)
while True:
output = process.stdout.readline()
if process.poll() is not None and output == '':
break
if output:
print(output.strip())
I'm at this point but it doesn't work. It waits until the code finishes and then prints all the outputs.
I just need to run a python file and get live outputs from it, if you have other ideas for doing it, without using the print function let me know, just know that I have to run the file separately. I just thought of the easiest way possible but, from what I'm seeing it can't be done.
There are three layers of buffering here, and you need to limit all three of them to guarantee you get live data:
Use the stdbuf command (on Linux) to wrap the subprocess execution (e.g. run ['stdbuf', '-oL'] + cmd instead of just cmd), or (if you have the ability to do so) alter the program itself to either explicitly change the buffering on stdout (e.g. using setvbuf for C/C++ code to switch stdout globally to line-buffered mode, rather than the default block buffering it uses when outputting to a non-tty) or to insert flush statements after critical output (e.g. fflush(stdout); for C/C++, fileobj.flush() for Python, etc.) the buffering of the program to line-oriented mode (or add fflushs); without that, everything is stuck in user-mode buffers of the sub-process.
Add bufsize=0 to the Popen arguments (probably not needed since you don't send anything to stdin, but harmless) so it unbuffers all piped handles. If the Popen is in text=True mode, switch to bufsize=1 (which is line-buffered, rather than unbuffered).
Add flush=True to the print arguments (if you're connected to a terminal, the line-buffering will flush it for you, so it's only if stdout is piped to a file that this will matter), or explicitly call sys.stdout.flush().
Between the three of these, you should be able to guarantee no data is stuck waiting in user-mode buffers; if at least one line has been output by the sub-process, it will reach you immediately, and any output triggered by it will also appear immediately. Item #1 is the hardest in most cases (when you can't use stdbuf, or the process reconfigures its own buffering internally and undoes the effect of stdbuf, and you can't modify the process executable to fix it); you have complete control over #2 and #3, but #1 may be outside your control.
This is the code I use for that same purpose:
def run_command(command, **kwargs):
"""Run a command while printing the live output"""
process = subprocess.Popen(
command,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
**kwargs,
)
while True: # Could be more pythonic with := in Python3.8+
line = process.stdout.readline()
if not line and process.poll() is not None:
break
print(line.decode(), end='')
An example of usage would be:
run_command(['git', 'status'], cwd=Path(__file__).parent.absolute())
I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.
I have a Fortran code that takes a file as input and writes the output into stdout. To avoid read/write cycles, I'd like to run the code inside python and convert the output to a numpy array. I can do this using the following function:
def run_fortran(infile):
import subprocess
import numpy as np
cmd = ['./output.e', infile]
p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
return np.array(out.split(),dtype=int)
Now I take the array, modify it and write it into a file. The new file is again passed into run_fortran(infile). Can I avoid this step and somehow use the output of run_fortran instead of passing a filename?
I tried two cases with no success:
(1) converting to string:
arr = run_fortran('input.txt')
new_arr = str(arr).replace("[","").replace("]","")
run_fortran(new_arr)
This returns an empty array.
(2) converting to a file type object using StringIO:
from cStringIO import StringIO
run_fortran(StringIO(new_arr))
This returns an error: TypeError: execv() arg 2 must contain only strings which makes sense.
In fortran the read(*,*), read(5,*) or read*, statement will read from from standard input, and if that has the right format, it will work. If you are dealing with formatted data, that is, anything that is human readable, and not a binary file, then you probably need a read loop like:
do line=1,numlines
read(*,*) array(line,:)
enddo
No open or close statements needed. So if what you were writing to a file is passed directly, you should be able to remove the those statements and change the file unit to 5 or *.
Now there are more efficient ways to do this kind of communication, but any solution is a good solution if it suit your purpose.
If your Fortran program (it's that './output.e' AFAICS) can read from stdin, not only from a regular file, you can do without temporary files by passing it stdin=subprocess.PIPE (other valid values are "an existing file descriptor (a positive integer) [or] an existing file object"). In UNIX, there's always /dev/stdin to put on the command line btw and in Windows, there's con.
Still, if the program can only work in "sessions" due to the nature of processing (i.e. cannot run continuously and be fed new data as it's available), you'll have to invoke it repeatedly.
Note that you have to handle different pipes in different threads to avoid deadlocks. So, either use communicate() (but then the program cannot run continuously) or spawn an stdout/stderr thread manually (that's what communicate() does; not stdin because the output reading code has to be running by the time you start writing to stdin or the external program may choke with "no space left on device" while writing).
Here's sample code for running the program continuously:
p=subprocess.Popen(argv,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
while True:
#the only more elegant way to return a value from a thread is with a Thread subclass,
# see http://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python
output_container=[]
ot=threading.Thread(target=_outputreaderthread,args=(p.stdout,output_container,data))
ot.start()
p.stdin.write(data)
ot.join()
output=output_container[0]
data=process_output(output)
if no_more_processing_needed(data): break
p.stdin.close()
if p.wait()!=0:raise subprocess.CalledProcessError(p.returncode,argv)
def _outputreaderthread(stream,container,data):
#since there's no process termination as an end mark here,
# you need to read exactly the right number of bytes to avoid getting blocked.
#use whatever technique is appropriate for your data (e.g. readline()'s)
output=stream.read(calculate_output_size(data))
container.append(output)
Probably an easy question, but I am basically trying to replicate the behavior (very sparsely and simplistically) of the cat command but in python. I want the user to be able to enter all the text they wish, then when they enter an EOF (by pressing ctrl+d or cmd+d) the program should print out everything they enter.
import sys
for line in sys.stdin:
print line
If I enter the input lines and follow the last line with a return character, and then press cmd+d:
never gonna give you up
never gonna let you down
never gonna run around
and desert you
then the output is
never gonna give you up
never gonna let you down
never gonna run around
and desert you
However, if I press cmd+d while I am still on the last line "and desert you" then the output is:
never gonna give you up
never gonna let you down
never gonna run around
How can I modify the program such that if the user presses the EOF on the last line, then it should still be included as part of the output?
The behavior of Control-D is actually a function of the TTY driver of the operating system, not of Python. The TTY driver normally sends data to programs a full line at a time. You'll note that 'cat' and most other programs behave the same way.
To do what you're asking is non-trivial. It usually requires putting the TTY into raw mode, reading and processing data (you would get the Control-D character itself), and then taking it out of raw mode before exiting (you don't want to exit in raw mode!)
Procedures for doing that are O/S dependent, but you might be able to make use of tty.setraw() and tty.setcbreak().
Okay that was wierd but it turns out there is an explanation. ctrl-d is "end of transmission" (EOT), not EOF. What it means is environment specific, but to a *nix terminal, it means to flush the input buffer to the program immediately.
If there are characters in the buffer, python's stdin gets them but since there is no \n, the stdin iterator buffers them and doesn't emit anything yet.
If there are no characters in the buffer, python's stdin gets an empty buffer and python follows the rule that an empty buffer means EOF and it should stop iteration.
If you type some characters and then 2 ctrl-d's in a row, you'll get the buffered characters and iteration will end. And, to keep things as complicated as possible, if you start another for line in sys.stdin it will happily continue taking input.
EDIT
This script will give you any pending characters immediately when you hit ctrl-d with data in the input buffer but it won't know to exit unless you hit a second ctrl-d
import sys
while True:
c = sys.stdin.read(1)
if not c:
break
sys.stdout.write(c)
sys.stdout.flush()
I think this is a python 2.7 problem. I tried your code on Python 3.5 and it works just fine.
Hope this helps!
I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.