I have the following C application
#include <stdio.h>
int main(void)
{
printf("hello world\n");
/* Go into an infinite loop here. */
while(1);
return 0;
}
And I have the following python code.
import subprocess
import time
import pprint
def run():
command = ["./myapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
try:
while process.poll() is None:
# HELP: This call blocks...
for i in process.stdout.readline():
print(i)
finally:
if process.poll() is None:
process.kill()
if __name__ == "__main__":
run()
When I run the python code, the stdout.readline or even stdout.read blocks.
If I run the application using subprocess.call(program) then I can see "hello world" in stdout.
How can I read input from stdout with the example I have provided?
Note: I would not want to modify my C program. I have tried this on both Python 2.7.17 and Python 3.7.5 under Ubuntu 19.10 and I get the same behaviour. Adding bufsize=0 did not help me.
The easiest way is to flush buffers in the C program
...
printf("hello world\n");
fflush(stdout);
while(1);
...
If you don't want to change the C program, you can manipulate the libc buffering behavior from outside. This can be done by using stdbuf to call your program (linux). The syntax is "stdbuf -o0 yourapplication" for zero buffering and "stdbuf -oL yourapplication" for line buffering. Therefore in your python code use
...
command = ["/usr/bin/stdbuf","-oL","pathtomyapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
...
or
...
command = ["/usr/bin/stdbuf","-o0","pathtomyapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
...
Applications built using the C Standard IO Library (built with #include <stdio.h>) buffer input and output (see here for why). The stdio library, like isatty, can tell that it is writing to a pipe not a TTY and so it chooses block buffering instead of line buffering. Data is flushed when the buffer is full, but "hello world\n" is not filling the buffer so it's not flushed.
One way around is shown in Timo Hartmann answer, using stdbuf utility. This uses an LD_PRELOAD trick to swap in its own libstdbuf.so. In many cases that is a fine solution, but LD_PRELOAD is kind of a hack and does not work in some cases, so it may not be a general solution.
Maybe you want to do this directly in Python, and there are stdlib options to help here, you can use a pseudo-tty (docs py2, docs py3) connected to stdout instead of a pipe. The program myapplication should enable line buffering, meaning that any newline character flushes the buffer.
from __future__ import print_function
from subprocess import Popen, PIPE
import errno
import os
import pty
import sys
mfd, sfd = pty.openpty()
proc = Popen(["/tmp/myapplication"], stdout=sfd)
os.close(sfd) # don't wait for input
while True:
try:
output = os.read(mfd, 1000)
except OSError as e:
if e.errno != errno.EIO:
raise
else:
print(output)
Note that we are reading bytes from the output now, so we can not necessarily decode them right away!
See Processing the output of a subprocess with Python in realtime for a blog post cleaning up this idea. There are also existing third-party libraries to do this stuff, see ptyprocess.
Related
I am writing a microservice in Haskell and it seems that we'll need to call into a Python library. I know how to create and configure a process to do that from Haskell, but my Python is rusty. Here's the logic I am trying to implement:
The Haskell application initializes by creating a persistent subprocess (lifetime of the subprocess = lifetime of the parent process) running a minimized application serving the Python library.
The Haskell application receives a network request and sends over stdin exactly 1 chunk of data (i.e. bytestring or text) to the Python subprocess; it waits for -- blocking -- exactly 1 chunk of data to be received from the subprocess' stdout, collects the result and returns it as a response.
I've looked around and the closest solution I was able to find where:
Running a Python program from Go and
Persistent python subprocess
Both handle only the part I know how to handle (i.e. calling into a Python subrocess) while not dealing with the details of the Python code run from the subprocess -- hence this question.
The obvious alternative would be to simply create, run and stop a subprocess whenever the Haskell application needs it, but the overhead is unpleasant.
I've tried something whose minimized version looks like:
-- From the Haskell parent process
{-# LANGUAGE OverloadedStrings #-}
import System.IO
import System.Process.Typed
configProc :: ProcessConfig Handle Handle ()
configProc =
setStdin createPipe $
setStdout createPipe $
setStderr closed $
setWorkingDir "/working/directory" $
shell "python3 my_program.py"
startPyProc :: IO (Process Handle Handle ())
startPyProc = do
p <- startProcess configProc
hSetBuffering (getStdin p) NoBuffering
hSetBuffering (getStdout p) NoBuffering
pure p
main :: IO ()
main = do
p <- startPyProc
let stdin = getStdin p
stdout = getStdout p
hSetBuffering stdin NoBuffering
hSetBuffering stdout NoBuffering
-- hGetLine won't get anything before I call hClose
-- making it impossible to stream over both stdin and stout
hPutStrLn stdin "foo" >> hClose stdin >> hGetLine stdout >>= print
# From the Python child process
import sys
if '__name__' == '__main__':
for line in sys.stdin:
# do some work and finally...
print(result)
One issue with this code is that I have not been able to send to sdin and receive from stdout without first closing the stdin handle, which makes the implementation unable to do what I want (send 1 chunk to stdin, block, read the result from stout, rinse and repeat). Another potential issue is that the Python code might not adequate at all for the specification I am trying to meet.
Got it fixed by simply replacing print(...) with print(..., flush=True). It appears that in Python stdin/stdout default to block-buffering, which made my call to hGetLine block since it was expecting lines.
Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.
I am using subprocess.run() for some automated testing. Mostly to automate doing:
dummy.exe < file.txt > foo.txt
diff file.txt foo.txt
If you execute the above redirection in a shell, the two files are always identical. But whenever file.txt is too long, the below Python code does not return the correct result.
This is the Python code:
import subprocess
import sys
def main(argv):
exe_path = r'dummy.exe'
file_path = r'file.txt'
with open(file_path, 'r') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE, universal_newlines=True)
out = p.stdout.strip()
err = p.stderr
if stdin == out:
print('OK')
else:
print('failed: ' + out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the C++ code in dummy.cc:
#include <iostream>
int main()
{
int size, count, a, b;
std::cin >> size;
std::cin >> count;
std::cout << size << " " << count << std::endl;
for (int i = 0; i < count; ++i)
{
std::cin >> a >> b;
std::cout << a << " " << b << std::endl;
}
}
file.txt can be anything like this:
1 100000
0 417
0 842
0 919
...
The second integer on the first line is the number of lines following, hence here file.txt will be 100,001 lines long.
Question: Am I misusing subprocess.run() ?
Edit
My exact Python code after comment (newlines,rb) is taken into account:
import subprocess
import sys
import os
def main(argv):
base_dir = os.path.dirname(__file__)
exe_path = os.path.join(base_dir, 'dummy.exe')
file_path = os.path.join(base_dir, 'infile.txt')
out_path = os.path.join(base_dir, 'outfile.txt')
with open(file_path, 'rb') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE)
out = p.stdout.strip()
if stdin == out:
print('OK')
else:
with open(out_path, "wb") as text_file:
text_file.write(out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the first diff:
Here is the input file: https://drive.google.com/open?id=0B--mU_EsNUGTR3VKaktvQVNtLTQ
To reproduce, the shell command:
subprocess.run("dummy.exe < file.txt > foo.txt", shell=True, check=True)
without the shell in Python:
with open('file.txt', 'rb', 0) as input_file, \
open('foo.txt', 'wb', 0) as output_file:
subprocess.run(["dummy.exe"], stdin=input_file, stdout=output_file, check=True)
It works with arbitrary large files.
You could use subprocess.check_call() in this case (available since Python 2), instead of subprocess.run() that is available only in Python 3.5+.
Works very well thanks. But then why was the original failing ? Pipe buffer size as in Kevin Answer ?
It has nothing to do with OS pipe buffers. The warning from the subprocess docs that #Kevin J. Chase cites is unrelated to subprocess.run(). You should care about OS pipe buffers only if you use process = Popen() and manually read()/write() via multiple pipe streams (process.stdin/.stdout/.stderr).
It turns out that the observed behavior is due to Windows bug in the Universal CRT. Here's the same issue that is reproduced without Python: Why would redirection work where piping fails?
As said in the bug description, to workaround it:
"use a binary pipe and do text mode CRLF => LF translation manually on the reader side" or use ReadFile() directly instead of std::cin
or wait for Windows 10 update this summer (where the bug should be fixed)
or use a different C++ compiler e.g., there is no issue if you use g++ on Windows
The bug affects only text pipes i.e., the code that uses <> should be fine (stdin=input_file, stdout=output_file should still work or it is some other bug).
I'll start with a disclaimer: I don't have Python 3.5 (so I can't use the run function), and I wasn't able to reproduce your problem on Windows (Python 3.4.4) or Linux (3.1.6). That said...
Problems with subprocess.PIPE and Family
The subprocess.run docs say that it's just a front-end for the old subprocess.Popen-and-communicate() technique. The subprocess.Popen.communicate docs warn that:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
This sure sounds like your problem. Unfortunately, the docs don't say how much data is "large", nor what will happen after "too much" data is read. Just "don't do that, then".
The docs for subprocess.call go into a little more detail (emphasis mine)...
Do not use stdout=PIPE or stderr=PIPE with this function. The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from.
...as do the docs for subprocess.Popen.wait:
This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.
That sure sounds like Popen.communicate is the solution to this problem, but communicate's own docs say "do not use this method if the data size is large" --- exactly the situation where the wait docs tell you to use communicate. (Maybe it "avoid(s) that" by silently dropping data on the floor?)
Frustratingly, I don't see any way to use a subprocess.PIPE safely, unless you're sure you can read from it faster than your child process writes to it.
On that note...
Alternative: tempfile.TemporaryFile
You're holding all your data in memory... twice, in fact. That can't be efficient, especially if it's already in a file.
If you're allowed to use a temporary file, you can compare the two files very easily, one line at a time. This avoids all the subprocess.PIPE mess, and it's much faster, because it only uses a little bit of RAM at a time. (The IO from your subprocess might be faster, too, depending on how your operating system handles output redirection.)
Again, I can't test run, so here's a slightly older Popen-and-communicate solution (minus main and the rest of your setup):
import io
import subprocess
import tempfile
def are_text_files_equal(file0, file1):
'''
Both files must be opened in "update" mode ('+' character), so
they can be rewound to their beginnings. Both files will be read
until just past the first differing line, or to the end of the
files if no differences were encountered.
'''
file0.seek(io.SEEK_SET)
file1.seek(io.SEEK_SET)
for line0, line1 in zip(file0, file1):
if line0 != line1:
return False
# Both files were identical to this point. See if either file
# has more data.
next0 = next(file0, '')
next1 = next(file1, '')
if next0 or next1:
return False
return True
def compare_subprocess_output(exe_path, input_path):
with tempfile.TemporaryFile(mode='w+t', encoding='utf8') as temp_file:
with open(input_path, 'r+t') as input_file:
p = subprocess.Popen(
[exe_path],
stdin=input_file,
stdout=temp_file, # No more PIPE.
stderr=subprocess.PIPE, # <sigh>
universal_newlines=True,
)
err = p.communicate()[1] # No need to store output.
# Compare input and output files... This must be inside
# the `with` block, or the TemporaryFile will close before
# we can use it.
if are_text_files_equal(temp_file, input_file):
print('OK')
else:
print('Failed: ' + str(err))
return
Unfortunately, since I can't reproduce your problem, even with a million-line input, I can't tell if this works. If nothing else, it ought to give you wrong answers faster.
Variant: Regular File
If you want to keep the output of your test run in foo.txt (from your command-line example), then you would direct your subprocess' output to a normal file instead of a TemporaryFile. This is the solution recommended in J.F. Sebastian's answer.
I can't tell from your question if you wanted foo.txt, or if it was just a side-effect of the two step test-then-diff --- your command-line example saves test output to a file, while your Python script doesn't. Saving the output would be handy if you ever want to investigate a test failure, but it requires coming up with a unique filename for each test you run, so they don't overwrite each other's output.
Ok so I'm trying to run a C program from a python script. Currently I'm using a test C program:
#include <stdio.h>
int main() {
while (1) {
printf("2000\n");
sleep(1);
}
return 0;
}
To simulate the program that I will be using, which takes readings from a sensor constantly.
Then I'm trying to read the output (in this case "2000") from the C program with subprocess in python:
#!usr/bin/python
import subprocess
process = subprocess.Popen("./main", stdout=subprocess.PIPE)
while True:
for line in iter(process.stdout.readline, ''):
print line,
but this is not working. From using print statements, it runs the .Popen line then waits at for line in iter(process.stdout.readline, ''):, until I press Ctrl-C.
Why is this? This is exactly what most examples that I've seen have as their code, and yet it does not read the file.
Is there a way of making it run only when there is something to be read?
It is a block buffering issue.
What follows is an extended for your case version of my answer to Python: read streaming input from subprocess.communicate() question.
Fix stdout buffer in C program directly
stdio-based programs as a rule are line buffered if they are running interactively in a terminal and block buffered when their stdout is redirected to a pipe. In the latter case, you won't see new lines until the buffer overflows or flushed.
To avoid calling fflush() after each printf() call, you could force line buffered output by calling in a C program at the very beginning:
setvbuf(stdout, (char *) NULL, _IOLBF, 0); /* make line buffered stdout */
As soon as a newline is printed the buffer is flushed in this case.
Or fix it without modifying the source of C program
There is stdbuf utility that allows you to change buffering type without modifying the source code e.g.:
from subprocess import Popen, PIPE
process = Popen(["stdbuf", "-oL", "./main"], stdout=PIPE, bufsize=1)
for line in iter(process.stdout.readline, b''):
print line,
process.communicate() # close process' stream, wait for it to exit
There are also other utilities available, see Turn off buffering in pipe.
Or use pseudo-TTY
To trick the subprocess into thinking that it is running interactively, you could use pexpect module or its analogs, for code examples that use pexpect and pty modules, see Python subprocess readlines() hangs. Here's a variation on the pty example provided there (it should work on Linux):
#!/usr/bin/env python
import os
import pty
import sys
from select import select
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable line buffering
process = Popen("./main", stdin=slave_fd, stdout=slave_fd, stderr=STDOUT,
bufsize=0, close_fds=True)
timeout = .1 # ugly but otherwise `select` blocks on process' exit
# code is similar to _copy() from pty.py
with os.fdopen(master_fd, 'r+b', 0) as master:
input_fds = [master, sys.stdin]
while True:
fds = select(input_fds, [], [], timeout)[0]
if master in fds: # subprocess' output is ready
data = os.read(master_fd, 512) # <-- doesn't block, may return less
if not data: # EOF
input_fds.remove(master)
else:
os.write(sys.stdout.fileno(), data) # copy to our stdout
if sys.stdin in fds: # got user input
data = os.read(sys.stdin.fileno(), 512)
if not data:
input_fds.remove(sys.stdin)
else:
master.write(data) # copy it to subprocess' stdin
if not fds: # timeout in select()
if process.poll() is not None: # subprocess ended
# and no output is buffered <-- timeout + dead subprocess
assert not select([master], [], [], 0)[0] # race is possible
os.close(slave_fd) # subproces don't need it anymore
break
rc = process.wait()
print("subprocess exited with status %d" % rc)
Or use pty via pexpect
pexpect wraps pty handling into higher level interface:
#!/usr/bin/env python
import pexpect
child = pexpect.spawn("/.main")
for line in child:
print line,
child.close()
Q: Why not just use a pipe (popen())? explains why pseudo-TTY is useful.
Your program isn't hung, it just runs very slowly. Your program is using buffered output; the "2000\n" data is not being written to stdout immediately, but will eventually make it. In your case, it might take BUFSIZ/strlen("2000\n") seconds (probably 1638 seconds) to complete.
After this line:
printf("2000\n");
add
fflush(stdout);
See readline docs.
Your code:
process.stdout.readline
Is waiting for EOF or a newline.
I cannot tell what you are ultimately trying to do, but adding a newline to your printf, e.g., printf("2000\n");, should at least get you started.
I want to capture stdout from a long-ish running process started via subprocess.Popen(...) so I'm using stdout=PIPE as an arg.
However, because it's a long running process I also want to send the output to the console (as if I hadn't piped it) to give the user of the script an idea that it's still working.
Is this at all possible?
Cheers.
The buffering your long-running sub-process is probably performing will make your console output jerky and very bad UX. I suggest you consider instead using pexpect (or, on Windows, wexpect) to defeat such buffering and get smooth, regular output from the sub-process. For example (on just about any unix-y system, after installing pexpect):
>>> import pexpect
>>> child = pexpect.spawn('/bin/bash -c "echo ba; sleep 1; echo bu"', logfile=sys.stdout); x=child.expect(pexpect.EOF); child.close()
ba
bu
>>> child.before
'ba\r\nbu\r\n'
The ba and bu will come with the proper timing (about a second between them). Note the output is not subject to normal terminal processing, so the carriage returns are left in there -- you'll need to post-process the string yourself (just a simple .replace!-) if you need \n as end-of-line markers (the lack of processing is important just in case the sub-process is writing binary data to its stdout -- this ensures all the data's left intact!-).
S. Lott's comment points to Getting realtime output using subprocess and Real-time intercepting of stdout from another process in Python
I'm curious that Alex's answer here is different from his answer 1085071.
My simple little experiments with the answers in the two other referenced questions has given good results...
I went and looked at wexpect as per Alex's answer above, but I have to say reading the comments in the code I was not left a very good feeling about using it.
I guess the meta-question here is when will pexpect/wexpect be one of the Included Batteries?
Can you simply print it as you read it from the pipe?
Inspired by pty.openpty() suggestion somewhere above, tested on python2.6, linux. Publishing since it took a while to make this working properly, w/o buffering...
def call_and_peek_output(cmd, shell=False):
import pty, subprocess
master, slave = pty.openpty()
p = subprocess.Popen(cmd, shell=shell, stdin=None, stdout=slave, close_fds=True)
os.close(slave)
line = ""
while True:
try:
ch = os.read(master, 1)
except OSError:
# We get this exception when the spawn process closes all references to the
# pty descriptor which we passed him to use for stdout
# (typically when it and its childs exit)
break
line += ch
sys.stdout.write(ch)
if ch == '\n':
yield line
line = ""
if line:
yield line
ret = p.wait()
if ret:
raise subprocess.CalledProcessError(ret, cmd)
for l in call_and_peek_output("ls /", shell=True):
pass
Alternatively, you can pipe your process into tee and capture only one of the streams.
Something along the lines of sh -c 'process interesting stuff' | tee /dev/stderr.
Of course, this only works on Unix-like systems.