Keeping a pipe to a process open - python

I have an app that reads in stuff from stdin and returns, after a newline, results to stdout
A simple (stupid) example:
$ app
Expand[(x+1)^2]<CR>
x^2 + 2*x + 1
100 - 4<CR>
96
Opening and closing the app requires a lot of initialization and clean-up (its an interface to a Computer Algebra System), so I want to keep this to a minimum.
I want to open a pipe in Python to this process, write strings to its stdin and read out the results from stdout. Popen.communicate() doesn't work for this, as it closes the file handle, requiring to reopen the pipe.
I've tried something along the lines of this related question:
Communicate multiple times with a process without breaking the pipe? but I'm not sure how to wait for the output. It is also difficult to know a priori how long it will take the app to finish to process for the input at hand, so I don't want to make any assumptions. I guess most of my confusion comes from this question: Non-blocking read on a subprocess.PIPE in python where it is stated that mixing high and low level functions is not a good idea.
EDIT:
Sorry that I didn't give any code before, got interrupted. This is what I've tried so far and it seems to work, I'm just worried that something goes wrong unnoticed:
from subprocess import Popen, PIPE
pipe = Popen(["MathPipe"], stdin=PIPE, stdout=PIPE)
expressions = ["Expand[(x+1)^2]", "Integrate[Sin[x], {x,0,2*Pi}]"] # ...
for expr in expressions:
pipe.stdin.write(expr)
while True:
line = pipe.stdout.readline()
if line != '':
print line
# output of MathPipe is always terminated by ';'
if ";" in line:
break
Potential problems with this?

Using subprocess, you can't do this reliably. You might want to look at using the pexpect library. That won't work on Windows - if you're on Windows, try winpexpect.
Also, if you're trying to do mathematical stuff in Python, check out SAGE. They do a lot of work on interfacing with other open-source maths software, so there's a chance they've already done what you're trying to.

Perhaps you could pass stdin=subprocess.PIPE as an argument to subprocess.Popen. This will make the process' stdin available as a general file-like object:
import sys, subprocess
proc = subprocess.Popen(["mathematica <args>"], stdin=subprocess.PIPE,
stdout=sys.stdout, shell=True)
proc.stdin.write("Expand[ (x-1)^2 ]") # Write whatever to the process
proc.stdin.flush() # Ensure nothing is left in the buffer
proc.terminate() # Kill the process
This directs the subprocess' output directly to your python process' stdout. If you need to read the output and do some editing first, that is possible as well. Check out http://docs.python.org/library/subprocess.html#popen-objects.

Related

How to get output from python2 subprocess which run a script using multiprocessing?

Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

how to send tab-key to python subprocess's stdin

Background: I have a Python subprocess that connects to a shell-like application, which uses the readline library to handle input, and that app has a TAB-complete routine for command input, just like bash. The child process is spawned, like so:
def get_cli_subprocess_handle():
return subprocess.Popen(
'/bin/myshell',
shell=False,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
Everything works great, except tab-complete. Whenever my Python program passes the tab character, '\t' to the subprocess, I get 5 spaces in the STDIN, instead of triggering the readline library's tab-complete routine. :(
Question: What can I send to the subprocess's STDIN to trigger the child's tab-complete function? Maybe asked another way: How do I send the TAB key as opposed to the TAB character, if that is even possible?
Related but Unanswered and Derailed:
trigger tab completion for python batch process built around readline
The shell like application is probably differentiating between a terminal being connected to stdin and a pipe being connected to it. Many Unix utilities do just that to optimise their buffering (line vs. block) and shell-like utilities are likely to disable command completion facilities on batch input (i.e. PIPE) to avoid unexpected results. Command completion is really an interactive feature which requires a terminal input.
Check out the pty module and try using a master/slave pair as the pipe for your subprocess.
There really is no such thing as sending a tab key to a pipe. A pipe can only accept strings of bits, and if the tab character isn't doing it, there may not be a solution.
There is a project that does something similar called pexpect. Just looking at its interact() code, I'm not seeing anything obvious that makes it work and yours not. Given that, the most likely explanation is that pexpect actually does some work to make itself look like a pseudo-terminal. Perhaps you could incorporate its code for that?
Based on isedev's answer, I modified my code as follows:
import os, pty
def get_cli_subprocess_handle():
masterPTY, slaveTTY = pty.openpty()
return masterPTY, slaveTTY, subprocess.Popen(
'/bin/myshell',
shell=False,
stdin=slaveTTY,
stdout=slaveTTY,
stderr=slaveTTY,
)
Using this returned tuple, I was able to perform select.select([masterPTY],[],[]) and os.read(masterPTY, 1024) as needed, and I wrote to the master-pty with a function that is very similar to a private method in the pty module source:
def write_all(masterPTY, data):
"""Successively write all of data into a file-descriptor."""
while data:
chars_written = os.write(masterPTY, data)
data = data[chars_written:]
return data
Thanks to all for the good solutions. Hope this example helps someone else. :)

subprocess.Popen.stdout - reading stdout in real-time (again)

Again, the same question.
The reason is - I still can't make it work after reading the following:
Real-time intercepting of stdout from another process in Python
Intercepting stdout of a subprocess while it is running
How do I get 'real-time' information back from a subprocess.Popen in python (2.5)
catching stdout in realtime from subprocess
My case is that I have a console app written in C, lets take for example this code in a loop:
tmp = 0.0;
printf("\ninput>>");
scanf_s("%f",&tmp);
printf ("\ninput was: %f",tmp);
It continuously reads some input and writes some output.
My python code to interact with it is the following:
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('12345\n')
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()
So far whenever I read form p.stdout it always waits until the process is terminated and then outputs an empty string. I've tried lots of stuff - but still the same result.
I tried Python 2.6 and 3.1, but the version doesn't matter - I just need to make it work somewhere.
Trying to write to and read from pipes to a sub-process is tricky because of the default buffering going on in both directions. It's extremely easy to get a deadlock where one or the other process (parent or child) is reading from an empty buffer, writing into a full buffer or doing a blocking read on a buffer that's awaiting data before the system libraries flush it.
For more modest amounts of data the Popen.communicate() method might be sufficient. However, for data that exceeds its buffering you'd probably get stalled processes (similar to what you're already seeing?)
You might want to look for details on using the fcntl module and making one or the other (or both) of your file descriptors non-blocking. In that case, of course, you'll have to wrap all reads and/or writes to those file descriptors in the appropriate exception handling to handle the "EWOULDBLOCK" events. (I don't remember the exact Python exception that's raised for these).
A completely different approach would be for your parent to use the select module and os.fork() ... and for the child process to execve() the target program after directly handling any file dup()ing. (Basically you'd be re-implement parts of Popen() but with different parent file descriptor (PIPE) handling.
Incidentally, .communicate, at least in Python's 2.5 and 2.6 standard libraries, will only handle about 64K of remote data (on Linux and FreeBSD). This number may vary based on various factors (possibly including the build options used to compile your Python interpreter, or the version of libc being linked to it). It is NOT simply limited by available memory (despite J.F. Sebastian's assertion to the contrary) but is limited to a much smaller value.
Push reading from the pipe into a separate thread that signals when a chunk of output is available:
How can I read all availably data from subprocess.Popen.stdout (non blocking)?
The bufsize=256 argument prevents 12345\n from being sent to the child process in a chunk smaller than 256 bytes, as it will be when omitting bufsize or inserting p.stdin.flush() after p.stdin.write(). Default behaviour is line-buffering.
In either case you should at least see one empty line before blocking as emitted by the first printf(\n...) in your example.
Your particular example doesn't require "real-time" interaction. The following works:
from subprocess import Popen, PIPE
p = Popen(["./a.out"], stdin=PIPE, stdout=PIPE)
output = p.communicate(b"12345")[0] # send input/read all output
print output,
where a.out is your example C program.
In general, for a dialog-based interaction with a subprocess you could use pexpect module (or its analogs on Windows):
import pexpect
child = pexpect.spawn("./a.out")
child.expect("input>>")
child.sendline("12345.67890") # send a number
child.expect(r"\d+\.\d+") # expect the number at the end
print float(child.after) # assert that we can parse it
child.close()
I had the same problem, and "proc.communicate()" does not solve it because it waits for process terminating.
So here is what is working for me, on Windows with Python 3.5.1 :
import subprocess as sp
myProcess = sp.Popen( cmd, creationflags=sp.CREATE_NEW_PROCESS_GROUP,stdout=sp.PIPE,stderr=sp.STDOUT)
while i<40:
i+=1
time.sleep(.5)
out = myProcess.stdout.readline().decode("utf-8").rstrip()
I guess creationflags and other arguments are not mandatory (but I don't have time to test), so this would be the minimal syntax :
myProcess = sp.Popen( cmd, stdout=sp.PIPE)
for i in range(40)
time.sleep(.5)
out = myProcess.stdout.readline()

Python - capture Popen stdout AND display on console?

I want to capture stdout from a long-ish running process started via subprocess.Popen(...) so I'm using stdout=PIPE as an arg.
However, because it's a long running process I also want to send the output to the console (as if I hadn't piped it) to give the user of the script an idea that it's still working.
Is this at all possible?
Cheers.
The buffering your long-running sub-process is probably performing will make your console output jerky and very bad UX. I suggest you consider instead using pexpect (or, on Windows, wexpect) to defeat such buffering and get smooth, regular output from the sub-process. For example (on just about any unix-y system, after installing pexpect):
>>> import pexpect
>>> child = pexpect.spawn('/bin/bash -c "echo ba; sleep 1; echo bu"', logfile=sys.stdout); x=child.expect(pexpect.EOF); child.close()
ba
bu
>>> child.before
'ba\r\nbu\r\n'
The ba and bu will come with the proper timing (about a second between them). Note the output is not subject to normal terminal processing, so the carriage returns are left in there -- you'll need to post-process the string yourself (just a simple .replace!-) if you need \n as end-of-line markers (the lack of processing is important just in case the sub-process is writing binary data to its stdout -- this ensures all the data's left intact!-).
S. Lott's comment points to Getting realtime output using subprocess and Real-time intercepting of stdout from another process in Python
I'm curious that Alex's answer here is different from his answer 1085071.
My simple little experiments with the answers in the two other referenced questions has given good results...
I went and looked at wexpect as per Alex's answer above, but I have to say reading the comments in the code I was not left a very good feeling about using it.
I guess the meta-question here is when will pexpect/wexpect be one of the Included Batteries?
Can you simply print it as you read it from the pipe?
Inspired by pty.openpty() suggestion somewhere above, tested on python2.6, linux. Publishing since it took a while to make this working properly, w/o buffering...
def call_and_peek_output(cmd, shell=False):
import pty, subprocess
master, slave = pty.openpty()
p = subprocess.Popen(cmd, shell=shell, stdin=None, stdout=slave, close_fds=True)
os.close(slave)
line = ""
while True:
try:
ch = os.read(master, 1)
except OSError:
# We get this exception when the spawn process closes all references to the
# pty descriptor which we passed him to use for stdout
# (typically when it and its childs exit)
break
line += ch
sys.stdout.write(ch)
if ch == '\n':
yield line
line = ""
if line:
yield line
ret = p.wait()
if ret:
raise subprocess.CalledProcessError(ret, cmd)
for l in call_and_peek_output("ls /", shell=True):
pass
Alternatively, you can pipe your process into tee and capture only one of the streams.
Something along the lines of sh -c 'process interesting stuff' | tee /dev/stderr.
Of course, this only works on Unix-like systems.

Python Popen, closing streams and multiple processes

I have some data that I would like to gzip, uuencode and then print to standard out. What I basically have is:
compressor = Popen("gzip", stdin = subprocess.PIPE, stdout = subprocess.PIPE)
encoder = Popen(["uuencode", "dummy"], stdin = compressor.stdout)
The way I feed data to the compressor is through compressor.stdin.write(stuff).
What I really need to do is to send an EOF to the compressor, and I have no idea how to do it.
At some point, I tried compressor.stdin.close() but that doesn't work -- it works well when the compressor writes to a file directly, but in the case above, the process doesn't terminate and stalls on compressor.wait().
Suggestions? In this case, gzip is an example and I really need to do something with piping the output of one process to another.
Note: The data I need to compress won't fit in memory, so communicate isn't really a good option here. Also, if I just run
compressor.communicate("Testing")
after the 2 lines above, it still hangs with the error
File "/usr/lib/python2.4/subprocess.py", line 1041, in communicate
rlist, wlist, xlist = select.select(read_set, write_set, [])
I suspect the issue is with the order in which you open the pipes. UUEncode is funny is that it will whine when you launch it if there's no incoming pipe in just the right way (try launching the darn thing on it's own in a Popen call to see the explosion with just PIPE as the stdin and stdout)
Try this:
encoder = Popen(["uuencode", "dummy"], stdin=PIPE, stdout=PIPE)
compressor = Popen("gzip", stdin=PIPE, stdout=encoder.stdin)
compressor.communicate("UUencode me please")
encoded_text = encoder.communicate()[0]
print encoded_text
begin 644 dummy
F'XL(`%]^L$D``PL-3<U+SD])5<A-52C(24TL3#4`;2O+"!(`````
`
end
You are right, btw... there is no way to send a generic EOF down a pipe. After all, each program really defines its own EOF. The way to do it is to close the pipe, as you were trying to do.
EDIT: I should be clearer about uuencode. As a shell program, it's default behaviour is to expect console input. If you run it without a "live" incoming pipe, it will block waiting for console input. By opening the encoder second, before you had sent material down the compressor pipe, the encoder was blocking waiting for you to start typing. Jerub was right in that there was something blocking.
This is not the sort of thing you should be doing directly in python, there are eccentricities regarding the how thing work that make it a much better idea to do this with a shell. If you can just use subprocess.Popen("foo | bar", shell=True), then all the better.
What might be happening is that gzip has not been able to output all of its input yet, and the process will no exit until its stdout writes have been finished.
You can look at what system call a process is blocking on if you use strace. Use ps auxwf to discover which process is the gzip process, then use strace -p $pidnum to see what system call it is performing. Note that stdin is FD 0 and stdout is FD 1, you will probably see it reading or writing on those file descriptors.
if you just want to compress and don't need the file wrappers consider using the zlib module
import zlib
compressed = zlib.compress("text")
any reason why the shell=True and unix pipes suggestions won't work?
from subprocess import *
pipes = Popen("gzip | uuencode dummy", stdin=PIPE, stdout=PIPE, shell=True)
for i in range(1, 100):
pipes.stdin.write("some data")
pipes.stdin.close()
print pipes.stdout.read()
seems to work

Categories

Resources