Read stream and pass to subprocess - python

I'm writing a python script that reads a stream from stdin and passes this stream to subprocess for further processing. The problem is that python hangs after having processed the input stream.
For example, this toy program sorter.py should read from stdin and pass the stream to subprocess for sorting via Unix sort:
cat dat.txt | ./sorter.py
Here's sorter.py:
#!/usr/bin/env python
import subprocess
import sys
p= subprocess.Popen('sort -', stdin= subprocess.PIPE, shell= True)
for line in sys.stdin:
p.stdin.write(line)
sys.exit()
The stream from cat is correctly sorted but the programs hangs, i.e. sys.exit() is never reached.
I've read quite a few variations on this theme but I can't get it right. Any idea what is missing?
Thank you!
Dario

My guess: sys.exit() is reached but sort continues to run. You should close p.stdin pipe to signal EOF to sort:
#!/usr/bin/env python2
import subprocess
import sys
p = subprocess.Popen('sort', stdin=subprocess.PIPE, bufsize=-1)
with p.stdin:
for line in sys.stdin:
# use line here
p.stdin.write(line)
if p.wait() != 0:
raise Error
Example:
$ < dat.txt ./sorter.py
If you don't need to modify the stdin stream then you don't need to use PIPE here:
#!/usr/bin/env python
import subprocess
subprocess.check_call('sort')

you probably have a problem with buffering -the OS doesnt send the data the moment it arrives to stdin - check this out
https://www.turnkeylinux.org/blog/unix-buffering

Related

Read from stdin AND forward it to a subprocess in Python

I'm writing a wrapper script for a program that optionally accepts input from STDIN. My script needs to process each line of the file, but it also needs to forward STDIN to the program it is wrapping. In minimalist form, this looks something like this:
import subprocess
import sys
for line in sys.stdin:
# Do something with each line
pass
subprocess.call(['cat'])
Note that I'm not actually trying to wrap cat, it just serves as an example to demonstrate whether or not STDIN is being forwarded properly.
With the example above, if I comment out the for-loop, it works properly. But if I run it with the for-loop, nothing gets forwarded because I've already read to the end of STDIN. I can't seek(0) to the start of the file because you can't seek on streams.
One possible solution is to read the entire file into memory:
import subprocess
import sys
lines = sys.stdin.readlines()
for line in lines:
# Do something with each line
pass
p = subprocess.Popen(['cat'], stdin=subprocess.PIPE)
p.communicate(''.join(lines))
which works, but isn't very memory efficient. Can anyone think of a better solution? Perhaps a way to split or copy the stream?
Additional Constraints:
The subprocess can only be called once. So I can't read a line at a time, process it, and forward it to the subprocess.
The solution must work in Python 2.6
Does this work for you?
#!/usr/bin/env python2
import subprocess
import sys
p = subprocess.Popen(['cat'], stdin = subprocess.PIPE)
line = sys.stdin.readline()
####################
# Insert work here #
####################
line = line.upper()
####################
p.communicate(line)
Example:
$ echo "hello world" | ./wrapper.py
HELLO WORLD

python-Can we use tempfile with subprocess to get non buffering live output in python app

I am trying to run one python file from python windows application.For that I have used subprocess.For getting live streaming output on app console I have tried the below statements.
With PIPE
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT, shell=True)
for line in iter(p.stdout.readline, ''):
print line
(or)
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
out = process.stdout.read(1)
if out == '' and process.poll() != None:
break
if out != '':
sys.stdout.write(out)
sys.stdout.flush()
Not only above code ,tried so many methods.Getting same results like below:
1.Python windows app taking so much of time to run
2.Then the app window went to "not responding" state for long time
3.Then whole output is printed on the console
I know that the buffer overflow is happening in python app thats why i am not getting live output.
I posted so many queries for this still not getting solution.
Just now found and tried tempfile for this.But i am not sure this will give live streaming output.
Shall I try this way?
import tempfile
import subprocess
w = tempfile.NamedTemporaryFile()
p = subprocess.Popen(cmd, shell=True, stdout=w,
stderr=subprocess.STDOUT, bufsize=0)
with open(w.name, 'r') as r:
for line in r:
print line
w.close()
Or any other best solutions for non blocking,unbuffering live output on windows app.
Any help would be appreciated.
Note:1.The python file which I want to run has more print statements(ie more content)
2.Windows server 2012,python 2.7
I understand you're frustration. It looks like you've almost come to the answer yourself.
I'm building on the answer from this SO post. But that answer doesn't use TemporaryFile and also I used the tail follow method from here which I have found to offer the fastest output to the terminal with very large volumes of output. This eliminates extraneous calls to print.
Side note: If you've got other asynchronous stuff to do then you can wrap up the code below the imports in a function and use the gevent package and import sleep from gevent and Popen, STDOUT from gevent.subprocess. It is what I'm doing and may help you avoid leftover slowdown (only reason I mention it).
import sys
from tempfile import TemporaryFile
from time import sleep
from subprocess import Popen, STDOUT
# the temp file will be automatically cleaned up using context manager
with TemporaryFile() as output:
sub = Popen(cmd, stdout=output, stderr=STDOUT, shell=True)
# sub.poll returns None until the subprocess ends,
# it will then return the exit code, hopefully 0 ;)
while sub.poll() is None:
where = output.tell()
lines = output.read()
if not lines:
# Adjust the sleep interval to your needs
sleep(0.1)
# make sure pointing to the last place we read
output.seek(where)
else:
sys.__stdout__.write(lines)
sys.__stdout__.flush()
# A last write needed after subprocess ends
sys.__stdout__.write(output.read())
sys.__stdout__.flush()

Python Popen with stdout and stdin piped : read freeze [duplicate]

Ok so I'm trying to run a C program from a python script. Currently I'm using a test C program:
#include <stdio.h>
int main() {
while (1) {
printf("2000\n");
sleep(1);
}
return 0;
}
To simulate the program that I will be using, which takes readings from a sensor constantly.
Then I'm trying to read the output (in this case "2000") from the C program with subprocess in python:
#!usr/bin/python
import subprocess
process = subprocess.Popen("./main", stdout=subprocess.PIPE)
while True:
for line in iter(process.stdout.readline, ''):
print line,
but this is not working. From using print statements, it runs the .Popen line then waits at for line in iter(process.stdout.readline, ''):, until I press Ctrl-C.
Why is this? This is exactly what most examples that I've seen have as their code, and yet it does not read the file.
Is there a way of making it run only when there is something to be read?
It is a block buffering issue.
What follows is an extended for your case version of my answer to Python: read streaming input from subprocess.communicate() question.
Fix stdout buffer in C program directly
stdio-based programs as a rule are line buffered if they are running interactively in a terminal and block buffered when their stdout is redirected to a pipe. In the latter case, you won't see new lines until the buffer overflows or flushed.
To avoid calling fflush() after each printf() call, you could force line buffered output by calling in a C program at the very beginning:
setvbuf(stdout, (char *) NULL, _IOLBF, 0); /* make line buffered stdout */
As soon as a newline is printed the buffer is flushed in this case.
Or fix it without modifying the source of C program
There is stdbuf utility that allows you to change buffering type without modifying the source code e.g.:
from subprocess import Popen, PIPE
process = Popen(["stdbuf", "-oL", "./main"], stdout=PIPE, bufsize=1)
for line in iter(process.stdout.readline, b''):
print line,
process.communicate() # close process' stream, wait for it to exit
There are also other utilities available, see Turn off buffering in pipe.
Or use pseudo-TTY
To trick the subprocess into thinking that it is running interactively, you could use pexpect module or its analogs, for code examples that use pexpect and pty modules, see Python subprocess readlines() hangs. Here's a variation on the pty example provided there (it should work on Linux):
#!/usr/bin/env python
import os
import pty
import sys
from select import select
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable line buffering
process = Popen("./main", stdin=slave_fd, stdout=slave_fd, stderr=STDOUT,
bufsize=0, close_fds=True)
timeout = .1 # ugly but otherwise `select` blocks on process' exit
# code is similar to _copy() from pty.py
with os.fdopen(master_fd, 'r+b', 0) as master:
input_fds = [master, sys.stdin]
while True:
fds = select(input_fds, [], [], timeout)[0]
if master in fds: # subprocess' output is ready
data = os.read(master_fd, 512) # <-- doesn't block, may return less
if not data: # EOF
input_fds.remove(master)
else:
os.write(sys.stdout.fileno(), data) # copy to our stdout
if sys.stdin in fds: # got user input
data = os.read(sys.stdin.fileno(), 512)
if not data:
input_fds.remove(sys.stdin)
else:
master.write(data) # copy it to subprocess' stdin
if not fds: # timeout in select()
if process.poll() is not None: # subprocess ended
# and no output is buffered <-- timeout + dead subprocess
assert not select([master], [], [], 0)[0] # race is possible
os.close(slave_fd) # subproces don't need it anymore
break
rc = process.wait()
print("subprocess exited with status %d" % rc)
Or use pty via pexpect
pexpect wraps pty handling into higher level interface:
#!/usr/bin/env python
import pexpect
child = pexpect.spawn("/.main")
for line in child:
print line,
child.close()
Q: Why not just use a pipe (popen())? explains why pseudo-TTY is useful.
Your program isn't hung, it just runs very slowly. Your program is using buffered output; the "2000\n" data is not being written to stdout immediately, but will eventually make it. In your case, it might take BUFSIZ/strlen("2000\n") seconds (probably 1638 seconds) to complete.
After this line:
printf("2000\n");
add
fflush(stdout);
See readline docs.
Your code:
process.stdout.readline
Is waiting for EOF or a newline.
I cannot tell what you are ultimately trying to do, but adding a newline to your printf, e.g., printf("2000\n");, should at least get you started.

Iterating over standard in blocks until EOF is read

I have two scripts which are connected by Unix pipe. The first script writes strings to standard out, and these are consumed by the second script.
Consider the following
# producer.py
import sys
import time
for x in range(10):
sys.stdout.write("thing number %d\n"%x)
sys.stdout.flush()
time.sleep(1)
and
# consumer.py
import sys
for line in sys.stdin:
print line
Now, when I run: python producer.py | python consumer.py, I expect to see a new line of output each second. Instead, I wait 10 seconds, and I suddenly see all of the output at once.
Why can't I iterate over stdin one-item-at-a-time? Why do I have to wait until the producer gives me an EOF before the loop-body starts executing?
Note that I can get to the correct behavior if I change consumer.py to:
# consumer.py
import sys
def stream_stdin():
line = sys.stdin.readline()
while line:
yield line
line = sys.stdin.readline()
for line in stream_stdin():
print line
I'm wondering why I have to explicitly build a generator to stream the items of stdin. Why doesn't this implicitly happen?
According to the python -h help message:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread‐
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.

Parsing pexpect output

I'm trying to parse in real time the output of a program block-buffered, which means that output is not available until the process ends. What I need is just to parse line by line, filter and manage data from the output, as it could run for hours.
I've tried to capture the output with subprocess.Popen(), but yes, as you may guess, Popen can't manage this kind of behavior, it keeps buffering until end of process.
from subprocess import Popen, PIPE
p = Popen("my noisy stuff ", shell=True, stdout=PIPE, stderr=PIPE)
for line in p.stdout.readlines():
#parsing text and getting data
So I found pexpect, which prints the output in real time, as it treats the stdout as a file, or I could even do a dirty trick printing out a file and parsing it outside the function. But ok, it is too dirty, even for me ;)
import pexpect
import sys
pexpect.run("my noisy stuff", logfile=sys.stdout)
But I guess it should a better pythonic way to do this, just manage the stdout like subprocess. Popen does. How can I do this?
EDIT:
Running J.F. proposal:
This is a deliberately wrong audit, it takes about 25 secs. to stop.
from subprocess import Popen, PIPE
command = "bully mon0 -e ESSID -c 8 -b aa:bb:cc:dd:ee:00 -v 2"
p = Popen(command, shell=True, stdout=PIPE, stderr=PIPE)
for line in iter(p.stdout.readline, b''):
print "inside loop"
print line
print "outside loop"
p.stdout.close()
p.wait()
#$ sudo python SCRIPT.py
### <= 25 secs later......
# inside loop
#[!] Bully v1.0-21 - WPS vulnerability assessment utility
#inside loop
#[!] Using 'ee:cc:bb:aa:bb:ee' for the source MAC address
#inside loop
#[X] Unable to get a beacon from the AP, possible causes are
#inside loop
#[.] an invalid --bssid or -essid was provided,
#inside loop
#[.] the access point isn't on channel '8',
#inside loop
#[.] you aren't close enough to the access point.
#outside loop
Using this method instead:
EDIT: Due to large delays and timeouts in the output, I had to fix the child, and added some hacks, so final code looks like this
import pexpect
child = pexpect.spawn(command)
child.maxsize = 1 #Turns off buffering
child.timeout = 50 # default is 30, insufficient for me. Crashes were due to this param.
for line in child:
print line,
child.close()
Gives back the same output, but it prints lines in real time. So... SOLVED Thanks #J.F. Sebastian
.readlines() reads all lines. No wonder you don't see any output until the subprocess ends. You could use .readline() instead to read line by line as soon as the subprocess flushes its stdout buffer:
from subprocess import Popen, PIPE
p = Popen("my noisy stuff", stdout=PIPE, bufsize=1)
for line in iter(p.stdout.readline, b''):
# process line
..
p.stdout.close()
p.wait()
If you are already have pexpect then you could use it to workaround the block-buffering issue:
import pexpect
child = pexpect.spawn("my noisy stuff", timeout=None)
for line in child:
# process line
..
child.close()
See also stdbuf, pty -based solutions from the question I've linked in the comments.

Categories

Resources