How to implement a complex process pipe in Python 2.6?

How to implement a complex process pipe in Python 2.6? - python

I like to have the Python (2.6, sorry!) equivalent of this shell pipe:
$ longrunningprocess | sometextfilter | gzip -c
That is, I have to call a binary longrunningprocess, filter its output through sometextfilter and need to get gzip output.
I know how to use subprocess pipes, but I need the output of the pipe chunkwise (probably using yield) and not all at once. E.g. this
https://security.openstack.org/guidelines/dg_avoid-shell-true.html
works only for getting all output at once.
Note, that both longrunningprocess and sometextfilter are external programs, that cannot be replaced with Python functions.
Thanks in advance for any hint or example!

Again, I thought it were difficult, while Python is (supposed to be) easy. Just concatenating the subprocesses just works, it seems:
def get_lines():
lrp = subprocess.Popen(["longrunningprocess"],
stdout=subprocess.PIPE,
close_fds=True)
stf = subprocess.Popen(["sometextfilter"],
stdin=lrp.stdout,
stdout=subprocess.PIPE,
bufsize=1,
close_fds=True)
for l in iter(stf.stdout.readline, ''):
yield l
lrp.stdout.close()
stf.stdout.close()
stf.stdin.close()
stf.wait()
lrp.wait()
[Changes by J.F. Sebastian applied. Thanks!]
Then I can use Pythons gzip for compression.

The shell syntax is optimized for one-liners, use it:
#!/usr/bin/env python2
import sys
from subprocess import Popen, PIPE
LINE_BUFFERED = 1
ON_POSIX = 'posix' in sys.builtin_module_names
p = Popen('longrunningprocess | sometextfilter', shell=True,
stdout=PIPE, bufsize=LINE_BUFFERED, close_fds=ON_POSIX)
with p.stdout:
for line in iter(p.stdout.readline, ''):
print line, # do something with the line
p.wait()
How do I use subprocess.Popen to connect multiple processes by pipes?
Python: read streaming input from subprocess.communicate()
If you want to emulate the pipeline manually:
#!/usr/bin/env python2
import sys
from subprocess import Popen, PIPE
LINE_BUFFERED = 1
ON_POSIX = 'posix' in sys.builtin_module_names
sometextfilter = Popen('sometextfilter', stdin=PIPE, stdout=PIPE,
bufsize=LINE_BUFFERED, close_fds=ON_POSIX)
longrunningprocess = Popen('longrunningprocess', stdout=sometextfilter.stdin,
close_fds=ON_POSIX)
with sometextfilter.stdin, sometextfilter.stdout as pipe:
for line in iter(pipe.readline, ''):
print line, # do something with the line
sometextfilter.wait()
longrunningprocess.wait()

Related

python how to use subprocess pipe with linux shell

I have a python script search for logs, it continuously output the logs found and I want to use linux pipe to filter the desired output. example like that:
$python logsearch.py | grep timeout
The problem is the sort and wc are blocked until the logsearch.py finishes, while the logsearch.py will continuous output the result.
sample logsearch.py:
p = subprocess.Popen("ping google.com", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
for line in p.stdout:
print(line)
UPDATE:
figured out, just change the stdout in subprocess to sys.stdout, python will handle the pipe for you.
p = subprocess.Popen("ping -c 5 google.com", shell=True, stdout=**sys.stdout**)
Thanks for all of you help!

And why use grep? Why don't do all the stuff in Python?
from subprocess import Popen, PIPE
p = Popen(['ping', 'google.com'], shell=False, stdin=PIPE, stdout=PIPE)
for line in p.stdout:
if 'timeout' in line.split():
# Process the error
print("Timeout error!!")
else:
print(line)
UPDATE:
I change the Popen line as recommended #triplee. Pros and cons in Actual meaning of 'shell=True' in subprocess

Interaction between Python script and linux shell

I have a Python script that needs to interact with the user via the command line, while logging whatever is output.
I currently have this:
# lots of code
popen = subprocess.Popen(
args,
shell=True,
stdin=sys.stdin,
stdout=sys.stdout,
stderr=sys.stdout,
executable='/bin/bash')
popen.communicate()
# more code
This executes a shell command (e.g. adduser newuser02) just as it would when typing it into a terminal, including interactive behavior. This is good.
Now, I want to log, from within the Python script, everything that appears on the screen. But I can't seem to make that part work.
I've tried various ways of using subprocess.PIPE, but this usually messes up the interactivity, like not outputting prompt strings.
I've also tried various ways to directly change the behavior of sys.stdout, but as subprocess writes to sys.stdout.fileno() directly, this was all to no avail.

Popen might not be very suitable for interactive programs due to buffering issues and due to the fact that some programs write/read directly from a terminal e.g., to retrieve a password. See Q: Why not just use a pipe (popen())?.
If you want to emulate script utility then you could use pty.spawn(), see the code example in Duplicating terminal output from a Python subprocess or in log syntax errors and uncaught exceptions for a python subprocess and print them to the terminal:
#!/usr/bin/env python
import os
import pty
import sys
with open('log', 'ab') as file:
def read(fd):
data = os.read(fd, 1024)
file.write(data)
file.flush()
return data
pty.spawn([sys.executable, "test.py"], read)
Or you could use pexpect for more flexibility:
import sys
import pexpect # $ pip install pexpect
with open('log', 'ab') as fout:
p = pexpect.spawn("python test.py")
p.logfile = fout # or .logfile_read
p.interact()
If your child process doesn't buffer its output (or it doesn't interfere with the interactivity) and it prints its output to its stdout or stderr then you could try subprocess:
#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT
with open('log','ab') as file:
p = Popen([sys.executable, '-u', 'test.py'],
stdout=PIPE, stderr=STDOUT,
close_fds=True,
bufsize=0)
for c in iter(lambda: p.stdout.read(1), ''):
for f in [sys.stdout, file]:
f.write(c)
f.flush()
p.stdout.close()
rc = p.wait()
To read both stdout/stderr separately, you could use teed_call() from Python subprocess get children's output to file and terminal?

This should work
import subprocess
f = open('file.txt','w')
cmd = ['echo','hello','world']
subprocess.call(cmd, stdout=f)

Using POpen to send a variable to Stdin and to send Stdout to a variable

In shell script, we have the following command:
/script1.pl < input_file| /script2.pl > output_file
I would like to replicate the above stream in Python using the module subprocess. input_file is a large file, and I can't read the whole file at once. As such I would like to pass each line, an input_string into the pipe stream and return a string variable output_string, until the whole file has been streamed through.
The following is a first attempt:
process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]
However, using process.communicate()[0] closes the stream. I would like to keep the stream open for future streams. I have tried using process.stdout.readline(), instead, but the program hangs.

To emulate /script1.pl < input_file | /script2.pl > output_file shell command using subprocess module in Python:
#!/usr/bin/env python
from subprocess import check_call
with open('input_file', 'rb') as input_file
with open('output_file', 'wb') as output_file:
check_call("/script1.pl | /script2.pl", shell=True,
stdin=input_file, stdout=output_file)
You could write it without shell=True (though I don't see a reason here) based on 17.1.4.2. Replacing shell pipeline example from the docs:
#!/usr/bin/env python
from subprocess import Popen, PIPE
with open('input_file', 'rb') as input_file
script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()
You could also use plumbum module to get shell-like syntax in Python:
#!/usr/bin/env python
from plumbum import local
script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()
See also How do I use subprocess.Popen to connect multiple processes by pipes?
If you want to read/write line by line then the answer depends on the concrete scripts that you want to run. In general it is easy to deadlock sending/receiving input/output if you are not careful e.g., due to buffering issues.
If input doesn't depend on output in your case then a reliable cross-platform approach is to use a separate thread for each stream:
#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread
def pump_input(pipe):
try:
for i in xrange(1000000000): # generate large input
print >>pipe, i
finally:
pipe.close()
p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
for line in iter(p.stdout.readline, b''):
print line.strip()[::-1] # print reversed lines
finally:
p.stdout.close()
p.wait()

python, subprocess: reading output from subprocess

I have following script:
#!/usr/bin/python
while True:
x = raw_input()
print x[::-1]
I am calling it from ipython:
In [5]: p = Popen('./script.py', stdin=PIPE)
In [6]: p.stdin.write('abc\n')
cba
and it works fine.
However, when I do this:
In [7]: p = Popen('./script.py', stdin=PIPE, stdout=PIPE)
In [8]: p.stdin.write('abc\n')
In [9]: p.stdout.read()
the interpreter hangs. What am I doing wrong? I would like to be able to both write and read from another process multiple times, to pass some tasks to this process. What do I need to do differently?
EDIT 1
If I use communicate, I get this:
In [7]: p = Popen('./script.py', stdin=PIPE, stdout=PIPE)
In [8]: p.communicate('abc\n')
Traceback (most recent call last):
File "./script.py", line 4, in <module>
x = raw_input()
EOFError: EOF when reading a line
Out[8]: ('cba\n', None)
EDIT 2
I tried flushing:
#!/usr/bin/python
import sys
while True:
x = raw_input()
print x[::-1]
sys.stdout.flush()
and here:
In [5]: from subprocess import PIPE, Popen
In [6]: p = Popen('./script.py', stdin=PIPE, stdout=PIPE)
In [7]: p.stdin.write('abc')
In [8]: p.stdin.flush()
In [9]: p.stdout.read()
but it hangs again.

I believe there are two problems at work here:
1) Your parent script calls p.stdout.read(), which will read all data until end-of-file. However, your child script runs in an infinite loop so end-of-file will never happen. Probably you want p.stdout.readline()?
2) In interactive mode, most programs do buffer only one line at a time. When run from another program, they buffer much more. The buffering improves efficiency in many cases, but causes problems when two programs need to communicate interactively.
After p.stdin.write('abc\n') add:
p.stdin.flush()
In your subprocess script, after print x[::-1] add the following within the loop:
sys.stdout.flush()
(and import sys at the top)

The subprocess method check_output can be useful for this:
output = subprocess.check_output('./script.py')
And output will be the stdout from the process. If you need stderr, too:
output = subprocess.check_output('./script.py', stderr=subprocess.STDOUT)
Because you avoid managing pipes directly, it may circumvent your issue.

If you'd like to pass several lines to script.py then you need to read/write simultaneously:
#!/usr/bin/env python
import sys
from subprocess import PIPE, Popen
from threading import Thread
def print_output(out, ntrim=80):
for line in out:
print len(line)
if len(line) > ntrim: # truncate long output
line = line[:ntrim-2]+'..'
print line.rstrip()
if __name__=="__main__":
p = Popen(['python', 'script.py'], stdin=PIPE, stdout=PIPE)
Thread(target=print_output, args=(p.stdout,)).start()
for s in ['abc', 'def', 'ab'*10**7, 'ghi']:
print >>p.stdin, s
p.stdin.close()
sys.exit(p.wait()) #NOTE: read http://docs.python.org/library/subprocess.html#subprocess.Popen.wait
Output:
4
cba
4
fed
20000001
bababababababababababababababababababababababababababababababababababababababa..
4
ihg
Where script.py:
#!/usr/bin/env python
"""Print reverse lines."""
while True:
try: x = raw_input()
except EOFError:
break # no more input
else:
print x[::-1]
Or
#!/usr/bin/env python
"""Print reverse lines."""
import sys
for line in sys.stdin:
print line.rstrip()[::-1]
Or
#!/usr/bin/env python
"""Print reverse lines."""
import fileinput
for line in fileinput.input(): # accept files specified as command line arguments
print line.rstrip()[::-1]

You're probably tripping over Python's output buffering. Here's what python --help has to say about it.
-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
see man page for details on internal buffering relating to '-u'

When you are through writing to p.stdin, close it: p.stdin.close()

Use communicate() instead of .stdout.read().
Example:
from subprocess import Popen, PIPE
p = Popen('./script.py', stdin=PIPE, stdout=PIPE, stderr=PIPE)
input = 'abc\n'
stdout, stderr = p.communicate(input)
This recommendation comes from the Popen objects section in the subprocess documentation:
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read
to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the
child process.

catching stdout in realtime from subprocess

I want to subprocess.Popen() rsync.exe in Windows, and print the stdout in Python.
My code works, but it doesn't catch the progress until a file transfer is done! I want to print the progress for each file in real time.
Using Python 3.1 now since I heard it should be better at handling IO.
import subprocess, time, os, sys
cmd = "rsync.exe -vaz -P source/ dest/"
p, line = True, 'start'
p = subprocess.Popen(cmd,
shell=True,
bufsize=64,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()

Some rules of thumb for subprocess.
Never use shell=True. It needlessly invokes an extra shell process to call your program.
When calling processes, arguments are passed around as lists. sys.argv in python is a list, and so is argv in C. So you pass a list to Popen to call subprocesses, not a string.
Don't redirect stderr to a PIPE when you're not reading it.
Don't redirect stdin when you're not writing to it.
Example:
import subprocess, time, os, sys
cmd = ["rsync.exe", "-vaz", "-P", "source/" ,"dest/"]
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
for line in iter(p.stdout.readline, b''):
print(">>> " + line.rstrip())
That said, it is probable that rsync buffers its output when it detects that it is connected to a pipe instead of a terminal. This is the default behavior - when connected to a pipe, programs must explicitly flush stdout for realtime results, otherwise standard C library will buffer.
To test for that, try running this instead:
cmd = [sys.executable, 'test_out.py']
and create a test_out.py file with the contents:
import sys
import time
print ("Hello")
sys.stdout.flush()
time.sleep(10)
print ("World")
Executing that subprocess should give you "Hello" and wait 10 seconds before giving "World". If that happens with the python code above and not with rsync, that means rsync itself is buffering output, so you are out of luck.
A solution would be to connect direct to a pty, using something like pexpect.

I know this is an old topic, but there is a solution now. Call the rsync with option --outbuf=L. Example:
cmd=['rsync', '-arzv','--backup','--outbuf=L','source/','dest']
p = subprocess.Popen(cmd,
stdout=subprocess.PIPE)
for line in iter(p.stdout.readline, b''):
print '>>> {}'.format(line.rstrip())

Depending on the use case, you might also want to disable the buffering in the subprocess itself.
If the subprocess will be a Python process, you could do this before the call:
os.environ["PYTHONUNBUFFERED"] = "1"
Or alternatively pass this in the env argument to Popen.
Otherwise, if you are on Linux/Unix, you can use the stdbuf tool. E.g. like:
cmd = ["stdbuf", "-oL"] + cmd
See also here about stdbuf or other options.

On Linux, I had the same problem of getting rid of the buffering. I finally used "stdbuf -o0" (or, unbuffer from expect) to get rid of the PIPE buffering.
proc = Popen(['stdbuf', '-o0'] + cmd, stdout=PIPE, stderr=PIPE)
stdout = proc.stdout
I could then use select.select on stdout.
See also https://unix.stackexchange.com/questions/25372/

for line in p.stdout:
...
always blocks until the next line-feed.
For "real-time" behaviour you have to do something like this:
while True:
inchar = p.stdout.read(1)
if inchar: #neither empty string nor None
print(str(inchar), end='') #or end=None to flush immediately
else:
print('') #flush for implicit line-buffering
break
The while-loop is left when the child process closes its stdout or exits.
read()/read(-1) would block until the child process closed its stdout or exited.

Your problem is:
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()
the iterator itself has extra buffering.
Try doing like this:
while True:
line = p.stdout.readline()
if not line:
break
print line

You cannot get stdout to print unbuffered to a pipe (unless you can rewrite the program that prints to stdout), so here is my solution:
Redirect stdout to sterr, which is not buffered. '<cmd> 1>&2' should do it. Open the process as follows: myproc = subprocess.Popen('<cmd> 1>&2', stderr=subprocess.PIPE)
You cannot distinguish from stdout or stderr, but you get all output immediately.
Hope this helps anyone tackling this problem.

To avoid caching of output you might wanna try pexpect,
child = pexpect.spawn(launchcmd,args,timeout=None)
while True:
try:
child.expect('\n')
print(child.before)
except pexpect.EOF:
break
PS : I know this question is pretty old, still providing the solution which worked for me.
PPS: got this answer from another question

p = subprocess.Popen(command,
bufsize=0,
universal_newlines=True)
I am writing a GUI for rsync in python, and have the same probelms. This problem has troubled me for several days until i find this in pyDoc.
If universal_newlines is True, the file objects stdout and stderr are opened as text files in universal newlines mode. Lines may be terminated by any of '\n', the Unix end-of-line convention, '\r', the old Macintosh convention or '\r\n', the Windows convention. All of these external representations are seen as '\n' by the Python program.
It seems that rsync will output '\r' when translate is going on.

if you run something like this in a thread and save the ffmpeg_time property in a property of a method so you can access it, it would work very nice
I get outputs like this:
output be like if you use threading in tkinter
input = 'path/input_file.mp4'
output = 'path/input_file.mp4'
command = "ffmpeg -y -v quiet -stats -i \"" + str(input) + "\" -metadata title=\"#alaa_sanatisharif\" -preset ultrafast -vcodec copy -r 50 -vsync 1 -async 1 \"" + output + "\""
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True)
for line in self.process.stdout:
reg = re.search('\d\d:\d\d:\d\d', line)
ffmpeg_time = reg.group(0) if reg else ''
print(ffmpeg_time)

Change the stdout from the rsync process to be unbuffered.
p = subprocess.Popen(cmd,
shell=True,
bufsize=0, # 0=unbuffered, 1=line-buffered, else buffer-size
stdin=subprocess.PIPE,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)

I've noticed that there is no mention of using a temporary file as intermediate. The following gets around the buffering issues by outputting to a temporary file and allows you to parse the data coming from rsync without connecting to a pty. I tested the following on a linux box, and the output of rsync tends to differ across platforms, so the regular expressions to parse the output may vary:
import subprocess, time, tempfile, re
pipe_output, file_name = tempfile.TemporaryFile()
cmd = ["rsync", "-vaz", "-P", "/src/" ,"/dest"]
p = subprocess.Popen(cmd, stdout=pipe_output,
stderr=subprocess.STDOUT)
while p.poll() is None:
# p.poll() returns None while the program is still running
# sleep for 1 second
time.sleep(1)
last_line = open(file_name).readlines()
# it's possible that it hasn't output yet, so continue
if len(last_line) == 0: continue
last_line = last_line[-1]
# Matching to "[bytes downloaded] number% [speed] number:number:number"
match_it = re.match(".* ([0-9]*)%.* ([0-9]*:[0-9]*:[0-9]*).*", last_line)
if not match_it: continue
# in this case, the percentage is stored in match_it.group(1),
# time in match_it.group(2). We could do something with it here...

In Python 3, here's a solution, which takes a command off the command line and delivers real-time nicely decoded strings as they are received.
Receiver (receiver.py):
import subprocess
import sys
cmd = sys.argv[1:]
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in p.stdout:
print("received: {}".format(line.rstrip().decode("utf-8")))
Example simple program that could generate real-time output (dummy_out.py):
import time
import sys
for i in range(5):
print("hello {}".format(i))
sys.stdout.flush()
time.sleep(1)
Output:
$python receiver.py python dummy_out.py
received: hello 0
received: hello 1
received: hello 2
received: hello 3
received: hello 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to implement a complex process pipe in Python 2.6? - python

Related

python how to use subprocess pipe with linux shell

Interaction between Python script and linux shell

Using POpen to send a variable to Stdin and to send Stdout to a variable

python, subprocess: reading output from subprocess

catching stdout in realtime from subprocess

Categories

Resources