Why is subprocess.run output different from shell output of same command?

Why is subprocess.run output different from shell output of same command? - python

I am using subprocess.run() for some automated testing. Mostly to automate doing:
dummy.exe < file.txt > foo.txt
diff file.txt foo.txt
If you execute the above redirection in a shell, the two files are always identical. But whenever file.txt is too long, the below Python code does not return the correct result.
This is the Python code:
import subprocess
import sys
def main(argv):
exe_path = r'dummy.exe'
file_path = r'file.txt'
with open(file_path, 'r') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE, universal_newlines=True)
out = p.stdout.strip()
err = p.stderr
if stdin == out:
print('OK')
else:
print('failed: ' + out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the C++ code in dummy.cc:
#include <iostream>
int main()
{
int size, count, a, b;
std::cin >> size;
std::cin >> count;
std::cout << size << " " << count << std::endl;
for (int i = 0; i < count; ++i)
{
std::cin >> a >> b;
std::cout << a << " " << b << std::endl;
}
}
file.txt can be anything like this:
1 100000
0 417
0 842
0 919
...
The second integer on the first line is the number of lines following, hence here file.txt will be 100,001 lines long.
Question: Am I misusing subprocess.run() ?
Edit
My exact Python code after comment (newlines,rb) is taken into account:
import subprocess
import sys
import os
def main(argv):
base_dir = os.path.dirname(__file__)
exe_path = os.path.join(base_dir, 'dummy.exe')
file_path = os.path.join(base_dir, 'infile.txt')
out_path = os.path.join(base_dir, 'outfile.txt')
with open(file_path, 'rb') as test_file:
stdin = test_file.read().strip()
p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE)
out = p.stdout.strip()
if stdin == out:
print('OK')
else:
with open(out_path, "wb") as text_file:
text_file.write(out)
if __name__ == "__main__":
main(sys.argv[1:])
Here is the first diff:
Here is the input file: https://drive.google.com/open?id=0B--mU_EsNUGTR3VKaktvQVNtLTQ

To reproduce, the shell command:
subprocess.run("dummy.exe < file.txt > foo.txt", shell=True, check=True)
without the shell in Python:
with open('file.txt', 'rb', 0) as input_file, \
open('foo.txt', 'wb', 0) as output_file:
subprocess.run(["dummy.exe"], stdin=input_file, stdout=output_file, check=True)
It works with arbitrary large files.
You could use subprocess.check_call() in this case (available since Python 2), instead of subprocess.run() that is available only in Python 3.5+.
Works very well thanks. But then why was the original failing ? Pipe buffer size as in Kevin Answer ?
It has nothing to do with OS pipe buffers. The warning from the subprocess docs that #Kevin J. Chase cites is unrelated to subprocess.run(). You should care about OS pipe buffers only if you use process = Popen() and manually read()/write() via multiple pipe streams (process.stdin/.stdout/.stderr).
It turns out that the observed behavior is due to Windows bug in the Universal CRT. Here's the same issue that is reproduced without Python: Why would redirection work where piping fails?
As said in the bug description, to workaround it:
"use a binary pipe and do text mode CRLF => LF translation manually on the reader side" or use ReadFile() directly instead of std::cin
or wait for Windows 10 update this summer (where the bug should be fixed)
or use a different C++ compiler e.g., there is no issue if you use g++ on Windows
The bug affects only text pipes i.e., the code that uses <> should be fine (stdin=input_file, stdout=output_file should still work or it is some other bug).

I'll start with a disclaimer: I don't have Python 3.5 (so I can't use the run function), and I wasn't able to reproduce your problem on Windows (Python 3.4.4) or Linux (3.1.6). That said...
Problems with subprocess.PIPE and Family
The subprocess.run docs say that it's just a front-end for the old subprocess.Popen-and-communicate() technique. The subprocess.Popen.communicate docs warn that:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
This sure sounds like your problem. Unfortunately, the docs don't say how much data is "large", nor what will happen after "too much" data is read. Just "don't do that, then".
The docs for subprocess.call go into a little more detail (emphasis mine)...
Do not use stdout=PIPE or stderr=PIPE with this function. The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from.
...as do the docs for subprocess.Popen.wait:
This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.
That sure sounds like Popen.communicate is the solution to this problem, but communicate's own docs say "do not use this method if the data size is large" --- exactly the situation where the wait docs tell you to use communicate. (Maybe it "avoid(s) that" by silently dropping data on the floor?)
Frustratingly, I don't see any way to use a subprocess.PIPE safely, unless you're sure you can read from it faster than your child process writes to it.
On that note...
Alternative: tempfile.TemporaryFile
You're holding all your data in memory... twice, in fact. That can't be efficient, especially if it's already in a file.
If you're allowed to use a temporary file, you can compare the two files very easily, one line at a time. This avoids all the subprocess.PIPE mess, and it's much faster, because it only uses a little bit of RAM at a time. (The IO from your subprocess might be faster, too, depending on how your operating system handles output redirection.)
Again, I can't test run, so here's a slightly older Popen-and-communicate solution (minus main and the rest of your setup):
import io
import subprocess
import tempfile
def are_text_files_equal(file0, file1):
'''
Both files must be opened in "update" mode ('+' character), so
they can be rewound to their beginnings. Both files will be read
until just past the first differing line, or to the end of the
files if no differences were encountered.
'''
file0.seek(io.SEEK_SET)
file1.seek(io.SEEK_SET)
for line0, line1 in zip(file0, file1):
if line0 != line1:
return False
# Both files were identical to this point. See if either file
# has more data.
next0 = next(file0, '')
next1 = next(file1, '')
if next0 or next1:
return False
return True
def compare_subprocess_output(exe_path, input_path):
with tempfile.TemporaryFile(mode='w+t', encoding='utf8') as temp_file:
with open(input_path, 'r+t') as input_file:
p = subprocess.Popen(
[exe_path],
stdin=input_file,
stdout=temp_file, # No more PIPE.
stderr=subprocess.PIPE, # <sigh>
universal_newlines=True,
)
err = p.communicate()[1] # No need to store output.
# Compare input and output files... This must be inside
# the `with` block, or the TemporaryFile will close before
# we can use it.
if are_text_files_equal(temp_file, input_file):
print('OK')
else:
print('Failed: ' + str(err))
return
Unfortunately, since I can't reproduce your problem, even with a million-line input, I can't tell if this works. If nothing else, it ought to give you wrong answers faster.
Variant: Regular File
If you want to keep the output of your test run in foo.txt (from your command-line example), then you would direct your subprocess' output to a normal file instead of a TemporaryFile. This is the solution recommended in J.F. Sebastian's answer.
I can't tell from your question if you wanted foo.txt, or if it was just a side-effect of the two step test-then-diff --- your command-line example saves test output to a file, while your Python script doesn't. Saving the output would be handy if you ever want to investigate a test failure, but it requires coming up with a unique filename for each test you run, so they don't overwrite each other's output.

Related

Bypass a command subprocess that ask for user input ? (password) [duplicate]

I'm working with a piece of scientific software called Chimera. For some of the code downstream of this question, it requires that I use Python 2.7.
I want to call a process, give that process some input, read its output, give it more input based on that, etc.
I've used Popen to open the process, process.stdin.write to pass standard input, but then I've gotten stuck trying to get output while the process is still running. process.communicate() stops the process, process.stdout.readline() seems to keep me in an infinite loop.
Here's a simplified example of what I'd like to do:
Let's say I have a bash script called exampleInput.sh.
#!/bin/bash
# exampleInput.sh
# Read a number from the input
read -p 'Enter a number: ' num
# Multiply the number by 5
ans1=$( expr $num \* 5 )
# Give the user the multiplied number
echo $ans1
# Ask the user whether they want to keep going
read -p 'Based on the previous output, would you like to continue? ' doContinue
if [ $doContinue == "yes" ]
then
echo "Okay, moving on..."
# [...] more code here [...]
else
exit 0
fi
Interacting with this through the command line, I'd run the script, type in "5" and then, if it returned "25", I'd type "yes" and, if not, I would type "no".
I want to run a python script where I pass exampleInput.sh "5" and, if it gives me "25" back, then I pass "yes"
So far, this is as close as I can get:
#!/home/user/miniconda3/bin/python2
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
process.stdin.write("5")
answer = process.communicate()[0]
if answer == "25":
process.stdin.write("yes")
## I'd like to print the STDOUT here, but the process is already terminated
But that fails of course, because after `process.communicate()', my process isn't running anymore.
(Just in case/FYI): Actual problem
Chimera is usually a gui-based application to examine protein structure. If you run chimera --nogui, it'll open up a prompt and take input.
I often need to know what chimera outputs before I run my next command. For example, I will often try to generate a protein surface and, if Chimera can't generate a surface, it doesn't break--it just says so through STDOUT. So, in my python script, while I'm looping through many proteins to analyze, I need to check STDOUT to know whether to continue analysis on that protein.
In other use cases, I'll run lots of commands through Chimera to clean up a protein first, and then I'll want to run lots of separate commands to get different pieces of data, and use that data to decide whether to run other commands. I could get the data, close the subprocess, and then run another process, but that would require re-running all of those cleaning up commands each time.
Anyways, those are some of the real-world reasons why I want to be able to push STDIN to a subprocess, read the STDOUT, and still be able to push more STDIN.
Thanks for your time!

you don't need to use process.communicate in your example.
Simply read and write using process.stdin.write and process.stdout.read. Also make sure to send a newline, otherwise read won't return. And when you read from stdin, you also have to handle newlines coming from echo.
Note: process.stdout.read will block until EOF.
# talk_with_example_input.py
import subprocess
process = subprocess.Popen(["./exampleInput.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
process.stdin.write("5\n")
stdout = process.stdout.readline()
print(stdout)
if stdout == "25\n":
process.stdin.write("yes\n")
print(process.stdout.readline())
$ python2 test.py
25
Okay, moving on...
Update
When communicating with an program in that way, you have to pay special attention to what the application is actually writing. Best is to analyze the output in a hex editor:
$ chimera --nogui 2>&1 | hexdump -C
Please note that readline [1] only reads to the next newline (\n). In your case you have to call readline at least four times to get that first block of output.
If you just want to read everything up until the subprocess stops printing, you have to read byte by byte and implement a timeout. Sadly, neither read nor readline does provide such a timeout mechanism. This is probably because the underlying read syscall [2] (Linux) does not provide one either.
On Linux we can write a single-threaded read_with_timeout() using poll / select. For an example see [3].
from select import epoll, EPOLLIN
def read_with_timeout(fd, timeout__s):
"""Reads from fd until there is no new data for at least timeout__s seconds.
This only works on linux > 2.5.44.
"""
buf = []
e = epoll()
e.register(fd, EPOLLIN)
while True:
ret = e.poll(timeout__s)
if not ret or ret[0][1] is not EPOLLIN:
break
buf.append(
fd.read(1)
)
return ''.join(buf)
In case you need a reliable way to read non blocking under Windows and Linux, this answer might be helpful.
[1] from the python 2 docs:
readline(limit=-1)
Read and return one line from the stream. If limit is specified, at most limit bytes will be read.
The line terminator is always b'\n' for binary files; for text files, the newline argument to open() can be used to select the line terminator(s) recognized.
[2] from man 2 read:
#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
[3] example
$ tree
.
├── prog.py
└── prog.sh
prog.sh
#!/usr/bin/env bash
for i in $(seq 3); do
echo "${RANDOM}"
sleep 1
done
sleep 3
echo "${RANDOM}"
prog.py
# talk_with_example_input.py
import subprocess
from select import epoll, EPOLLIN
def read_with_timeout(fd, timeout__s):
"""Reads from f until there is no new data for at least timeout__s seconds.
This only works on linux > 2.5.44.
"""
buf = []
e = epoll()
e.register(fd, EPOLLIN)
while True:
ret = e.poll(timeout__s)
if not ret or ret[0][1] is not EPOLLIN:
break
buf.append(
fd.read(1)
)
return ''.join(buf)
process = subprocess.Popen(
["./prog.sh"],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE
)
print(read_with_timeout(process.stdout, 1.5))
print('-----')
print(read_with_timeout(process.stdout, 3))
$ python2 prog.py
6194
14508
11293
-----
10506

Python 2 Subprocess: Cannot get output from readline

I have the following C application
#include <stdio.h>
int main(void)
{
printf("hello world\n");
/* Go into an infinite loop here. */
while(1);
return 0;
}
And I have the following python code.
import subprocess
import time
import pprint
def run():
command = ["./myapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
try:
while process.poll() is None:
# HELP: This call blocks...
for i in process.stdout.readline():
print(i)
finally:
if process.poll() is None:
process.kill()
if __name__ == "__main__":
run()
When I run the python code, the stdout.readline or even stdout.read blocks.
If I run the application using subprocess.call(program) then I can see "hello world" in stdout.
How can I read input from stdout with the example I have provided?
Note: I would not want to modify my C program. I have tried this on both Python 2.7.17 and Python 3.7.5 under Ubuntu 19.10 and I get the same behaviour. Adding bufsize=0 did not help me.

The easiest way is to flush buffers in the C program
...
printf("hello world\n");
fflush(stdout);
while(1);
...
If you don't want to change the C program, you can manipulate the libc buffering behavior from outside. This can be done by using stdbuf to call your program (linux). The syntax is "stdbuf -o0 yourapplication" for zero buffering and "stdbuf -oL yourapplication" for line buffering. Therefore in your python code use
...
command = ["/usr/bin/stdbuf","-oL","pathtomyapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
...
or
...
command = ["/usr/bin/stdbuf","-o0","pathtomyapplication"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
...

Applications built using the C Standard IO Library (built with #include <stdio.h>) buffer input and output (see here for why). The stdio library, like isatty, can tell that it is writing to a pipe not a TTY and so it chooses block buffering instead of line buffering. Data is flushed when the buffer is full, but "hello world\n" is not filling the buffer so it's not flushed.
One way around is shown in Timo Hartmann answer, using stdbuf utility. This uses an LD_PRELOAD trick to swap in its own libstdbuf.so. In many cases that is a fine solution, but LD_PRELOAD is kind of a hack and does not work in some cases, so it may not be a general solution.
Maybe you want to do this directly in Python, and there are stdlib options to help here, you can use a pseudo-tty (docs py2, docs py3) connected to stdout instead of a pipe. The program myapplication should enable line buffering, meaning that any newline character flushes the buffer.
from __future__ import print_function
from subprocess import Popen, PIPE
import errno
import os
import pty
import sys
mfd, sfd = pty.openpty()
proc = Popen(["/tmp/myapplication"], stdout=sfd)
os.close(sfd) # don't wait for input
while True:
try:
output = os.read(mfd, 1000)
except OSError as e:
if e.errno != errno.EIO:
raise
else:
print(output)
Note that we are reading bytes from the output now, so we can not necessarily decode them right away!
See Processing the output of a subprocess with Python in realtime for a blog post cleaning up this idea. There are also existing third-party libraries to do this stuff, see ptyprocess.

How to pipe an online stream into a python script, and produce streaming output [duplicate]

I'm running memcached with the following bash command pattern:
memcached -vv 2>&1 | tee memkeywatch2010098.log 2>&1 | ~/bin/memtracer.py | tee memkeywatchCounts20100908.log
to try and track down unmatched gets to sets for keys platform wide.
The memtracer script is below and works as desired, with one minor issue. Watching the intermediate log file size, memtracer.py doesn't start getting input until memkeywatchYMD.log
is about 15-18K in size. Is there a better way to read in stdin or perhaps a way to cut the buffer size down to under 1k for faster response times?
#!/usr/bin/python
import sys
from collections import defaultdict
if __name__ == "__main__":
keys = defaultdict(int)
GET = 1
SET = 2
CLIENT = 1
SERVER = 2
#if <
for line in sys.stdin:
key = None
components = line.strip().split(" ")
#newConn = components[0][1:3]
direction = CLIENT if components[0].startswith("<") else SERVER
#if lastConn != newConn:
# lastConn = newConn
if direction == CLIENT:
command = SET if components[1] == "set" else GET
key = components[2]
if command == SET:
keys[key] -= 1
elif direction == SERVER:
command = components[1]
if command == "sending":
key = components[3]
keys[key] += 1
if key != None:
print "%s:%s" % ( key, keys[key], )

You can completely remove buffering from stdin/stdout by using python's -u flag:
-u : unbuffered binary stdout and stderr (also PYTHONUNBUFFERED=x)
see man page for details on internal buffering relating to '-u'
and the man page clarifies:
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread-
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.
Beyond this, altering the buffering for an existing file is not supported, but you can make a new file object with the same underlying file descriptor as an existing one, and possibly different buffering, using os.fdopen. I.e.,
import os
import sys
newin = os.fdopen(sys.stdin.fileno(), 'r', 100)
should bind newin to the name of a file object that reads the same FD as standard input, but buffered by only about 100 bytes at a time (and you could continue with sys.stdin = newin to use the new file object as standard input from there onwards). I say "should" because this area used to have a number of bugs and issues on some platforms (it's pretty hard functionality to provide cross-platform with full generality) -- I'm not sure what its state is now, but I'd definitely recommend thorough testing on all platforms of interest to ensure that everything goes smoothly. (-u, removing buffering entirely, should work with fewer problems across all platforms, if that might meet your requirements).

You can simply use sys.stdin.readline() instead of sys.stdin.__iter__():
import sys
while True:
line = sys.stdin.readline()
if not line: break # EOF
sys.stdout.write('> ' + line.upper())
This gives me line-buffered reads using Python 2.7.4 and Python 3.3.1 on Ubuntu 13.04.

The sys.stdin.__iter__ still being line-buffered, one can have an iterator that behaves mostly identically (stops at EOF, whereas stdin.__iter__ won't) by using the 2-argument form of iter to make an iterator of sys.stdin.readline:
import sys
for line in iter(sys.stdin.readline, ''):
sys.stdout.write('> ' + line.upper())
Or provide None as the sentinel (but note that then you need to handle the EOF condition yourself).

This worked for me in Python 3.4.3:
import os
import sys
unbuffered_stdin = os.fdopen(sys.stdin.fileno(), 'rb', buffering=0)
The documentation for fdopen() says it is just an alias for open().
open() has an optional buffering parameter:
buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer.
In other words:
Fully unbuffered stdin requires binary mode and passing zero as the buffer size.
Line-buffering requires text mode.
Any other buffer size seems to work in both binary and text modes (according to the documentation).

It may be that your troubles are not with Python but with the buffering that the Linux shell injects when chaining commands with pipes. When this is the problem, the input is not buffered by line, but by 4K block.
To stop this buffering, precede the command chain with the unbuffer command from the expect package, such as:
unbuffer memcached -vv 2>&1 | unbuffer -p tee memkeywatch2010098.log 2>&1 | unbuffer -p ~/bin/memtracer.py | tee memkeywatchCounts20100908.log
The unbuffer command needs the -p option when used in the middle of a pipeline.

The only way I could do it with python 2.7 was:
tty.setcbreak(sys.stdin.fileno())
from Python nonblocking console input . This completly disable the buffering and also suppress the echo.
EDIT: Regarding Alex's answer, the first proposition (invoking python with -u) is not possible in my case (see shebang limitation).
The second proposition (duplicating fd with smaller buffer: os.fdopen(sys.stdin.fileno(), 'r', 100)) is not working when I use a buffer of 0 or 1, as it is for an interactive input and I need every character pressed to be processed immediatly.

How to spawn another process and capture output in python?

I'm just learning Python but have about 16 years experience with PERL and PHP.
I'm trying to get the output of ngrep and write it to a log file using Python while also tailing the log file. I've seen some examples online but some seem old and outdated and others use shell=True which is discouraged.
In perl I just use something similar to the following
#!/usr/bin/perl
open(NGFH,"ngrep -iW byline $filter");
while ($line = <NGFH>) {
open(LOG,">> /path/to/file.log")
// highlighting, filtering, other sub routine calls
print LOG $line
}
I've gotten tail to work but ngrep doesn't. I'd like to be able to run this infinately and output the stream from ngrep to the log file after filtering. I couldn't get the output from ngrep to show in stdout so that's as far as I've gotten. I was expecting to be able to see the data file tail as the log file was updated and see the output from ngrep. For now i was just using bash to run the following.
echo "." >> /path/to/ngrep.log
Thanks!
Here's what I got so far...
Updated
This seems to work now. I wouldn't know how to improve on it though.
import subprocess
import select
import re
log = open('/path/to/ngrep.log','a+',0)
print log.name
n = subprocess.Popen(['ngrep', '-iW', 'byline'],\
stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
p = select.poll()
p.register(n.stdout)
f = subprocess.Popen(['tail','-F','-n','0','/path/to/tailme.log'],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p2 = select.poll()
p2.register(f.stdout)
def srtrepl(match):
if match.group(0) == 'x.x.x.x':
# do something
if match.group(0) == 'x.x.y.y':
# do something else
return '\033[92m'+ match.group(0) + '\033[0m'
while True:
if p.poll(1):
line = n.stdout.readline()
s = re.compile(r'(8.8.(4.4|8.8)|192.168.[0-9]{1,3}.[0-9]{1,3})' )
print s.sub( srtrepl, line )
log.write(n.stdout.readline())
if p2.poll(1):
print f.stdout.readline().rstrip('\n')

To emulate your perl code in Python:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen("ngrep -iW byline".split() + [filter_], stdout=PIPE) as process, \
open('/path/to/file.log', 'ab') as log_file:
for line in process.stdout: # read b'\n'-separated lines
# highlighting, filtering, other function calls
log_file.write(line)
It starts ngrep process passing filter_ variable and appends the output to the log file while allowing you to modify it in Python. See Python: read streaming input from subprocess.communicate() (there could be buffering issues: check whether ngrep supports --line-buffered option like grep and if you want to tail file.log then pass buffering=1 to open(), to enable line-buffering (only usable in the text-mode) or call log_file.flush() after log_file.write(line)).
You could emulate ngrep in pure Python too.
If you want to read output from several processes concurrently (ngrep, tail in your case) then you need to able to read pipes without blocking e.g., using threads, async.io.

Python `tee` stdout of child process

Is there a way in Python to do the equivalent of the UNIX command line tee? I'm doing a typical fork/exec pattern, and I'd like the stdout from the child to appear in both a log file and on the stdout of the parent simultaneously without requiring any buffering.
In this python code for instance, the stdout of the child ends up in the log file, but not in the stdout of the parent.
pid = os.fork()
logFile = open(path,"w")
if pid == 0:
os.dup2(logFile.fileno(),1)
os.execv(cmd)
edit: I do not wish to use the subprocess module. I'm doing some complicated stuff with the child process that requires me call fork manually.

Here you have a working solution without using the subprocess module. Although, you could use it for the tee process while still using the exec* functions suite for your custom subprocess (just use stdin=subprocess.PIPE and then duplicate the descriptor to your stdout).
import os, time, sys
pr, pw = os.pipe()
pid = os.fork()
if pid == 0:
os.close(pw)
os.dup2(pr, sys.stdin.fileno())
os.close(pr)
os.execv('/usr/bin/tee', ['tee', 'log.txt'])
else:
os.close(pr)
os.dup2(pw, sys.stdout.fileno())
os.close(pw)
pid2 = os.fork()
if pid2 == 0:
# Replace with your custom process call
os.execv('/usr/bin/yes', ['yes'])
else:
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
pass
Note that the tee command, internally, does the same thing as Ben suggested in his answer: reading input and looping over output file descriptors while writing to them. It may be more efficient because of the optimized implementation and because it's written in C, but you have the overhead of the different pipes (don't know for sure which solution is more efficient, but in my opinion, reassigning a custom file-like object to stdout is a more elegant solution).
Some more resources:
How do I duplicate sys.stdout to a log file in python?
http://www.shallowsky.com/blog/programming/python-tee.html

In the following, SOMEPATH is the path to the child executable, in a format suitable for subprocess.Popen (see its docs).
import sys, subprocess
f = open('logfile.txt', 'w')
proc = subprocess.Popen(SOMEPATH, stdout=subprocess.PIPE)
while True:
out = proc.stdout.read(1)
if out == '' and proc.poll() != None:
break
if out != '':
# CR workaround since chars are read one by one, and Windows interprets
# both CR and LF as end of lines. Linux only has LF
if out != '\r': f.write(out)
sys.stdout.write(out)
sys.stdout.flush()

Would an approach like this do what you want?
import sys
class Log(object):
def __init__(self, filename, mode, buffering):
self.filename = filename
self.mode = mode
self.handle = open(filename, mode, buffering)
def write(self, thing):
self.handle.write(thing)
sys.stdout.write(thing)
You'd probably need to implement more of the file interface for this to be really useful (and I've left out properly defaulting mode and buffering, if you want it). You could then do all your writes in the child process to an instance of Log. Or, if you wanted to be really magic, and you're sure you implement enough of the file interface that things won't fall over and die, you could potentially assign sys.stdout to be an instance of this class. Then I think any means of writing to stdout, including print, will go via the log class.
Edit to add: Obviously if you assign to sys.stdout you will have to do something else in the write method to echo the output to stdout!! I think you could use sys.__stdout__ for that.

Oh, you. I had a decent answer all prettied-up before I saw the last line of your example: execv(). Well, poop. The original idea was replacing each child process' stdout with an instance of this blog post's tee class, and split the stream into the original stdout, and the log file:
http://www.shallowsky.com/blog/programming/python-tee.html
But, since you're using execv(), the child process' tee instance would just get clobbered, so that won't work.
Unfortunately for you, there is no "out of the box" solution to your problem that I can find. The closest thing would be to spawn the actual tee program in a subprocess; if you wanted to be more cross-platform, you could fork a simple Python substitute.
First thing to know when coding a tee substitute: tee really is a simple program. In all the true C implementations I've seen, it's not much more complicated than this:
while((character = read()) != EOF) {
/* Write to all of the output streams in here, then write to stdout. */
}
Unfortunately, you can't just join two streams together. That would be really useful (so that the input of one stream would automatically be forwarded out of another), but we've no such luxury without coding it ourselves. So, Eli and I are going to have very similar answers. The difference is that, in my answer, the Python 'tee' is going to run in a separate process, via a pipe; that way, the parent thread is still useful!
(Remember: copy the blog post's tee class, too.)
import os, sys
# Open it for writing in binary mode.
logFile=open("bar", "bw")
# Verbose names, but I wanted to get the point across.
# These are file descriptors, i.e. integers.
parentSideOfPipe, childSideOfPipe = os.pipe()
# 'Tee' subprocess.
pid = os.fork()
if pid == 0:
while True:
char = os.read(parentSideOfPipe, 1)
logFile.write(char)
os.write(1, char)
# Actual command
pid = os.fork()
if pid == 0:
os.dup2(childSideOfPipe, 1)
os.execv(cmd)
I'm sorry if that's not what you wanted, but it's the best solution I can find.
Good luck with the rest of your project!

The first obvious answer is to fork an actual tee process but that is probably not ideal.
The tee code (from coreutils) merely reads each line and writes to each file in turn (effectively buffering).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.