I'm writing a wrapper class for use with a workflow manager. I would like to log output from an application (child process executed via subprocess.Popen) in a certain way:
stdout of the child should go to a log file and to stdout of the parent,
stderr of the child should go to a different logfile, but also to stdout of the parent.
I.e. all output from the child should end up merged on stdout (like with subprocess.Popen(..., stderr=subprocess.STDOUT), so I can reserve stderr for log messages from the wrapper itself. On the other hand, the child's streams should go to different files to allow separate validation.
I've tried using a "Tee" helper class to tie two streams (stdout and the log file) together, so that Tee.write writes to both streams. However, this cannot be passed to Popen because "subprocess" uses OS-level functions for writing (see here: http://bugs.python.org/issue1631).
The problem with my current solution (code snippet below, adapted mostly from here) is that output on stdout may not appear in the right order.
How can I overcome this? Or should I use an altogether different approach?
(If I stick with the code below, how do I choose a value for the number of bytes in os.read?)
import subprocess, select, sys, os
call = ... # set this
process = subprocess.Popen(call, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
logs = {process.stdout: open("out.log", "w"), process.stderr: open("err.log", "w")}
done = {process.stdout: False, process.stderr: False}
while (process.poll() is None) or (not all(done.values())):
ready = select.select([process.stdout, process.stderr], [], [])[0]
for stream in ready:
data = os.read(stream.fileno(), 1)
if data:
sys.stdout.write(data)
logs[stream].write(data)
else:
done[stream] = True
logs[process.stdout].close()
logs[process.stderr].close()
By the way, this solution using "fcntl" has not worked for me. And I couldn't quite figure out how to adapt this solution to my case yet, so I haven't tried it.
If you set shell=True, you can pass a command string to subprocess that includes pipes, redirections, and the tee command.
Related
What I'd like to do is to, in Python, programmatically send a few initial commands via stdin to a process, and then pass input to the user to let them control the program afterward. The Python program should simply wait until the subprocess exits due to user input. In essence, what I want to do is something along the lines of:
import subprocess
p = subprocess.Popen(['cat'], stdin=subprocess.PIPE)
# Send initial commands.
p.stdin.write(b"three\ninitial\ncommands\n")
p.stdin.flush()
# Give over control to the user.
# …Although stdin can't simply be reassigned
# in post like this, it seems.
p.stdin = sys.stdin
# Wait for the subprocess to finish.
p.wait()
How can I pass stdin back to the user (not using raw_input, since I need the user's input to come into effect every keypress and not just after pressing enter)?
Unfortunately, there is no standard way to splice your own stdin to some other process's stdin for the duration of that process, other than to read from your own stdin and write to that process, once you have chosen to write to that process in the first place.
That is, you can do this:
proc = subprocess.Popen(...) # no stdin=
and the process will inherit your stdin; or you can do this:
proc = subprocess.Popen(..., stdin=subprocess.PIPE, ...)
and then you supply the stdin to that process. But once you have chosen to supply any of its stdin, you supply all of its stdin, even if that means you have to read your own stdin.
Linux offers a splice system call (documentation at man7.org, documentation at linux.die.net, Wikipedia, linux pipe data from file descriptor into a fifo) but your best bet is probably a background thread to copy the data.
So searching for this same thing, at least in my case, the pexpect library takes care of this:
https://pexpect.readthedocs.io/en/stable/
p = pexpect.spawn("ssh myhost")
p.sendline("some_line")
p.interact()
As by its name you can automate a lot of interaction before handing it over to the user.
Note, in your case you may want an output filter:
Using expect() and interact() simultaneously in pexpect
Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.
As per the subject line, is there a standard Python 2.7 library call that I can use like this:
(returncode, stdout, stderr) = invoke_subprocess(args, stdin)
The catch is that I want all three forms of output: the returncode and the entire (and separate) content from both stdout and stderr. I expected to find something like this in the subprocess module, but the closest I can see is subprocess.check_output, which gives you two out of three (the returncode and stdout).
There are other questions on stack overflow that suggest piping stderr to stdout, but I need to keep them separate because I'm using Python to drive testing of some non-Python applications where I'm trying to distinguish which output shows up on which streams. For example, I want to verify that human-readable error messages shows up in stderr, but that only machine-parseable output data shows up on stdout.
I think what I want is a simple four-liner...
def method(args, input):
from subprocess import Popen, PIPE
p = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = p.communicate(input)
return (p.poll(), out, err)
...so I'm a bit surprised that there's not already be a method like this, given the number of other methods that are mostly similar to this in the subprocess module. Does it exist and I'm missing it, or is it present somewhere in some other standard Python 2.7 module? Or is there a subtle reason why a method like this is a bad idea?
It doesn't exist because that's what subprocess.Popen is designed to do. The other functions exist mostly to support drop-in replacement of os.system and common simplistic automation cases. Trying to cover every use case with a one-liner would result in an even more bloated set of functions.
Also note that you can use p.returncode rather than p.poll(). It's guaranteed to have been set by the time p.communicate(...) returns.
You can use os.system(cmd) which return returncode of command.
Is there any way to display the output of a shell command in Python, as the command runs?
I have the following code to send commands to a specific shell (in this case, /bin/tcsh):
import subprocess
import select
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
poll = select.poll()
poll.register(cmd.stdout.fileno(),select.POLLIN)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
ready = poll.poll()
if ready:
result = cmd.stdout.readline()
print result
Also, I got the code above from this thread, but I am not sure I understand how the polling mechanism works.
What exactly is registered above?
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
Yes, it is entirely possible to display the output of a shell comamand as the command runs. There are two requirements:
1) The command must flush its output.
Many programs buffer their output differently according to whether the output is connected to a terminal, a pipe, or a file. If they are connected to a pipe, they might write their output in much bigger chunks much less often. For each program that you execute, consult its documentation. Some versions of /bin/cat', for example, have the -u switch.
2) You must read it piecemeal, and not all at once.
Your program must be structured to one piece at a time from the output stream. This means that you ought not do these, which each read the entire stream at one go:
cmd.stdout.read()
for i in cmd.stdout:
list(cmd.stdout.readline())
But instead, you could do one of these:
while not_dead_yet:
line = cmd.stdout.readline()
for line in iter(cmd.stdout.readline, b''):
pass
Now, for your three specific questions:
Is there any way to display the output of a shell command in Python, as the command runs?
Yes, but only if the command you are running outputs as it runs and doesn't save it up for the end.
What exactly is registered above?
The file descriptor which, when read, makes available the output of the subprocess.
Why do I need the variable ready if I don't pass any timeout to poll.poll()?
You don't. You also don't need the poll(). It is possible, if your commands list is fairly large, that might need to poll() both the stdin and stdout streams to avoid a deadlock. But if your commands list is fairly modest (less than 5Kbytes), then you will be OK just writing them at the beginning.
Here is one possible solution:
#! /usr/bin/python
import subprocess
import select
# Critical: all of this must fit inside ONE pipe() buffer
commands = ['echo Start\n', 'date\n', 'sleep 10\n', 'date\n', 'exit\n']
cmd = subprocess.Popen(['/bin/tcsh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# The list "commands" holds a list of shell commands
for command in commands:
cmd.stdin.write(command)
# Must include this to ensure data is passed to child process
cmd.stdin.flush()
for line in iter(cmd.stdout.readline, b''):
print line
Again, the same question.
The reason is - I still can't make it work after reading the following:
Real-time intercepting of stdout from another process in Python
Intercepting stdout of a subprocess while it is running
How do I get 'real-time' information back from a subprocess.Popen in python (2.5)
catching stdout in realtime from subprocess
My case is that I have a console app written in C, lets take for example this code in a loop:
tmp = 0.0;
printf("\ninput>>");
scanf_s("%f",&tmp);
printf ("\ninput was: %f",tmp);
It continuously reads some input and writes some output.
My python code to interact with it is the following:
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('12345\n')
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()
So far whenever I read form p.stdout it always waits until the process is terminated and then outputs an empty string. I've tried lots of stuff - but still the same result.
I tried Python 2.6 and 3.1, but the version doesn't matter - I just need to make it work somewhere.
Trying to write to and read from pipes to a sub-process is tricky because of the default buffering going on in both directions. It's extremely easy to get a deadlock where one or the other process (parent or child) is reading from an empty buffer, writing into a full buffer or doing a blocking read on a buffer that's awaiting data before the system libraries flush it.
For more modest amounts of data the Popen.communicate() method might be sufficient. However, for data that exceeds its buffering you'd probably get stalled processes (similar to what you're already seeing?)
You might want to look for details on using the fcntl module and making one or the other (or both) of your file descriptors non-blocking. In that case, of course, you'll have to wrap all reads and/or writes to those file descriptors in the appropriate exception handling to handle the "EWOULDBLOCK" events. (I don't remember the exact Python exception that's raised for these).
A completely different approach would be for your parent to use the select module and os.fork() ... and for the child process to execve() the target program after directly handling any file dup()ing. (Basically you'd be re-implement parts of Popen() but with different parent file descriptor (PIPE) handling.
Incidentally, .communicate, at least in Python's 2.5 and 2.6 standard libraries, will only handle about 64K of remote data (on Linux and FreeBSD). This number may vary based on various factors (possibly including the build options used to compile your Python interpreter, or the version of libc being linked to it). It is NOT simply limited by available memory (despite J.F. Sebastian's assertion to the contrary) but is limited to a much smaller value.
Push reading from the pipe into a separate thread that signals when a chunk of output is available:
How can I read all availably data from subprocess.Popen.stdout (non blocking)?
The bufsize=256 argument prevents 12345\n from being sent to the child process in a chunk smaller than 256 bytes, as it will be when omitting bufsize or inserting p.stdin.flush() after p.stdin.write(). Default behaviour is line-buffering.
In either case you should at least see one empty line before blocking as emitted by the first printf(\n...) in your example.
Your particular example doesn't require "real-time" interaction. The following works:
from subprocess import Popen, PIPE
p = Popen(["./a.out"], stdin=PIPE, stdout=PIPE)
output = p.communicate(b"12345")[0] # send input/read all output
print output,
where a.out is your example C program.
In general, for a dialog-based interaction with a subprocess you could use pexpect module (or its analogs on Windows):
import pexpect
child = pexpect.spawn("./a.out")
child.expect("input>>")
child.sendline("12345.67890") # send a number
child.expect(r"\d+\.\d+") # expect the number at the end
print float(child.after) # assert that we can parse it
child.close()
I had the same problem, and "proc.communicate()" does not solve it because it waits for process terminating.
So here is what is working for me, on Windows with Python 3.5.1 :
import subprocess as sp
myProcess = sp.Popen( cmd, creationflags=sp.CREATE_NEW_PROCESS_GROUP,stdout=sp.PIPE,stderr=sp.STDOUT)
while i<40:
i+=1
time.sleep(.5)
out = myProcess.stdout.readline().decode("utf-8").rstrip()
I guess creationflags and other arguments are not mandatory (but I don't have time to test), so this would be the minimal syntax :
myProcess = sp.Popen( cmd, stdout=sp.PIPE)
for i in range(40)
time.sleep(.5)
out = myProcess.stdout.readline()