Python subprocess Log and Display in Shell Issues

Python subprocess Log and Display in Shell Issues - python

I have a python script where I'm running an external archive command with subprocess.Popen(). Then I'm piping stdout to a sys write and a log file (see code below), because I need to print and log the output. The external command outputs progress like "Writing Frame 1 of 1,000", which I would like in my log.
So far I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working. Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
Any thoughts?
My script looks something like this:
archive_log = open('archive.log', 'w')
archive_log.write('Archive Begin')
process_archive = subprocess.Popen(["external_command", "-v", "-d"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) #Archive Command
for line in process_archive.stdout:
sys.stdout.write(line)
archive_log.write(line)
archive_log.write('Archive End')
archive_log.close()

It sounds like you're just trying to merge the subprocess's stdout and stderr into a single pipe. To do that, as the docs explain, you just pass stderr=subprocess.STDOUT.
If, on the other hand, you want to read from both pipes independently, without blocking on either one of them, then you need some explicit asynchronicity.
One way to do this is to just create two threads, one of them blocking on proc.stdout, the other on proc.stderr, then just have the main thread join both threads. (You probably want a lock inside the for body in each thread; that's the only way to make sure that lines are written atomically and in the same order on stdout and in the file.)
Alternatively, many reactor-type async I/O libraries, including the stdlib's own asyncio (if you're on 3.4+) and major third-party libs like Twisted can be used to multiplex multiple subprocess pipes.
Finally, at least if you're on Unix, if you understand all the details, you may be able to do it with just select or selectors. (If this doesn't make you say, "Aha, I know how to do it, I just have a question about one of the details", ignore this idea and use one of the other two.)
It's clear that you really do need stderr here. From your question:
Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
That means the subprocess is writing those messages to stderr, not stdout. So when you don't capture stderr, it just passes through to the terminal, rather than being captured and written to both the terminal and the log by your code.
And it's clear that you really do need them either merged or handled asynchronously:
I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working.
The reason the user thinks the script isn't working is that, although you haven't shown us the code that does this, clearly you're looping on stdout and then on stderr. This means the progress messages won't show up until stdout is done, so the user will think the script isn't working.

Is there a reason you aren't using check_call and the syslog module to do this?
You might also want to use with like this:
with open('archive.log', 'w') as archive:`
do stuff
You gain the benefit of the file being closed automatically.

Related

Send and receive data multiple times to subprocess (Python)

Issue
I am communicating with a terminal application (xfoil) and I want to isolate the stdout corresponding to each stdin.
This question is also more general as I wish to know why I can't open an application with subprocess, and then use successively its stdin and stdout (or rather how could I do it).
What I can do now
As of now, I can send instructions to Xfoil using process.communicate which retrieves the entire stdout.
import subprocess
xfoil = subprocess.Popen('path_to_xfoil.exe', stdin=subprocess.PIPE, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
[output, _] = xfoil.communicate(input=instructions)
What I want to achieve
Instead of having to deal with the entire stdout, I wish to isolate each set of instructions (stdin) and results (stdout).
Something in the lines of:
output1 = process.communicate(input=instructions1)
output2 = process.communicate(input=instructions2)
output3 = process.communicate(input=instructions3)
...
I need the process to stay open (which is not the case with communicate).
What I have attempted
Communicate multiple times with a process without breaking the pipe? is probably the way to go, however it does not explain clearly how to read the output, and the following piece of code simply freezes, probably because I have no idea when read should stop.
xfoil.stdin.write(instructions1)
xfoil.stdout.read() # never passes this line
xfoil.stdin.write(instructions2)
xfoil.stdout.read()
Non-blocking read on a subprocess.PIPE in python seemed a good path as well, however it only takes care of output.
Or perhaps I need to use the os module as in ipc - communicate multiple times with a subprocess in Python ?
Thank you for your help
PS: I read a tiny bit about fcntl but I need the code to work on both Linux and Windows.

Use python subprocess module like a command line simulator

I am writing a test framework in Python for a command line application. The application will create directories, call other shell scripts in the current directory and will output on the Stdout.
I am trying to treat {Python-SubProcess, CommandLine} combo as equivalent to {Selenium, Browser}. The first component plays something on the second and checks if the output is expected. I am facing the following problems
The Popen construct takes a command and returns back after that command is completed. What I want is a live handle to the process so I can run further commands + verifications and finally close the shell once done
I am okay with writing some infrastructure code for achieveing this since we have a lot of command line applications that need testing like this.
Here is a sample code that I am running
p = subprocess.Popen("/bin/bash", cwd = test_dir)
p.communicate(input = "hostname") --> I expect the hostname to be printed out
p.communicate(input = "time") --> I expect current time to be printed out
but the process hangs or may be I am doing something wrong. Also how do I "grab" the output of that sub process so I can assert that something exists?

subprocess.Popen allows you to continue execution after starting a process. The Popen objects expose wait(), poll() and many other methods to communicate with a child process when it is running. Isn't it what you need?
See Popen constructor and Popen objects description for details.
Here is a small example that runs Bash on Unix systems and executes a command:
from subprocess import Popen, PIPE
p = Popen (['/bin/sh'], stdout=PIPE, stderr=PIPE, stdin=PIPE)
sout, serr = p.communicate('ls\n')
print 'OUT:'
print sout
print 'ERR:'
print serr
UPD: communicate() waits for process termination. If you do not need that, you may use the appropriate pipes directly, though that usually gives you rather ugly code.
UPD2: You updated the question. Yes, you cannot call communicate twice for a single process. You may either give all commands you need to execute in a single call to communicate and check the whole output, or work with pipes (Popen.stdin, Popen.stdout, Popen.stderr). If possible, I strongly recommend the first solution (using communicate).
Otherwise you will have to put a command to input and wait for some time for desired output. What you need is non-blocking read to avoid hanging when there is nothing to read. Here is a recipe how to emulate a non-blocking mode on pipes using threads. The code is ugly and strangely complicated for such a trivial purpose, but that's how it's done.
Another option could be using p.stdout.fileno() for select.select() call, but that won't work on Windows (on Windows select operates only on objects originating from WinSock). You may consider it if you are not on Windows.

Instead of using plain subprocess you might find Python sh library very useful:
http://amoffat.github.com/sh/
Here is an example how to build in an asynchronous interaction loop with sh:
http://amoffat.github.com/sh/tutorials/2-interacting_with_processes.html
Another (old) library for solving this problem is pexpect:
http://www.noah.org/wiki/pexpect

Piping output in Python: multiple channels for data and messages?

I'm doing some python programming, and I would like to "pipe" the output of one program to another. That's easily doable using sys.stdin and sys.stdout. However, I would also like to be able to print info and warning messages to the terminal. Is there any (simple) way to have multiple channels, with messages printed to the terminal, but data being sent to another program?

You can use stderr for terminal output and stdout for the pipe.
You might find subprocess.Popen useful. It spawns another program as a separate process and allows you to define the file descriptors for stdin, stdout and stderr as you wish.

This is pretty much what the logging module is built for. You can log messages to loggers you can create and name arbitrarily from anywhere in your code, then by attaching handlers to the loggers, push the data to whatever consumer or consumers you want.
If that doesn't work for you, moooeeeep's subprocess suggestion is probably the next best avenue to explore. That is kind of the civilized way of accomplishing the same thing as overwriting sys.stdout, which is something that only barbarians and the infirm should ever do.

As the others said, use stdout for data and stderr for user interaction. Here's how to do it in recent python versions (that is, 3.x and 2.6 or newer):
import sys
print("data", file=sys.stdout)
print("more data") # sys.stdout is the default
print("user interaction", file=sys.stderr)
The result:
#!/usr/bin/env python
desktop➜ ~ ./tmp.py
data
more data
user interaction
desktop➜ ~ ./tmp.py > /dev/null
user interaction
desktop➜ ~ ./tmp.py 2&> /dev/null
data
more data

Honestly when I'm wanting to do this quickly I don't mess with the code, I just use tee. Its a *nix utility that does just what you are talking about splitting the pipe for display and piping. You could farther restrict what you are displaying with a grep. Great for debugging something that uses pipes. If this is part of your production system though I probably wouldn't pass info with pipes unless you have to. If you do, log your errors/warnings and tail -f your log.
I know not really a python answer, but it gets the job done.

Executing multiple commands using Popen.stdin

I'd like to execute multiple commands in a standalone application launched from a python script, using pipes. The only way I could reliably pass the commands to the stdin of the program was using Popen.communicate but it closes the program after the command gets executed. If I use Popen.stdin.write than the command executes only 1 time out of 5 or so, it does not work reliable. What am I doing wrong?
To elaborate a bit :
I have an application that listens to stdin for commands and executes them line by line.
I'd like to be able to run the application and pass various commands to it, based on the users interaction with a GUI.
This is a simple test example:
import os, string
from subprocess import Popen, PIPE
command = "anApplication"
process = Popen(command, shell=False, stderr=None, stdin=PIPE)
process.stdin.write("doSomething1\n")
process.stdin.flush()
process.stdin.write("doSomething2\n")
process.stdin.flush()
I'd expect to see the result of both commands but I don't get any response. (If I execute one of the Popen.write lines multiple times it occasionally works.)
And if I execute:
process.communicate("doSomething1")
it works perfectly but the application terminates.

If I understand your problem correctly, you want to interact (i.e. send commands and read the responses) with a console application.
If so, you may want to check an Expect-like library, like pexpect for Python: http://pexpect.sourceforge.net
It will make your life easier, because it will take care of synchronization, the problem that ddaa also describes. See also:
http://www.noah.org/wiki/Pexpect#Q:_Why_not_just_use_a_pipe_.28popen.28.29.29.3F

The real issue here is whether the application is buffering its output, and if it is whether there's anything you can do to stop it. Presumably when the user generates a command and clicks a button on your GUI you want to see the output from that command before you require the user to enter the next.
Unfortunately there's nothing you can do on the client side of subprocess.Popen to ensure that when you have passed the application a command the application is making sure that all output is flushed to the final destination. You can call flush() all you like, but if it doesn't do the same, and you can't make it, then you are doomed to looking for workarounds.

Your code in the question should work as is. If it doesn't then either your actual code is different (e.g., you might use stdout=PIPE that may change the child buffering behavior) or it might indicate a bug in the child application itself such as the read-ahead bug in Python 2 i.e., your input is sent correctly by the parent process but it is stuck in the child's internal input buffer.
The following works on my Ubuntu machine:
#!/usr/bin/env python
import time
from subprocess import Popen, PIPE
LINE_BUFFERED = 1
#NOTE: the first argument is a list
p = Popen(['cat'], bufsize=LINE_BUFFERED, stdin=PIPE,
universal_newlines=True)
with p.stdin:
for cmd in ["doSomething1\n", "doSomethingElse\n"]:
time.sleep(1) # a delay to see that the commands appear one by one
p.stdin.write(cmd)
p.stdin.flush() # use explicit flush() to workaround
# buffering bugs on some Python versions
rc = p.wait()

It sounds like your application is treating input from a pipe in a strange way. This means it won't get all of the commands you send until you close the pipe.
So the approach I would suggest is just to do this:
process.stdin.write("command1\n")
process.stdin.write("command2\n")
process.stdin.write("command3\n")
process.stdin.close()
It doesn't sound like your Python program is reading output from the application, so it shouldn't matter if you send the commands all at once like that.

python subprocess module: looping over stdout of child process

I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:
for line in data_stream.stdout:
#do stuff here
.
.
.
Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?

You have to worry about deadlocks if you're communicating with your subprocess, i.e. if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:
data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
... # BAD!
However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.
data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
... # Fine
If you need two-way communication, use communicate.

The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").
If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect.

SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.
This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate(). You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py.

data_stream.stdout is a standard output handle. you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr). this stdoutdata you should be using to do your stuff.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.