Piping output in Python: multiple channels for data and messages?

Piping output in Python: multiple channels for data and messages? - python

I'm doing some python programming, and I would like to "pipe" the output of one program to another. That's easily doable using sys.stdin and sys.stdout. However, I would also like to be able to print info and warning messages to the terminal. Is there any (simple) way to have multiple channels, with messages printed to the terminal, but data being sent to another program?

You can use stderr for terminal output and stdout for the pipe.
You might find subprocess.Popen useful. It spawns another program as a separate process and allows you to define the file descriptors for stdin, stdout and stderr as you wish.

This is pretty much what the logging module is built for. You can log messages to loggers you can create and name arbitrarily from anywhere in your code, then by attaching handlers to the loggers, push the data to whatever consumer or consumers you want.
If that doesn't work for you, moooeeeep's subprocess suggestion is probably the next best avenue to explore. That is kind of the civilized way of accomplishing the same thing as overwriting sys.stdout, which is something that only barbarians and the infirm should ever do.

As the others said, use stdout for data and stderr for user interaction. Here's how to do it in recent python versions (that is, 3.x and 2.6 or newer):
import sys
print("data", file=sys.stdout)
print("more data") # sys.stdout is the default
print("user interaction", file=sys.stderr)
The result:
#!/usr/bin/env python
desktop➜ ~ ./tmp.py
data
more data
user interaction
desktop➜ ~ ./tmp.py > /dev/null
user interaction
desktop➜ ~ ./tmp.py 2&> /dev/null
data
more data

Honestly when I'm wanting to do this quickly I don't mess with the code, I just use tee. Its a *nix utility that does just what you are talking about splitting the pipe for display and piping. You could farther restrict what you are displaying with a grep. Great for debugging something that uses pipes. If this is part of your production system though I probably wouldn't pass info with pipes unless you have to. If you do, log your errors/warnings and tail -f your log.
I know not really a python answer, but it gets the job done.

Related

Merge stdout and stderr with Python popen AND detect if stderr is empty

I want to run a sub process with popen that can send messages to both stdout and stderr, but that sub process will continue along it's merry way despite writing to stderr.
I want to stream stdout and stderr together so I get the output in the exact order it occurred (or was flushed I guess technically). Then, I want to log that full result set. BUT I also want to independently know if stderr is empty. If it is not, I'm going to throw an exception.
It is clear to me how I can get them separately, or merged together, but how can I perhaps do both?

First off I solved my actual use case in large part by digging further into the process producing the streams. I was using the MySQL client, and found the right combination of command line switches would solve the real goals. I'll leave that out since it's entirely specific to that situation.
Before I came upon that, however, I did find a very useful direction to explore - the use of the program "Tee" and / or a Python source equivalent. I did not try to implement it, but is seems like a method one could use to simultaneously collect the output from both stdout and stderr, while sending just stdout to another file descriptor. If the two outputs are not the same, then that would tell you if stderr was utilized.
Check out :
https://www.computerhope.com/unix/utee.htm
And :
Python subprocess get children's output to file and terminal?

Python subprocess Log and Display in Shell Issues

I have a python script where I'm running an external archive command with subprocess.Popen(). Then I'm piping stdout to a sys write and a log file (see code below), because I need to print and log the output. The external command outputs progress like "Writing Frame 1 of 1,000", which I would like in my log.
So far I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working. Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
Any thoughts?
My script looks something like this:
archive_log = open('archive.log', 'w')
archive_log.write('Archive Begin')
process_archive = subprocess.Popen(["external_command", "-v", "-d"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) #Archive Command
for line in process_archive.stdout:
sys.stdout.write(line)
archive_log.write(line)
archive_log.write('Archive End')
archive_log.close()

It sounds like you're just trying to merge the subprocess's stdout and stderr into a single pipe. To do that, as the docs explain, you just pass stderr=subprocess.STDOUT.
If, on the other hand, you want to read from both pipes independently, without blocking on either one of them, then you need some explicit asynchronicity.
One way to do this is to just create two threads, one of them blocking on proc.stdout, the other on proc.stderr, then just have the main thread join both threads. (You probably want a lock inside the for body in each thread; that's the only way to make sure that lines are written atomically and in the same order on stdout and in the file.)
Alternatively, many reactor-type async I/O libraries, including the stdlib's own asyncio (if you're on 3.4+) and major third-party libs like Twisted can be used to multiplex multiple subprocess pipes.
Finally, at least if you're on Unix, if you understand all the details, you may be able to do it with just select or selectors. (If this doesn't make you say, "Aha, I know how to do it, I just have a question about one of the details", ignore this idea and use one of the other two.)
It's clear that you really do need stderr here. From your question:
Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
That means the subprocess is writing those messages to stderr, not stdout. So when you don't capture stderr, it just passes through to the terminal, rather than being captured and written to both the terminal and the log by your code.
And it's clear that you really do need them either merged or handled asynchronously:
I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working.
The reason the user thinks the script isn't working is that, although you haven't shown us the code that does this, clearly you're looping on stdout and then on stderr. This means the progress messages won't show up until stdout is done, so the user will think the script isn't working.

Is there a reason you aren't using check_call and the syslog module to do this?
You might also want to use with like this:
with open('archive.log', 'w') as archive:`
do stuff
You gain the benefit of the file being closed automatically.

Preserving bash redirection in a python subprocess

To begin with, I am only allowed to use python 2.4.4
I need to write a process controller in python which launches and various subprocesses monitors how they affect the environment. Each of these subprocesses are themselves python scripts.
When executed from the unix shell, the command lines look something like this:
python myscript arg1 arg2 arg3 >output.log 2>err.log &
I am not interested in the input or the output, python does not need to process. The python program only needs to know
1) The pid of each process
2) Whether each process is running.
And the processes run continuously.
I have tried reading in the output and just sending it out a file again but then I run into issues with readline not being asynchronous, for which there are several answers many of them very complex.
How can I a formulate a python subprocess call that preserves the bash redirection operations?
Thanks

If I understand your question correctly, it sounds like what you are looking for here is to be able to launch a list of scripts with the output redirected to files. In that case, launch each of your tasks something like this:
task = subprocess.Popen(['python', 'myscript', 'arg1', 'arg2', 'arg3'],
stdout=open('output.log', 'w'), stderr=open('err.log', 'w'))
Doing this means that the subprocess's stdout and stderr are redirected to files that the monitoring process opened, but the monitoring process does not have to be involved in copying data around. You can also redirect the subprocess stdins as well, if needed.
Note that you'll likely want to handle error cases and such, which aren't handled in this example.

You can use existing file descriptors as the stdout/stderr arguments to subprocess.Popen. This should be exquivalent to running from with redirection from bash. That redirection is implemented with fdup(2) after fork and the output should never touch your program. You can probably also pass fopen('/dev/null') as a file descriptor.
Alternatively you can redirect the stdout/stderr of your controller program and pass None as stdout/stderr. Children should print to your controllers stdout/stderr without passing through python itself. This works because the children will inherit the stdin/stdout descriptors of the controller, which were redirected by bash at launch time.

The subprocess module is good.
You can also do this on *ix with os.fork() and a periodic os.wait() with a WNOHANG.

python subprocess module: looping over stdout of child process

I have some commands which I am running using the subprocess module. I then want to loop over the lines of the output. The documentation says do not do data_stream.stdout.read which I am not but I may be doing something which calls that. I am looping over the output like this:
for line in data_stream.stdout:
#do stuff here
.
.
.
Can this cause deadlocks like reading from data_stream.stdout or are the Popen modules set up for this kind of looping such that it uses the communicate code but handles all the callings of it for you?

You have to worry about deadlocks if you're communicating with your subprocess, i.e. if you're writing to stdin as well as reading from stdout. Because these pipes may be cached, doing this kind of two-way communication is very much a no-no:
data_stream = Popen(mycmd, stdin=PIPE, stdout=PIPE)
data_stream.stdin.write("do something\n")
for line in data_stream:
... # BAD!
However, if you've not set up stdin (or stderr) when constructing data_stream, you should be fine.
data_stream = Popen(mycmd, stdout=PIPE)
for line in data_stream.stdout:
... # Fine
If you need two-way communication, use communicate.

The two answer have caught the gist of the issue pretty well: don't mix writing something to the subprocess, reading something from it, writing again, etc -- the pipe's buffering means you're at risk of a deadlock. If you can, write everything you need to write to the subprocess FIRST, close that pipe, and only THEN read everything the subprocess has to say; communicate is nice for the purpose, IF the amount of data is not too large to fit in memory (if it is, you can still achieve the same effect "manually").
If you need finer-grain interaction, look instead at pexpect or, if you're on Windows, wexpect.

SilentGhost's/chrispy's answers are OK if you have a small to moderate amount of output from your subprocess. Sometimes, though, there may be a lot of output - too much to comfortably buffer in memory. In such a case, the thing to do is start() the process, and spawn a couple of threads - one to read child.stdout and one to read child.stderr where child is the subprocess. You then need to wait() for the subprocess to terminate.
This is actually how communicate() works; the advantage of using your own threads is that you can process the output from the subprocess as it is generated. For example, in my project python-gnupg I use this technique to read status output from the GnuPG executable as it is generated, rather than waiting for all of it by calling communicate(). You are welcome to inspect the source of this project - the relevant stuff is in the module gnupg.py.

data_stream.stdout is a standard output handle. you shouldn't be looping over it. communicate returns tuple of (stdoutdata, stderr). this stdoutdata you should be using to do your stuff.

Creating a new terminal/shell window to simply display text

I want to pipe [edit: real-time text] the output of several subprocesses (sometimes chained, sometimes parallel) to a single terminal/tty window that is not the active python shell (be it an IDE, command-line, or a running script using tkinter). IPython is not an option. I need something that comes with the standard install. Prefer OS-agnostic solution, but needs to work on XP/Vista.
I'll post what I've tried already if you want it, but it’s embarrassing.

A good solution in Unix would be named pipes. I know you asked about Windows, but there might be a similar approach in Windows, or this might be helpful for someone else.
on terminal 1:
mkfifo /tmp/display_data
myapp >> /tmp/display_data
on terminal 2 (bash):
tail -f /tmp/display_data
Edit: changed terminal 2 command to use "tail -f" instead of infinite loop.

You say "pipe" so I assume you're dealing with text output from the subprocesses. A simple solution may be to just write output to files?
e.g. in the subprocess:
Redirect output %TEMP%\output.txt
On exit, copy output.txt to a directory your main process is watching.
In the main process:
Every second, examine directory for new files.
When files found, process and remove them.
You could encode the subprocess name in the output filename so you know how to process it.

You could make a producer-customer system, where lines are inserted over a socket (nothing fancy here).
The customer would be a multithreaded socket server listening to connections and putting all lines into a Queue. In the separate thread it would get items from the queue and print it on the console. The program can be run from the cmd console or from the eclipse console as an external tool without much trouble.
From your point of view, it should be realtime. As a bonus, You can place producers and customers on separate boxes. Producers can even form a network.
Some Examples of socket programming with python can be found here. Look here for an tcp echoserver example and here for a tcp "hello world" socket client.
There also is an extension for windows that enables usage of named pipes.
On linux (possibly cygwin?) You could just tail -f named-fifo.
Good luck!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.