Send and receive data multiple times to subprocess (Python)

Send and receive data multiple times to subprocess (Python) - python

Issue
I am communicating with a terminal application (xfoil) and I want to isolate the stdout corresponding to each stdin.
This question is also more general as I wish to know why I can't open an application with subprocess, and then use successively its stdin and stdout (or rather how could I do it).
What I can do now
As of now, I can send instructions to Xfoil using process.communicate which retrieves the entire stdout.
import subprocess
xfoil = subprocess.Popen('path_to_xfoil.exe', stdin=subprocess.PIPE, \
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
[output, _] = xfoil.communicate(input=instructions)
What I want to achieve
Instead of having to deal with the entire stdout, I wish to isolate each set of instructions (stdin) and results (stdout).
Something in the lines of:
output1 = process.communicate(input=instructions1)
output2 = process.communicate(input=instructions2)
output3 = process.communicate(input=instructions3)
...
I need the process to stay open (which is not the case with communicate).
What I have attempted
Communicate multiple times with a process without breaking the pipe? is probably the way to go, however it does not explain clearly how to read the output, and the following piece of code simply freezes, probably because I have no idea when read should stop.
xfoil.stdin.write(instructions1)
xfoil.stdout.read() # never passes this line
xfoil.stdin.write(instructions2)
xfoil.stdout.read()
Non-blocking read on a subprocess.PIPE in python seemed a good path as well, however it only takes care of output.
Or perhaps I need to use the os module as in ipc - communicate multiple times with a subprocess in Python ?
Thank you for your help
PS: I read a tiny bit about fcntl but I need the code to work on both Linux and Windows.

Related

Subprocess, repeatedly write to STDIN while reading from STDOUT (Windows)

I want to call an external process from python. The process I'm calling reads an input string and gives tokenized result, and waits for another input (binary is MeCab tokenizer if that helps).
I need to tokenize thousands of lines of string by calling this process.
Problem is Popen.communicate() works but waits for the process to die before giving out the STDOUT result. I don't want to keep closing and opening new subprocesses for thousands of times. (And I don't want to send the whole text, it may easily grow over tens of thousands of -long- lines in future.)
from subprocess import PIPE, Popen
with Popen("mecab -O wakati".split(), stdin=PIPE,
stdout=PIPE, stderr=PIPE, close_fds=False,
universal_newlines=True, bufsize=1) as proc:
output, errors = proc.communicate("foobarbaz")
print(output)
I've tried reading proc.stdout.read() instead of using communicate but it is blocked by stdin and doesn't return any results before proc.stdin.close() is called. Which, again means I need to create a new process everytime.
I've tried to implement queues and threads from a similar question as below, but it either doesn't return anything so it's stuck on While True, or when I force stdin buffer to fill by repeteadly sending strings, it outputs all the results at once.
from subprocess import PIPE, Popen
from threading import Thread
from queue import Queue, Empty
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen('mecab -O wakati'.split(), stdout=PIPE, stdin=PIPE,
universal_newlines=True, bufsize=1, close_fds=False)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True
t.start()
p.stdin.write("foobarbaz")
while True:
try:
line = q.get_nowait()
except Empty:
pass
else:
print(line)
break
Also looked at the Pexpect route, but it's windows port doesn't support some important modules (pty based ones), so I couldn't apply that as well.
I know there are a lot of similar answers, and I've tried most of them. But nothing I've tried seems to work on Windows.
EDIT: some info on the binary I'm using, when I use it via command line. It runs and tokenizes sentences I give, until I'm done and forcibly close the program.
(...waits_for_input -> input_recieved -> output -> waits_for_input...)
Thanks.

If mecab uses C FILE streams with default buffering, then piped stdout has a 4 KiB buffer. The idea here is that a program can efficiently use small, arbitrary-sized reads and writes to the buffers, and the underlying standard I/O implementation handles automatically filling and flushing the much-larger buffers. This minimizes the number of required system calls and maximizes throughput. Obviously you don't want this behavior for interactive console or terminal I/O or writing to stderr. In these cases the C runtime uses line-buffering or no buffering.
A program can override this behavior, and some do have command-line options to set the buffer size. For example, Python has the "-u" (unbuffered) option and PYTHONUNBUFFERED environment variable. If mecab doesn't have a similar option, then there isn't a generic workaround on Windows. The C runtime situation is too complicated. A Windows process can link statically or dynamically to one or several CRTs. The situation on Linux is different since a Linux process generally loads a single system CRT (e.g. GNU libc.so.6) into the global symbol table, which allows an LD_PRELOAD library to configure the C FILE streams. Linux stdbuf uses this trick, e.g. stdbuf -o0 mecab -O wakati.
One option to experiment with is to call CreateConsoleScreenBuffer and get a file descriptor for the handle from msvcrt.open_osfhandle. Then pass this as stdout instead of using a pipe. The child process will see this as a TTY and use line buffering instead of full buffering. However managing this is non-trivial. It would involve reading (i.e. ReadConsoleOutputCharacter) a sliding buffer (call GetConsoleScreenBufferInfo to track the cursor position) that's actively written to by another process. This kind of interaction isn't something that I've ever needed or even experimented with. But I have used a console screen buffer non-interactively, i.e. reading the buffer after the child has exited. This allows reading up to 9,999 lines of output from programs that write directly to the console instead of stdout, e.g. programs that call WriteConsole or open "CON" or "CONOUT$".

Here is a workaround for Windows. This should also be adaptable to other operating systems.
Download a console emulator like ConEmu (https://conemu.github.io/)
Start it instead of mecab as your subprocess.
p = Popen(['conemu'] , stdout=PIPE, stdin=PIPE,
universal_newlines=True, bufsize=1, close_fds=False)
Then send the following as the first input:
mecab -O wakafi & exit
You are letting the emulator handle the file output issues for you; the way it normally does when you manually interact with it.
I am still looking into this; but already looks promising...
Only problem is conemu is a gui application; so if no other way to hook into its input and output, one might have to tweak and rebuild from sources (it's open source). I haven't found any other way; but this should work.
I have asked the question about running in some sort of console mode here; so you can check that thread also for something. The author Maximus is on SO...

The code
while True:
try:
line = q.get_nowait()
except Empty:
pass
else:
print(line)
break
is essentially the same as
print(q.get())
except less efficient because it burns CPU time while waiting. The explicit loop won't make data from the subprocess arrive sooner; it arrives when it arrives.
For dealing with uncooperative binaries I have a few suggestions, from best to worst:
Find a Python library and use that instead. It appears that there's an official Python binding in the MeCab source tree and I see some prebuilt packages on PyPI. You can also look for a DLL build that you can call with ctypes or another Python FFI. If that doesn't work...
Find a binary that flushes after each line of output. The most recent Win32 build I found online, v0.98, does flush after each line. Failing that...
Build your own binary that flushes after each line. It should be easy enough to find the main loop and insert a flush call in it. But MeCab seems to explicitly flush already, and git blame says that the flush statement was last changed in 2011, so I'm surprised you ever had this problem and I suspect that there may have just been a bug in your Python code. Failing that...
Process the output asynchronously. If your concern is that you want to deal with the output in parallel with the tokenization for performance reasons, you can mostly do that, after the first 4K. Just do the processing in the second thread instead of stuffing the lines in a queue. If you can't do that...
This is a terrible hack but it may work in some cases: intersperse your inputs with dummy inputs that produce at least 4K of output. For example, you could output 2047 blank lines after every real input line (2047 CRLFs plus the CRLF from the real output = 4K), or a single line of b'A' * 4092 + b'\r\n', whichever is faster.
Not on this list at all is an approach suggested by the two previous answers: directing the output to a Win32 console and scraping the console. This is a terrible idea because scraping gets you cooked output as a rectangular array of characters. The scraper has no way to know whether two lines were originally one overlong line that wrapped. If it guesses wrong, your outputs will get out of sync with your inputs. It's impossible to work around output buffering in this way if you care at all about the integrity of the output.

I guess the answer, if not the solution, can be found here
https://github.com/ikriv/ConsoleProxy/blob/master/src/Tools/Exec/readme.md
I guess, because I had a similar problem, which I worked around, and could not try this route because this tool is not available for Windows 2003, which is the OS I had to use (in a VM for a legacy application).
I'd like to know if I guessed right.

Python subprocess Log and Display in Shell Issues

I have a python script where I'm running an external archive command with subprocess.Popen(). Then I'm piping stdout to a sys write and a log file (see code below), because I need to print and log the output. The external command outputs progress like "Writing Frame 1 of 1,000", which I would like in my log.
So far I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working. Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
Any thoughts?
My script looks something like this:
archive_log = open('archive.log', 'w')
archive_log.write('Archive Begin')
process_archive = subprocess.Popen(["external_command", "-v", "-d"], stdout=subprocess.PIPE, stderr=subprocess.PIPE) #Archive Command
for line in process_archive.stdout:
sys.stdout.write(line)
archive_log.write(line)
archive_log.write('Archive End')
archive_log.close()

It sounds like you're just trying to merge the subprocess's stdout and stderr into a single pipe. To do that, as the docs explain, you just pass stderr=subprocess.STDOUT.
If, on the other hand, you want to read from both pipes independently, without blocking on either one of them, then you need some explicit asynchronicity.
One way to do this is to just create two threads, one of them blocking on proc.stdout, the other on proc.stderr, then just have the main thread join both threads. (You probably want a lock inside the for body in each thread; that's the only way to make sure that lines are written atomically and in the same order on stdout and in the file.)
Alternatively, many reactor-type async I/O libraries, including the stdlib's own asyncio (if you're on 3.4+) and major third-party libs like Twisted can be used to multiplex multiple subprocess pipes.
Finally, at least if you're on Unix, if you understand all the details, you may be able to do it with just select or selectors. (If this doesn't make you say, "Aha, I know how to do it, I just have a question about one of the details", ignore this idea and use one of the other two.)
It's clear that you really do need stderr here. From your question:
Or I just have "stdout=subprocess.PIPE" the progress "Writing of Frame..." aren't in the log file.
That means the subprocess is writing those messages to stderr, not stdout. So when you don't capture stderr, it just passes through to the terminal, rather than being captured and written to both the terminal and the log by your code.
And it's clear that you really do need them either merged or handled asynchronously:
I can either have it display/write in large blocks by including "stdout=subprocess.PIPE, stderr=subprocess.PIPE", but then the user thinks the script isn't working.
The reason the user thinks the script isn't working is that, although you haven't shown us the code that does this, clearly you're looping on stdout and then on stderr. This means the progress messages won't show up until stdout is done, so the user will think the script isn't working.

Is there a reason you aren't using check_call and the syslog module to do this?
You might also want to use with like this:
with open('archive.log', 'w') as archive:`
do stuff
You gain the benefit of the file being closed automatically.

Use python subprocess module like a command line simulator

I am writing a test framework in Python for a command line application. The application will create directories, call other shell scripts in the current directory and will output on the Stdout.
I am trying to treat {Python-SubProcess, CommandLine} combo as equivalent to {Selenium, Browser}. The first component plays something on the second and checks if the output is expected. I am facing the following problems
The Popen construct takes a command and returns back after that command is completed. What I want is a live handle to the process so I can run further commands + verifications and finally close the shell once done
I am okay with writing some infrastructure code for achieveing this since we have a lot of command line applications that need testing like this.
Here is a sample code that I am running
p = subprocess.Popen("/bin/bash", cwd = test_dir)
p.communicate(input = "hostname") --> I expect the hostname to be printed out
p.communicate(input = "time") --> I expect current time to be printed out
but the process hangs or may be I am doing something wrong. Also how do I "grab" the output of that sub process so I can assert that something exists?

subprocess.Popen allows you to continue execution after starting a process. The Popen objects expose wait(), poll() and many other methods to communicate with a child process when it is running. Isn't it what you need?
See Popen constructor and Popen objects description for details.
Here is a small example that runs Bash on Unix systems and executes a command:
from subprocess import Popen, PIPE
p = Popen (['/bin/sh'], stdout=PIPE, stderr=PIPE, stdin=PIPE)
sout, serr = p.communicate('ls\n')
print 'OUT:'
print sout
print 'ERR:'
print serr
UPD: communicate() waits for process termination. If you do not need that, you may use the appropriate pipes directly, though that usually gives you rather ugly code.
UPD2: You updated the question. Yes, you cannot call communicate twice for a single process. You may either give all commands you need to execute in a single call to communicate and check the whole output, or work with pipes (Popen.stdin, Popen.stdout, Popen.stderr). If possible, I strongly recommend the first solution (using communicate).
Otherwise you will have to put a command to input and wait for some time for desired output. What you need is non-blocking read to avoid hanging when there is nothing to read. Here is a recipe how to emulate a non-blocking mode on pipes using threads. The code is ugly and strangely complicated for such a trivial purpose, but that's how it's done.
Another option could be using p.stdout.fileno() for select.select() call, but that won't work on Windows (on Windows select operates only on objects originating from WinSock). You may consider it if you are not on Windows.

Instead of using plain subprocess you might find Python sh library very useful:
http://amoffat.github.com/sh/
Here is an example how to build in an asynchronous interaction loop with sh:
http://amoffat.github.com/sh/tutorials/2-interacting_with_processes.html
Another (old) library for solving this problem is pexpect:
http://www.noah.org/wiki/pexpect

Executing multiple commands using Popen.stdin

I'd like to execute multiple commands in a standalone application launched from a python script, using pipes. The only way I could reliably pass the commands to the stdin of the program was using Popen.communicate but it closes the program after the command gets executed. If I use Popen.stdin.write than the command executes only 1 time out of 5 or so, it does not work reliable. What am I doing wrong?
To elaborate a bit :
I have an application that listens to stdin for commands and executes them line by line.
I'd like to be able to run the application and pass various commands to it, based on the users interaction with a GUI.
This is a simple test example:
import os, string
from subprocess import Popen, PIPE
command = "anApplication"
process = Popen(command, shell=False, stderr=None, stdin=PIPE)
process.stdin.write("doSomething1\n")
process.stdin.flush()
process.stdin.write("doSomething2\n")
process.stdin.flush()
I'd expect to see the result of both commands but I don't get any response. (If I execute one of the Popen.write lines multiple times it occasionally works.)
And if I execute:
process.communicate("doSomething1")
it works perfectly but the application terminates.

If I understand your problem correctly, you want to interact (i.e. send commands and read the responses) with a console application.
If so, you may want to check an Expect-like library, like pexpect for Python: http://pexpect.sourceforge.net
It will make your life easier, because it will take care of synchronization, the problem that ddaa also describes. See also:
http://www.noah.org/wiki/Pexpect#Q:_Why_not_just_use_a_pipe_.28popen.28.29.29.3F

The real issue here is whether the application is buffering its output, and if it is whether there's anything you can do to stop it. Presumably when the user generates a command and clicks a button on your GUI you want to see the output from that command before you require the user to enter the next.
Unfortunately there's nothing you can do on the client side of subprocess.Popen to ensure that when you have passed the application a command the application is making sure that all output is flushed to the final destination. You can call flush() all you like, but if it doesn't do the same, and you can't make it, then you are doomed to looking for workarounds.

Your code in the question should work as is. If it doesn't then either your actual code is different (e.g., you might use stdout=PIPE that may change the child buffering behavior) or it might indicate a bug in the child application itself such as the read-ahead bug in Python 2 i.e., your input is sent correctly by the parent process but it is stuck in the child's internal input buffer.
The following works on my Ubuntu machine:
#!/usr/bin/env python
import time
from subprocess import Popen, PIPE
LINE_BUFFERED = 1
#NOTE: the first argument is a list
p = Popen(['cat'], bufsize=LINE_BUFFERED, stdin=PIPE,
universal_newlines=True)
with p.stdin:
for cmd in ["doSomething1\n", "doSomethingElse\n"]:
time.sleep(1) # a delay to see that the commands appear one by one
p.stdin.write(cmd)
p.stdin.flush() # use explicit flush() to workaround
# buffering bugs on some Python versions
rc = p.wait()

It sounds like your application is treating input from a pipe in a strange way. This means it won't get all of the commands you send until you close the pipe.
So the approach I would suggest is just to do this:
process.stdin.write("command1\n")
process.stdin.write("command2\n")
process.stdin.write("command3\n")
process.stdin.close()
It doesn't sound like your Python program is reading output from the application, so it shouldn't matter if you send the commands all at once like that.

Logging output of external program with (wx)python

I'm writing a GUI for using the oracle exp/imp commands and starting sql-scripts through sqlplus. The subprocess class makes it easy to launch the commands, but I need some additional functionality. I want to get rid of the command prompt when using my wxPython GUI, but I still need a way to show the output of the exp/imp commands.
I already tried these two methods:
command = "exp userid=user/pwd#nsn file=dump.dmp"
process = subprocess.Popen(command, stdout=subprocess.PIPE)
output = process.communicate()[0]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.wait()
output = process.stdout.read()
Through one of these methods (forgot which one) I really got the output of exp/imp, but only after the command finishes, which is quite worthless to me, as I need a frequent update during these potentially long running operations. And sqlplus made even more problems, as sqlplus mostly wants some input when an error occurs. When this happens python waits for the process to finish but the user can't see the prompt, so you don't know how long to wait or what to do...
What I'd like to have is a wrapper that outputs everything I can see on the standard commandline. I want to log this to a file and show it inside a wxPython control.
I also tried the code from this page: http://code.activestate.com/recipes/440554/
but this can't read the output either.
The OutputWrapper from this answer doesn't work either: How can I capture all exceptions from a wxPython application?
Any help would be appreciated!
EDIT:
The subprocesses don't seem to flush their output. I already tried it with .readline().
My Tool has to run on windows and unix, so pexpect is no solution if there's no windows version. And using cx_oracle would be extreme overkill as I would have to rebuild the whole functionality of exp, imp and sqlplus.

The solution is to use a list for your command
command = ["exp", "userid=user/pwd#nsn", "file=dump.dmp"]
process = subprocess.Popen(command, stdout=subprocess.PIPE)
then you read process.stdout in a line-by-line basis:
line = process.stdout.readline()
that way you can update the GUI without waiting. IF the subprocess you are running (exp) flushes output. It is possible that the output is buffered, then you won't see anything until the output buffer is full. If that is the case then you are probably out of luck.

If you're on Linux, check out pexpect. It does exactly what you want.
If you need to work on Windows, maybe you should bite the bullet and use Python bindings to Oracle, such as cx_Oracle, instead of running CL stuff via subprocess.

Are these solutions able to capture stderr as well? I see you have stdout= option above. How do you make sure to get stderr as well? Another question is is there a way to use import logging/import logging.handlers to capture command stdout/stderr. It would be interesting to be able to use the logger with its buildt in formatters/rotaters,etc.

Try this:
import subprocess
command = "ping google.com"
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
output = process.stdout
while 1:
print output.readline(),

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.