Blocking writing to stdout - python

I'm writing a Python script that will use subprocesses. The main idea is to have one parent script that runs specialised child scripts, which e.g. run other programs or do some stuff on their own. There are pipes between parent script and subprocesses. I use them to control whether subprocess is still responding by sending some characters on regular basis and checking the response. The problem is that when the subprocess prints anything on screen (i.e. writes to stdout or stderr), the pipes are broken and everything crashes. So my main question is whether it is possible to block writing to std* in the subprocess, so only legitimate response written to pipe would be possible? I have already tried Stop a function from writing to stdout but without any success.
Also other ideas for communcation between parent and subprocess are welcome (except file based pipes). However, the subprocesses must be used.

I strongly believe that you do not just have to accept "that when the subprocess prints anything on screen (i.e. writes to stdout or stderr), the pipes are broken and everything crashes". You can solve this problem. Then you do not need to "block" the subprocesses from writing to standard streams.
Make proper use of all the power of the subprocess module. First of all, connect a subprocess.PIPE to each of the standard streams of a subprocess:
p = subprocess.Popen(
[executable, arg1, arg2],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
Run the subprocess and interact with it through those pipes:
stdout, stderr = p.communicate(stdin="command")
If communicate() is not flexible enough (if you need to monitor several subprocesses at the same time and/or if the stdin data to a certain subprocess depends on its output in response to a previous command) you can directly interact with the p.stdout, p.stderr, p.stdin attributes. In this case, you will likely have to build your own monitoring loop and make use of p.poll() and/or p.returncode. Controlling the subprocesses can also be realized via p.send_signal().

You can pass a function to subprocess.Popen that is executed prior to executing the requested program:
def close_std():
os.close(0)
os.close(1)
os.close(2)
p = subprocess.Popen(cmd, preexec_fn=close_std)
Note the use of low-level os.close; closing sys.std* will only have effect in the forked Python process. Also, be aware that if your underlying programs are Python scripts, they may die due to an exception when they try to write to closed file descriptors.

Related

Python subprocess: Print to stdin, read stdout until newline, repeat

I am looking to interface with an interactive command line application using Python 3.5. The idea is that I start the process at the beginning of the Python script and leave it open. In a loop, I print a file path, followed by a line return, to stdin, wait for a quarter second or so as it processes, and read from stdout until it reaches a newline.
This is quite similar to the communicate feature of subprocess, but I am waiting for a line return instead of waiting for the process to terminate. Anyone aware of a relatively simple way to do this?
Edit: it would be preferable to use the standard library to do this, rather than third-party libraries such as pexpect, if possible.
You can use subprocess.Popen for this.
Something like this:
proc = subprocess.Popen(['my-command'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
Now proc.stdin and proc.stdout are your ends of pipes that send data to the subprocess stdin and read from the subprocess stdout.
Since you're only interested in reading newline-terminated lines, you can probably get around any problems caused by buffering. Buffering is one of the big gotchas when using subprocess to communicate with interactive processes. Usually I/O is line-buffered, meaning that if the subprocess doesn't terminate a line with newline, you might never see any data on proc.stdout, and vice versa with you writing to proc.stdin - it might not see it if you're not ending with newline. You can turn buffering off, but that's not so simple, and not platform independent.
Another problem you might have to solve is that you can't determine whether the subprocess is waiting for input or has sent you output except by writing and reading from the pipes. So you might need to start a second thread so you can wait for output on proc.stdout and write to proc.stdin at the same time without running into a deadlock because both processes are blocking on pipe I/O (or, if you're on a Unix which supports select with file handles, use select to determine which pipes are ready to receive or ready to be read from).
This sounds like a job for an event loop. The subprocess module starts to show its strain under complex tasks.
I've done this with Twisted, by subclassing the following:
twisted.internet.endpoints.ProcessEndpoint
twisted.protocols.basic.LineOnlyReceiver
Most documentation for Twisted uses sockets as endpoints, but it's not hard to adjust the code for processes.

What is the difference if I don't use stdout=subprocess.PIPE in subprocess.Popen()?

I recently noted in Python the subprocess.Popen() has an argument:
stdout=None(default)
I also saw people using stdout=subprocess.PIPE.
What is the difference? Which one should I use?
Another question would be, why the wait() function can't wait until the process is really done sometimes? I used:
a = sp.Popen(....,shell=True)
a.wait()
a2 = sp.Popen(...,shell=True)
a2.wait()
sometimes the a2 command is executed before the command a is done.
stdout=None means, the stdout-handle from the process is directly inherited from the parent, in easier words it basically means, it gets printed to the console (same applies for stderr).
Then you have the option stderr=STDOUT, this redirects stderr to the stdout, which means the output of stdout and stderr are forwarded to the same file handle.
If you set stdout=PIPE, Python will redirect the data from the process to a new file handle, which can be accessed through p.stdout (p beeing a Popen object). You would use this to capture the output of the process, or for the case of stdin to send data (constantly) to stdin.
But mostly you want to use p.communicate, which allows you to send data to the process once (if you need to) and returns the complete stderr and stdout if the process is completed!
One more interesting fact, you can pass any file-object to stdin/stderr/stdout, e.g. also a file opened with open (the object has to provide a fileno() method).
To your wait problem. This should not be the case! As workaround you could use p.poll() to check if the process did exit! What is the return-value of the wait call?
Furthermore, you should avoid shell=True especially if you pass user-input as first argument, this could be used by a malicious user to exploit your program! Also it launches a shell process which means additional overhead. Of course there is the 1% of cases where you actually need shell=True, I can't judge this with your minimalistic example.
stdout=None means that subprocess prints to whatever place your script prints
stdout=PIPE means that subprocess' stdout is redirected to a pipe that you should read e.g., using process.communicate() to read all at once or using process.stdout object to read via a file/iterator interfaces

Use python subprocess module like a command line simulator

I am writing a test framework in Python for a command line application. The application will create directories, call other shell scripts in the current directory and will output on the Stdout.
I am trying to treat {Python-SubProcess, CommandLine} combo as equivalent to {Selenium, Browser}. The first component plays something on the second and checks if the output is expected. I am facing the following problems
The Popen construct takes a command and returns back after that command is completed. What I want is a live handle to the process so I can run further commands + verifications and finally close the shell once done
I am okay with writing some infrastructure code for achieveing this since we have a lot of command line applications that need testing like this.
Here is a sample code that I am running
p = subprocess.Popen("/bin/bash", cwd = test_dir)
p.communicate(input = "hostname") --> I expect the hostname to be printed out
p.communicate(input = "time") --> I expect current time to be printed out
but the process hangs or may be I am doing something wrong. Also how do I "grab" the output of that sub process so I can assert that something exists?
subprocess.Popen allows you to continue execution after starting a process. The Popen objects expose wait(), poll() and many other methods to communicate with a child process when it is running. Isn't it what you need?
See Popen constructor and Popen objects description for details.
Here is a small example that runs Bash on Unix systems and executes a command:
from subprocess import Popen, PIPE
p = Popen (['/bin/sh'], stdout=PIPE, stderr=PIPE, stdin=PIPE)
sout, serr = p.communicate('ls\n')
print 'OUT:'
print sout
print 'ERR:'
print serr
UPD: communicate() waits for process termination. If you do not need that, you may use the appropriate pipes directly, though that usually gives you rather ugly code.
UPD2: You updated the question. Yes, you cannot call communicate twice for a single process. You may either give all commands you need to execute in a single call to communicate and check the whole output, or work with pipes (Popen.stdin, Popen.stdout, Popen.stderr). If possible, I strongly recommend the first solution (using communicate).
Otherwise you will have to put a command to input and wait for some time for desired output. What you need is non-blocking read to avoid hanging when there is nothing to read. Here is a recipe how to emulate a non-blocking mode on pipes using threads. The code is ugly and strangely complicated for such a trivial purpose, but that's how it's done.
Another option could be using p.stdout.fileno() for select.select() call, but that won't work on Windows (on Windows select operates only on objects originating from WinSock). You may consider it if you are not on Windows.
Instead of using plain subprocess you might find Python sh library very useful:
http://amoffat.github.com/sh/
Here is an example how to build in an asynchronous interaction loop with sh:
http://amoffat.github.com/sh/tutorials/2-interacting_with_processes.html
Another (old) library for solving this problem is pexpect:
http://www.noah.org/wiki/pexpect

Repeatedly write to STDIN and read STDOUT of a Subprocess without closing it

I am trying to employ a Subprocess in Python for keeping an external script open in a Server-like fashion. The external script first loads a model. Once this is done, it accepts requests via STDIN and returns processed strings to STDOUT.
So far, I've tried
tokenizer = subprocess.Popen([tokenizer_path, '-l', lang_prefix], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
However, I cannot use
tokenizer.stdin.write(input_string+'\n')
out = self._tokenizer.stdout.readline()
in order to repeatedly process input_strings by means of the subprocess – out will just be empty, no matter if I use stdout.read() or stdout.readline(). However, it works when I close the stdin with tokenizer.stdin.close() before reading STDOUT, but this closes the subprocess, which is not what I want as I would have to reload the whole external script again before sending another request.
Is there any way to use a subprocess in a server-like fashion in python without closing and re-opening it?
Thanks to this Answer, I found out that a slave handle must be used in order to properly communicate with the subprocess:
master, slave = pty.openpty()
tokenizer = subprocess.Popen(script, shell=True stdin=subprocess.PIPE, stdout=slave)
stdin_handle = process.stdin
stdout_handle = os.fdopen(master)
Now, I can communicate to the subprocess without closing it via
stdin_handle.write(input)
stdout_handle.readline() #gets the processed input
Your external script probably buffers its output, so you only can read it in the father when the buffer in the child is flushed (which the child must do itself). One way to make it flush its buffers is probably closing the input because then it terminates in a proper fashion and flushes its buffers in the process.
If you have control over the external program (i. e. if you can patch it), insert a flushing after the output is produced.
Otherwise programs sometimes can be made to not buffer their output by attaching them to a pseudo-TTY (many programs, including the stdlib, assume that when their output is going to a TTY, no buffering is wished). But this is a bit tricky.

question about pexpect in python

I tried both pexpect and subprocess.Popen from python to call an external long term background process (this process use socket to communicate with external applications), with following details.
subprocess.Popen(launchcmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
This works fine. I do not need to do anything else. However, because I have to get the output immediately, I choose pexpect to avoid the pipe file buffer problem.
obj= pexpect.spawn(launchcmd, timeout=None)
after launching external process, I use a separate thread to do "readline" to read the output of the launched process "obj", and everything is ok.
obj= pexpect.spawn(launchcmd, timeout=None)
after launching external process, I did nothing further, i.e., just leave it there. Although, by using the "ps -e" command I can find the launched process, but the launched process seems blocked and cannot communicate in sockets with other applications.
OK. To be more specific, I put some sample code to formulate my question.
import subprocess
import pexpect
import os
t=1
while(True):
if(t==1):
background_process="./XXX.out"
launchcmd = [background_process]
#---option 3--------
p=pexpect.spawn(launchcmd, timeout=None) # process launced, problem with socket.
#---option 1--------
p=subprocess.Popen(launchcmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) # process launced, everything fine
t=0
Could anyone tell me what's wrong with the 3rd option? And if it is due to the fact that I did not use a separate thread to manipulate the output, why 1st option works with subprocess.popen? I suspect there is something wrong with pexpect to launch a process using socket, but I am not sure, especially considering option 2 works well.
I think that you are making this too complicated.
Yes, it is a good idea to use a pty instead of a pipe to communicate with the background process because most applications recognize tty/pty devices and switch to using unbuffered output, (or at least line-buffered).
But why pexpect? Just use Python's pty module. First call openpty to get some filehandles and then use Popen to spawn the process. Example code is found in the following question (the answer with the green checkmark) Python Run a daemon sub-process & read stdout

Categories

Resources