Subprocess communicate: order matters? - python

So I'm trying to effectively create a "branch" in a pipe from subprocess. The idea is to load a file with Popen into a pipe's stdout. Then, I can send that stdout to two (or more) stdin's. This works, more or less. The problem comes when the process needs to see an EOF. As far as I can tell, this happens when you use communicate(None) on a subprocess. However, it also seems to depend on the order I spawned the two processes I'm trying to send data to.
#!/usr/bin/env python
from subprocess import *
import shutil
import os
import shlex
inSub=Popen(shlex.split('cat in.txt'),stdout=PIPE)
print inSub.poll()
queue=[]
for i in range(0,3):
temp=Popen(['cat'],stdin=PIPE)
queue=queue+[temp]
while True:
# print 'hi'
buf=os.read(inSub.stdout.fileno(),10000)
if buf == '': break
for proc in queue:
proc.stdin.write(buf)
queue[1].communicate()
print queue[1].poll()
As long as I use queue[1], things hang at the communicate() line. But if I use queue[2], things don't hang. What's going on? It shouldn't depend on the order the subprocesses were created, should it?
(The in.txt file can really be anything, it doesn't matter.)

I can't see any reason why it would be different for any one of the processes. In any case, closing the stdin pipes will cause Python to send the EOF, ending the processes:
...
while True:
# print 'hi'
buf = os.read(inSub.stdout.fileno(),10000)
if buf == '': break
for proc in queue:
proc.stdin.write(buf)
for proc in queue:
proc.stdin.close()
queue[1].communicate()
...

Related

python - force input() to read newline character from a different thread and forward execution

I want to figure out a way to programmatically avoid the builtin input() method stopping and waiting for user input.
Here is a snippet showing what I'm trying to do:
import sys
from threading import Thread
def kill_input():
sys.stdout.write('\n')
sys.stdout.flush() # just to make sure the output is really written to stdout and not bufferized
t = Thread(target=kill_input)
t.start()
foo = input('Press some key')
print('input() method has been bypassed')
Expected behavior: the script executes and terminates without waiting for enter key to be pressed.
On the contrary, what's happening is the program stopping to wait for user entering some input.
In my thoughts input() should read the newline character ('\n') printed on stdout by the other thread and terminates by executing the final print statement. That thread should simulate a user pressing the enter key. I do not understand what's going on behind
Maybe one other possible way is to close the stdin file descriptor from the non-main thread and catching the exception on the main one.
def kill_input():
sys.stdin.close()
Possibly I would like to avoid this option and rather understand what's going on behind this logic and find a way to force the main thread to read some mock characters from the stdin.
Edit - using subprocess module
Based on these related posts I've had a look to the subprocess module. I've thought this is the case for the Popen class to come in handy, so I've modified my script to exploit pipes
import sys
from subprocess import Popen, PIPE
def kill_input():
proc = Popen(['python3', '-c', 'pass'], stdin=PIPE)
proc.stdin.write('some text just to force parent proc to read'.encode())
proc.stdin.flush()
proc.stdin.close()
t = Thread(target=kill_input)
t.start()
sys.stdin.read()
print('input() method has been bypassed')
From my understanding, that should create a process with the Popen (the commend python3 -c 'pass' acts like a placeholder) whose (should?) stdin is a unix pipe opened with the parent process.
What I'm expecting is anything written to the child process stdin to go straight to the stdin of the parent in order to be read by the sys.stdin.read(). So the program shouldn't stop to wait for any user input and it should terminates instantly. Unfortunately, it doesn't happen and the script still waits for me pressing enter. I cannot really find out a workaround for this.
[Python version: 3.8.5]
In your first piece of code, you were writing to sys.stdout, which by default won't effect the contents of sys.stdin. Also, by default, you can't directly write to sys.stdin, but you can change it to a different file. To do this, you can use os.pipe(), which will return a tuple of a file descriptor for reading from the new pipe, and a file descriptor for writing to the pipe.
We can then use os.fdopen on these file descriptors, and assign sys.stdin to the read end of the pipe, while in another thread we write to the other end of the pipe.
import sys
import os
from threading import Thread
fake_stdin_read_fd, fake_stdin_write_fd = os.pipe()
fake_stdin_read = os.fdopen(fake_stdin_read_fd, 'r')
fake_stdin_write = os.fdopen(fake_stdin_write_fd, 'w')
sys.stdin = fake_stdin_read
def kill_input():
fake_stdin_write.write('hello\n')
fake_stdin_write.flush()
thread = Thread(target=kill_input)
thread.start()
input()
print('input() method has been bypassed!')

How to stream messages in a pipe from one process to another?

I have 2 python (2.7) processes.
The parent process needs to send rows of text to a child process, and the child process should process them as they come in (not wait for the parent process to finish).
I have this code which doesn't work:
# Sender
import subprocess
process = subprocess.Popen(['python', 'child.py'], bufsize=1, stdin=subprocess.PIPE)
try:
while True:
process.stdin.write(msg + '\n') # 'msg' is a changing string
# process.stdin.flush() <-- commented out since it doesn't help
except KeyboardInterrupt:
process.stdin.close()
process.wait()
And the child process:
# Receiver
import sys
for line in sys.stdin:
print line.strip()
The problem is that the child process waits until the parent process exits before it prints out the messages.
What I'm trying to achieve is a child process that processes the messages as soon as they are written to the pipe.
Try adding a process.stdin.flush() after your process.stdin.write(). That way you actually send the string to the other process. What you're suffering from here is your kernel caching everything you write. It does this to be more efficient when actually sending the data to the other process. flush force the kernel to send your data regardless of how full the kernel's buffer is.
I tried your code as such:
# Sender
import subprocess
process = subprocess.Popen(['python', 'child.py'], bufsize=1, stdin=subprocess.PIPE)
msg = "This is my message"
try:
while True:
process.stdin.write(msg + '\n') # 'msg' is a changing string
process.stdin.flush() # This code works well for me regardless of the presence of this line
except KeyboardInterrupt:
process.stdin.close()
process.wait()
# Receiver
import sys
for line in sys.stdin:
print line.strip()
With "works well" here i mean that i get "This is my message" printed as fast as the computer can perform. I'm trying this in Python 2.7.12 for the record.
The story of how buffering works for sys.stdin and sys.stdout has made me cry more than once. A similar problem is discussed in Setting smaller buffer size for sys.stdin?.
As to your specific problem, I suggest you change your child to use sys.stdin.readline() instead of iterating over sys.stdin. The former somewhat "buffers less" :)
while True:
line = sys.stdin.readline()
if not line: break
print (line.strip())
In the parent, you'll likely either need to set bufsize=0 in your call to Popen (making your pipe completely unbuffered), or you'll need the process.stdin.flush() line, as Patrik suggests. I'd opt for the latter.
Tested on Python 2.7.14 on Windows 10 64bit.

How to get output from python2 subprocess which run a script using multiprocessing?

Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

How to read the first line of a subprocess without buffers filling up in Python

From Python in Linux, I want to start a sub-process, wait until it prints one line on it's standard out, then continue with the rest of my Python script. If I do:
from subprocess import *
proc = Popen(my_process, stdout=PIPE)
proc.readline()
# Now continue with the rest of my script
Will my process eventually block if it writes a lot to its stdout, because the pipe fills up?
Ideally, I'd like the rest of the output to go to the standard output of my script. Is there a way to change the stdout of the subprocess from PIPE to my standard output after it starts?
I'm guessing I'll have to spawn a separate thread just to read from my process's stdout and print to my own, but I'd like to avoid that if there's a simpler solution.
Stop the process?
proc.terminate()
After the readline
The readline method should not block if the line is particularly large, this is pulling data directly out of the pipe buffer and into userspace. If the data was remaining in the pipe buffer, there's a good chance it would block the spawned process but I'm pretty sure Python must take the data out of the pipe buffer before it can examine it for the end-of-line.
Or you could just read characters off the pipe directly, this would prevent any possible buffer issues:
from subprocess import *
proc = Popen(my_process, stdout=PIPE)
c = ' '
while c != '\n':
c = proc.stdout.read(1)
# Now complete the rest of the program....

How to print stdout before writing stdin using subprocess module in Python

I am writing a script in which in the external system command may sometimes require user input. I am not able to handle that properly. I have tried using os.popen4 and subprocess module but could not achieve the desired behavior.
Below mentioned example would show this problem using "cp" command. ("cp" command is used to show this problem, i am calling some different exe which may similarly prompt for user response in some scenarios). In this example there are two files present on disk and when user tries to copy file1 to file2, an conformer message comes up.
proc = subprocess.Popen("cp -i a.txt b.txt", shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,)
stdout_val, stderr_val = proc.communicate()
print stdout_val
b.txt?
proc.communicate("y")
Now in this example if i read only stdout/stderr and prints it, later on if i try to write "y" or "n" based on user's input, i got an error that channel is closed.
Can some one please help me on achieving this behavior in python such that i can print stdout first, then should take user input and write stdin later on.
I found another solution (Threading) from Non-blocking read on a subprocess.PIPE in python , not sure whether it would help. But it appears it is printing question from cp command, i have modified code but not sure on how to write in threading code.
import sys
from subprocess import PIPE, Popen
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p = Popen(['cp', '-i', 'a.txt', 'b.txt'],stdin=PIPE, stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.start()
try:
line = q.get_nowait()
except Empty:
print('no output yet')
else:
pass
Popen.communicate will run the subprocess to completion, so you can't call it more than once. You could use the stdin and stdout attributes directly, although that's risky as you could deadlock if the process uses block buffering or the buffers fill up:
stdout_val = proc.stdout.readline()
print stdout_val
proc.stdin.write('y\n')
As there is a risk of deadlock and because this may not work if the process uses block buffering, you would do well to consider using the pexpect package instead.
I don't have a technical answer to this question. More of just a solution. It has something to do with the way the process waits for the input, and once you communicate with the process, a None input is enough to close the process.
For your cp example, what you can do is check the return code immediately with proc.poll(). If the return value is None, you might assume it is trying to wait for input and can ask your user a question. You can then pass the response to the process via proc.communicate(response). It will then pass the value and proceed with the process.
Maybe someone else can chime in with a more technical reason why an initial communicate with a None value closes the process.

Categories

Resources