Can someone explain this Python code for me? - python

This code create pty (pseudo-terminals) in Python. I have commented the parts that I do not understand
import os,select
pid, master_fd =os.forkpty() #I guess this function return the next available pid and fd
args=['/bin/bash']
if pid == 0:#I have no I idea what this if statement does, however I have noticed that it get executed twice
os.execlp('/bin/bash',*args)
while 1:
r,w,e=select.select([master_fd,0], [], [])
for i in r:
if i==master_fd:
data=os.read(master_fd, 1024)
"""Why I cannot do something like
f=open('/dev/pts/'+master_fd,'r')
data=f.read()"""
os.write(1, data) # What does 1 mean???
elif i==0:
data = os.read(0, 1024)
while data!='':
n = os.write(master_fd, data)
data = data[n:]

In Unix-like operating systems, the way to start a new process is a fork. That is accomplished with fork() or its several cousins. What this does is it duplicates the calling process, in effect having two exactly the same programs.
The only difference is the return value from fork(). The parent process gets the PID of the child, and the child gets 0. What usually happens is that you have an if statement like the one that you're asking about.
If the returned PID is 0 then you're "in the child". In this case the child is supposed to be a shell, so bash is executed.
Else, you're "in the parent". In this case the parent makes sure that the child's open file descriptors (stdin, stdout, stderr and any open files) do what they're supposed to.
If you ever take an OS class or just try to write your own shell you'll be following this pattern a lot.
As for your other question, what does the 1 mean in os.write(1, data)?
The file descriptors are integer offsets into an array inside the kernel:
0 is stdin
1 is stdout
2 is stderr
i.e. that line just writes to stdout.
When you want to set up pipes or redirections then you just change the meaning of those three file descriptors (look up dup2()).

Related

How to get output from python2 subprocess which run a script using multiprocessing?

Here is my demo code. It contains two scripts.
The first is main.py, it will call print_line.py with subprocess module.
The second is print_line.py, it prints something to the stdout.
main.py
import subprocess
p = subprocess.Popen('python2 print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
while True:
line = p.stdout.readline()
if line:
print(line)
else:
break
print_line.py
from multiprocessing import Process, JoinableQueue, current_process
if __name__ == '__main__':
task_q = JoinableQueue()
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
task_q.task_done()
for _ in range(10):
p = Process(target=do_task)
p.daemon = True
p.start()
for i in range(100):
task_q.put(i)
task_q.join()
Before, print_line.py is written with threading and Queue module, everything is fine. But now, after changing to multiprocessing module, the main.py cannot get any output from print_line. I tried to use Popen.communicate() to get the output or set preexec_fn=os.setsid inPopen(). Neither of them work.
So, here is my question:
Why subprocess cannot get the output with multiprocessing? why it is ok with threading?
If I comment out stdout=subprocess.PIPE and stderr=subprocess.PIPE, the output is printed in my console. Why? How does this happen?
Is there any chance to get the output from print_line.py?
Curious.
In theory this should work as it is, but it does not. The reason being somewhere in the deep, murky waters of buffered IO. It seems that the output of a subprocess of a subprocess can get lost if not flushed.
You have two workarounds:
One is to use flush() in your print_line.py:
def do_task():
while True:
task = task_q.get()
pid = current_process().pid
print 'pid: {}, task: {}'.format(pid, task)
sys.stdout.flush()
task_q.task_done()
This will fix the issue as you will flush your stdout as soon as you have written something to it.
Another option is to use -u flag to Python in your main.py:
p = subprocess.Popen('python2 -u print_line.py',
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
close_fds=True,
shell=True,
universal_newlines=True)
-u will force stdin and stdout to be completely unbuffered in print_line.py, and children of print_line.py will then inherit this behaviour.
These are workarounds to the problem. If you are interested in the theory why this happens, it definitely has something to do with unflushed stdout being lost if subprocess terminates, but I am not the expert in this.
It's not a multiprocessing issue, but it is a subprocess issue—or more precisely, it has to to with standard I/O and buffering, as in Hannu's answer. The trick is that by default, the output of any process, whether in Python or not, is line buffered if the output device is a "terminal device" as determined by os.isatty(stream.fileno()):
>>> import sys
>>> sys.stdout.fileno()
1
>>> import os
>>> os.isatty(1)
True
There is a shortcut available to you once the stream is open:
>>> sys.stdout.isatty()
True
but the os.isatty() operation is the more fundamental one. That is, internally, Python inspects the file descriptor first using os.isatty(fd), then chooses the stream's buffering based on the result (and/or arguments and/or the function used to open the stream). The sys.stdout stream is opened early on during Python's startup, before you generally have much control.1
When you call open or codecs.open or otherwise do your own operation to open a file, you can specify the buffering via one of the optional arguments. The default for open is the system default, which is line buffering if isatty(), otherwise fully buffered. Curiously, the default for codecs.open is line buffered.
A line buffered stream gets an automatic flush() applied when you write a newline to it.
An unbuffered stream writes each byte to its output immediately. This is very inefficient in general. A fully buffered stream writes its output when the buffer gets sufficiently full—the definition of "sufficient" here tends to be pretty variable, anything from 1024 (1k) to 1048576 (1 MB)—or when explicitly directed.
When you run something as a process, it's the process itself that decides how to do any buffering. Your own Python code, reading from the process, cannot control it. But if you know something—or a lot—about the processes that you will run, you can set up their environment so that they run line-buffered, or even unbuffered. (Or, as in your case, since you write that code, you can write it to do what you want.)
1There are hooks that fire up very early, where you can fuss with this sort of thing. They are tricky to work though.

How can I write to a child process stdin obtained with fork() in python?

Well, I need to write to stdin of a child process obtained with fork(). I also need to keep the file descriptor(?) of stdin to the parent process for repeating writes to child. I use os.pipe() to get the descriptors, so please keep it this way.
pid = fork()
if pid == 0:
os.write(sys.stdin.fileno(), "sample") # <-- isn't this the child's stdin?
os.execv(..)
.
.
the child process is a bash script, sth like:
#!/bin/bash
/usr/bin/mplayer -slave "$1" <&0
Apparently, I want to control mplayer with python using its slave mode that receives commands from its stdin
fork() is essential because of the structure of the program, so please no alternatives using communicate, etc
What you're trying to do is impossible, because it doesn't make sense. The child has the same stdin as the parent, not a new one that you can write to. POSIX guarantees that after fork:
The child process shall have its own copy of the parent's file descriptors. Each of the child's file descriptors shall refer to the same open file description with the corresponding file descriptor of the parent.
Meanwhile, you're trying to write to the child's stdin from within the child. That makes even less sense. What do you expect writing to your own stdin to do?
Of course the child can write to its own stdout, which will be the same as the parent's stdout. But I suspect that's not what you want. What you want is for the parent to write to the child's stdin.
If so, you have to create a pipe before forking, then replace the child's stdin with the read side of that pipe (usually by using dup2), then write to the write side of that pipe. The CPython subprocess implementation is great example code for how to do that without relying on higher-level functions (even though it's in C, rather than in Python).
Something like this:
pr, pw = pipe()
pid = fork()
if pid == 0:
os.close(pw)
sys.stdin.close()
os.dup2(pr, 0)
os.execv(...)
else:
os.close(pr)
os.write(pw, "sample")

Popen does not give output immediately when available

I am trying to read from both stdout and stderr from a Popen and print them out. The command I am running with Popen is the following
#!/bin/bash
i=10
while (( i > 0 )); do
sleep 1s
echo heyo-$i
i="$((i-1))"
done
echo 'to error' >&2
When I run this in the shell, I get one line of output and then a second break and then one line again, etc. However, I am unable to recreate this using python. I am starting two threads, one each to read from stdout and stderr, put the lines read into a Queue and another thread that takes items from this queue and prints them out. But with this, I see that all the output gets printed out at once, after the subprocess ends. I want the lines to be printed as and when they are echo'ed.
Here's my python code:
# The `randoms` is in the $PATH
proc = sp.Popen(['randoms'], stdout=sp.PIPE, stderr=sp.PIPE, bufsize=0)
q = Queue()
def stream_watcher(stream, name=None):
"""Take lines from the stream and put them in the q"""
for line in stream:
q.put((name, line))
if not stream.closed:
stream.close()
Thread(target=stream_watcher, args=(proc.stdout, 'out')).start()
Thread(target=stream_watcher, args=(proc.stderr, 'err')).start()
def displayer():
"""Take lines from the q and add them to the display"""
while True:
try:
name, line = q.get(True, 1)
except Empty:
if proc.poll() is not None:
break
else:
# Print line with the trailing newline character
print(name.upper(), '->', line[:-1])
q.task_done()
print('-*- FINISHED -*-')
Thread(target=displayer).start()
Any ideas? What am I missing here?
Only stderr is unbuffered, not stdout. What you want cannot be done using the shell built-ins alone. The buffering behavior is defined in the stdio(3) C library, which applies line buffering only when the output is to a terminal. When the output is to a pipe, it is pipe-buffered, not line-buffered, and so the data is not transferred to the kernel and thence to the other end of the pipe until the pipe buffer fills.
Moreover, the shell has no access to libc’s buffer-controlling functions, such as setbuf(3) and friends. The only possible solution within the shell is to launch your co-process on a pseudo-tty, and pty management is a complex topic. It is much easier to rewrite the equivalent shell script in a language that does grant access to low-level buffering features for output streams than to arrange to run something over a pty.
However, if you call /bin/echo instead of the shell built-in echo, you may find it more to your liking. This works because now the whole line is flushed when the newly launched /bin/echo process terminates each time. This is hardly an efficient use of system resources, but may be an efficient use of your own.
IIRC, setting shell=True on Popen should do it.

Python `tee` stdout of child process

Is there a way in Python to do the equivalent of the UNIX command line tee? I'm doing a typical fork/exec pattern, and I'd like the stdout from the child to appear in both a log file and on the stdout of the parent simultaneously without requiring any buffering.
In this python code for instance, the stdout of the child ends up in the log file, but not in the stdout of the parent.
pid = os.fork()
logFile = open(path,"w")
if pid == 0:
os.dup2(logFile.fileno(),1)
os.execv(cmd)
edit: I do not wish to use the subprocess module. I'm doing some complicated stuff with the child process that requires me call fork manually.
Here you have a working solution without using the subprocess module. Although, you could use it for the tee process while still using the exec* functions suite for your custom subprocess (just use stdin=subprocess.PIPE and then duplicate the descriptor to your stdout).
import os, time, sys
pr, pw = os.pipe()
pid = os.fork()
if pid == 0:
os.close(pw)
os.dup2(pr, sys.stdin.fileno())
os.close(pr)
os.execv('/usr/bin/tee', ['tee', 'log.txt'])
else:
os.close(pr)
os.dup2(pw, sys.stdout.fileno())
os.close(pw)
pid2 = os.fork()
if pid2 == 0:
# Replace with your custom process call
os.execv('/usr/bin/yes', ['yes'])
else:
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
pass
Note that the tee command, internally, does the same thing as Ben suggested in his answer: reading input and looping over output file descriptors while writing to them. It may be more efficient because of the optimized implementation and because it's written in C, but you have the overhead of the different pipes (don't know for sure which solution is more efficient, but in my opinion, reassigning a custom file-like object to stdout is a more elegant solution).
Some more resources:
How do I duplicate sys.stdout to a log file in python?
http://www.shallowsky.com/blog/programming/python-tee.html
In the following, SOMEPATH is the path to the child executable, in a format suitable for subprocess.Popen (see its docs).
import sys, subprocess
f = open('logfile.txt', 'w')
proc = subprocess.Popen(SOMEPATH, stdout=subprocess.PIPE)
while True:
out = proc.stdout.read(1)
if out == '' and proc.poll() != None:
break
if out != '':
# CR workaround since chars are read one by one, and Windows interprets
# both CR and LF as end of lines. Linux only has LF
if out != '\r': f.write(out)
sys.stdout.write(out)
sys.stdout.flush()
Would an approach like this do what you want?
import sys
class Log(object):
def __init__(self, filename, mode, buffering):
self.filename = filename
self.mode = mode
self.handle = open(filename, mode, buffering)
def write(self, thing):
self.handle.write(thing)
sys.stdout.write(thing)
You'd probably need to implement more of the file interface for this to be really useful (and I've left out properly defaulting mode and buffering, if you want it). You could then do all your writes in the child process to an instance of Log. Or, if you wanted to be really magic, and you're sure you implement enough of the file interface that things won't fall over and die, you could potentially assign sys.stdout to be an instance of this class. Then I think any means of writing to stdout, including print, will go via the log class.
Edit to add: Obviously if you assign to sys.stdout you will have to do something else in the write method to echo the output to stdout!! I think you could use sys.__stdout__ for that.
Oh, you. I had a decent answer all prettied-up before I saw the last line of your example: execv(). Well, poop. The original idea was replacing each child process' stdout with an instance of this blog post's tee class, and split the stream into the original stdout, and the log file:
http://www.shallowsky.com/blog/programming/python-tee.html
But, since you're using execv(), the child process' tee instance would just get clobbered, so that won't work.
Unfortunately for you, there is no "out of the box" solution to your problem that I can find. The closest thing would be to spawn the actual tee program in a subprocess; if you wanted to be more cross-platform, you could fork a simple Python substitute.
First thing to know when coding a tee substitute: tee really is a simple program. In all the true C implementations I've seen, it's not much more complicated than this:
while((character = read()) != EOF) {
/* Write to all of the output streams in here, then write to stdout. */
}
Unfortunately, you can't just join two streams together. That would be really useful (so that the input of one stream would automatically be forwarded out of another), but we've no such luxury without coding it ourselves. So, Eli and I are going to have very similar answers. The difference is that, in my answer, the Python 'tee' is going to run in a separate process, via a pipe; that way, the parent thread is still useful!
(Remember: copy the blog post's tee class, too.)
import os, sys
# Open it for writing in binary mode.
logFile=open("bar", "bw")
# Verbose names, but I wanted to get the point across.
# These are file descriptors, i.e. integers.
parentSideOfPipe, childSideOfPipe = os.pipe()
# 'Tee' subprocess.
pid = os.fork()
if pid == 0:
while True:
char = os.read(parentSideOfPipe, 1)
logFile.write(char)
os.write(1, char)
# Actual command
pid = os.fork()
if pid == 0:
os.dup2(childSideOfPipe, 1)
os.execv(cmd)
I'm sorry if that's not what you wanted, but it's the best solution I can find.
Good luck with the rest of your project!
The first obvious answer is to fork an actual tee process but that is probably not ideal.
The tee code (from coreutils) merely reads each line and writes to each file in turn (effectively buffering).

Shed some light on working with pipes and subprocesses in Python?

I'm wrestling with the concepts behind subprocesses and pipes, and working with them in a Python context. If anybody could shed some light on these questions it would really help me out.
Say I have a pipeline set up as follows
createText.py | processText.py | cat
processText.py is receiving data through stdin, but how is this implemented? How does it know that no more data will be coming and that it should exit? My guess is that it could look for an EOF and terminate based on that, but what if createText.py never sends one? Would that be considered an error on createText.py's part?
Say parent.py starts a child subprocess (child.py) and calls wait() to wait for the child to complete. If parent is capturing child's stdout and stderr as pipes, is it still safe to read from them after child has terminated? Or are the pipes (and data in them) destroyed when one end terminates?
The general problem that I want to solve is to write a python script that calls rsync several times with the Popen class. I want my program to wait until rsync has completed, then I want to check the return status to see if it exited correctly. If it didn't, I want to read the child's stderr to see what the error was. Here is what I have so far
# makes the rsync call. Will block until the child
# process is finished. Returns the exit code for rsync
def performRsync(src, dest):
print "Pushing " + src + " to " + dest
child = Popen(['rsync', '-av', src, dest], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
child.wait()
## check for success or failure
## 0 is a successful exit code here
if not child.returncode:
return True
else:#ballz
stout, sterr = child.communicate()
print "ERR pushing " + src + ". " + sterr
return False
Update: I also came across this problem. Consider these two simple files:
# createText.py
for x in range(1000):
print "creating line " + str(x)
time.sleep(1)
# processText.py
while True:
line = sys.stdin.readline()
if not line:
break;
print "I modified " + line
Why does processText.py in this case not start printing as it gets data from stdin? Does a pipe collect some amount of buffered data before it passes it along?
This assumes a UNIXish/POSIXish environment.
EOF in a pipeline is signaled by no more data to read, that is, read() returns a length of 0. This normally occurs when the left-hand process exits and closes its stdout. Since you can't read from a pipe whose other end is closed the read in processText indicates EOF.
If createText were to not exit thus closing its output it would be a non-ending program which in a pipeline is a Bad Thing. Even if not in pipeline, a program that never ends usually incorrect (odd cases like yes(1) excepted).
You can read from a pipe as long as you don't get EOF or an IOError(errno.EPIPE) indication which would also indicate there is nothing left to read.
I've not tested your code, does it do something unexpected?

Categories

Resources