Interaction of python with pypy via subprocess - python

I'm writing a pygtk application in Python 2.7.5 that requires some heavy mathematical calculations, so I need to do these calculations in an external pypy (that don't support gtk) for efficiency and plot the results in the main program as they are produced.
Since the ouput of the calculations is potentially infinite and I want to show it as it is produced, I cannot use subprocess.Popen.communicate(input).
I am able to do non-blocking reads of the output (via fcntl), but I am not able to effectively send the input (or anyway something else that I don't see is going wrong). For ex, the following code:
import subprocess
# start pypy subprocess
pypy = subprocess.Popen(['pypy', '-u'], bufsize=0, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# send input to pypy
pypy.stdin.write('import sys\nprint "hello"\nsys.stdout.flush()\n')
pypy.stdin.flush()
# read output from pypy
pypy.stdout.flush()
print pypy.stdout.readline()
Will get stuck on the last line. What is weird to me is that if I substitute 'pypy' with 'cat' it will work, and if I substitute the input-output lines with
print pypy.communicate(input='import sys\nprint "hello"\nsys.stdout.flush()\n')[0]
it will also work (but it does not fit with what I want to do). I thought it was a problem of buffering, but I tried several ways of avoiding it (including writing to stderr and so) with no luck. I also tried sending to pypy the command to print in a while True loop, also with no luck (that makes me think that is not a problem with output buffering but maybe with input buffering).

Related

Force a 3rd-party program to flush its output when called through subprocess

I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.
The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.

What is the point of "stderr" in Python?

I'm a programming newbie and I'm trying to understand how stdin, stdout, and stderr work. As I understand it, stdout and stderr are two different places where we can direct output from programs. I guess I don't understand what's the point of having a second "stream" of output just for errors with stderr? Why not have errors on the regular stdout? What does having errors on stderr allow me to do (basically why is stderr useful)?
There are two "points" to supporting distinct stout and stderr streams:
When you are writing applications that can be chained together (e.g. using pipelines) you don't want the "normal" output to get mixed up with errors, warnings, debug info and other "chit chat". Mixing them in the same stream would make life difficult for the next program in the chain / pipeline.
Example:
$ cat some-file | grep not
$ echo $?
If the cat command did not write its error messages to stderr, then the grep command would see a "file not found" message if "some-file" did not exist. It would then (incorrectly) match on the "not", and set the return code for the pipeline incorrectly. Constructing pipelines that coped with this sort of thing would be hellishly difficult.
Separate stdout and stderr streams have been support in (at least) UNIX and UNIX-like system since ... umm ... the 1970's. And they are part of the POSIX standard. If a new programming language's runtime libraries did not support this, then it would be considered to be crippled; i.e. unsuitable for writing production quality applications.
(In the history of programming languages, Python is still relatively new.)
However, nobody is forcing to write your applications to use stderr for its intended purpose. (Well ... maybe your future co-workers will :-) )
In UNIX (and Linux, and other Posix-compatible systems) programs are often combined with pipes, so that one program takes the output of another one as input. If you would mix normal output and error information, every program would need to know how to treat diagnostic info from its pipe data producer differently from normal data. In practice, that is impossible due to the large number of program combinations.
By writing error information to stderr, each program makes it possible for the user to get this info without needing to filter it out of the data stream intended to be read by the next program in the pipe.

Advantages of subprocess over os.system

I have recently came across a few posts on stack overflow saying that subprocess is much better than os.system, however I am having difficulty finding the exact advantages.
Some examples of things I have run into:
https://docs.python.org/3/library/os.html#os.system
"The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function."
No idea in what ways it is more powerful though, I know it is easier in many ways to use subprocess but is it actually more powerful in some way?
Another example is:
https://stackoverflow.com/a/89243/3339122
The advantage of subprocess vs system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).
This post which has 2600+ votes. Again could not find any elaboration on what was meant by better error handling or real status code.
Top comment on that post is:
Can't see why you'd use os.system even for quick/dirty/one-time. subprocess seems so much better.
Again, I understand it makes some things slightly easier, but I hardly can understand why for example:
subprocess.call("netsh interface set interface \"Wi-Fi\" enable", shell=True)
is any better than
os.system("netsh interface set interface \"Wi-Fi\" enabled")
Can anyone explain some reasons it is so much better?
First of all, you are cutting out the middleman; subprocess.call by default avoids spawning a shell that examines your command, and directly spawns the requested process. This is important because, besides the efficiency side of the matter, you don't have much control over the default shell behavior, and it actually typically works against you regarding escaping.
In particular, do not do this:
subprocess.call('netsh interface set interface "Wi-Fi" enable')
since
If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.
Instead, you'll do:
subprocess.call(["netsh", "interface", "set", "interface", "Wi-Fi", "enable"])
Notice that here all the escaping nightmares are gone. subprocess handles escaping (if the OS wants arguments as a single string - such as Windows) or passes the separated arguments straight to the relevant syscall (execvp on UNIX).
Compare this with having to handle the escaping yourself, especially in a cross-platform way (cmd doesn't escape in the same way as POSIX sh), especially with the shell in the middle messing with your stuff (trust me, you don't want to know what unholy mess is to provide a 100% safe escaping for your command when calling cmd /k).
Also, when using subprocess without the shell in the middle you are sure you are getting correct return codes. If there's a failure launching the process you get a Python exception, if you get a return code it's actually the return code of the launched program. With os.system you have no way to know if the return code you get comes from the launched command (which is generally the default behavior if the shell manages to launch it) or it is some error from the shell (if it didn't manage to launch it).
Besides arguments splitting/escaping and return code, you have way better control over the launched process. Even with subprocess.call (which is the most basic utility function over subprocess functionalities) you can redirect stdin, stdout and stderr, possibly communicating with the launched process. check_call is similar and it avoids the risk of ignoring a failure exit code. check_output covers the common use case of check_call + capturing all the program output into a string variable.
Once you get past call & friends (which is blocking just as os.system), there are way more powerful functionalities - in particular, the Popen object allows you to work with the launched process asynchronously. You can start it, possibly talk with it through the redirected streams, check if it is running from time to time while doing other stuff, waiting for it to complete, sending signals to it and killing it - all stuff that is way besides the mere synchronous "start process with default stdin/stdout/stderr through the shell and wait it to finish" that os.system provides.
So, to sum it up, with subprocess:
even at the most basic level (call & friends), you:
avoid escaping problems by passing a Python list of arguments;
avoid the shell messing with your command line;
either you have an exception or the true exit code of the process you launched; no confusion about program/shell exit code;
have the possibility to capture stdout and in general redirect the standard streams;
when you use Popen:
you aren't restricted to a synchronous interface, but you can actually do other stuff while the subprocess run;
you can control the subprocess (check if it is running, communicate with it, kill it).
Given that subprocess does way more than os.system can do - and in a safer, more flexible (if you need it) way - there's just no reason to use system instead.
There are many reasons, but the main reason is mentioned directly in the docstring:
>>> os.system.__doc__
'Execute the command in a subshell.'
For almost all cases where you need a subprocess, it is undesirable to spawn a subshell. This is unnecessary and wasteful, it adds an extra layer of complexity, and introduces several new vulnerabilities and failure modes. Using subprocess module cuts out the middleman.

Python: why is this print automatically flushing to screen without flush()?

I was reading about std.flush() in python. And I found this example a lot.
import sys,time
for i in range(10):
print i,
#sys.stdout.flush()
time.sleep(1)
It is often said that it makes a difference with/without the "sys.stdout.flush()".
However, when I called this script from command prompt, it didn't make a difference in my case. Both printed numbers to the screen in real time.
I used python 2.7.5 in windows.
Why is that happening?
p.s. In another example which printed the output through subprocess.PIPE instead of to the screen directly, I did observe a difference of the buffering.
What am I missing?
Using flush will generally guarantee that flushing is done but assuming the reverse relationship is a logical fallacy, akin to:
Dogs are animals.
This is an animal.
Therefore this is a dog.
In other words, not using flush does not guarantee flushing will not happen.
Interestingly enough, using Python 2.7.8 under Cygwin in Win81, I see the opposite behaviour - everything is batched up until the end. It may be different with Windows-native Python, it may also be different from within IDLE.
See stdio buffering. In brief:
Default Buffering modes:
stdin is always buffered
stderr is always unbuffered
if stdout is a terminal then buffering is automatically set to line buffered, else it is set to buffered
For me, the example you gave prints:
In cmd:
all the numbers upon exit in Cygwin's python
one by one in Win32 python
In mintty:
both all upon exit
both one by one with -u option
sys.stdout.isatty() returns False!
So, it looks like msvcrt's stdout is unbuffered when it points to a terminal. A test with a simple C program shows the same behaviour.

Python: intercommunication between shells

A Python program is running on a shell, how to communicate with it using another Python shell?
Let say we have started running a very long simulation which is going well on a shell. We realised a need to capture the current values, say, in a numpy array.
How to pause the simulation, capture desired values and resume using another Python shell?
Here is a package to work with subprocesses, capture an output: http://pymotw.com/2/subprocess/
For instance to capture an output, you can check_output
import subprocess
output = subprocess.check_output(['ls', '-1'])
print 'Have %d bytes in output' % len(output)
print output
Your best bet is to use IPython. It runs code in one or more kernels, which can then be communicated with by different subprocesses. All of the values in the running kernel can be shared, allowing you to do exactly what you're looking to do. Sites like Wakari offer free IPython Notebook instances so you can experiment.

Categories

Resources