When should I use subprocess.Popen instead of os.popen?

When should I use subprocess.Popen instead of os.popen? - python

Seems both executes a subprocess and create a pipe to do in/out, just that the subprocess is newer.
My question is, is there any function that subprocess.Popen can do while os.popen cannot, so that we need the new module subprocess?
Why Python language didn't choose to enhance os.popen but created a new module?

Short answer: Never use os.popen, always use subprocess!
As you can see from the Python 2.7 os.popen docs:
Deprecated since version 2.6: This function is obsolete. Use the
subprocess module. Check especially the Replacing Older Functions
with the subprocess
Module section.
There were various limitations and problems with the old os.popen family of functions. And as the docs mention, the pre 2.6 versions weren't even reliable on Windows.
The motivation behind subprocess is explained in PEP 324 -- subprocess - New process module:
Motivation
Starting new processes is a common task in any programming language,
and very common in a high-level language like Python. Good support for
this task is needed, because:
Inappropriate functions for starting processes could mean a
security risk: If the program is started through the shell, and
the arguments contain shell meta characters, the result can be
disastrous. [1]
It makes Python an even better replacement language for
over-complicated shell scripts.
Currently, Python has a large number of different functions for
process creation. This makes it hard for developers to choose.
The subprocess module provides the following enhancements over
previous functions:
One "unified" module provides all functionality from previous
functions.
Cross-process exceptions: Exceptions happening in the child
before the new process has started to execute are re-raised in
the parent. This means that it's easy to handle exec()
failures, for example. With popen2, for example, it's
impossible to detect if the execution failed.
A hook for executing custom code between fork and exec. This
can be used for, for example, changing uid.
No implicit call of /bin/sh. This means that there is no need
for escaping dangerous shell meta characters.
All combinations of file descriptor redirection is possible.
For example, the "python-dialog" [2] needs to spawn a process
and redirect stderr, but not stdout. This is not possible with
current functions, without using temporary files.
With the subprocess module, it's possible to control if all open
file descriptors should be closed before the new program is
executed.
Support for connecting several subprocesses (shell "pipe").
Universal newline support.
A communicate() method, which makes it easy to send stdin data
and read stdout and stderr data, without risking deadlocks.
Most people are aware of the flow control issues involved with
child process communication, but not all have the patience or
skills to write a fully correct and deadlock-free select loop.
This means that many Python applications contain race
conditions. A communicate() method in the standard library
solves this problem.
Please see the PEP link for the Rationale, and further details.
Aside from the safety & reliability issues, IMHO, the old os.popen family was cumbersome and confusing. It was almost impossible to use correctly without closely referring to the docs while you were coding. In comparison, subprocess is a godsend, although it's still wise to refer to the docs while using it. ;)
Occasionally, one sees people recommending the use of os.popen rather than subprocess.Popen in Python 2.7, eg Python subprocess vs os.popen overhead because it's faster. Sure, it's faster, but that's because it doesn't do various things that are vital to guarantee that it's working safely!
FWIW, os.popen itself still exists in Python 3, however it's safely implemented via subprocess.Popen, so you might as well just use subprocess.Popen directly yourself. The other members of the os.popen family no longer exist in Python 3. The os.spawn family of functions still exist in Python 3, but the docs recommend that the more powerful facilities provided by the subprocess module be used instead.

Related

Is there a difference between subprocess.call() and subprocess.Popen.communicate()?

On python 2.7, I used subprocess.call(my_cmd) when running a shell command.
However, I needed to check outputs of those commands, so I replaced them with subprocess.Popen.communicate().
command_line_process = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
process_stdout, process_stderr = command_line_process.communicate()
logging.info(process_stdout)
logging.info(process_stderr)
Is there a difference between the two methods other than the latter one can print outputs?
I wonder whether it'd be okay to replace all subprocess.call() with subprocess.Popen.communicate().

Like the documentation already tells you, you want to avoid Popen whenever you can.
The subprocess.check_output() function in Python 2.7 lets you retrieve the output from the subprocess, but otherwise works basically like check_call() (which in turn differs from call only in that it will raise an exception if the subprocess fails, which is something you usually want and need).
The case for Popen is that it enables you to build new high-level functions like these if you need to (like also then subprocess.run() in Python 3.5+ which is rather more versatile, and basically subsumes the functionality of all the above three). They all use Popen under the hood, but it is finicky and requires you to take care of several details related to managing the subprocess object if you use it directly; those higher-level functions already do those things for you.
Common cases where you do need Popen is if you want the subprocess to run in parallel with your main Python script. There is no simple library function which does this for you.

If you check inside the source code, which you have on your system, you'll see that subprocess.call is implemented by calling subprocess.Popen.wait. Virtually everything in that module is actually implemented in terms of subprocess.Popen.

Send `exec()` output to another stream without redirecting stdout

I'm writing a Sublime Text plugin that provides multiple Python shells accessible via UNIX sockets. Each shell should act as a standalone REPL running on its own thread. (It is undesirable for these shells to have their own processes; sharing a single process is an absolute requirement.)
The builtin exec() function prints output to stdout if the code was compiled with mode='single' and is an expression that does not evaluate to None. I need to send this output over the socket instead.
I know that this can be done by patching stdout. However, this would not work here because multiple consoles may be running in multiple threads (plus the built-in console).
My ideas are as follows:
Try to compile() the input with mode='eval', eval() it, and print the result (if not None). If it won't compile, try mode='exec' instead of mode='single'.
For each console's thread, keep the output stream in thread-local storage. Patch sys.stdout with an object that checks for these streams before calling "regular" stdout.
Somehow provide a patched sys to each console.
These don't seem like great ideas. Is there a better one?

If you're dead set on having a single process, then depending on how willing you are to dive into obscure C-level features of the CPython implementation, you might try looking into subinterpreters. Those are, as far as I know, the highest level of isolation CPython provides in a single process, and they allow things like separate sys.stdout objects for separate subinterpreters.

Python & C/C++ multithreading : run several threads executing python in the background of C

I have a really specific need :
I want to create a python console with a Qt Widget, and to be able to have several independent interpreters.
Now let me try to explain where are my problems and all tries I did, in order of the ones I'd most like to make working to those I can use by default
The first point is that all functions in the Python C API (PyRun[...], PyEval[...] ...) need the GIL locked, that forbid any concurrent code interpretations from C ( or I'd be really glad to be wrong !!! :D )
Therefore, I tried another approach than the "usual way" : I made a loop in python that call read() on my special file and eval the result. This function (implemented as a built extension) blocks until there is data to read. (Actually, it's currently a while in C code rather than a pthread based condition)
Then, with PyRun_simpleString(), I launch my loop in another thread. This is where the problem is : my read function, in addition to block the current thread (that is totally normal), it blocks the whole interpreter, and PyRun_simpleString() doesn't return...
Finally I've this last idea which risks to be relatively slow : To have a dedicated thread in C++ which run the interpreter, and do every thing in python to manage input/output. This could be a loop which creates the jobs when there is a console needing to execute a command. Seems not to be really hard to do, but I prefer ask you : is there a way to make the above possibilities to work, or is there another way I didn't think about or is my last idea the best ?

One alternative is to just re-use code from IPython and its Qt Console. This assumes by independent interpreters you imply they won't share memory. IPythons run the Python interpreter in multiple processes and communicates with them over TCP or Unix domain sockets with the help of ZeroMQ.
Also, from your question I'm not sure if you're aware of the common blocking I/O idiom in Python C extensions:
Py_BEGIN_ALLOW_THREADS
... Do some blocking I/O operation ...
Py_END_ALLOW_THREADS
This releases the GIL so that other threads can execute Python code while your function is blocking. See Python/C API Reference Manual: Thread State and the Global Interpreter Lock.

If your main requirement is to have several interpreters independent from each other, you'd probably better suited doing fork() and exec() than doing multithreading.
This way each of the interpreters would live in it's own address space not disturbing one of the others.

C++ to python communication. Multiple io streams?

A python program opens a new process of the C++ program and is reading the processes stdout.
No problem so far.
But is it possible to have multiple streams like this for communication? I can get two if I misuse stderr too, but not more. Easy way to hack this would be using temporary files. Is there something more elegant that does not need a detour to the filesystem?
PS: *nix specific solutions are welcome too

On unix systems; the usual way to open a subprocess is with fork(), which will leave any open file descriptors (small integers representing open files or sockets) available in both the child, and the parent, and then exec(), which also allows the new executable to use the file descriptors that were open in the old process. This functionality is preserved in the subprocess.Popen() call (adjustable with the close_fds argument). Thus, what you probably want to do is use os.pipe() to create pairs of sockets to communicate on, then use Popen() to launch the other process, with arguments for each of fd's returned by the previous call to pipe() to tell it which fd's it should use.

Sounds like what you want are to use sockets for communication. Both languages let open raw sockets but you might want to check out the zeromq project as well which has some addition advantages for message passing. Check out their hello world in c++ and python.

assuming windows machine.
you could try using the clipboard for exchanging information between python processes and C++.
assign some unique process id followed by your information and write it to clipboard on python side.....now just parse the string on C++ side.
its akin to using temporary files but all done in memory..... but the drawback being you cannot use clipboard for any other application.
hope it helps

With traditional, synchronous programming and the standard Python library, what you're asking is difficult to accomplish. If, instead, you consider using an asynchronous programming model and the Twisted library, it's a piece of cake. The Using Processes HOWTO describes how to easily communicate with as many processes as you like. Admittedly, there's a bit of a learning curve to Twisted but it's well worth the effort.

Script from stdin use case

Taking e.g. Python as a good example of a modern scripting language, it has the option of reading a program (as opposed to input data for the program) from stdin. The REPL is the obvious use case where stdin is a terminal, but it's also designed to handle the scenario where it's not a terminal.
What use cases are there for reading the program itself from noninteractive stdin?
(The reason I ask is that I'm working on a scripting language myself, and wondering whether this is an important feature to provide, and if so, what the specifics need to look like.)

If you want to execute code generated by some tool it could be useful to be able to pipe the generated into your interpreter/compiler..
Simply support it ;) Checking if stdin is a tty or not is not hard anyway.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.