On python 2.7, I used subprocess.call(my_cmd) when running a shell command.
However, I needed to check outputs of those commands, so I replaced them with subprocess.Popen.communicate().
command_line_process = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
process_stdout, process_stderr = command_line_process.communicate()
logging.info(process_stdout)
logging.info(process_stderr)
Is there a difference between the two methods other than the latter one can print outputs?
I wonder whether it'd be okay to replace all subprocess.call() with subprocess.Popen.communicate().
Like the documentation already tells you, you want to avoid Popen whenever you can.
The subprocess.check_output() function in Python 2.7 lets you retrieve the output from the subprocess, but otherwise works basically like check_call() (which in turn differs from call only in that it will raise an exception if the subprocess fails, which is something you usually want and need).
The case for Popen is that it enables you to build new high-level functions like these if you need to (like also then subprocess.run() in Python 3.5+ which is rather more versatile, and basically subsumes the functionality of all the above three). They all use Popen under the hood, but it is finicky and requires you to take care of several details related to managing the subprocess object if you use it directly; those higher-level functions already do those things for you.
Common cases where you do need Popen is if you want the subprocess to run in parallel with your main Python script. There is no simple library function which does this for you.
If you check inside the source code, which you have on your system, you'll see that subprocess.call is implemented by calling subprocess.Popen.wait. Virtually everything in that module is actually implemented in terms of subprocess.Popen.
Related
I stumbled across shlex.quote(). I have read explanations about what shlex.quote() is, but I am wondering when to use it and when not. I see that I should use it when I am using Python as a subshell. For example, using os.system() or subprocess.call() or even using pexpect. Should I use it when I am trying to check for a file using os.path.isfile, or are there any other useful ways to use it properly?
I'm writing a Sublime Text plugin that provides multiple Python shells accessible via UNIX sockets. Each shell should act as a standalone REPL running on its own thread. (It is undesirable for these shells to have their own processes; sharing a single process is an absolute requirement.)
The builtin exec() function prints output to stdout if the code was compiled with mode='single' and is an expression that does not evaluate to None. I need to send this output over the socket instead.
I know that this can be done by patching stdout. However, this would not work here because multiple consoles may be running in multiple threads (plus the built-in console).
My ideas are as follows:
Try to compile() the input with mode='eval', eval() it, and print the result (if not None). If it won't compile, try mode='exec' instead of mode='single'.
For each console's thread, keep the output stream in thread-local storage. Patch sys.stdout with an object that checks for these streams before calling "regular" stdout.
Somehow provide a patched sys to each console.
These don't seem like great ideas. Is there a better one?
If you're dead set on having a single process, then depending on how willing you are to dive into obscure C-level features of the CPython implementation, you might try looking into subinterpreters. Those are, as far as I know, the highest level of isolation CPython provides in a single process, and they allow things like separate sys.stdout objects for separate subinterpreters.
Some Linux commands provide a single option that is equivalent to a given group of options, for convenience. For example, rsync has an option -a which is equivalent to -rlptgoD. For a Python script, is it possible to implement this behaviour using argparse? Or should I just pass the -a option to my code and handle it there?
rsync has been around long enough that it (or many implementations) probably uses getopt for parsing the commands (if it doesn't do its own parsing). Python has a version of getopt. Neither the c version or the python has a mechanism for replacing a -a command with -rlptgoD. Any such replacement is performed after parsing.
The primary purpose of a parser is to decode what the user wants. Acting on that information is the responsibility of your code.
I can imagine writing a custom Action class that would set multiple attributes at once. But it wouldn't save any coding work. It would look a lot like a equivalent function that is used after parsing.
Seems both executes a subprocess and create a pipe to do in/out, just that the subprocess is newer.
My question is, is there any function that subprocess.Popen can do while os.popen cannot, so that we need the new module subprocess?
Why Python language didn't choose to enhance os.popen but created a new module?
Short answer: Never use os.popen, always use subprocess!
As you can see from the Python 2.7 os.popen docs:
Deprecated since version 2.6: This function is obsolete. Use the
subprocess module. Check especially the Replacing Older Functions
with the subprocess
Module section.
There were various limitations and problems with the old os.popen family of functions. And as the docs mention, the pre 2.6 versions weren't even reliable on Windows.
The motivation behind subprocess is explained in PEP 324 -- subprocess - New process module:
Motivation
Starting new processes is a common task in any programming language,
and very common in a high-level language like Python. Good support for
this task is needed, because:
Inappropriate functions for starting processes could mean a
security risk: If the program is started through the shell, and
the arguments contain shell meta characters, the result can be
disastrous. [1]
It makes Python an even better replacement language for
over-complicated shell scripts.
Currently, Python has a large number of different functions for
process creation. This makes it hard for developers to choose.
The subprocess module provides the following enhancements over
previous functions:
One "unified" module provides all functionality from previous
functions.
Cross-process exceptions: Exceptions happening in the child
before the new process has started to execute are re-raised in
the parent. This means that it's easy to handle exec()
failures, for example. With popen2, for example, it's
impossible to detect if the execution failed.
A hook for executing custom code between fork and exec. This
can be used for, for example, changing uid.
No implicit call of /bin/sh. This means that there is no need
for escaping dangerous shell meta characters.
All combinations of file descriptor redirection is possible.
For example, the "python-dialog" [2] needs to spawn a process
and redirect stderr, but not stdout. This is not possible with
current functions, without using temporary files.
With the subprocess module, it's possible to control if all open
file descriptors should be closed before the new program is
executed.
Support for connecting several subprocesses (shell "pipe").
Universal newline support.
A communicate() method, which makes it easy to send stdin data
and read stdout and stderr data, without risking deadlocks.
Most people are aware of the flow control issues involved with
child process communication, but not all have the patience or
skills to write a fully correct and deadlock-free select loop.
This means that many Python applications contain race
conditions. A communicate() method in the standard library
solves this problem.
Please see the PEP link for the Rationale, and further details.
Aside from the safety & reliability issues, IMHO, the old os.popen family was cumbersome and confusing. It was almost impossible to use correctly without closely referring to the docs while you were coding. In comparison, subprocess is a godsend, although it's still wise to refer to the docs while using it. ;)
Occasionally, one sees people recommending the use of os.popen rather than subprocess.Popen in Python 2.7, eg Python subprocess vs os.popen overhead because it's faster. Sure, it's faster, but that's because it doesn't do various things that are vital to guarantee that it's working safely!
FWIW, os.popen itself still exists in Python 3, however it's safely implemented via subprocess.Popen, so you might as well just use subprocess.Popen directly yourself. The other members of the os.popen family no longer exist in Python 3. The os.spawn family of functions still exist in Python 3, but the docs recommend that the more powerful facilities provided by the subprocess module be used instead.
I have a situation where there is a program which is written in c++. It is a kind of a server which you need to start first. Then from another konsole you can call the program passing commandline arguments and it does stuff. Also it provides rpc and rest based access. So you can write a rpc or rest based library to interface with the server.
So my question is, since the program can be managed using mere commandline arguments, isn't it better to use python's subprocess module and build a library (wrapper) around it? Or is there any problem with this method?
Consider another case. Say I wanted to build a GUI around any linux utility like grep which allows user to test regular expressions (like we have on websites). So isn't it easier to communicate to grep using subprocess?
Thanks.
I think I'd prefer to use any of the rpc or rest interfaces, because the results you can obtain from them are usually in a format that is easy to parse since those interfaces have been designed for machine interaction. However, a command line interface is designed for human interaction and this means that the output is easy to parse for the human eye, but not necessarily by another program that receives the output.