What is subprocess doing with a list and shell=True

What is subprocess doing with a list and shell=True - python

I like shell=False to avoid the various problems with shell=True. Sometimes I see code like this, and it returns True, but it doesn't seem to have printed anything, and I don't know what did happen.
subprocess.run(['echo', 'hi'], shell=True, check=True).returncode == 0
By contrast,
subprocess.run(['echo', 'hi'], shell=False, check=True).returncode == 0
actually prints to the stdout.
What happens when I pass a list as arguments and shell=True?

From the documentation:
On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself.
So your command is equivalent to sh -c "echo" "hi", which simply executes echo without arguments.
A more useful example would be to use this mechanism to pass arbitrary data safely to a shell snippet:
file1='my file.txt'
file2='Author&Title - [FOO] **proper**.mp3'
subprocess.run(
['for f; do printf "Input: %s\n" "$f"; done', '_',
file1, file2],
shell=True, check=True)
This prints out the variables from a shell without having to worry about escaping shell metacharacters. (the extra '_' becomes $0).

Related

Why does shell=True eat up the arguments, and what is the fix?

I am using subprocess.Popen as opposed to os.fork(), and tryin the "shell=True" construct. However, the arguments to be passed to the child process are getting deleted. Any idea why, and what could be the fix?
The files attached are "/tmp/a.py" and "/tmp/a.pl". If I run a.py without any argument, I get the expected results. With an argument, an error message.
#!/opt/local/bin/python3.6
import sys, subprocess
class child_proc:
def __init__ (self, useShell):
print ("useShell:", useShell)
acmd = ["/tmp/a.pl", "Hello", "World"]
if useShell:
proc = subprocess.Popen(acmd, stdout=subprocess.PIPE, shell=True)
else:
proc = subprocess.Popen(acmd, stdout=subprocess.PIPE)
while True:
line = proc.stdout.readline()
if line:
try:
aline = line.decode().rstrip()
print (aline)
except UnicodeDecodeError as e:
print("Error decoding child output:", line, e)
break
else:
break
child_proc(len(sys.argv) > 1)
The script it is calling -
#!/opt/local/bin/perl -w
die "[a.pl] Missing arguments\n" if $#ARGV < 0;
print "[a.pl] #ARGV\n";
exit 0;
This is on MacOS10.13.1. Thank you for your insight.

From the docs:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a
string, the string specifies the command to execute through the shell. ... If
args is a sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell itself.
So if shell=True, you're supposed to pass a string instead of a list of arguments, which will be interpreted literally by the shell. If you pass a sequence as you're doing now, it treats the first element as the entire command and the rest as arguments to the shell. So if you need shell=True, pass a string.
proc = subprocess.Popen("./tmp/a.pl Hello World", stdout=subprocess.PIPE, shell=True)
And be careful not to pass any user input to the shell when you do it this way. As the docs warn, untrusted input to Popen with a shell=True argument is a major security hazard.

When you use a list as the command container, with shell=True, the first element would be treated as the command to run, and then rest would be passed as arguments to the shell itself.
So, if you want to use shell=True to run the command is a shell, make it a string:
acmd = "/tmp/a.pl Hello World"
Note: Be careful about running something directly in the shell in unescaped form.
Or, better, you should drop the shell=True and make subprocess do the fork()-exec() as there is nothing shell specific in your code.
For completeness, if you insist on keeping your current structure, you should do:
if useShell:
proc = subprocess.Popen("/tmp/a.pl Hello World", stdout=subprocess.PIPE, shell=True)
else:
proc = subprocess.Popen(["/tmp/a.pl", "Hello", "World"], stdout=subprocess.PIPE)
Or you can set the acmd at the start based on useShell's value, it gets simpler as you can drop the if...else construct too by setting shell as useShell inside subprocess.Popen:
acmd = "/tmp/a.pl Hello World" if useShell else ["/tmp/a.pl", "Hello", "World"]
proc = subprocess.Popen(acmd, stdout=subprocess.PIPE, shell=useShell)
As a side note, you should use snake_case for variable/function names, and CamelCase for class names.

How does subprocess.call() work with shell=False?

I am using Python's subprocess module to call some Linux command line functions. The documentation explains the shell=True argument as
If shell is True, the specified command will be executed through the shell
There are two examples, which seem the same to me from a descriptive viewpoint (i.e. both of them call some command-line command), but one of them uses shell=True and the other does not
>>> subprocess.call(["ls", "-l"])
0
>>> subprocess.call("exit 1", shell=True)
1
My question is:
What does running the command with shell=False do, in contrast to shell=True?
I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell. In other words, how can it possibly not execute the argument through the shell?
It would also be helpful to get some examples of:
Things that can be done with shell=True that can't be done with
shell=False and why they can't be done.
Vice versa (although it seems that there are no such examples)
Things for which it does not matter whether shell=True or False and why it doesn't matter

UNIX programs start each other with the following three calls, or derivatives/equivalents thereto:
fork() - Create a new copy of yourself.
exec() - Replace yourself with a different program (do this if you're the copy!).
wait() - Wait for another process to finish (optional, if not running in background).
Thus, with shell=False, you do just that (as Python-syntax pseudocode below -- exclude the wait() if not a blocking invocation such as subprocess.call()):
pid = fork()
if pid == 0: # we're the child process, not the parent
execlp("ls", "ls", "-l", NUL);
else:
retval = wait(pid) # we're the parent; wait for the child to exit & get its exit status
whereas with shell=True, you do this:
pid = fork()
if pid == 0:
execlp("sh", "sh", "-c", "ls -l", NUL);
else:
retval = wait(pid)
Note that with shell=False, the command we executed was ls, whereas with shell=True, the command we executed was sh.
That is to say:
subprocess.Popen(foo, shell=True)
is exactly the same as:
subprocess.Popen(
["sh", "-c"] + ([foo] if isinstance(foo, basestring) else foo),
shell=False)
That is to say, you execute a copy of /bin/sh, and direct that copy of /bin/sh to parse the string into an argument list and execute ls -l itself.
So, why would you use shell=True?
You're invoking a shell builtin.
For instance, the exit command is actually part of the shell itself, rather than an external command. That said, this is a fairly small set of commands, and it's rare for them to be useful in the context of a shell instance that only exists for the duration of a single subprocess.call() invocation.
You have some code with shell constructs (ie. redirections) that would be difficult to emulate without it.
If, for instance, your command is cat one two >three, the syntax >three is a redirection: It's not an argument to cat, but an instruction to the shell to set stdout=open('three', 'w') when running the command ['cat', 'one', 'two']. If you don't want to deal with redirections and pipelines yourself, you need a shell to do it.
A slightly trickier case is cat foo bar | baz. To do that without a shell, you need to start both sides of the pipeline yourself: p1 = Popen(['cat', 'foo', 'bar'], stdout=PIPE), p2=Popen(['baz'], stdin=p1.stdout).
You don't give a damn about security bugs.
...okay, that's a little bit too strong, but not by much. Using shell=True is dangerous. You can't do this: Popen('cat -- %s' % (filename,), shell=True) without a shell injection vulnerability: If your code were ever invoked with a filename containing $(rm -rf ~), you'd have a very bad day. On the other hand, ['cat', '--', filename] is safe with all possible filenames: The filename is purely data, not parsed as source code by a shell or anything else.
It is possible to write safe scripts in shell, but you need to be careful about it. Consider the following:
filenames = ['file1', 'file2'] # these can be user-provided
subprocess.Popen(['cat -- "$#" | baz', '_'] + filenames, shell=True)
That code is safe (well -- as safe as letting a user read any file they want ever is), because it's passing your filenames out-of-band from your script code -- but it's safe only because the string being passed to the shell is fixed and hardcoded, and the parameterized content is external variables (the filenames list). And even then, it's "safe" only to a point -- a bug like Shellshock that triggers on shell initialization would impact it as much as anything else.

I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell.
No, subprocess is perfectly capable of starting a program directly (via an operating system call). It does not need a shell
Things that can be done with shell=True that can't be done with shell=False
You can use shell=False for any command that simply runs some executable optionally with some specified arguments.
You must use shell=True if your command uses shell features. This includes pipelines, |, or redirections or that contains compound statements combined with ; or && or || etc.
Thus, one can use shell=False for a command like grep string file. But, a command like grep string file | xargs something will, because of the | require shell=True.
Because the shell has power features that python programmers do not always find intuitive, it is considered better practice to use shell=False unless you really truly need the shell feature. As an example, pipelines are not really truly needed because they can also be done using subprocess' PIPE feature.

Why can't we combine arguments in subprocess.Popen?

When using subprocess.Popen, we have to write
with subprocess.Popen(['ls', '-l', '-a'], stdout=subprocess.PIPE) as proc:
print(proc.stdout.read())
instead of
with subprocess.Popen(['ls', '-l -a'], stdout=subprocess.PIPE) as proc:
print(proc.stdout.read())
Why? What ls will get in the second case? Thank you.

When your operating system starts an executable, it does this via a call something very much like this:
execv('/usr/bin/ls', 'ls', '-l', '-a', NULL)
Note that the arguments are already split out into individual words before ls is started; if you're running your program with a shell, then the shell is responsible for doing that splitting; if you're running it via a programming language that lets you control the execv call's arguments directly, then you're deciding how to split the array up yourself.
When ls runs, it's passed those arguments in an array, argv. Witness the usual way a main function is declared in C:
int main(int argc, char *argv[]) {
...
}
It's getting an array of arguments, in a variable conventionally named argv, already broken up into individual words.
The parser for ls, then, can expect that when it's run it will be handed an array that looks like this:
argc = 3 # three arguments, including our own name
argv = ['ls', '-l', '-a'] # first argument is our name, others follow
...so the command-line parser built into ls doesn't need to break up spaces inside of its arguments -- spaces have already been removed, and syntactic quotes honored and stripped, before the ls command is ever started.
Now, when you run ['ls', '-l -a'], you're explicitly specifying an argc of 2, not 3, and a single argument that includes a single string -l -a. To get that behavior from a shell, you'd need to use quoting or escaping:
ls "-l -a"
ls '-l -a'
ls -l\ -a
...and you'll find that ls fails the exact same way as what you get here when invoked from a shell with any of those usages.

In the second case -l -a as a single string will be the first argument to ls, which it won't know what to do with, or at least won't do what you want. In the first case -l is the first argument and -a is the second argument.
If you want to build a string that has the complete command you can use the shell=True flag to Popen, but then your command would be "ls -l -a" not ['ls', '-l -a']
With Popen each argument in the list is an argument passed to the command being executed, it's not a string passed to the shell to be interpreted, unless you ask for it to be passed to the shell to be interpreted.

If you want to use string representation of command to execute, shlex module may be useful.
shlex.split(s[, comments[, posix]])
Split the string s using shell-like syntax. If comments is False (the default), the parsing of comments in the given string will be
disabled (setting the commenters attribute of the shlex instance to
the empty string). This function operates in POSIX mode by default,
but uses non-POSIX mode if the posix argument is false.
assert shlex.split("ls -a -l") == ['ls', '-a', '-l']
subprocess.Popen(shlex.split("ls -a -l"))
It also covers more complex cases like escaping chars or quotes usage:
assert shlex.split("cat 'file with space.txt'") == ['cat', 'file with space.txt']
assert shlex.split(r"cat file\ with\ space.txt") == ['cat', 'file with space.txt']

Why is my variable not be included in my subprocess.Popen?

I'm simply trying to pass along a variable to my shell script, but it isn't being handed off. I've
following examples from the python docs, but it's not working. What am I missing?
subprocess.Popen(['./script.sh' + variable] , shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

You shouldn't be using shell=True here at all, unless you want any actual shell syntax in your variable (like >file.log) to be executed.
subprocess.Popen(['./script.sh', variable],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
If you really want shell=True, you have a few options to do so securely. The first is to use pipes.quote() (or, in Python 3, shlex.quote()) to prevent shell escapes:
subprocess.Popen('./script.sh ' + pipes.quote(variable), shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
The second is to pass the name as a subsequent argument (note the empty string, which becomes $0 in the generated shell):
subprocess.Popen(['./script.sh "$1"', '', variable], shell=True,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
Remember, Bobby Tables isn't just for SQL -- his younger sister
Susan $(rm -rf /) is out there too.

You're combining two different ways to doing things. And, on top of that, you're doing it wrong, but just fixing the "doing it wrong" isn't the answer.
You can put your two arguments in a list, and then launch it without the shell, like ['./script.sh', variable]. This is usually better. Using the shell means you have to deal with quoting, and with accidental or malicious injection, and can interfere with your input and output, and adds a performance cost. So, if you don't need it, don't use it.
Or you can put your two arguments in a string, and then launch it with the shell, like './script.sh ' + variable.
But you can't put your two arguments in a string, and then put that string in a list. In some cases, it will happen to work, but that's not something you can rely on.
In some cases, you can use a list with the shell,* or a string without the shell,** but generally you shouldn't do that unless you know what you're doing, and in any case, you still shouldn't be using a list of one string unless there's a specific reason you need to.***
If you want to use a list of arguments, do this:
subprocess.Popen(['./script.sh', variable], shell=False, …)
Notice that this is a list of two strings, not a list of one joined-up string, and that shell=False.
If you want to use a shell command line, don't put the command line in a list, don't skip the space between the arguments, and quote any non-static arguments, like this:
subprocess.Popen('./script.sh ' + shlex.quote(variable), shell=True, …)
* Using a list with the shell on Windows is never useful; they just get combined up in some unspecified way. But on Unix, subprocess will effectively prepend '/bin/sh' and '-c' to your list, and use that as the arg list for /bin/sh, which can be simpler than trying to quote shell arguments, and at least arguably more concise than explicitly calling /bin/sh with shell=False.
** Using a string without the shell on Unix is never useful; that just tries to find a program whose name is the whole string, which is going to fail (unless you're really unlucky). But on Windows, it can be useful; subprocess tries to combine your arguments into a string to be passed to CreateProcess in such a way that MSVCRT will parse them back to the same list of arguments on the other side, and in some edge cases it's necessary to create that string yourself.
*** Basically, you want to spawn ['/bin/sh', '-c', <command line>] exactly.

Add space after ./script.sh:
subprocess.Popen(['./script.sh ' + variable] , shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

Would just add a space after script name:
subprocess.Popen(['./script.sh ' + variable], shell=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

subprocess.Popen: sequence vs a single string

In the documentaiton for Popen I read:
class subprocess.Popen(args, bufsize=0, ...)
args should be a sequence of program arguments or else a single
string. [...] Unless otherwise stated, it is recommended to pass args
as a sequence.
Why is it recommended to use a sequence for args? What are the cases when I must use a single string?

On Unix a single string argument to Popen only works if you're not passing arguments to the program. Otherwise you would need shell=True. That's because the string is interpreted as the name of the program to execute.
Using a sequence also tends to be more secure. If you get program arguments from the user, you must fully sanitize them before appending to the command string. Otherwise the user will be able to pass arbitrary commands for execution. Consider the following example:
>>> def ping(host):
... cmd = "ping -c 1 {}".format(host)
... Popen(cmd, shell=True)
...
>>> ping(input())
8.8.8.8; cat /etc/passwd
Using a sequence for args helps to avoid such vulnerabilities.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.