Python Subprocess: Unable to Escape Quotes

Python Subprocess: Unable to Escape Quotes - python

I know similar questions have been asked before, but they all seem to have been resolved by reworking how arguments are passed (i.e. using a list, etc).
However, I have a problem here in that I don't have that option. There is a particular command line program (I am using a Bash shell) to which I must pass a quoted string. It cannot be unquoted, it cannot have a replicated argument, it just has to be either single or double quoted.
command -flag 'foo foo1'
I cannot use command -flag foo foo1, nor can I use command -flag foo -flag foo1. I believe this is an oversight in how the command was programmed to receive input, but I have no control over it.
I am passing arguments as follows:
self.commands = [
self.path,
'-flag1', quoted_argument,
'-flag2', 'test',
...etc...
]
process = subprocess.Popen(self.commands, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
results = process.communicate(input)
Where quoted_argument is something like 'foo foo1 foo2'.
I have tried escaping the single quote ("\'foo foo1 foo2\'"), but I get no output.
I know this is considered bad practice because it is ambiguous to interpret, but I don't have another option. Any ideas?

The shell breaks command strings into lists. The quotes tell the shell to put multiple words into a single list item. Since you are building the list yourself, you add the words as a single item without the quotes.
These two Popen commands are equivalent
Popen("command -flag 'foo foo1'", shell=True)
Popen(["command", "-flag", "foo foo1"])
EDIT
This answer deals with escaping characters in the shell. If you don't use the shell, you don't add any quotes or escapes, just put in the string itself. There are other issues with skipping the shell, like piping commands, running background jobs, using shell variables and etc. These all can be done in python instead of the shell.

A mental model of process and shells that I found very helpful:
This mental model has helped me a lot through the years.
Processes in your operating system receive an array of strings representing the arguments. In Python, this array can be accessed from sys.argv. In C, this is the argv array passed to the main function. And so on.
When you open a terminal, you are running a shell inside that terminal, for example bash or zsh. What happens if you run a command like this one?
$ /usr/bin/touch one two
What happens is that the shell interprets the command that you wrote and splits it by whitespace to create the array ["/usr/bin/touch", "one", "two"]. It then launches a new process using that list of arguments, in this case creating two files named one and two.
What if you wanted one file named one two with a space? You can't pass the shell a list of arguments as you might want to do, you can only pass it a string. Shells like Bash and Zsh use single quotes to workaround this:
$ /usr/bin/touch 'one two'
The shell will create a new process with the arguments ["/usr/bin/touch", "one two"], which in this case create a file named one two.
Shells have special features like piping. With a shell, you can do something like this:
$ /usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z
THIS IS AN EXAMPLE
In this case, the shell interprets the | character differently. In creates a process with the arguments ["/usr/bin/echo", "This is an example"] and another process with the arguments ["/usr/bin/tr", "a-z", "A-Z"], and will pipe the output of the former to the input of the latter.
How this applies to subprocess in Python
Now, in Python, you can use subprocess with shell=False (which is the default, or with shell=True. If you use the default behaviour shell=False, then subprocess expects you to pass it a list of arguments. You cannot use special shell features like shell piping. On the plus side, you don't have to worry about escaping special characters for the shell:
import subprocess
# create a file named "one two"
subprocess.call(["/usr/bin/touch", "one two"])
If you do want to use shell features, you can do something like:
subprocess.call(
"/usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z",
shell=True,
)
If you are using variables with no particular guarantees, remember to escape the command:
import shlex
import subprocess
subprocess.call(
"/usr/bin/echo " + shlex.quote(variable) + " | /usr/bin/tr a-z A-Z",
shell=True,
)
(Note that shlex.quote is only designed for UNIX shells, and not for DOS on Windows.)

Related

Python subprocess.checkoutput() try to run mvn command getting CalledProcessError: '...' returned non-zero exit status 255 [duplicate]

I know similar questions have been asked before, but they all seem to have been resolved by reworking how arguments are passed (i.e. using a list, etc).
However, I have a problem here in that I don't have that option. There is a particular command line program (I am using a Bash shell) to which I must pass a quoted string. It cannot be unquoted, it cannot have a replicated argument, it just has to be either single or double quoted.
command -flag 'foo foo1'
I cannot use command -flag foo foo1, nor can I use command -flag foo -flag foo1. I believe this is an oversight in how the command was programmed to receive input, but I have no control over it.
I am passing arguments as follows:
self.commands = [
self.path,
'-flag1', quoted_argument,
'-flag2', 'test',
...etc...
]
process = subprocess.Popen(self.commands, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
results = process.communicate(input)
Where quoted_argument is something like 'foo foo1 foo2'.
I have tried escaping the single quote ("\'foo foo1 foo2\'"), but I get no output.
I know this is considered bad practice because it is ambiguous to interpret, but I don't have another option. Any ideas?

The shell breaks command strings into lists. The quotes tell the shell to put multiple words into a single list item. Since you are building the list yourself, you add the words as a single item without the quotes.
These two Popen commands are equivalent
Popen("command -flag 'foo foo1'", shell=True)
Popen(["command", "-flag", "foo foo1"])
EDIT
This answer deals with escaping characters in the shell. If you don't use the shell, you don't add any quotes or escapes, just put in the string itself. There are other issues with skipping the shell, like piping commands, running background jobs, using shell variables and etc. These all can be done in python instead of the shell.

A mental model of process and shells that I found very helpful:
This mental model has helped me a lot through the years.
Processes in your operating system receive an array of strings representing the arguments. In Python, this array can be accessed from sys.argv. In C, this is the argv array passed to the main function. And so on.
When you open a terminal, you are running a shell inside that terminal, for example bash or zsh. What happens if you run a command like this one?
$ /usr/bin/touch one two
What happens is that the shell interprets the command that you wrote and splits it by whitespace to create the array ["/usr/bin/touch", "one", "two"]. It then launches a new process using that list of arguments, in this case creating two files named one and two.
What if you wanted one file named one two with a space? You can't pass the shell a list of arguments as you might want to do, you can only pass it a string. Shells like Bash and Zsh use single quotes to workaround this:
$ /usr/bin/touch 'one two'
The shell will create a new process with the arguments ["/usr/bin/touch", "one two"], which in this case create a file named one two.
Shells have special features like piping. With a shell, you can do something like this:
$ /usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z
THIS IS AN EXAMPLE
In this case, the shell interprets the | character differently. In creates a process with the arguments ["/usr/bin/echo", "This is an example"] and another process with the arguments ["/usr/bin/tr", "a-z", "A-Z"], and will pipe the output of the former to the input of the latter.
How this applies to subprocess in Python
Now, in Python, you can use subprocess with shell=False (which is the default, or with shell=True. If you use the default behaviour shell=False, then subprocess expects you to pass it a list of arguments. You cannot use special shell features like shell piping. On the plus side, you don't have to worry about escaping special characters for the shell:
import subprocess
# create a file named "one two"
subprocess.call(["/usr/bin/touch", "one two"])
If you do want to use shell features, you can do something like:
subprocess.call(
"/usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z",
shell=True,
)
If you are using variables with no particular guarantees, remember to escape the command:
import shlex
import subprocess
subprocess.call(
"/usr/bin/echo " + shlex.quote(variable) + " | /usr/bin/tr a-z A-Z",
shell=True,
)
(Note that shlex.quote is only designed for UNIX shells, and not for DOS on Windows.)

Bash script will not run with subprocess in Python

For some reason, no matter how many variations I've tried, I can't seem to execute a bash script I've written. The command words 100% fine in Terminal, but when I try calling it with a subprocess, it returns nothing.
from os import listdir
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for i in listdir(moviefolder):
title = i.split('.')
formatted_title = title[0].replace(' ', '\ ')
if string.lower() == title[0].lower():
command = 'vlc {}/{}.{}'.format(moviefolder, formatted_title, title[1])
subprocess.call(["/usr/local/bin",'-i','-c', command], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
else:
continue
The bash executable file looks like this:
#/bin/bash
func() {
open -a /Applications/VLC.app/Contents/MacOS/VLC $1
}
Where have I gone wrong?

You should call open directly:
import os
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for filename in os.listdir(moviefolder):
title = filename.split('.')
if string.lower() == title[0].lower():
subprocess.call(['open', '-a', '/Applications/VLC.app/Contents/MacOS/VLC', os.path.join(moviefolder, filename)])

Since you are using shell=True, the command must be a string:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a
string, the string specifies the command to execute through the shell.
This means that the string must be formatted exactly as it would be
when typed at the shell prompt. This includes, for example, quoting or
backslash escaping filenames with spaces in them. If args is a
sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell
itself. (docs)

Like you even mentioned in a comment, you get /usr/local/bin: is a directory when you properly capture the error from the shell (and take out the erroneous shell=True; or correspondingly refactor the command line to be suitable for this usage, i.e. pass a string instead of a list).
Just to spell this out, you are attempting to run the command /usr/local/bin with some options; but of course, it's not a valid command; so this fails.
The actual script you seem to want to run will declare a function and then exit, which results in the function's definition being lost again, because the subprocess which ran the shell in which this function declaration was executed has now terminated and released all its resources back to the system.
Perhaps you should take more than just a few steps back and explain what you actually want to accomplish; but really, that should be a new, separate question.
Assuming you are actually trying to run vlc, and guessing some other things, too, perhaps you actually want
subprocess.call(['vlc','{}/{}.{}'.format(moviefolder, formatted_title, title[1]),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
If your PATH is correct, you should not need to specify /usr/local/bin/ explicitly (and if your PATH is wrong, correct it in the code before, instead of hardcoding a directory for the executable you want to call).

/usr/local/bin is a directory. You can't run a directory as if it were a command.
Anyhow, there's no point to having /usr/local/bin anywhere in your command at all. Leave out the shell=True, and explicitly call vlc:
subprocess.call([
'vlc',
'{}/{}.{}'.format(moviefolder, formatted_title, title[1])
])

When shell=True is used in subprocess.call, if the command arguments is a sequence, then the first element of the sequence needs to be the command, and the rest are treated as argument(s) to the shell itself.
So, this should do:
subprocess.call(["/usr/local/bin/{}".format(command), '-i','-c'], shell=True, ...)
Otherwise, you can make the command a string.
Example:
In [20]: subprocess.call(["cat spamegg", "-i", "-c"], shell=True)
foobar

problems using python subprocess/sh as a bash wrapper

I'm trying to execute the following line of code:
subprocess.call(["java", "-cp", "/home/me/somepath/file.jar", ..., "-someflag somevalue"])
The code fails and the jar I'm trying to run gives me usage information. But if I expand out the string and paste it into the terminal, it works (I know I'm expanding the string out correctly because the sh module spits it back to me when it errors out). So this is an issue with how either subprocess or sh operates.
Here's an example of how you're supposed to use it:
subprocess.call(["ls", "-l"])
Here's the description:
subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
args is required for all calls and should be a string, or a sequence
of program arguments. Providing a sequence of arguments is generally
preferred, as it allows the module to take care of any required
escaping and quoting of arguments (e.g. to permit spaces in file
names). If passing a single string, either shell must be True (see
below) or else the string must simply name the program to be executed
without specifying any arguments.
http://docs.python.org/2/library/subprocess.html
It's not clear to me if I should be breaking up the strings in the list with flags and values in separate places.

subprocess.call(["java", "-cp", "/home/me/somepath/file.jar", ..., "-someflag", "somevalue"])
Your original code corresponds to
java -cp /home/me/somepath/file.jar ... "-someflag somevalue"
in the shell.

toggle the shell flag to true
ie,
subprocess.check_call(["java", "-cp", cp_arg, ..., "-someflag somevalue"], shell=True)
also, a tip, you can use the split() function to split up a string command:
subprocess.check_call("java -cp blah blah".split(), shell=True)

Python Subprocess Grep

I am trying to use the grep command in a python script using the subprocess module.
Here's what I have:
userid = 'foo12'
p = subprocess.Popen(['grep', "%s *.log"%userid], stdout=subprocess.PIPE)
And it returns nothing.
I am not entirely sure what I am doing wrong so can someone please explain. The current method that I am using that works is by adding the shell=true which makes it output the correct output but as the help pages have pointed out it is unsafe. I need help trying to make this work so that my script isn't unsafe.

I think you're running up against two problems:
This call:
p = subprocess.Popen(['grep', "%s *.log"%userid]...
will not work as expected without shell=True because the list of arguments are being passed directly to os.execvp, which requires each item to be a single string representing an argument. You've squished two separate arguments together into a single string (in other words, grep is interpreting "foo12 *.log" as the pattern to search, and not pattern+file list).
You can fix this by saying:
p = subprocess.Popen(['grep', userid, '*.log']...)
The second issue is that, again without shell=True, execvp doesn't know what you mean by *.log and passes it directly along to grep, without going through the shell's wildcard expansion mechanism. If you don't want to use shell=True, you can instead do something like:
import glob
args = ['grep', userid]
args.extend(glob.glob('*.log')
p = subprocess.Popen(args, ...)

Here are two tested pieces of code to model from:
>>> print subprocess.check_output(['grep', 'python', 'api_talk.txt'])
Discuss python API patterns
Limitations of python
Introspection in python
>>> print subprocess.check_output('grep python *.txt', shell=True)
Use the latter if you want the shell to do wildcard expansion for you. When shell is True, be sure to put the whole command in a single string rather than a list of separate fields.

I am assuming you want to grep for 'foo12' in all files that end with '.log', to get this to work with just subprocess you will need to change your code to the following:
userid = 'foo12'
p = subprocess.Popen('grep %s *.log' % userid, stdout=subprocess.PIPE, shell=True)
shell=True is necessary for the wildcard expansion, and when that option is set you need to provide a string command instead of a list.
Also, make sure when you are providing a list of arguments that each argument is a separate entry in the list, your initial code would have been equivalent to the following:
grep 'foo12 *.log'

Why does subprocess.Popen() with shell=True work differently on Linux vs Windows?

When using subprocess.Popen(args, shell=True) to run "gcc --version" (just as an example), on Windows we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc (GCC) 3.4.5 (mingw-vista special r3) ...
So it's nicely printing out the version as I expect. But on Linux we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc: no input files
Because gcc hasn't received the --version option.
The docs don't specify exactly what should happen to the args under Windows, but it does say, on Unix, "If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional shell arguments." IMHO the Windows way is better, because it allows you to treat Popen(arglist) calls the same as Popen(arglist, shell=True) ones.
Why the difference between Windows and Linux here?

Actually on Windows, it does use cmd.exe when shell=True - it prepends cmd.exe /c (it actually looks up the COMSPEC environment variable but defaults to cmd.exe if not present) to the shell arguments. (On Windows 95/98 it uses the intermediate w9xpopen program to actually launch the command).
So the strange implementation is actually the UNIX one, which does the following (where each space separates a different argument):
/bin/sh -c gcc --version
It looks like the correct implementation (at least on Linux) would be:
/bin/sh -c "gcc --version" gcc --version
Since this would set the command string from the quoted parameters, and pass the other parameters successfully.
From the sh man page section for -c:
Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
This patch seems to fairly simply do the trick:
--- subprocess.py.orig 2009-04-19 04:43:42.000000000 +0200
+++ subprocess.py 2009-08-10 13:08:48.000000000 +0200
## -990,7 +990,7 ##
args = list(args)
if shell:
- args = ["/bin/sh", "-c"] + args
+ args = ["/bin/sh", "-c"] + [" ".join(args)] + args
if executable is None:
executable = args[0]

From the subprocess.py source:
On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
On Windows: the Popen class uses CreateProcess() to execute the child
program, which operates on strings. If args is a sequence, it will be
converted to a string using the list2cmdline method. Please note that
not all MS Windows applications interpret the command line the same
way: The list2cmdline is designed for applications using the same
rules as the MS C runtime.
That doesn't answer why, just clarifies that you are seeing the expected behavior.
The "why" is probably that on UNIX-like systems, command arguments are actually passed through to applications (using the exec* family of calls) as an array of strings. In other words, the calling process decides what goes into EACH command line argument. Whereas when you tell it to use a shell, the calling process actually only gets the chance to pass a single command line argument to the shell to execute: The entire command line that you want executed, executable name and arguments, as a single string.
But on Windows, the entire command line (according to the above documentation) is passed as a single string to the child process. If you look at the CreateProcess API documentation, you will notice that it expects all of the command line arguments to be concatenated together into a big string (hence the call to list2cmdline).
Plus there is the fact that on UNIX-like systems there actually is a shell that can do useful things, so I suspect that the other reason for the difference is that on Windows, shell=True does nothing, which is why it is working the way you are seeing. The only way to make the two systems act identically would be for it to simply drop all of the command line arguments when shell=True on Windows.

The reason for the UNIX behaviour of shell=True is to do with quoting. When we write a shell command, it will be split at spaces, so we have to quote some arguments:
cp "My File" "New Location"
This leads to problems when our arguments contain quotes, which requires escaping:
grep -r "\"hello\"" .
Sometimes we can get awful situations where \ must be escaped too!
Of course, the real problem is that we're trying to use one string to specify multiple strings. When calling system commands, most programming languages avoid this by allowing us to send multiple strings in the first place, hence:
Popen(['cp', 'My File', 'New Location'])
Popen(['grep', '-r', '"hello"'])
Sometimes it can be nice to run "raw" shell commands; for example, if we're copy-pasting something from a shell script or a Web site, and we don't want to convert all of the horrible escaping manually. That's why the shell=True option exists:
Popen(['cp "My File" "New Location"'], shell=True)
Popen(['grep -r "\"hello\"" .'], shell=True)
I'm not familiar with Windows so I don't know how or why it behaves differently.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.