Python subprocess with complex arguments

Python subprocess with complex arguments - python

I'm looking for the safest and most convenient way to call a shell command from python(3). Here a ps to pdf conversion:
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile="${pdf_file}" "${ps_file}"
I use subprocess, shlex and avoid shell=True.
But I find the resulting command inconsistent:
cmd = ['gs', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pdfwrite', '-sOutputFile={0}'.format(pdf_filename), ps_filename]
What do I miss?! subprocess.call() syntax looks so clean with space separated arguments, and looks such a mess everywhere else.
What's the difference when calling subprocess.call(cmd) (at python level, ie. escaping, injection protection, quoting, etc.) between:
cmd = ['do', '--something', arg]
cmd = ['do', '--someting {0}'.format(arg)]
If none, is this, also, a good way to do it ?
cmd = ['gs', '-dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile={0} {1}'.format(pdf_filename, ps_filename)]
Another example of inconsistency:
hg merge -r 3 would be cmd = ['hg', 'merge', '-r', revision_id]
hg merge --rev=3 would be cmd = ['hg', 'merge', '--rev={0}'.format(revision_id)]
despite the fact, it is two ways to send the same arguments.

The difference is that the command may have a --something option which accepts an argument, but it doesn't have a --something foo option -- which is what you would be telling it. When you run a command in your shell, like wc -l myfile.txt, your shell splits up that commandline where it finds spaces -- so the command that gets run is ['wc', '-l', 'myfile.txt'].
The subprocess module does not perform such splitting. You have to do it yourself (unless you use the 'shell' option, but that's generally less secure, so avoid it if you can.).
Some anti-examples...
Try to run a command named "wc -l myfile.txt". Of course, there is no "wc -l myfile.txt" command installed, only a "wc" command, so this will fail:
['wc -l myfile.txt']
Try to run a command "wc" with an option "-l myfile.txt". There is an "-l" option, but no "-l myfile.txt" option. This will fail:
['wc', '-l myfile.txt']
and a correct example:
['wc', '-l', 'myfile.txt']
This calls wc with the -l option (print only the line count) and myfile.txt as the only filename.
Something you may have found confusing is fragments like this:
'-sOutputFile={0}'
This is an 'inline' style of giving the argument of an option. If this is supported, the help for the program usually says so explicitly. Python does not split these -- the program receiving them does.
There are three main styles of 'inline' arguments. I'll use grep options to demo the first two:
--context=3
-C3
(the above two lines are equivalent)
The third type is only found in imagemagick and a few other programs that tend to have reams of commandline arguments, such as gs:
-sOutputFile=foo
This is just a minor variation on the GNU standard --long-option=VALUE form shown above.
The GNU libc manual's "argument syntax" section gives a full explanation of these option passing conventions.
In regards to escaping: No escaping is done, and no escaping is normally needed. The string values are passed exactly as you specify to the command. Naturally, no quoting is done nor is it needed, since you already took care of that in your Python code.
In regards to injection: this is not possible unless you use the 'shell' option. Don't use the 'shell' option :).

Difference between what you asked.. easy to check:
arg = 'foo'
cmd = ['do', '--something', arg]
print cmd
cmd = ['do', '--someting {0}'.format(arg)]
print cmd
>>>
['do', '--something', 'foo']
['do', '--someting foo']
As you can see they are not the same.
In order to call your subprocess correctly, you should do this:
cmd = ['gs', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pdfwrite', '-sOutputFile={0}'.format(pdf_filename), ps_filename]
subprocess.Popen(cmd, ...)
OR:
cmd = 'gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile={0} {1}'.format(pdf_filename, ps_filename)
subprocess.Popen(cmd, shell=True, ...)
The difference between using a list of arguments or a string:
When you use a list of arguments, you are passing those as the arguments to the shell (or executable if you specify)
And when you send a string with shell=True you let the shell parse the string and make its own arguments...
So ['do', '--something', 'foo'] is 3 arguments, while ['do', '--someting foo'] is only 2 arguments.

Related

Bash script will not run with subprocess in Python

For some reason, no matter how many variations I've tried, I can't seem to execute a bash script I've written. The command words 100% fine in Terminal, but when I try calling it with a subprocess, it returns nothing.
from os import listdir
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for i in listdir(moviefolder):
title = i.split('.')
formatted_title = title[0].replace(' ', '\ ')
if string.lower() == title[0].lower():
command = 'vlc {}/{}.{}'.format(moviefolder, formatted_title, title[1])
subprocess.call(["/usr/local/bin",'-i','-c', command], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
else:
continue
The bash executable file looks like this:
#/bin/bash
func() {
open -a /Applications/VLC.app/Contents/MacOS/VLC $1
}
Where have I gone wrong?

You should call open directly:
import os
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for filename in os.listdir(moviefolder):
title = filename.split('.')
if string.lower() == title[0].lower():
subprocess.call(['open', '-a', '/Applications/VLC.app/Contents/MacOS/VLC', os.path.join(moviefolder, filename)])

Since you are using shell=True, the command must be a string:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a
string, the string specifies the command to execute through the shell.
This means that the string must be formatted exactly as it would be
when typed at the shell prompt. This includes, for example, quoting or
backslash escaping filenames with spaces in them. If args is a
sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell
itself. (docs)

Like you even mentioned in a comment, you get /usr/local/bin: is a directory when you properly capture the error from the shell (and take out the erroneous shell=True; or correspondingly refactor the command line to be suitable for this usage, i.e. pass a string instead of a list).
Just to spell this out, you are attempting to run the command /usr/local/bin with some options; but of course, it's not a valid command; so this fails.
The actual script you seem to want to run will declare a function and then exit, which results in the function's definition being lost again, because the subprocess which ran the shell in which this function declaration was executed has now terminated and released all its resources back to the system.
Perhaps you should take more than just a few steps back and explain what you actually want to accomplish; but really, that should be a new, separate question.
Assuming you are actually trying to run vlc, and guessing some other things, too, perhaps you actually want
subprocess.call(['vlc','{}/{}.{}'.format(moviefolder, formatted_title, title[1]),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
If your PATH is correct, you should not need to specify /usr/local/bin/ explicitly (and if your PATH is wrong, correct it in the code before, instead of hardcoding a directory for the executable you want to call).

/usr/local/bin is a directory. You can't run a directory as if it were a command.
Anyhow, there's no point to having /usr/local/bin anywhere in your command at all. Leave out the shell=True, and explicitly call vlc:
subprocess.call([
'vlc',
'{}/{}.{}'.format(moviefolder, formatted_title, title[1])
])

When shell=True is used in subprocess.call, if the command arguments is a sequence, then the first element of the sequence needs to be the command, and the rest are treated as argument(s) to the shell itself.
So, this should do:
subprocess.call(["/usr/local/bin/{}".format(command), '-i','-c'], shell=True, ...)
Otherwise, you can make the command a string.
Example:
In [20]: subprocess.call(["cat spamegg", "-i", "-c"], shell=True)
foobar

Python 2 subprocess arguments error under Mac

I'm trying to do a Mac version of a program that runs fine under Windows, using python 2.7. Under Mac (OS X El Capitan running in VirtualBox), it fails because the arguments I pass to the shell are not recognized properly.
Original code:
for item in source_files:
# core process
output = sub.Popen(["mhl", "verify", "-vv", "-f", item, ">", text_report],
shell=True,
stdout=sub.PIPE,
stderr=sub.PIPE)
stdout_value, stderr_value = output.communicate()
Under Mac only the 'mhl' argument is recognized so I tried this:
sub.Popen(['mhl verify -vv -f', item, '>', text_report]
Now the command works but the item (a .mhl file) is not recognized so I tried this:
sub.Popen(['mhl verify -vv -f', '/Users/simon/Documents/Documents.mhl', '>', text_report]
and this:
sub.Popen(['mhl verify -vv -f', r'/Users/simon/Documents/Documents.mhl', '>', text_report]
Same results, it tells me a mhl file should follow the '-f' argument. If I add the item directly to the first argument it works fine:
sub.Popen(['mhl verify -vv -f /Users/simon/Documents/Documents.mhl', '>', text_report]
What am I missing here?

You are asking the OS to run the executable 'mhl verify -vv -f', and there is no such executable. No shell splitting takes place on the spaces there.
With shell=True you'd want to pass in everything as one string, not as separate arguments:
sub.Popen('mhl verify -vv -f {} > {}'.format(item, text_report),
shell=True, stdout=sub.PIPE, stderr=sub.PIPE)
Note that there is little point in directing stdout to a pipe here, since all stdout output from the mhl command is being redirected to a file.
If you wanted to capture the output of the mhl command directly in Python, I'd not use a shell intermediary here; run without shell=True and then just use subprocess.check_output() to retrieve the output:
output = sub.check_output(['mhl', 'verify', '-vv', '-f', item])
Note that now the program name and the arguments must be passed in ready-split into separate strings.

Getting console output of a Perl script through Python

There are a variety of posts and resources explaining how to use Python to get output of an outside call. I am familiar with using these--I've used Python to get output of jars and exec several times, when it was not realistic or economical to re-implement the functionality of that jar/exec inside Python itself.
I am trying to call a Perl script via Python's subprocess module, but I have had no success with this particular Perl script. I carefully followed the answers here, Call Perl script from Python, but had no results.
I was able to get the output of this test Perl script from this question/answer: How to call a Perl script from Python, piping input to it?
#!/usr/bin/perl
use strict;
use warnings;
my $name = shift;
print "Hello $name!\n";
Using this block of Python code:
import subprocess
var = "world"
args_test = ['perl', 'perl/test.prl', var]
pipe = subprocess.Popen(args_test, stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
However, if I swap out the arguments and the Perl script with the one I need output from, I get no output at all.
args = ['perl', 'perl/my-script.prl', '-a', 'perl/file-a.txt',
'-t', 'perl/file-t.txt', 'input.txt']
which runs correctly when entered on the command line, e.g.
>perl perl/my-script.prl -a perl/file-a.txt -t perl/file-t.txt input.txt
but this produces no output when called via subprocess:
pipe = subprocess.Popen(args, stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
I've done another sanity check as well. This correctly outputs the help message of Perl as a string:
import subprocess
pipe = subprocess.Popen(['perl', '-h'], stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
As shown here:
>>> ================================ RESTART ================================
>>>
Usage: perl [switches] [--] [programfile] [arguments]
-0[octal] specify record separator (\0, if no argument)
-a autosplit mode with -n or -p (splits $_ into #F)
-C[number/list] enables the listed Unicode features
-c check syntax only (runs BEGIN and CHECK blocks)
-d[:debugger] run program under debugger
-D[number/list] set debugging flags (argument is a bit mask or alphabets)
-e program one line of program (several -e's allowed, omit programfile)
-f don't do $sitelib/sitecustomize.pl at startup
-F/pattern/ split() pattern for -a switch (//'s are optional)
-i[extension] edit <> files in place (makes backup if extension supplied)
-Idirectory specify #INC/#include directory (several -I's allowed)
-l[octal] enable line ending processing, specifies line terminator
-[mM][-]module execute "use/no module..." before executing program
-n assume "while (<>) { ... }" loop around program
-p assume loop like -n but print line also, like sed
-P run program through C preprocessor before compilation
-s enable rudimentary parsing for switches after programfile
-S look for programfile using PATH environment variable
-t enable tainting warnings
-T enable tainting checks
-u dump core after parsing program
-U allow unsafe operations
-v print version, subversion (includes VERY IMPORTANT perl info)
-V[:variable] print configuration summary (or a single Config.pm variable)
-w enable many useful warnings (RECOMMENDED)
-W enable all warnings
-x[directory] strip off text before #!perl line and perhaps cd to directory
-X disable all warnings
None

Running cmd in python

Atm I have this as my code, the first line seems to work well but the 2nd gives errrors.
os.chdir('C://Users/Alex/Dropbox/code stuff/test')
subprocess.call('ffmpeg -i test%d0.png output.avi')
Also when I try to run it as this, it gives a 1s cmd flicker and then nothing happens
os.system('ffmpeg -i test%d0.png output.avi')

For the later generations looking for the answer, this worked. (You have to separate the command by the spaces.)
import os
import subprocess
os.chdir('C://Users/Alex/')
subprocess.call(['ffmpeg', '-i', 'picture%d0.png', 'output.avi'])
subprocess.call(['ffmpeg', '-i', 'output.avi', '-t', '5', 'out.gif'])

It is better to call subprocess.call in another way.
The preferred way is:
subprocess.call(['ffmpeg', '-i', 'test%d0.png', 'output.avi'])
Alternatively:
subprocess.call('ffmpeg -i test%d0.png output.avi', shell=True)
You can find the reasons for this in the manual. I quote:
args is required for all calls and should be a string, or a sequence
of program arguments. Providing a sequence of arguments is generally
preferred, as it allows the module to take care of any required
escaping and quoting of arguments (e.g. to permit spaces in file
names). If passing a single string, either shell must be True (see
below) or else the string must simply name the program to be executed
without specifying any arguments.

I know this question is old, but now there is an excellent wrapper for ffmpeg in Python :
ffmpeg-python. You will find it at https://github.com/kkroening/ffmpeg-python
With it, the command could be achieved this way:
import ffmpeg
ffmpeg
.input('test*.png', pattern_type='glob')
.output('output.avi')
.run()

I also use subprocess but in another way.
As #Roland Smith claimed, the preferred method is (which isn't really comfortable):
subprocess.call(['ffmpeg', '-i', 'test%d0.png', 'output.avi'])
The second method can be more comfortable to use but can have some problems:
subprocess.call('ffmpeg -i test%d0.png output.avi', shell=True)
Besides the fact that is suggested avoiding setting the shell parameter to "True", I had problems when the output folder contains round brackets, that is: "temp(5)/output.avi".
A more robust way is the following:
import subprocess
import shlex
cmd = shlex.split('ffmpeg -i test%d0.png output.avi')
subprocess.call(cmd)
To know more about shlex.
In particular, for shlex.split:
Split the string s using shell-like syntax.

What version of Python are you using? getstatusoutput() is deprecated since version 2.6. In Python 3 you can use subprocess for the same effect.
subprocess.getoutput('cd /Users/Alex/code/pics/')

Why does subprocess.Popen() with shell=True work differently on Linux vs Windows?

When using subprocess.Popen(args, shell=True) to run "gcc --version" (just as an example), on Windows we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc (GCC) 3.4.5 (mingw-vista special r3) ...
So it's nicely printing out the version as I expect. But on Linux we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc: no input files
Because gcc hasn't received the --version option.
The docs don't specify exactly what should happen to the args under Windows, but it does say, on Unix, "If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional shell arguments." IMHO the Windows way is better, because it allows you to treat Popen(arglist) calls the same as Popen(arglist, shell=True) ones.
Why the difference between Windows and Linux here?

Actually on Windows, it does use cmd.exe when shell=True - it prepends cmd.exe /c (it actually looks up the COMSPEC environment variable but defaults to cmd.exe if not present) to the shell arguments. (On Windows 95/98 it uses the intermediate w9xpopen program to actually launch the command).
So the strange implementation is actually the UNIX one, which does the following (where each space separates a different argument):
/bin/sh -c gcc --version
It looks like the correct implementation (at least on Linux) would be:
/bin/sh -c "gcc --version" gcc --version
Since this would set the command string from the quoted parameters, and pass the other parameters successfully.
From the sh man page section for -c:
Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
This patch seems to fairly simply do the trick:
--- subprocess.py.orig 2009-04-19 04:43:42.000000000 +0200
+++ subprocess.py 2009-08-10 13:08:48.000000000 +0200
## -990,7 +990,7 ##
args = list(args)
if shell:
- args = ["/bin/sh", "-c"] + args
+ args = ["/bin/sh", "-c"] + [" ".join(args)] + args
if executable is None:
executable = args[0]

From the subprocess.py source:
On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
On Windows: the Popen class uses CreateProcess() to execute the child
program, which operates on strings. If args is a sequence, it will be
converted to a string using the list2cmdline method. Please note that
not all MS Windows applications interpret the command line the same
way: The list2cmdline is designed for applications using the same
rules as the MS C runtime.
That doesn't answer why, just clarifies that you are seeing the expected behavior.
The "why" is probably that on UNIX-like systems, command arguments are actually passed through to applications (using the exec* family of calls) as an array of strings. In other words, the calling process decides what goes into EACH command line argument. Whereas when you tell it to use a shell, the calling process actually only gets the chance to pass a single command line argument to the shell to execute: The entire command line that you want executed, executable name and arguments, as a single string.
But on Windows, the entire command line (according to the above documentation) is passed as a single string to the child process. If you look at the CreateProcess API documentation, you will notice that it expects all of the command line arguments to be concatenated together into a big string (hence the call to list2cmdline).
Plus there is the fact that on UNIX-like systems there actually is a shell that can do useful things, so I suspect that the other reason for the difference is that on Windows, shell=True does nothing, which is why it is working the way you are seeing. The only way to make the two systems act identically would be for it to simply drop all of the command line arguments when shell=True on Windows.

The reason for the UNIX behaviour of shell=True is to do with quoting. When we write a shell command, it will be split at spaces, so we have to quote some arguments:
cp "My File" "New Location"
This leads to problems when our arguments contain quotes, which requires escaping:
grep -r "\"hello\"" .
Sometimes we can get awful situations where \ must be escaped too!
Of course, the real problem is that we're trying to use one string to specify multiple strings. When calling system commands, most programming languages avoid this by allowing us to send multiple strings in the first place, hence:
Popen(['cp', 'My File', 'New Location'])
Popen(['grep', '-r', '"hello"'])
Sometimes it can be nice to run "raw" shell commands; for example, if we're copy-pasting something from a shell script or a Web site, and we don't want to convert all of the horrible escaping manually. That's why the shell=True option exists:
Popen(['cp "My File" "New Location"'], shell=True)
Popen(['grep -r "\"hello\"" .'], shell=True)
I'm not familiar with Windows so I don't know how or why it behaves differently.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.