Getting console output of a Perl script through Python

Getting console output of a Perl script through Python - python

There are a variety of posts and resources explaining how to use Python to get output of an outside call. I am familiar with using these--I've used Python to get output of jars and exec several times, when it was not realistic or economical to re-implement the functionality of that jar/exec inside Python itself.
I am trying to call a Perl script via Python's subprocess module, but I have had no success with this particular Perl script. I carefully followed the answers here, Call Perl script from Python, but had no results.
I was able to get the output of this test Perl script from this question/answer: How to call a Perl script from Python, piping input to it?
#!/usr/bin/perl
use strict;
use warnings;
my $name = shift;
print "Hello $name!\n";
Using this block of Python code:
import subprocess
var = "world"
args_test = ['perl', 'perl/test.prl', var]
pipe = subprocess.Popen(args_test, stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
However, if I swap out the arguments and the Perl script with the one I need output from, I get no output at all.
args = ['perl', 'perl/my-script.prl', '-a', 'perl/file-a.txt',
'-t', 'perl/file-t.txt', 'input.txt']
which runs correctly when entered on the command line, e.g.
>perl perl/my-script.prl -a perl/file-a.txt -t perl/file-t.txt input.txt
but this produces no output when called via subprocess:
pipe = subprocess.Popen(args, stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
I've done another sanity check as well. This correctly outputs the help message of Perl as a string:
import subprocess
pipe = subprocess.Popen(['perl', '-h'], stdout=subprocess.PIPE)
out, err = pipe.communicate()
print out, err
As shown here:
>>> ================================ RESTART ================================
>>>
Usage: perl [switches] [--] [programfile] [arguments]
-0[octal] specify record separator (\0, if no argument)
-a autosplit mode with -n or -p (splits $_ into #F)
-C[number/list] enables the listed Unicode features
-c check syntax only (runs BEGIN and CHECK blocks)
-d[:debugger] run program under debugger
-D[number/list] set debugging flags (argument is a bit mask or alphabets)
-e program one line of program (several -e's allowed, omit programfile)
-f don't do $sitelib/sitecustomize.pl at startup
-F/pattern/ split() pattern for -a switch (//'s are optional)
-i[extension] edit <> files in place (makes backup if extension supplied)
-Idirectory specify #INC/#include directory (several -I's allowed)
-l[octal] enable line ending processing, specifies line terminator
-[mM][-]module execute "use/no module..." before executing program
-n assume "while (<>) { ... }" loop around program
-p assume loop like -n but print line also, like sed
-P run program through C preprocessor before compilation
-s enable rudimentary parsing for switches after programfile
-S look for programfile using PATH environment variable
-t enable tainting warnings
-T enable tainting checks
-u dump core after parsing program
-U allow unsafe operations
-v print version, subversion (includes VERY IMPORTANT perl info)
-V[:variable] print configuration summary (or a single Config.pm variable)
-w enable many useful warnings (RECOMMENDED)
-W enable all warnings
-x[directory] strip off text before #!perl line and perhaps cd to directory
-X disable all warnings
None

Related

Issue using subprocess to run a PDAL bash command from Python [duplicate]

This question already has answers here:
File not found error when launching a subprocess containing piped commands
(6 answers)
Closed 2 years ago.
Issue:
I cannot run a pdal bash command from Python using subprocess.
Here is the code
based on Running Bash commands in Python:
import os, subprocess
input = '/path/to/file.ply'
output = '/path/to/statfile.json'
if not os.path.isfile(output):
open(output, 'a').close()
bashcmd = ("pdal info --boundary "
+input
+" > "
+output
)
print("Bash command is:\n{}\n".format(bashcmd))
process = subprocess.Popen(bashcommand.split(),
stdout=subprocess.PIPE,
shell=True)
output, error = process.communicate()
print("Output:\n{}\n".format(output))
print("Error:\n{}\n".format(error))
Which gives me this output in the Python console:
Bash command is:
pdal info --boundary /path/to/file.ply > /path/to/statfile.json
Output:
Usage:
pdal <options>
pdal <command> <command options>
--command The PDAL command
--debug Sets the output level to 3 (option deprecated)
--verbose, -v Sets the output level (0-8)
--drivers List available drivers
--help, -h Display help text
--list-commands List available commands
--version Show program version
--options Show options for specified driver (or 'all')
--log Log filename (accepts stderr, stdout, stdlog, devnull as
special cases)
--logtiming Turn on timing for log messages
The following commands are available:
- delta
- diff
- fauxplugin
- ground
- hausdorff
- info
- merge
- pcl
- pipeline
- random
- smooth
- sort
- split
- tindex
- translate
See http://pdal.io/apps/ for more detail
Error:
None
It looks as if it had stop reading the arguments of the command after the call to 'pdal' only, which prints this help message.
If I copy the output of the first print and paste it in a bash terminal, it works properly, giving me the output file with the desired metadata. But from Python no output file is created.
Question:
I wonder why (e.g. is there anything wrong with the redirection or the fact that the computation itself may take ~20sec normally?), and how to execute this command from Python?
This doesn't provide a clear enough answer to the present issue.

There are multiple errors here.
You are using an undefined variable bashCommand instead of the one you defined above bashcmd.
You are mixing output to a Python file handle with shell redirection.
You are not capturing the stderr of the process. (I will vaguely assume you do not need the standard error anyway.)
You should not split() the command if you run it with shell=True.
More broadly, you should probably avoid the shell=True and let Python take care of the redirection for you by connecting the output to the file you open; and in modern times, you really should not use subprocess.Popen() if you can use subprocess.run() or subprocess.check_call() or friends.
import subprocess
input = '/path/to/file.ply'
output = '/path/to/statfile.json'
with open(output, 'a') as handle:
bashcmd = ['pdal', 'info', '--boundary', input]
#print("Bash command is:\n{}\n".format(bashcmd))
result = subprocess.run(bashcmd, stdout=handle, stderr=subprocess.PIPE)
# No can do because output goes straight to the file now
##print("Output:\n{}\n".format(output))
#print("Error:\n{}\n".format(result.stdout))

Command gives different behaviour in terminal or in program (Python and C++)

I am trying to execute the following command programatically:
~$ obabel -:"ccco" -O /home/user/output.png
obabel is a chemistry library, and basically if the string in the "" is complete nonsense chemically, it won't be able to generate the PNG file, and if it is a legitimate chemical structure the PNG file will be generated. This works in the terminal.
However, if I call the same command with Python, PNG files are generated for complete nonsense input strings which don't generate a PNG when the command is executed in the terminal.
I'm using subprocess like this:
cmd = 'obabel -:"ccco" -O /home/user/output.png'
proc = sub.Popen([cmd], shell=True, stderr=sub.PIPE)
res = proc.communicate()
I have also tried this:
os.system(cmd)
And tried Python2 and Python3. This happens when running scripts from the terminal or iPython.
I have also tried using C++ and running the cmd like this:
std::string cmd = "obabel -:\"ccco\" -O /home/user/output.png";
system(cmd.c_str());

By default Popen expects a list of string arguments, but if you pass shell=True, you can supply the command as a simple string (it will be executed in a shell). Currently you are passing in a list with one string that contains the entirety of the command, instead you can use either of these:
proc = subprocess.Popen('obabel -:"ccco" -O output.png', shell=True, stderr=subprocess.PIPE)
proc = subprocess.Popen(['obabel', '-:ccco', '-O', 'output.png'], stderr=subprocess.PIPE)
Escaping the SMILES string with quotes seems to be done to protect it from the shell, and you don't need it when passing the input directly (otherwise the " characters will be a part of the string and cause invalid syntax).

Bash: Tokenize string using shell rules without eval'ing it?

I'm writing a wrapper script. The original program's arguments are in a separate file, args. The script needs to split contents of args using shell parameter rules and then run the program. A partial solution (set + eval) was offered in Splitting a string to tokens according to shell parameter rules without eval:
#!/usr/bin/env bash
STDOUT="$1"
STDERR="$2"
( set -f ; eval "set -- $(cat args)"; exec run_in_container "$#" >"$STDOUT" 2>"$STDERR" )
but in my case args is user-generated. One can easily imagine
args: echo "Hello, 'world'! $(rm -rf /)" (not cool, but harmless: commands are run in a e.g. docker container)
args: bash -c "$JAVA_HOME/<...> > <...> && <...>" (harmful: $JAVA_HOME was intended to be container's value of environment variable JAVA_HOME, but actually will be substituted earlier, when eval'ing the command in the wrapper script's subshell.)
I tried Python, and this works:
#!/usr/bin/env python
import shlex, subprocess, sys
with open('args', 'r') as argsfile:
args = argsfile.read()
with open(sys.argv[1], 'w') as outfile, open(sys.argv[2], 'w') as errfile:
exit(subprocess.call(["run_in_container"] + shlex.split(args), stdout=outfile, stderr=errfile))
Is there a way to do shlex in bash: tokenize the string using shell parameter rules, but don't substitute any variables' values, don't execute $(...) etc.?

Handling specific Python error within Bash call?

I am using the line_profiler, which allows you to drop #profile decorators anywhere in a python codebase and returns line output.
However, if you try to execute python code that contains one such #profile decorator without loading this line_profiler module, the code will fail with a NameError, for such a decorator is defined and injected by this external library.
I'd like a bash command that attempts to run my python script with vanilla python. Then, if and only if the error consists of NameError, I want to give it a second try. This is what I have got so far:
python -u $file || python -m kernprof -l -v --outfile=/dev/null $file"
The problem is of course that if my python code has ANY errors at all, be it ValueError or IndentationError or anything, it tries the profiler. I want to ONLY run the profiler if the error contains a string NameError: name 'profile' is not defined is found within stderr.

Wouldn't be better to monkey patch profile when no line_profiles is present ?
Something like
try:
import line_profiles
except:
import warnings
warnings.warn("Profile disabled")
def profile(fn):
def wrapper(*args, **kw):
return fn(*args, **kw)
return wrapper
This way your code runs in either case without complicating matters.

Here's a usable Bash solution that preserves stdout and stderr as separate streams (with the caveat that stderr appears after stdout) and only checks stderr for the error message (which probably is overkill though).
It goes the easy route and simply saves the stderr output to a file. It also handles script names that contain spaces (by properly quoting variable expansions where needed) and/or start with - (by passing -- before the filename to switch off flag processing) as it's an OCD pet peeve of mine.
On success or if there is an error that is not the expected error, the stderr of the first python command is shown. Otherwise (for the expected error), it is hidden.
Usage is $ ./check <script>.
#!/bin/bash
if [[ $# -ne 1 ]]; then
echo "Expected one argument: the script" >&2
exit 1
fi
script=$1
if [[ ! -e $script ]]; then
echo "'$script' does not exist or is not a regular file" >&2
exit 1
fi
if ! python -- "$script" 2>saved_stderr &&
grep -q "NameError: name 'profile' is not defined" saved_stderr; then
# Try again with the kernprof module.
python -m kernprof -l -v --outfile=/dev/null -- "$script"
else
# Either success or an unexpected error. Show stderr.
cat saved_stderr >&2
fi
rm saved_stderr
To check if the return status of a command is zero (i.e., success), it suffices to do
if <cmd>; then <if successful>; fi`
! negates the exit status, so if ! <cmd> ... can be used to check for failure. ! only applies to the python command above, not all of python ... && grep ....
>&2 redirects stdout to stderr. (It's the same as 1>&2 but saves a single character, which is a bit silly, but I included for illustrative purposes as it's a common idiom.)

Creating a simple Python wrapper would seem a lot more straightforward, because inside Python, you have access to the things which go wrong.
Assuming your $file uses the common __name__ == '__main__' idiom something like this:
if __name__ == '__main__':
main()
you can create a wrapper something like
import yourfile
try:
file.main()
except NameError:
import kernprof
# hack hack, quickly constructed from looking at main() in kernprof.py
prof = kernprof.ContextualProfile()
execfile_ = execfile
ns = locals()
try:
prof.runctx('execfile_(%r, globals())' % (yourfile,), ns, ns)
finally:
prof.print_stats()

Python subprocess with complex arguments

I'm looking for the safest and most convenient way to call a shell command from python(3). Here a ps to pdf conversion:
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile="${pdf_file}" "${ps_file}"
I use subprocess, shlex and avoid shell=True.
But I find the resulting command inconsistent:
cmd = ['gs', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pdfwrite', '-sOutputFile={0}'.format(pdf_filename), ps_filename]
What do I miss?! subprocess.call() syntax looks so clean with space separated arguments, and looks such a mess everywhere else.
What's the difference when calling subprocess.call(cmd) (at python level, ie. escaping, injection protection, quoting, etc.) between:
cmd = ['do', '--something', arg]
cmd = ['do', '--someting {0}'.format(arg)]
If none, is this, also, a good way to do it ?
cmd = ['gs', '-dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile={0} {1}'.format(pdf_filename, ps_filename)]
Another example of inconsistency:
hg merge -r 3 would be cmd = ['hg', 'merge', '-r', revision_id]
hg merge --rev=3 would be cmd = ['hg', 'merge', '--rev={0}'.format(revision_id)]
despite the fact, it is two ways to send the same arguments.

The difference is that the command may have a --something option which accepts an argument, but it doesn't have a --something foo option -- which is what you would be telling it. When you run a command in your shell, like wc -l myfile.txt, your shell splits up that commandline where it finds spaces -- so the command that gets run is ['wc', '-l', 'myfile.txt'].
The subprocess module does not perform such splitting. You have to do it yourself (unless you use the 'shell' option, but that's generally less secure, so avoid it if you can.).
Some anti-examples...
Try to run a command named "wc -l myfile.txt". Of course, there is no "wc -l myfile.txt" command installed, only a "wc" command, so this will fail:
['wc -l myfile.txt']
Try to run a command "wc" with an option "-l myfile.txt". There is an "-l" option, but no "-l myfile.txt" option. This will fail:
['wc', '-l myfile.txt']
and a correct example:
['wc', '-l', 'myfile.txt']
This calls wc with the -l option (print only the line count) and myfile.txt as the only filename.
Something you may have found confusing is fragments like this:
'-sOutputFile={0}'
This is an 'inline' style of giving the argument of an option. If this is supported, the help for the program usually says so explicitly. Python does not split these -- the program receiving them does.
There are three main styles of 'inline' arguments. I'll use grep options to demo the first two:
--context=3
-C3
(the above two lines are equivalent)
The third type is only found in imagemagick and a few other programs that tend to have reams of commandline arguments, such as gs:
-sOutputFile=foo
This is just a minor variation on the GNU standard --long-option=VALUE form shown above.
The GNU libc manual's "argument syntax" section gives a full explanation of these option passing conventions.
In regards to escaping: No escaping is done, and no escaping is normally needed. The string values are passed exactly as you specify to the command. Naturally, no quoting is done nor is it needed, since you already took care of that in your Python code.
In regards to injection: this is not possible unless you use the 'shell' option. Don't use the 'shell' option :).

Difference between what you asked.. easy to check:
arg = 'foo'
cmd = ['do', '--something', arg]
print cmd
cmd = ['do', '--someting {0}'.format(arg)]
print cmd
>>>
['do', '--something', 'foo']
['do', '--someting foo']
As you can see they are not the same.
In order to call your subprocess correctly, you should do this:
cmd = ['gs', '-dBATCH', '-dNOPAUSE', '-sDEVICE=pdfwrite', '-sOutputFile={0}'.format(pdf_filename), ps_filename]
subprocess.Popen(cmd, ...)
OR:
cmd = 'gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile={0} {1}'.format(pdf_filename, ps_filename)
subprocess.Popen(cmd, shell=True, ...)
The difference between using a list of arguments or a string:
When you use a list of arguments, you are passing those as the arguments to the shell (or executable if you specify)
And when you send a string with shell=True you let the shell parse the string and make its own arguments...
So ['do', '--something', 'foo'] is 3 arguments, while ['do', '--someting foo'] is only 2 arguments.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.