Taming shlex.split() behaviour

Taming shlex.split() behaviour - python

There are other questions on SO that get close to answering mine, but I have a very specific use case that I have trouble solving. Consider this:
from asyncio import create_subprocess_exec, run
async def main():
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
await proc.wait()
run(main())
This causes trouble, because program.exe is called with these arguments:
['C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
That is, the double backslash is no longer there, as shlex.split() removes it. Of course, I could instead (as other answers suggest) do this:
proc = await create_subprocess_exec(*command, posix=False)
But then program.exe is effectively called with these arguments:
['"C:\\some folder"', '-o"\\\\server\\share\\some folder"', '"a \\"', 'quote\\""']
That's also no good, because now the double quotes have become part of the content of the first parameter, where they don't belong, even though the second parameter is now fine. The third parameters has become a complete mess.
Replacing backslashes with forward slashes, or removing quotes with regular expressions all don't work for similar reasons.
Is there some way to get shlex.split() to leave double backslashes before server names alone? Or just at all? Why does it remove them in the first place?
Note that, by themselves these are perfectly valid commands (on Windows and Linux respectively anyway):
program.exe "C:\some folder" -o"\\server\share\some folder"
echo "hello \"world""
And even if I did detect the OS and used posix=True/False accordingly, I'd still be stuck with the double quotes included in the second argument, which they shouldn't be.

For now, I ended up with this (arguably a bit of a hack):
from os import name as os_name
from shlex import split
def arg_split(args, platform=os_name):
"""
Like calling shlex.split, but sets `posix=` according to platform
and unquotes previously quoted arguments on Windows
:param args: a command line string consisting of a command with arguments,
e.g. r'dir "C:\Program Files"'
:param platform: a value like os.name would return, e.g. 'nt'
:return: a list of arguments like shlex.split(args) would have returned
"""
return [a[1:-1].replace('""', '"') if a[0] == a[-1] == '"' else a
for a in (split(args, posix=False) if platform == 'nt' else split(args))]
Using this instead of shlex.split() gets me what I need, while not breaking UNC paths. However, I'm sure there's some edge cases where correct escaping of double quotes isn't correctly handled, but it has worked for all my test cases and seems to be working for all practical cases so far. Use at your own risk.
#balmy made the excellent observation that most people should probably just use:
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_shell(command)
Instead of
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
However, note that this means:
it's not easy to check or replace individual arguments
you have the problem that always comes with using create_subprocess_exec if part of your command is based on external input, someone can inject code; in the words of the documentation (https://docs.python.org/3/library/asyncio-subprocess.html):
It is the application’s responsibility to ensure that all
whitespace and special characters are quoted appropriately to avoid
shell injection vulnerabilities. The shlex.quote() function can be
used to properly escape whitespace and special shell characters in
strings that are going to be used to construct shell commands.
And that's still a problem, as quote() also doesn't work correctly for Windows (by design).
I'll leave the question open for a bit, in case someone wishes to point out why the above is a really bad idea, or if someone has a better one.

As far as I can tell, the shlex module is the wrong tool if you are dealing with the Windows shell.
The first paragraph of the docs says (my italics):
The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell.
Admittedly, that talks about just one class, not the entire module. Later, the docs for the quote function say (boldface in the original, this time):
Warning The shlex module is only designed for Unix shells.
To be honest, I'm not sure what the non-Posix mode is supposed to be compatible with. It could be, but this is just me guessing, that the original versions of shlex parsed a syntax of its own which was not quite compatible with anything else, and then Posix mode got added to actually be compatible with Posix shells. This mailing list thread, including this mail from ESR seems to support this.

For the -o parameter, but the leading " at the start of it not in the middle, and double the backslashes
Then use posix=True
import shlex
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
print( "Original command Posix=True", shlex.split(command, posix=True) )
command = r'program.exe "C:\some folder" "-o\\\\server\\share\\some folder" "a \"quote\""'
print( "Updated command Posix=True", shlex.split(command, posix=True) )
result:
Original command Posix=True ['program.exe', 'C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
Updated command Posix=True ['program.exe', 'C:\\some folder', '-o\\\\server\\share\\some folder', 'a "quote"']
The backslashes are still double in the result, but that's standard Python representation of a \ in a string.

Related

Python subprocess.checkoutput() try to run mvn command getting CalledProcessError: '...' returned non-zero exit status 255 [duplicate]

I know similar questions have been asked before, but they all seem to have been resolved by reworking how arguments are passed (i.e. using a list, etc).
However, I have a problem here in that I don't have that option. There is a particular command line program (I am using a Bash shell) to which I must pass a quoted string. It cannot be unquoted, it cannot have a replicated argument, it just has to be either single or double quoted.
command -flag 'foo foo1'
I cannot use command -flag foo foo1, nor can I use command -flag foo -flag foo1. I believe this is an oversight in how the command was programmed to receive input, but I have no control over it.
I am passing arguments as follows:
self.commands = [
self.path,
'-flag1', quoted_argument,
'-flag2', 'test',
...etc...
]
process = subprocess.Popen(self.commands, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
results = process.communicate(input)
Where quoted_argument is something like 'foo foo1 foo2'.
I have tried escaping the single quote ("\'foo foo1 foo2\'"), but I get no output.
I know this is considered bad practice because it is ambiguous to interpret, but I don't have another option. Any ideas?

The shell breaks command strings into lists. The quotes tell the shell to put multiple words into a single list item. Since you are building the list yourself, you add the words as a single item without the quotes.
These two Popen commands are equivalent
Popen("command -flag 'foo foo1'", shell=True)
Popen(["command", "-flag", "foo foo1"])
EDIT
This answer deals with escaping characters in the shell. If you don't use the shell, you don't add any quotes or escapes, just put in the string itself. There are other issues with skipping the shell, like piping commands, running background jobs, using shell variables and etc. These all can be done in python instead of the shell.

A mental model of process and shells that I found very helpful:
This mental model has helped me a lot through the years.
Processes in your operating system receive an array of strings representing the arguments. In Python, this array can be accessed from sys.argv. In C, this is the argv array passed to the main function. And so on.
When you open a terminal, you are running a shell inside that terminal, for example bash or zsh. What happens if you run a command like this one?
$ /usr/bin/touch one two
What happens is that the shell interprets the command that you wrote and splits it by whitespace to create the array ["/usr/bin/touch", "one", "two"]. It then launches a new process using that list of arguments, in this case creating two files named one and two.
What if you wanted one file named one two with a space? You can't pass the shell a list of arguments as you might want to do, you can only pass it a string. Shells like Bash and Zsh use single quotes to workaround this:
$ /usr/bin/touch 'one two'
The shell will create a new process with the arguments ["/usr/bin/touch", "one two"], which in this case create a file named one two.
Shells have special features like piping. With a shell, you can do something like this:
$ /usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z
THIS IS AN EXAMPLE
In this case, the shell interprets the | character differently. In creates a process with the arguments ["/usr/bin/echo", "This is an example"] and another process with the arguments ["/usr/bin/tr", "a-z", "A-Z"], and will pipe the output of the former to the input of the latter.
How this applies to subprocess in Python
Now, in Python, you can use subprocess with shell=False (which is the default, or with shell=True. If you use the default behaviour shell=False, then subprocess expects you to pass it a list of arguments. You cannot use special shell features like shell piping. On the plus side, you don't have to worry about escaping special characters for the shell:
import subprocess
# create a file named "one two"
subprocess.call(["/usr/bin/touch", "one two"])
If you do want to use shell features, you can do something like:
subprocess.call(
"/usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z",
shell=True,
)
If you are using variables with no particular guarantees, remember to escape the command:
import shlex
import subprocess
subprocess.call(
"/usr/bin/echo " + shlex.quote(variable) + " | /usr/bin/tr a-z A-Z",
shell=True,
)
(Note that shlex.quote is only designed for UNIX shells, and not for DOS on Windows.)

Bash script will not run with subprocess in Python

For some reason, no matter how many variations I've tried, I can't seem to execute a bash script I've written. The command words 100% fine in Terminal, but when I try calling it with a subprocess, it returns nothing.
from os import listdir
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for i in listdir(moviefolder):
title = i.split('.')
formatted_title = title[0].replace(' ', '\ ')
if string.lower() == title[0].lower():
command = 'vlc {}/{}.{}'.format(moviefolder, formatted_title, title[1])
subprocess.call(["/usr/local/bin",'-i','-c', command], stdout=subprocess.PIPE,
stderr=subprocess.PIPE, shell=True)
else:
continue
The bash executable file looks like this:
#/bin/bash
func() {
open -a /Applications/VLC.app/Contents/MacOS/VLC $1
}
Where have I gone wrong?

You should call open directly:
import os
import subprocess
computer_name = 'homedirectoryname'
moviefolder = '/Users/{}/Documents/Programming/Voicer/Movies'.format(computer_name)
string = 'The lion king'
for filename in os.listdir(moviefolder):
title = filename.split('.')
if string.lower() == title[0].lower():
subprocess.call(['open', '-a', '/Applications/VLC.app/Contents/MacOS/VLC', os.path.join(moviefolder, filename)])

Since you are using shell=True, the command must be a string:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a
string, the string specifies the command to execute through the shell.
This means that the string must be formatted exactly as it would be
when typed at the shell prompt. This includes, for example, quoting or
backslash escaping filenames with spaces in them. If args is a
sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell
itself. (docs)

Like you even mentioned in a comment, you get /usr/local/bin: is a directory when you properly capture the error from the shell (and take out the erroneous shell=True; or correspondingly refactor the command line to be suitable for this usage, i.e. pass a string instead of a list).
Just to spell this out, you are attempting to run the command /usr/local/bin with some options; but of course, it's not a valid command; so this fails.
The actual script you seem to want to run will declare a function and then exit, which results in the function's definition being lost again, because the subprocess which ran the shell in which this function declaration was executed has now terminated and released all its resources back to the system.
Perhaps you should take more than just a few steps back and explain what you actually want to accomplish; but really, that should be a new, separate question.
Assuming you are actually trying to run vlc, and guessing some other things, too, perhaps you actually want
subprocess.call(['vlc','{}/{}.{}'.format(moviefolder, formatted_title, title[1]),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
If your PATH is correct, you should not need to specify /usr/local/bin/ explicitly (and if your PATH is wrong, correct it in the code before, instead of hardcoding a directory for the executable you want to call).

/usr/local/bin is a directory. You can't run a directory as if it were a command.
Anyhow, there's no point to having /usr/local/bin anywhere in your command at all. Leave out the shell=True, and explicitly call vlc:
subprocess.call([
'vlc',
'{}/{}.{}'.format(moviefolder, formatted_title, title[1])
])

When shell=True is used in subprocess.call, if the command arguments is a sequence, then the first element of the sequence needs to be the command, and the rest are treated as argument(s) to the shell itself.
So, this should do:
subprocess.call(["/usr/local/bin/{}".format(command), '-i','-c'], shell=True, ...)
Otherwise, you can make the command a string.
Example:
In [20]: subprocess.call(["cat spamegg", "-i", "-c"], shell=True)
foobar

Prevent expansion of wildcards in non-quoted python script argument when running in UNIX environment

I have a python script that I'd like to supply with an argument (usually) containing wildcards, referring to a series of files that I'd like to do stuff with. Example here:
#!/usr/bin/env python
import argparse
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-i', action="store", dest="i")
results = parser.parse_args()
print 'argument i is: ', results.i
list_of_matched_files = glob.glob(results.i)
In this case, everything works great if the user adds quotes to the passed argument like so:
./test_script.py -i "foo*.txt"
...but often times the users forget to add quotes to the argument and are stumped when the list only contains the first match because UNIX already expanded the list and argparse only then gets the first list element.
Is there a way (within the script) to prevent UNIX from expanding the list before passing it to python? Or maybe even just to test if the argument doesn't contain quotes and then warn the user?

No. Wildcards are expanded by the shell (Bash, zsh, csh, fish, whatever) before the script even runs, and the script can't do anything about them. Testing whether the argument contains quotes also won't work, as the shell similarly strips the quotes from "foo*.txt" before passing the argument to the script, so all Python sees is foo*.txt.

Its not UNIX that is doing the expansion, it is the shell.
Bash has an option set -o noglob (or -f) which turns off globbing (filename expansion), but that is non-standard.
If you give an end-user access to the command-line then they really should know about quoting. For example, the commonly used find command has a -name parameter which can take glob constructs but they have to be quoted in a similar manner. Your program is no different to any other.
If users can't handle that then maybe you should give them a different interface. You could go to the extreme of writing a GUI or a web/HTML front-end, but that's probably over the top.
Or why not prompt for the filename pattern? You could, for example, use a -p option to indicate prompting, e.g:
import argparse
import glob
parser = argparse.ArgumentParser()
parser.add_argument('-i', action="store", dest="i")
parser.add_argument('-p', action="store_true", default=False)
results = parser.parse_args()
if results.p:
pattern = raw_input("Enter filename pattern: ")
else:
pattern = results.i
list_of_matched_files = glob.glob(pattern)
print list_of_matched_files
(I have assumed Python 2 because of your print statement)
Here the input is not read by the shell but by python, which will not expand glob constructs unless you ask it to.

You can disable the expansion using set -f from the command line. (re-enable with set +f).
As jwodder correctly says though, this happens before the script is run, so the only way I can think of to do this is to wrap it with a shell script that disables expansion temporarily, runs the python script, and re-enables expansion. Preventing UNIX from expanding the list before passing it to python is not possible.

Here is an example for the Bash shell that shows what #Tom Wyllie is talking about:
alias sea='set -f; search_function'
search_function() { perl /home/scripts/search.pl $# ; set +f; }
This defines an alias called "sea" that:
Turns off expansion ("set -f")
Runs the search_function function which is a perl script
Turns expansion back on ("set +f")
The problem with this is that if a user stops execution with ^C or some such then the expansion may not be turned back on leaving the user puzzling why "ls *" is not working. So I'm not necessarily advocating using this. :).

This worked for me:
files = sys.argv[1:]
Even though only one string is on the command line, the shell expands the wildcards and fills sys.argv[] with the list.

Python Subprocess: Unable to Escape Quotes

I know similar questions have been asked before, but they all seem to have been resolved by reworking how arguments are passed (i.e. using a list, etc).
However, I have a problem here in that I don't have that option. There is a particular command line program (I am using a Bash shell) to which I must pass a quoted string. It cannot be unquoted, it cannot have a replicated argument, it just has to be either single or double quoted.
command -flag 'foo foo1'
I cannot use command -flag foo foo1, nor can I use command -flag foo -flag foo1. I believe this is an oversight in how the command was programmed to receive input, but I have no control over it.
I am passing arguments as follows:
self.commands = [
self.path,
'-flag1', quoted_argument,
'-flag2', 'test',
...etc...
]
process = subprocess.Popen(self.commands, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
results = process.communicate(input)
Where quoted_argument is something like 'foo foo1 foo2'.
I have tried escaping the single quote ("\'foo foo1 foo2\'"), but I get no output.
I know this is considered bad practice because it is ambiguous to interpret, but I don't have another option. Any ideas?

The shell breaks command strings into lists. The quotes tell the shell to put multiple words into a single list item. Since you are building the list yourself, you add the words as a single item without the quotes.
These two Popen commands are equivalent
Popen("command -flag 'foo foo1'", shell=True)
Popen(["command", "-flag", "foo foo1"])
EDIT
This answer deals with escaping characters in the shell. If you don't use the shell, you don't add any quotes or escapes, just put in the string itself. There are other issues with skipping the shell, like piping commands, running background jobs, using shell variables and etc. These all can be done in python instead of the shell.

A mental model of process and shells that I found very helpful:
This mental model has helped me a lot through the years.
Processes in your operating system receive an array of strings representing the arguments. In Python, this array can be accessed from sys.argv. In C, this is the argv array passed to the main function. And so on.
When you open a terminal, you are running a shell inside that terminal, for example bash or zsh. What happens if you run a command like this one?
$ /usr/bin/touch one two
What happens is that the shell interprets the command that you wrote and splits it by whitespace to create the array ["/usr/bin/touch", "one", "two"]. It then launches a new process using that list of arguments, in this case creating two files named one and two.
What if you wanted one file named one two with a space? You can't pass the shell a list of arguments as you might want to do, you can only pass it a string. Shells like Bash and Zsh use single quotes to workaround this:
$ /usr/bin/touch 'one two'
The shell will create a new process with the arguments ["/usr/bin/touch", "one two"], which in this case create a file named one two.
Shells have special features like piping. With a shell, you can do something like this:
$ /usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z
THIS IS AN EXAMPLE
In this case, the shell interprets the | character differently. In creates a process with the arguments ["/usr/bin/echo", "This is an example"] and another process with the arguments ["/usr/bin/tr", "a-z", "A-Z"], and will pipe the output of the former to the input of the latter.
How this applies to subprocess in Python
Now, in Python, you can use subprocess with shell=False (which is the default, or with shell=True. If you use the default behaviour shell=False, then subprocess expects you to pass it a list of arguments. You cannot use special shell features like shell piping. On the plus side, you don't have to worry about escaping special characters for the shell:
import subprocess
# create a file named "one two"
subprocess.call(["/usr/bin/touch", "one two"])
If you do want to use shell features, you can do something like:
subprocess.call(
"/usr/bin/echo 'This is an example' | /usr/bin/tr a-z A-Z",
shell=True,
)
If you are using variables with no particular guarantees, remember to escape the command:
import shlex
import subprocess
subprocess.call(
"/usr/bin/echo " + shlex.quote(variable) + " | /usr/bin/tr a-z A-Z",
shell=True,
)
(Note that shlex.quote is only designed for UNIX shells, and not for DOS on Windows.)

Full command line as it was typed

I want to get the full command line as it was typed.
This:
" ".join(sys.argv[:])
doesn't work here (deletes double quotes). Also I prefer not to rejoin something that was parsed and split.
Any ideas?

You're too late. By the time that the typed command gets to Python your shell has already worked its magic. For example, quotes get consumed (as you've noticed), variables get interpolated, etc.

In a Unix environment, this is not generally possible...the best you can hope for is the command line as passed to your process.
Because the shell (essentially any shell) may munge the typed command line in several ways before handing it to the OS for execution.

*nix
Look at the initial stack layout (Linux on i386) that provides access to command line and environment of a program: the process sees only separate arguments.
You can't get the command-line as it was typed in the general case. On Unix, the shell parses the command-line into separate arguments and eventually execv(path, argv) function that invokes the corresponding syscall is called. sys.argv is derived from argv parameter passed to the execve() function. You could get something equivalent using " ".join(map(shlex.quote, sys.argv)) though you shouldn't need to e.g., if you want to restart the script with slightly different command-line parameters then sys.argv is enough (in many cases), see Is it possible to set the python -O (optimize) flag within a script?
There are some creative (non-practical) solutions:
attach the shell using gdb and interrogate it (most shells are capable of repeating the same command twice)—you should be able to get almost the same command as it was typed— or read its history file directly if it is updated before your process exits
use screen, script utilities to get the terminal session
use a keylogger, to get what was typed.
Windows
On Windows the native CreateProcess() interface is a string but python.exe still receives arguments as a list. subprocess.list2cmdline(sys.argv) might help to reverse the process. list2cmdline is designed for applications using the same
rules as the MS C runtime—python.exe is one of them. list2cmdline doesn't return the command-line as it was typed but it returns a functional equivalent in this case.
On Python 2, you might need GetCommandLineW(), to get Unicode characters from the command line that can't be represented in Windows ANSI codepage (such as cp1252).

As mentioned, this probably cannot be done, at least not reliably. In a few cases, you might be able to find a history file for the shell (e.g. - "bash", but not "tcsh") and get the user's typing from that. I don't know how much, if any, control you have over the user's environment.

On Linux there is /proc/<pid>/cmdline that is in the format of argv[] (i.e. there is 0x00 between all the lines and you can't really know how many strings there are since you don't get the argc; though you will know it when the file runs out of data ;).
You can be sure that that commandline is already munged too since all escaping/variable filling is done and parameters are nicely packaged (no extra spaces between parameters, etc.).

You can use psutil that provides a cross platform solution:
import psutil
import os
my_process = psutil.Process( os.getpid() )
print( my_process.cmdline() )
If that's not what you're after you can go further and get the command line of the parent program(s):
my_parent_process = psutil.Process( my_process.ppid() )
print( my_parent_process.cmdline() )
The variables will still be split into its components, but unlike sys.argv they won't have been modified by the interpreter.

If you're on Linux, I'd suggest monkeying with the ~/.bash_history file or the shell history command, although I believe the command must finish executing before it's added to the shell history.
I started playing with:
import popen2
x,y = popen2.popen4("tail ~/.bash_history")
print x.readlines()
But I'm getting weird behavior where the shell doesn't seem to be completely flushing to the .bash_history file.

Here's how you can do it from within the Python program to get back the full command string. Since the command-line arguments are already handled once before it's sent into sys.argv, this is how you can reconstruct that string.
commandstring = '';
for arg in sys.argv:
if ' ' in arg:
commandstring += '"{}" '.format(arg);
else:
commandstring+="{} ".format(arg);
print(commandstring);
Example:
Invoking like this from the terminal,
./saferm.py sdkf lsadkf -r sdf -f sdf -fs -s "flksjfksdkfj sdfsdaflkasdf"
will give the same string in commandstring:
./saferm.py sdkf lsadkf -r sdf -f sdf -fs -s "flksjfksdkfj sdfsdaflkasdf"

I am just 10.5 years late to the party, but... here it goes how I have handled exactly the same issue as the OP, under Linux (as others have said, in Windows that info may be possible to retrieve from the system).
First, note that I use the argparse module to parse passed parameters. Also, parameters then are assumed to be passed either as --parname=2, --parname="text", -p2 or -p"text".
call = ""
for arg in sys.argv:
if arg[:2] == "--": #case1: longer parameter with value assignment
before = arg[:arg.find("=")+1]
after = arg[arg.find("=")+1:]
parAssignment = True
elif arg[0] == "-": #case2: shorter parameter with value assignment
before = arg[:2]
after = arg[2:]
parAssignment = True
else: #case3: #parameter with no value assignment
parAssignment = False
if parAssignment:
try: #check if assigned value is "numeric"
complex(after) # works for int, long, float and complex
call += arg + " "
except ValueError:
call += before + '"' + after + '" '
else:
call += arg + " "
It may not fully cover all corner cases, but it has served me well (it can even detect that a number like 1e-06 does not need quotes).
In the above, for checking whether value passed to a parameter is "numeric", I steal from this pretty clever answer.

I needed to replay a complex command line with multi-line arguments and values that look like options but which are not.
Combining an answer from 2009 and various comments, here is a modern python 3 version that works quite well on unix.
import sys
import shlex
print(sys.executable, " ".join(map(shlex.quote, sys.argv)))
Let's test:
$ cat << EOT > test.py
import sys
import shlex
print(sys.executable, " ".join(map(shlex.quote, sys.argv)))
EOT
then:
$ python test.py --foo 1 --bar " aha " --tar 'multi \
line arg' --nar '--prefix1 --prefix2'
prints:
/usr/bin/python test.py --foo 1 --bar ' aha ' --tar 'multi \
line arg' --nar '--prefix1 --prefix2'
Note that it got '--prefix1 --prefix2' quoted correctly and the multi-line argument too!
The only difference is the full python path.
That was all I needed.
Thank you for the ideas to make this work.
Update: here is a more advanced version of the same that replays desired env vars and also wraps the long output nicely with bash line breaks so that the output can be immediately pasted in forums and not needing to manually deal with breaking up long lines to avoid horizontal scrolling.
import os
import shlex
import sys
def get_orig_cmd(max_width=80, full_python_path=False):
"""
Return the original command line string that can be replayed
nicely and wrapped for 80 char width
Args:
- max_width: the width to wrap for. defaults to 80
- full_python_path: whether to replicate the full path
or just the last part (i.e. `python`). default to `False`
"""
cmd = []
# deal with critical env vars
env_keys = ["CUDA_VISIBLE_DEVICES"]
for key in env_keys:
val = os.environ.get(key, None)
if val is not None:
cmd.append(f"{key}={val}")
# python executable (not always needed if the script is executable)
python = sys.executable if full_python_path else sys.executable.split("/")[-1]
cmd.append(python)
# now the normal args
cmd += list(map(shlex.quote, sys.argv))
# split up into up to MAX_WIDTH lines with shell multi-line escapes
lines = []
current_line = ""
while len(cmd) > 0:
current_line += f"{cmd.pop(0)} "
if len(cmd) == 0 or len(current_line) + len(cmd[0]) + 1 > max_width - 1:
lines.append(current_line)
current_line = ""
return "\\\n".join(lines)
print(get_orig_cmd())
Here is an example that this function produced:
CUDA_VISIBLE_DEVICES=0 python ./scripts/benchmark/trainer-benchmark.py \
--base-cmd \
' examples/pytorch/translation/run_translation.py --model_name_or_path t5-small \
--output_dir output_dir --do_train --label_smoothing 0.1 --logging_strategy no \
--save_strategy no --per_device_train_batch_size 32 --max_source_length 512 \
--max_target_length 512 --num_train_epochs 1 --overwrite_output_dir \
--source_lang en --target_lang ro --dataset_name wmt16 --dataset_config "ro-en" \
--source_prefix "translate English to Romanian: " --warmup_steps 50 \
--max_train_samples 2001 --dataloader_num_workers 2 ' \
--target-metric-key train_samples_per_second --repeat-times 1 --variations \
'|--fp16|--bf16' '|--tf32' --report-metric-keys 'train_loss train_samples' \
--table-format console --repeat-times 2 --base-variation ''
Note, that it's super complex as one argument has multiple arguments as its value and it is multiline too.
Also note that this particular version doesn't rewrap single arguments - if any are longer than the requested width they remain unwrapped (by design).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Taming shlex.split() behaviour - python

Related

Python subprocess.checkoutput() try to run mvn command getting CalledProcessError: '...' returned non-zero exit status 255 [duplicate]

Bash script will not run with subprocess in Python

Prevent expansion of wildcards in non-quoted python script argument when running in UNIX environment

Python Subprocess: Unable to Escape Quotes

Full command line as it was typed

Categories

Resources