g++ in python - macro containing various characters can not be passed

g++ in python - macro containing various characters can not be passed - python

In my django python project I used to invoke my C++ script via g++:
os.system('g++ -std=c++0x mutualcepepe.cpp -D "threshold = ' + str(thresh) + '" -o mutualout')
"thresh" was a simple float variable. It worked but the idea of whole project changed a bit and now I want to pass a string containing different let's say "type" of characters.
I will show my problem on the example and in this case my macro "djangoname" (not a "threshold" anymore) is ">gi|111>gi|222>gi|333>gi|444".
Invocation:
os.system('g++ -std=c++0x mutualcepepe.cpp -D "djangoname = ' + str(filename2only) + '" -o mutualout')
Errors I get in the terminal:
mutualcepepe.cpp: In function ‘int main(int, char**)’:
<command-line>: 0:14: error: expected primary-expression before ‘>’ token
mutualcepepe.cpp:
99:30: note: in expansion of macro ‘djangoname’ string filename
to_string(djangoname);
^
<command-line>:0:15: error: ‘gi’ was not declared in this scope
mutualcepepe.cpp:99:30: note: in expansion of macro ‘djangoname’
string filename = to_string(djangoname);
I think the point is, that when g++ compilator "read" what the macro contains, it some kinda divides it, when it gets special character, or when after number it reads letter, because after that it treat it as a integer not a string data. So my question is, is it possible to pass in g++ the macro (or anyhow "string variable") containing "different type" of characters, in the way which g++ compiler will run without the problem?
I wondered about translation some "unconvinient" characters for other ones, and turned them back in c++ script, but I can't be sure what my macro will contain, that depend on users who will use my net app.
To be honest I have an idea to avoid it, but it is totally silly and connected with senseless opening new files and reading from them what take time.
Mabye I'm wrong and the problem has different nature, I hope You will be able to help me or give helpful advise.

You have to make the macro explicitly a string, like e.g.
os.system('g++ ... -Ddjangoname="' + ... + '" ...')
Note the placement of the double-quotes.

This problem has nothing to do with the shell or the compiler invocation, although I honestly think you would be well advised to use a different way to invoke the compiler from python, such as the [subprocess]1 module.
Somewhere in your C++ program you have:
string filename = to_string(djangoname);
You are using the -D option to effectively insert
#define djangoname >gi|111>gi|222>gi|333>gi|444
at the beginning of your program. That means that your declaration of filename will be:
string filename = to_string(>gi|111>gi|222>gi|333>gi|444);
which makes no sense at all, neither to C++ nor to me. Hence the error message, which is saying that you cannot start an expression with the > operator.
I don't think that is what you meant, but I have no idea what you really want to do.

os.system('g++ ...') doesn't directly starts the g++ process. It actually starts whatever is configured as the default shell (e.g. /bin/sh) and the command line is then interpreted by the shell.
To avoid this unnecessary hoop and its complications, you can directly execute g++ with Python's subprocess.Popen and its communicate() method. This allows you to pass command line arguments as an array.
For example :
import sys, subprocess
filename2only = '">gi|111>gi|222>gi|333>gi|444"'
args = [
'g++',
'-std=c++0x', 'mutualcepepe.cpp',
'-Ddjangoname=' + str(filename2only),
'-omutualout'
]
p = subprocess.Popen(args=args, bufsize=-1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
sys.stdout.write(stdout)
sys.stderr.write(stderr)
if p.returncode != 0: sys.stderr.write('failed with exit code: ' + str(p.returncode))
This passes a -Ddjangoname=">gi|111>gi|222>gi|333>gi|444" option to the compiler, which is equivalent to #define djangoname ">gi|111>gi|222>gi|333>gi|444".
Note that str(filename2only) isn't strictly necessary in this case, but it has the nice property of supporting any type of value (string, int or float).
You could, e.g. do :
filename2only = 12.3 to pass -Ddjangoname=12.3, which is equivalent to #define djangoname 12.3
filename2only = 'a b c' to pass -Ddjangoname=a b c, which is equivalent to #define djangoname a b c
filename2only = '"a b c"' to pass -Ddjangoname="a b c", which is equivalent to #define djangoname "a b c"
filename2only = '"a \\"b\\" c"' to pass -Ddjangoname="a \"b\" c", which is equivalent to #define djangoname "a \"b\" c", i.e. djangoname is then the string literal a "b" c, which contains double quotes!

Related

powershell: execute python code passed as arguments [duplicate]

In pwsh call the following:
Write-Host '{"drop_attr": "name"}'
Result ok:
{"drop_attr": "name"}
Now do the same via pwsh:
pwsh -Command Write-Host '{"drop_attr": "name"}'
Result is missing quotation marks and square brackets?
drop_attr: name

Update:
PowerShell 7.3.0 mostly fixed the problem, with selective exceptions on Windows, and it seems that in some version after 7.3.1 the fix will require opt-in - see this answer for details.
For cross-version, cross-edition code, the Native module discussed at the bottom may still be of interest.
Unfortunately, PowerShell's handling of passing arguments with embedded " chars. to external programs - which includes PowerShell's own CLI (pwsh) - is fundamentally broken (and always has been), up to at least PowerShell 7.2.x:
You need to manually \-escape " instances embedded in your arguments in order for them to be correctly passed through to external programs (which happens to be PowerShell in this case as well):
# Note: The embedded '' sequences are the normal and expected
# way to escape ' chars. inside a PowerShell '...' string.
# What is *unexpected* is the need to escape " as \"
# even though " can normally be used *as-is* inside a '...' string.
pwsh -Command ' ''{\"drop_attr\": \"name\"}'' '
Note that I'm assuming your intent is to pass a JSON string, hence the inner '' ... '' quoting (escaped single quotes), which ensures that pwsh ultimately sees a single-quoted string ('...'). (No need for an explicit output command; PowerShell implicitly prints command and expression output).
Another way to demonstrate this on Windows is via the standard choice.exe utility, repurposed to simply print its /m (message) argument (followed by verbatim [Y,N]?Y):
# This *should* preserve the ", but doesn't as of v7.2
PS> choice /d Y /t 0 /m '{"drop_attr": "name"}'
{drop_attr: name} [Y,N]?Y # !! " were REMOVED
# Only the extra \-escaping preserves the "
PS> choice /d Y /t 0 /m '{\"drop_attr\": \"name\"}'
{"drop_attr": "name"} [Y,N]?Y # OK
Note that from inside PowerShell, you can avoid the need for \-escaping, if you call pwsh with a script block ({ ... }) - but that only works when calling PowerShell itself, not other external programs:
# NOTE: Works from PowerShell only.
pwsh -Command { '{"drop_attr": "name"}' }
Background info on PowerShell's broken handling of arguments with embedded " in external-program calls, as of PowerShell 7.2.1:
This GitHub docs issue contains background information.
GitHub issue #1995 discusses the problem and the details of the broken behavior as well as manual workarounds are summarized in this comment; the state of the discussion as of PowerShell [Core] 7 seems to be:
A fix is being considered as an experimental feature, which may become an official feature, in v7.3 at the earliest. Whether it will become a regular feature - i.e whether the default behavior will be fixed or whether the fix will require opt-in or even if the feature will become official at all - remains to be seen.
Fixing the default behavior would substantially break backward compatibility; as of this writing, this has never been allowed, but a discussion as to whether to allow breaking changes in the future and how to manage them has begun: see GitHub issue #13129.
See GitHub PR #14692 for the relevant experimental feature, which, however, as of this writing is missing vital accommodations for batch files and msiexec-style executables on Windows - see GitHub issue #15143.
In the meantime, you can use the PSv3+ ie helper function from the Native module (in PSv5+, install with Install-Module Native from the PowerShell Gallery), which internally compensates for all broken behavior and allows passing arguments as expected; e.g.,
ie pwsh -Command ' ''{"drop_attr": "name"}'' ' would then work properly.

Another way. Are you in Windows or Unix?
pwsh -c "[pscustomobject]#{drop_attr='name'} | convertto-json -compress"
{"drop_attr":"name"}

Another way is to use "encoded commands".
> $cmd1 = "Write-Host '{ ""description"": ""Test program"" }'"
> pwsh -encoded ([Convert]::ToBase64String([Text.Encoding]::Unicode.GetBytes($cmd1)))
{ "description": "Test program" }

Taming shlex.split() behaviour

There are other questions on SO that get close to answering mine, but I have a very specific use case that I have trouble solving. Consider this:
from asyncio import create_subprocess_exec, run
async def main():
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
await proc.wait()
run(main())
This causes trouble, because program.exe is called with these arguments:
['C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
That is, the double backslash is no longer there, as shlex.split() removes it. Of course, I could instead (as other answers suggest) do this:
proc = await create_subprocess_exec(*command, posix=False)
But then program.exe is effectively called with these arguments:
['"C:\\some folder"', '-o"\\\\server\\share\\some folder"', '"a \\"', 'quote\\""']
That's also no good, because now the double quotes have become part of the content of the first parameter, where they don't belong, even though the second parameter is now fine. The third parameters has become a complete mess.
Replacing backslashes with forward slashes, or removing quotes with regular expressions all don't work for similar reasons.
Is there some way to get shlex.split() to leave double backslashes before server names alone? Or just at all? Why does it remove them in the first place?
Note that, by themselves these are perfectly valid commands (on Windows and Linux respectively anyway):
program.exe "C:\some folder" -o"\\server\share\some folder"
echo "hello \"world""
And even if I did detect the OS and used posix=True/False accordingly, I'd still be stuck with the double quotes included in the second argument, which they shouldn't be.

For now, I ended up with this (arguably a bit of a hack):
from os import name as os_name
from shlex import split
def arg_split(args, platform=os_name):
"""
Like calling shlex.split, but sets `posix=` according to platform
and unquotes previously quoted arguments on Windows
:param args: a command line string consisting of a command with arguments,
e.g. r'dir "C:\Program Files"'
:param platform: a value like os.name would return, e.g. 'nt'
:return: a list of arguments like shlex.split(args) would have returned
"""
return [a[1:-1].replace('""', '"') if a[0] == a[-1] == '"' else a
for a in (split(args, posix=False) if platform == 'nt' else split(args))]
Using this instead of shlex.split() gets me what I need, while not breaking UNC paths. However, I'm sure there's some edge cases where correct escaping of double quotes isn't correctly handled, but it has worked for all my test cases and seems to be working for all practical cases so far. Use at your own risk.
#balmy made the excellent observation that most people should probably just use:
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_shell(command)
Instead of
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
proc = await create_subprocess_exec(*command)
However, note that this means:
it's not easy to check or replace individual arguments
you have the problem that always comes with using create_subprocess_exec if part of your command is based on external input, someone can inject code; in the words of the documentation (https://docs.python.org/3/library/asyncio-subprocess.html):
It is the application’s responsibility to ensure that all
whitespace and special characters are quoted appropriately to avoid
shell injection vulnerabilities. The shlex.quote() function can be
used to properly escape whitespace and special shell characters in
strings that are going to be used to construct shell commands.
And that's still a problem, as quote() also doesn't work correctly for Windows (by design).
I'll leave the question open for a bit, in case someone wishes to point out why the above is a really bad idea, or if someone has a better one.

As far as I can tell, the shlex module is the wrong tool if you are dealing with the Windows shell.
The first paragraph of the docs says (my italics):
The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell.
Admittedly, that talks about just one class, not the entire module. Later, the docs for the quote function say (boldface in the original, this time):
Warning The shlex module is only designed for Unix shells.
To be honest, I'm not sure what the non-Posix mode is supposed to be compatible with. It could be, but this is just me guessing, that the original versions of shlex parsed a syntax of its own which was not quite compatible with anything else, and then Posix mode got added to actually be compatible with Posix shells. This mailing list thread, including this mail from ESR seems to support this.

For the -o parameter, but the leading " at the start of it not in the middle, and double the backslashes
Then use posix=True
import shlex
command = r'program.exe "C:\some folder" -o"\\server\share\some folder" "a \"quote\""'
print( "Original command Posix=True", shlex.split(command, posix=True) )
command = r'program.exe "C:\some folder" "-o\\\\server\\share\\some folder" "a \"quote\""'
print( "Updated command Posix=True", shlex.split(command, posix=True) )
result:
Original command Posix=True ['program.exe', 'C:\\some folder', '-o\\server\\share\\some folder', 'a "quote"']
Updated command Posix=True ['program.exe', 'C:\\some folder', '-o\\\\server\\share\\some folder', 'a "quote"']
The backslashes are still double in the result, but that's standard Python representation of a \ in a string.

Custom bash function not feeding python multiple args [duplicate]

This question already has answers here:
When to wrap quotes around a shell variable?
(5 answers)
How to pass all arguments passed to my Bash script to a function of mine? [duplicate]
(7 answers)
Closed 3 years ago.
The problem:
I'm writing a program that performs actions in python based on links, and possibly expanding it to do things beyond that. This program is meant to be quickly used through bash. So, I'm using a custom function to do that.
function youtube() {python3 <youtube program path here> $1}
As for the python file; I'm using sys, os, and re in order to make it function. sys, in order to use both sys.exit() and var = sys.argv[<argNum>], the former in order to exit the program using custom exceptions, like error.searchError() or error.usageError(), and the later for actualling using the arguments from the command itself. os is just for os.system('{}'.format(<your command here>)). And re is for removing the spaces from the second argument, where my problem lies, and replacing them with '+', as per query = re.sub(' ', '+', query).
Now, as for the problem itself. As I mentioned before, the problem lies with the second bash argument, or sys.argv[2]. With sys.argv[0] being the python file, and sys.argv[1] being the option, in this case being -s.
sys.argv[2] is meant to be the actual youtube search query. But, according to whenever I use the command with all three arguments, youtube -s Hi this is a test., I get the following output, as per the custom error I made: No search query provided!. This only happens when python excepts an IndexError, which means that the program is not receiving the second argument from bash or zsh. What is actually supposed to happen, when the second arguments does exist, is:
os.system('open https://www.youtube.com/results?search_query=Hi+this+is+a+test.')
Which opens that link in my default browser. I have tried to add $2 to the custom function, and various ways of entering the second argument through the python source itself, including using a x = input('Search Query: ') statement. But that isn't optimal for what I'm doing.
The code:
The following is the source code for all the nonsense I just typed out.
The custom function:
function youtube() {python3 <python program path here> $1}
For those that have no idea what this means (i.e.; people that don't know much (or anything) about bash coding); The function method creates a bash object, in this case, youtube(). As for the code in the brackets ({}), this uses the function python3, which just pushes the program in argument 0 to the python 3.x interpreter, to open <python program path here>, which is a placeholder for the actual path of the program. As for $1, this is a variable that always equals the text inputted after the function name.
The custom errors:
class error:
def usageError():
usageError = '''Usage: youtube [-s] [<search_query>]
Help: Performs actions related to https://www.youtube.com
Options:
-s Opens https://www.youtube.com/results?search_query=<your query here>'''
print(usageError)
sys.exit()
def searchError():
searchError = 'No search query provided!'
print(searchError)
sys.exit()
Is this irrelevant? I'm not sure, but I'm putting it in anyway! Now, if you don't understand it, the following should explain it.
The error class contains all of the customs errors for this program, ok? Ok, so you get that, but what do these functions do? usageError is raised when argument 1 simply doesn't exist, and prints the usage information to the terminal. Then sys.exit()s the program, basically the equivalent of hitting Alt+f4 in video game. searchError, on the other hand, only happens if argument 2 doesn't exist, meaning there is no search query. It then tells you that you're stupid, and will need to actually enter your query for it to work.
Well, maybe not that exactly, but you get the point.
The Juicy Bits:
option = ''
try: option = sys.argv[1];
except IndexError: raise error.usageError()
if option == '-s':
try:
query = sys.argv[2]
query = re.sub(' ', '+', query)
os.system('open https://www.youtube.com/results?search_query={}'.format(query))
except IndexError: raise error.searchError();
Just to explain; First, the program creates the variable option and then sets it to an empty string. Then, it tries to set option to argument 1, or the option. If argument 1 doesn't exist, it raises the error error.usageError, as explained in The Custom Errors. After that, the program tries to create the variable query, and set it to argument 2, then replace all of the spaces in query with '+' signs. If all of that succeeds to happen, it then loads up the youtube search in your default browser. If not, it raises the error error.searchError().
The Edits
Edit 1. The error was in The Custom Function. Where I should have had an $#, I had an $1. As Jeshurun Roach explains in his answer, $1 only holds the argument 1, and no other arguments. While $# contains all variables.
function youtube() {python3 <python program path here> $#}

$1 refers to the first argument passed into the function. in bash, spaces delimit arguments. so in your example youtube -s Hi this is a test.,
$1 is -s,
$2 is Hi,
$3 is this etc...
What you're looking for is the $# symbol. This value stands for all the arguments.
But just plugging in $# instead of $1 won't fix all your problems. in your python script, each argument will be broken up again by spaces, just like the bash function.
To fix this, you can put quotes around the text after the flag like so: youtube -s 'Hi this is a test.'.

If you call your program like this: youtube -s something cool, then sys.argv[2] is going to be "something".
I'd suggest wrapping your query in quotes. For example youtube -s "something cool".

Returning a text string from fortran subroutine to python using f2py

I got this simple module in Fortran:
test.f90:
module test
implicit none
contains
subroutine foo(chid)
implicit none
character(len=*),intent(out):: chid ! char. identifier
chid = "foo"
end subroutine foo
end module test
program bar
use test
character(len=20) text
call foo(text)
write(*,*) text
end program bar
compiling it (on windows) gfortran test.f90 -o test.exe and running it gives, as expected:
foo
I can also compile it using f2py: c:\Python27\python.exe c:\Python27\Scripts\f2py.py --fcompiler=gnu95 --compiler=mingw32 -c -m test \test.f90
When I run this Python script:
test.py:
from id_map import test
print "This should be 'foo':"
print test.foo()
print "was it?"
I get the following output:
This should be 'foo':
was it?
As you can see, the string that should be "foo" is empty. Why is this?

The problem here is with the len=* character declaration. you're telling the fortran compiler to accept any length string which was input. That's great except when you wrap it with f2py and have intent out, f2py needs to guess what length string to allocate and pass to your function and it has no way of doing that. (After all, what length string should it assume?).
It looks to me like f2py assumes a 0 length string. When you assign a bigger string to a smaller string in fortran, the result gets truncated (although I would need to go back and read the standard to find out if that could result memory errors). In any event, it looks like that's what the gnu compiler is doing.
If you change it to len=3, it works.
Alternatively, doing something like this can make this work for f2py without modifying the original code (except for some comments):
!f2py character(len=256),intent(out):: chid
character(len=*),intent(out):: chid ! char. identifier

Full command line as it was typed

I want to get the full command line as it was typed.
This:
" ".join(sys.argv[:])
doesn't work here (deletes double quotes). Also I prefer not to rejoin something that was parsed and split.
Any ideas?

You're too late. By the time that the typed command gets to Python your shell has already worked its magic. For example, quotes get consumed (as you've noticed), variables get interpolated, etc.

In a Unix environment, this is not generally possible...the best you can hope for is the command line as passed to your process.
Because the shell (essentially any shell) may munge the typed command line in several ways before handing it to the OS for execution.

*nix
Look at the initial stack layout (Linux on i386) that provides access to command line and environment of a program: the process sees only separate arguments.
You can't get the command-line as it was typed in the general case. On Unix, the shell parses the command-line into separate arguments and eventually execv(path, argv) function that invokes the corresponding syscall is called. sys.argv is derived from argv parameter passed to the execve() function. You could get something equivalent using " ".join(map(shlex.quote, sys.argv)) though you shouldn't need to e.g., if you want to restart the script with slightly different command-line parameters then sys.argv is enough (in many cases), see Is it possible to set the python -O (optimize) flag within a script?
There are some creative (non-practical) solutions:
attach the shell using gdb and interrogate it (most shells are capable of repeating the same command twice)—you should be able to get almost the same command as it was typed— or read its history file directly if it is updated before your process exits
use screen, script utilities to get the terminal session
use a keylogger, to get what was typed.
Windows
On Windows the native CreateProcess() interface is a string but python.exe still receives arguments as a list. subprocess.list2cmdline(sys.argv) might help to reverse the process. list2cmdline is designed for applications using the same
rules as the MS C runtime—python.exe is one of them. list2cmdline doesn't return the command-line as it was typed but it returns a functional equivalent in this case.
On Python 2, you might need GetCommandLineW(), to get Unicode characters from the command line that can't be represented in Windows ANSI codepage (such as cp1252).

As mentioned, this probably cannot be done, at least not reliably. In a few cases, you might be able to find a history file for the shell (e.g. - "bash", but not "tcsh") and get the user's typing from that. I don't know how much, if any, control you have over the user's environment.

On Linux there is /proc/<pid>/cmdline that is in the format of argv[] (i.e. there is 0x00 between all the lines and you can't really know how many strings there are since you don't get the argc; though you will know it when the file runs out of data ;).
You can be sure that that commandline is already munged too since all escaping/variable filling is done and parameters are nicely packaged (no extra spaces between parameters, etc.).

You can use psutil that provides a cross platform solution:
import psutil
import os
my_process = psutil.Process( os.getpid() )
print( my_process.cmdline() )
If that's not what you're after you can go further and get the command line of the parent program(s):
my_parent_process = psutil.Process( my_process.ppid() )
print( my_parent_process.cmdline() )
The variables will still be split into its components, but unlike sys.argv they won't have been modified by the interpreter.

If you're on Linux, I'd suggest monkeying with the ~/.bash_history file or the shell history command, although I believe the command must finish executing before it's added to the shell history.
I started playing with:
import popen2
x,y = popen2.popen4("tail ~/.bash_history")
print x.readlines()
But I'm getting weird behavior where the shell doesn't seem to be completely flushing to the .bash_history file.

Here's how you can do it from within the Python program to get back the full command string. Since the command-line arguments are already handled once before it's sent into sys.argv, this is how you can reconstruct that string.
commandstring = '';
for arg in sys.argv:
if ' ' in arg:
commandstring += '"{}" '.format(arg);
else:
commandstring+="{} ".format(arg);
print(commandstring);
Example:
Invoking like this from the terminal,
./saferm.py sdkf lsadkf -r sdf -f sdf -fs -s "flksjfksdkfj sdfsdaflkasdf"
will give the same string in commandstring:
./saferm.py sdkf lsadkf -r sdf -f sdf -fs -s "flksjfksdkfj sdfsdaflkasdf"

I am just 10.5 years late to the party, but... here it goes how I have handled exactly the same issue as the OP, under Linux (as others have said, in Windows that info may be possible to retrieve from the system).
First, note that I use the argparse module to parse passed parameters. Also, parameters then are assumed to be passed either as --parname=2, --parname="text", -p2 or -p"text".
call = ""
for arg in sys.argv:
if arg[:2] == "--": #case1: longer parameter with value assignment
before = arg[:arg.find("=")+1]
after = arg[arg.find("=")+1:]
parAssignment = True
elif arg[0] == "-": #case2: shorter parameter with value assignment
before = arg[:2]
after = arg[2:]
parAssignment = True
else: #case3: #parameter with no value assignment
parAssignment = False
if parAssignment:
try: #check if assigned value is "numeric"
complex(after) # works for int, long, float and complex
call += arg + " "
except ValueError:
call += before + '"' + after + '" '
else:
call += arg + " "
It may not fully cover all corner cases, but it has served me well (it can even detect that a number like 1e-06 does not need quotes).
In the above, for checking whether value passed to a parameter is "numeric", I steal from this pretty clever answer.

I needed to replay a complex command line with multi-line arguments and values that look like options but which are not.
Combining an answer from 2009 and various comments, here is a modern python 3 version that works quite well on unix.
import sys
import shlex
print(sys.executable, " ".join(map(shlex.quote, sys.argv)))
Let's test:
$ cat << EOT > test.py
import sys
import shlex
print(sys.executable, " ".join(map(shlex.quote, sys.argv)))
EOT
then:
$ python test.py --foo 1 --bar " aha " --tar 'multi \
line arg' --nar '--prefix1 --prefix2'
prints:
/usr/bin/python test.py --foo 1 --bar ' aha ' --tar 'multi \
line arg' --nar '--prefix1 --prefix2'
Note that it got '--prefix1 --prefix2' quoted correctly and the multi-line argument too!
The only difference is the full python path.
That was all I needed.
Thank you for the ideas to make this work.
Update: here is a more advanced version of the same that replays desired env vars and also wraps the long output nicely with bash line breaks so that the output can be immediately pasted in forums and not needing to manually deal with breaking up long lines to avoid horizontal scrolling.
import os
import shlex
import sys
def get_orig_cmd(max_width=80, full_python_path=False):
"""
Return the original command line string that can be replayed
nicely and wrapped for 80 char width
Args:
- max_width: the width to wrap for. defaults to 80
- full_python_path: whether to replicate the full path
or just the last part (i.e. `python`). default to `False`
"""
cmd = []
# deal with critical env vars
env_keys = ["CUDA_VISIBLE_DEVICES"]
for key in env_keys:
val = os.environ.get(key, None)
if val is not None:
cmd.append(f"{key}={val}")
# python executable (not always needed if the script is executable)
python = sys.executable if full_python_path else sys.executable.split("/")[-1]
cmd.append(python)
# now the normal args
cmd += list(map(shlex.quote, sys.argv))
# split up into up to MAX_WIDTH lines with shell multi-line escapes
lines = []
current_line = ""
while len(cmd) > 0:
current_line += f"{cmd.pop(0)} "
if len(cmd) == 0 or len(current_line) + len(cmd[0]) + 1 > max_width - 1:
lines.append(current_line)
current_line = ""
return "\\\n".join(lines)
print(get_orig_cmd())
Here is an example that this function produced:
CUDA_VISIBLE_DEVICES=0 python ./scripts/benchmark/trainer-benchmark.py \
--base-cmd \
' examples/pytorch/translation/run_translation.py --model_name_or_path t5-small \
--output_dir output_dir --do_train --label_smoothing 0.1 --logging_strategy no \
--save_strategy no --per_device_train_batch_size 32 --max_source_length 512 \
--max_target_length 512 --num_train_epochs 1 --overwrite_output_dir \
--source_lang en --target_lang ro --dataset_name wmt16 --dataset_config "ro-en" \
--source_prefix "translate English to Romanian: " --warmup_steps 50 \
--max_train_samples 2001 --dataloader_num_workers 2 ' \
--target-metric-key train_samples_per_second --repeat-times 1 --variations \
'|--fp16|--bf16' '|--tf32' --report-metric-keys 'train_loss train_samples' \
--table-format console --repeat-times 2 --base-variation ''
Note, that it's super complex as one argument has multiple arguments as its value and it is multiline too.
Also note that this particular version doesn't rewrap single arguments - if any are longer than the requested width they remain unwrapped (by design).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.