Weird issue during parsing a path in Python - python

Given this variables:
cardIP="00.00.00.00"
dir="D:\\TestingScript"
mainScriptPath='"\\\\XX\\XX\\XX\\Testing\\SNMP Tests\\Python Script\\MainScript.py"'
When using subprocess.call("cmd /c "+mainScriptPath+" "+dir+" "+cardIP) and print(mainScriptPath+" "+dir+" "+cardIP) I get this:
"\\XX\XX\XX\Testing\SNMP Tests\Python Script\MainScript.py" D:\TestingScript 00.00.00.00
which is what I wanted, OK.
But now, I want the 'dir' variable to be also inside "" because I am going to use dir names with spaces.
So, I do the same thing I did with 'mainScriptPath':
cardIP="00.00.00.00"
dir='"D:\\Testing Script"'
mainScriptPath='"\\XX\\XX\\XX\\Testing\\SNMP Tests\\Python Script\\MainScript.py"'
But now, when I'm doing print(mainScriptPath+" "+dir+" "+cardIP) I get:
"\\XX\XX\XX\Testing\SNMP Tests\Python Script\MainScript.py" "D:\Testing Script" 00.00.00.00
Which is great, but when executed in subprocess.call("cmd /c "+mainScriptPath+" "+dir+" "+cardIP) there is a failure with 'mainScriptPath' variable:
'\\XX\XX\XX\Testing\SNMP' is not recognized as an internal or external command...
It doesn't make sense to me.
Why does it fail?
In addition, I tried also:
dir="\""+"D:\\Testing Script"+"\""
Which in 'print' acts well but in 'subprocess.call' raise the same problem.
(Windows XP, Python3.3)

Use proper string formatting, use single quotes for the formatting string and simply include the quotes:
subprocess.call('cmd /c "{}" "{}" "{}"'.format(mainScriptPath, dir, cardIP))
The alternative is to pass in a list of arguments and have Python take care of quoting for you:
subprocess.call(['cmd', '/c', mainScriptPath, dir, cardIP])
When the first argument to .call() is a list, Python uses the process described under the section Converting an argument sequence to a string on Windows.
On Windows, an args sequence is converted to a string that can be
parsed using the following rules (which correspond to the rules used
by the MS C runtime):
Arguments are delimited by white space, which is either a space or a tab.
A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted
string can be embedded in an argument.
A double quotation mark preceded by a backslash is interpreted as a literal double quotation mark.
Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
If backslashes immediately precede a double quotation mark, every pair of backslashes is interpreted as a literal backslash. If the
number of backslashes is odd, the last backslash escapes the next
double quotation mark as described in rule 3.
This means that passing in your arguments as a sequence makes Python worry about all the nitty gritty details of escaping your arguments properly, including handling embedded backslashes and double quotes.

Related

Replace double backslash in string literal with single backslash

I'm trying to print a string that contains double backslash (one to escape the other) such that only one of the backslashes are printed. I thought this would happen automatically, but I must be missing some detail.
I have this little snippet:
for path in self.tokenized:
pdb.set_trace()
print(self.tokenized[path])
When I debug with that pdb.set_trace() I can see that my strings have double backslashes, and then I enter continue to print the remainder and it prints that same thing.
> /home/kendall/Development/path-parser/tokenize_custom.py(82)print_tokens()
-> print(self.tokenized[path])
(Pdb) self.tokenized[path]
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
(Pdb) c
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
Note that I'm writing a parser that parses Windows file paths -- thus the backslashes.
This is what it looks like to run the program:
kendall#kendall-XPS-8500:~/Development/path-parser$ python main.py -f c:\\home\\kendall\\Desktop
The issue you are having is that you're printing a list, which only knows one way to stringify its contents: repr. repr is only designed for debugging use. Idiomatically, when possible (classes are a notable exception), it outputs a syntactically valid python expression that can be directly fed into the interpretter to reproduce the original object - hence the escaped backslashes.
Instead, you need to loop through each list, and print each string individually.
You can use str.join() to do this for you.
To get the exact same output, minus the doubled backslashes, you'd need to do something like:
print("[{0}]".format(", ".join(self.tokenized[path])))

Why does PyCharm use double backslash to indicate escaping?

For instance, I write a normal string and another "abnormal" string like this:
Now I debug it, finding that in the debug tool, the "abnormal" string will be shown like this:
Here's the question:
Why does PyCharm show double backslashes instead of a single backslash? As is known to all, \' means '. Is there any trick?
What I believe is happening is the ' in your c variable string needs to be escaped and PyCharm knows this at runtime, given you have surrounded the full string in " (You'll notice in the debugger, your c string is now surrounded by '). To escape the single quote it changes it to \', but now, there is a \ in your string that needs escaping, and to escape \ in Python, you type \\.
EDIT
Let me see if I can explain the order of escaping going on here.
"u' this is not normal" is assigned to c
PyCharm converts the string in c to 'u' this is not normal' at runtime. See how, without escaping the 2nd ', your string is now closed off right after u.
PyCharm escapes the ' automatically for you by adding a slash before it. The string is now 'u\' this is not normal'. At this point, everything should be fine but PyCharm may be taking an additional step for safety.
PyCharm then escapes the slash it just added to your string, leaving the string as: 'u\\' this is not normal'.
It is likely a setting inside PyCharm. Does it cause an actual issue with your code?

Why is there a difference between using a list or a string with subprocess.Popen and quotes on the commandline

When running the following script:
import os
import sys
import subprocess
if len(sys.argv) > 1:
print sys.argv[1]
sys.exit(0)
commandline = [sys.executable]
commandline.append(os.path.realpath(__file__))
commandline.append('"test"')
p = subprocess.Popen(commandline)
p.wait()
p = subprocess.Popen(" ".join(commandline))
p.wait()
It returns the following output
"test"
test
Why is there a difference between providing a list of arguments or one string?
This is run on a windows machine and you will see backslashes before the quotes on the command in the task manager.
I expected the same result in both runs.
Edit:
The problem is not so much in the automatic escaping of spaces (I find that is the programmers responsibility), but more about my quotes being escaped or not in the process commandline.
These are the two subprocesses taken from the windows task manager:
A different non-python process parses the first commandline with the backslashes, which brings unexpected behaviour. How can I have it so that I can use a list and not have the quotes escaped on the commandline?
Edit2:
The quotes are definitely added by python. If you run the following:
import subprocess
commandline = ['echo']
commandline.append('"test"')
commandline.append('>')
commandline.append(r'D:\test1.txt')
p = subprocess.Popen(commandline, shell=True)
p.wait()
commandline = 'echo "test" > D:\\test2.txt'
p = subprocess.Popen(commandline, shell=True)
p.wait()
Then you will see that the outputs are
D:\test1.txt:
\"test\"
D:\test2.txt:
"test"
The string API is dangerous since it might change the meaning of arguments. Example: You want to execute C:\Program Files\App\app.exe. If you use the string version of Popen(), you get an error:
C:\Program: Error 13
What happens is that with the string API, Python will split the input by spaces and try to start the command C:\Program with the single argument Files\App\app.exe. To fix this, you need to quote properly. Which gets you in quote hell when you have quotes in your arguments (i.e. when you really want to pass "test" as an argument with the quotes).
To solve this (and other subtle) bugs, there is the list API where each element of the list will become a single item passed to the OS without any modifications. With list API, you get what you see. If you quote an argument with the list API, it will be passed on with the quotes. If there are spaces, they won't split your argument. If there are arbitrary other special characters (like * or %), they will all be passed on.
[EDIT] As usual, things are much more complex on Windows. From the Python documentation for the subprocess module:
17.1.5.1. Converting an argument sequence to a string on Windows
On Windows, an args sequence is converted to a string that can be parsed using the following rules (which correspond to the rules used by the MS C runtime):
Arguments are delimited by white space, which is either a space or a tab.
A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
A double quotation mark preceded by a backslash is interpreted as a literal double quotation mark.
Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
If backslashes immediately precede a double quotation mark, every pair of backslashes is interpreted as a literal backslash. If the number of backslashes is odd, the last backslash escapes the next double quotation mark as described in rule 3.
So the backslashes are there because MS C runtime wants it that way.

Why do I need 4 backslashes in a Python path?

When I'm using Python 3 to launch a program via subprocess.call(), why do I need 4 backslashes in paths?
This is my code:
cmd = 'C:\\\\Windows\\\\System32\\\\cmd.exe'
cmd = shlex.split(cmd)
subprocess.call(cmd)
When I examine the command line of the launched cmd.exe instance with Task Manager, it shows the path correctly with only one backslash separating each path.
Because of this, I need this on Windows to make the paths work:
if platform.platform().startswith('Windows'):
cmd = cmd.replace(os.sep, os.sep + os.sep)
is there a more elegant solution?
Part of the problem is that you're using shlex, which implements escaping rules used by Unix-ish shells. But you're running on Windows, whose command shells use different rules. That accounts for one level of needing to double backslashes (i.e., to worm around something shlex does that you didn't need to begin with).
That you're using a regular string instead of a raw string (r"...") accounts for the other level of needing to double backslashes, and 2*2 = 4. QED ;-)
This works fine on Windows:
cmd = subprocess.call(r"C:\Windows\System32\cmd.exe")
By the way, read the docs for subprocess.Popen() carefully: the Windows CreateProcess() API call requires a string for an argument. When you pass a sequence instead, Python tries to turn that sequence into a string, via rules explained in the docs. When feasible, it's better - on Windows - to pass the string you want directly.
When you are creating the string, you need to double each backslash for escaping, and then when the string is passed to your shell, you need to double each backslash again. You can cute the backslashes in half by using a raw string:
cmd = r'C:\\Windows\\System32\\cmd.exe'
\ has special meaning - you're using it as part of an escape sequence. Double up the backslashes, and you have a literal backslash \.
The caveat is that, with only one pair of escaped backslashes, you still have only one literal backslash. You need to escape that backslash, too.
Alternatively, why not just use os.sep instead? You'll be able to ensure your code is more portable (since it'll use the system-specific separator), and you won't have to deal [directly] with escaping backslashes.
As John points out 4 slashes isn't necessary when accessing files locally.
One place where 4 slashes is necessary is when connecting to (generally windows) servers over SMB or CIFS.
Normally you would just use \servername\share\
But each one of those slashes needs to be escaped. So thus the 4 slashes before servernames.
you could also use subprocess.call()
import subprocess as sp
sp.call(['c:\\program files\\<path>'])

Python argparse argument with quotes

Is there any way I can tell argparse to not eat quotation marks?
For example, When I give an argument with quotes, argparse only takes what's inside of the quotes as the argument. I want to capture the quotation marks as well (without having to escape them on the command line.)
pbsnodes -x | xmlparse -t "interactive-00"
produces
interactive-00
I want
"interactive-00"
I think it is the shell that eats them, so python will actually never see them. Escaping them on the command line may be your only option.
If it's the \"backslash\" style escaping you don't like for some reason, then this way should work instead:
pbsnodes -x | xmlparse -t '"interactive-00"'
Command line is parsed into argument vector by python process itself. Depending on how python is built, that would be done by some sort of run-time library. For Windows build, that would be most likely MS Visual C++ runtime library. More details about how it parses command line can be found in Visual C++ documentation: Parsing C++ command-Line arguments.
In particular:
A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (").
If you want to see unprocessed command line, on Windows you can do this:
import win32api
print(win32api.GetCommandLine())

Categories

Resources