Replace double backslash in string literal with single backslash - python

I'm trying to print a string that contains double backslash (one to escape the other) such that only one of the backslashes are printed. I thought this would happen automatically, but I must be missing some detail.
I have this little snippet:
for path in self.tokenized:
pdb.set_trace()
print(self.tokenized[path])
When I debug with that pdb.set_trace() I can see that my strings have double backslashes, and then I enter continue to print the remainder and it prints that same thing.
> /home/kendall/Development/path-parser/tokenize_custom.py(82)print_tokens()
-> print(self.tokenized[path])
(Pdb) self.tokenized[path]
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
(Pdb) c
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
Note that I'm writing a parser that parses Windows file paths -- thus the backslashes.
This is what it looks like to run the program:
kendall#kendall-XPS-8500:~/Development/path-parser$ python main.py -f c:\\home\\kendall\\Desktop

The issue you are having is that you're printing a list, which only knows one way to stringify its contents: repr. repr is only designed for debugging use. Idiomatically, when possible (classes are a notable exception), it outputs a syntactically valid python expression that can be directly fed into the interpretter to reproduce the original object - hence the escaped backslashes.
Instead, you need to loop through each list, and print each string individually.
You can use str.join() to do this for you.
To get the exact same output, minus the doubled backslashes, you'd need to do something like:
print("[{0}]".format(", ".join(self.tokenized[path])))

Related

Why do some functions in Python change \ to \\

When I declare pass a file to shutil.copy as
shutil.copy(r'i:\myfile.txt', r'UNC to where I want it to go')
I get an error
No such file or directory 'i:\\myfile.txt'
I've experienced this problem before with the os module when I have a UNC path. Usually I just get frustrated enough that I forget using the os module and just put the file path into with open() or whatever I'm using it for.
It is my understanding that placing an r before '' is supposed to cause python to ignore escape characters and treat them as string literals, but the behavior I'm seeing leads me to believe that this is not the case. For some reason it takes the \ and changes it to \\.
I've seen this when using os.path.join where the \\ at the beginning of the the UNC Path gets turned into \\\\.
What is the best way to pass a string literal to ensure that all escape characters are ignored and the string is preserved?
Your string is not being modified by Python. It's the representation of your string that's coming out differently.
When the error is printed, Python calls repr() to print the value. This function will
Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a repr() method.
This can be very nice when debugging: if I paste that string (quotes, escapes, and all) into the REPL I'll get the string in memory that you were working with. I can use this to interactively try your copy command, maybe tweaking the string a bit.
If you want to see your string in a printed form, you could do
source_path = r'i:\myfile.txt'
target_path = r'UNC to where I want it to go'
print(f'Copying {source_path} to {target_path}...')
shutil.copy(source_path, target_path)

Why is there a difference between using a list or a string with subprocess.Popen and quotes on the commandline

When running the following script:
import os
import sys
import subprocess
if len(sys.argv) > 1:
print sys.argv[1]
sys.exit(0)
commandline = [sys.executable]
commandline.append(os.path.realpath(__file__))
commandline.append('"test"')
p = subprocess.Popen(commandline)
p.wait()
p = subprocess.Popen(" ".join(commandline))
p.wait()
It returns the following output
"test"
test
Why is there a difference between providing a list of arguments or one string?
This is run on a windows machine and you will see backslashes before the quotes on the command in the task manager.
I expected the same result in both runs.
Edit:
The problem is not so much in the automatic escaping of spaces (I find that is the programmers responsibility), but more about my quotes being escaped or not in the process commandline.
These are the two subprocesses taken from the windows task manager:
A different non-python process parses the first commandline with the backslashes, which brings unexpected behaviour. How can I have it so that I can use a list and not have the quotes escaped on the commandline?
Edit2:
The quotes are definitely added by python. If you run the following:
import subprocess
commandline = ['echo']
commandline.append('"test"')
commandline.append('>')
commandline.append(r'D:\test1.txt')
p = subprocess.Popen(commandline, shell=True)
p.wait()
commandline = 'echo "test" > D:\\test2.txt'
p = subprocess.Popen(commandline, shell=True)
p.wait()
Then you will see that the outputs are
D:\test1.txt:
\"test\"
D:\test2.txt:
"test"
The string API is dangerous since it might change the meaning of arguments. Example: You want to execute C:\Program Files\App\app.exe. If you use the string version of Popen(), you get an error:
C:\Program: Error 13
What happens is that with the string API, Python will split the input by spaces and try to start the command C:\Program with the single argument Files\App\app.exe. To fix this, you need to quote properly. Which gets you in quote hell when you have quotes in your arguments (i.e. when you really want to pass "test" as an argument with the quotes).
To solve this (and other subtle) bugs, there is the list API where each element of the list will become a single item passed to the OS without any modifications. With list API, you get what you see. If you quote an argument with the list API, it will be passed on with the quotes. If there are spaces, they won't split your argument. If there are arbitrary other special characters (like * or %), they will all be passed on.
[EDIT] As usual, things are much more complex on Windows. From the Python documentation for the subprocess module:
17.1.5.1. Converting an argument sequence to a string on Windows
On Windows, an args sequence is converted to a string that can be parsed using the following rules (which correspond to the rules used by the MS C runtime):
Arguments are delimited by white space, which is either a space or a tab.
A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
A double quotation mark preceded by a backslash is interpreted as a literal double quotation mark.
Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
If backslashes immediately precede a double quotation mark, every pair of backslashes is interpreted as a literal backslash. If the number of backslashes is odd, the last backslash escapes the next double quotation mark as described in rule 3.
So the backslashes are there because MS C runtime wants it that way.

Weird issue during parsing a path in Python

Given this variables:
cardIP="00.00.00.00"
dir="D:\\TestingScript"
mainScriptPath='"\\\\XX\\XX\\XX\\Testing\\SNMP Tests\\Python Script\\MainScript.py"'
When using subprocess.call("cmd /c "+mainScriptPath+" "+dir+" "+cardIP) and print(mainScriptPath+" "+dir+" "+cardIP) I get this:
"\\XX\XX\XX\Testing\SNMP Tests\Python Script\MainScript.py" D:\TestingScript 00.00.00.00
which is what I wanted, OK.
But now, I want the 'dir' variable to be also inside "" because I am going to use dir names with spaces.
So, I do the same thing I did with 'mainScriptPath':
cardIP="00.00.00.00"
dir='"D:\\Testing Script"'
mainScriptPath='"\\XX\\XX\\XX\\Testing\\SNMP Tests\\Python Script\\MainScript.py"'
But now, when I'm doing print(mainScriptPath+" "+dir+" "+cardIP) I get:
"\\XX\XX\XX\Testing\SNMP Tests\Python Script\MainScript.py" "D:\Testing Script" 00.00.00.00
Which is great, but when executed in subprocess.call("cmd /c "+mainScriptPath+" "+dir+" "+cardIP) there is a failure with 'mainScriptPath' variable:
'\\XX\XX\XX\Testing\SNMP' is not recognized as an internal or external command...
It doesn't make sense to me.
Why does it fail?
In addition, I tried also:
dir="\""+"D:\\Testing Script"+"\""
Which in 'print' acts well but in 'subprocess.call' raise the same problem.
(Windows XP, Python3.3)
Use proper string formatting, use single quotes for the formatting string and simply include the quotes:
subprocess.call('cmd /c "{}" "{}" "{}"'.format(mainScriptPath, dir, cardIP))
The alternative is to pass in a list of arguments and have Python take care of quoting for you:
subprocess.call(['cmd', '/c', mainScriptPath, dir, cardIP])
When the first argument to .call() is a list, Python uses the process described under the section Converting an argument sequence to a string on Windows.
On Windows, an args sequence is converted to a string that can be
parsed using the following rules (which correspond to the rules used
by the MS C runtime):
Arguments are delimited by white space, which is either a space or a tab.
A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted
string can be embedded in an argument.
A double quotation mark preceded by a backslash is interpreted as a literal double quotation mark.
Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
If backslashes immediately precede a double quotation mark, every pair of backslashes is interpreted as a literal backslash. If the
number of backslashes is odd, the last backslash escapes the next
double quotation mark as described in rule 3.
This means that passing in your arguments as a sequence makes Python worry about all the nitty gritty details of escaping your arguments properly, including handling embedded backslashes and double quotes.

How can I read blackslashes from a file correctly?

The following code:
key = open("C:\Scripts\private.ppk",'rb').read()
reads the file and assigns its data to the var key.
For a reason, backslashes are multiplied in the process. How can I make sure they don't get multiplied?
You ... don't. They are escaped when they are read in so that they will process properly when they are written out / used. If you're declaring strings and don't want to double up the back slashes you can use raw strings r'c:\myfile.txt', but that doesn't really apply to the contents of a file you're reading in.
>>> s = r'c:\boot.ini'
>>> s
'c:\\boot.ini'
>>> repr(s)
"'c:\\\\boot.ini'"
>>> print s
c:\boot.ini
>>>
As you can see, the extra slashes are stored internally, but when you use the value in a print statement (write a file, test for values, etc.) they're evaluated properly.
You should read this great blog post on python and the backslash escape character.
And under some circumstances, if
Python prints information to the
console, you will see the two
backslashes rather than one. For
example, this is part of the
difference between the repr() function
and the str() function.
myFilename =
"c:\newproject\typenames.txt" print
repr(myFilename), str(myFilename)
produces
'c:\newproject\typenames.txt'
c:\newproject\typenames.txt
Backslashes are represented as escaped. You'll see two backslashes for each real one existing on the file, but that is normal behaviour.
The reason is that the backslash is used in order to create codes that represent characters that cannot be easily represented, such as new line '\n' or tab '\t'.
Are you trying to put single backslashes in a string? Strings with backslashes require and escape character, in this case "\". It will print to the screen with a single slash
In fact there is a solution - using eval, as long as the file content can be wrapped into quotes of some kind. Following worked for me (PATH contains some script that executes Matlab):
MATLAB_EXE = "C:\Program Files (x86)\MATLAB\R2012b\bin\matlab.exe"
content = open(PATH).read()
MATLAB_EXE in content # False
content = eval(f'r"""{content}"""')
MATLAB_EXE in content # True
This works by evaluating the content as python string literal, making double escapes transform into single ones. Raw string is used to prevent escapes forming special characters.

Python Command Line "characters" returns 'characters'

Thanks in advance for your help.
When entering "example" at the command line, Python returns 'example'. I can not find anything on the web to explain this. All reference materials speaks to strings in the context of the print command, and I get all of the material about using double quotes, singles quotes, triple quotes, escape commands, etc.
I can not, however, find anything explaining why entering text surrounded by double quotes at the command line always returns the same text surrounded by single quotes. What gives? Thanks.
In Python both 'string' and "string" are used to represent string literals. It's not like Java where single and double quotes represent different data types to the compiler.
The interpreter evaluates each line you enter and displays this value to you. In both cases the interpreter is evaluating what you enter, getting a string, and displaying this value. The default way of displaying strings is in single quotes so both times the string is displayed enclosed in single quotes.
It does seem odd - in that it breaks Python's rule of There should be one - and preferably only one - obvious way to do it - but I think disallowing one of the options would have been worse.
You can also enter a string literal using triple quotes:
>>> """characters
... and
... newlines"""
'characters\nand\nnewlines'
You can use the command line to confirm that these are the same thing:
>>> type("characters")
<type 'str'>
>>> type('characters')
<type 'str'>
>>> "characters" == 'characters'
True
The interpreter uses the __repr__ method of an object to get the display to print to you. So on your own objects you can determine how they are displayed in the interpreter. We can't change the __repr__ method for built in types, but we can customise the interpreter output using sys.displayhook:
>>> import sys
>>> def customoutput(value):
... if isinstance(value,str):
... print '"%s"' % value
... else:
... sys.__displayhook__(value)
...
>>> sys.displayhook = customoutput
>>> 'string'
"string"
In python, single quotes and double quotes are semantically the same.
It struck me as strange at first, since in C++ and other strongly-typed languages single quotes are a char and doubles are a string.
Just get used to the fact that python doesn't care about types, and so there's no special syntax for marking a string vs. a character. Don't let it cloud your perception of a great language
Don't get confused.
In python single quotes and double quotes are same. The creates an string object.

Categories

Resources