How can I read blackslashes from a file correctly? - python

The following code:
key = open("C:\Scripts\private.ppk",'rb').read()
reads the file and assigns its data to the var key.
For a reason, backslashes are multiplied in the process. How can I make sure they don't get multiplied?

You ... don't. They are escaped when they are read in so that they will process properly when they are written out / used. If you're declaring strings and don't want to double up the back slashes you can use raw strings r'c:\myfile.txt', but that doesn't really apply to the contents of a file you're reading in.
>>> s = r'c:\boot.ini'
>>> s
'c:\\boot.ini'
>>> repr(s)
"'c:\\\\boot.ini'"
>>> print s
c:\boot.ini
>>>
As you can see, the extra slashes are stored internally, but when you use the value in a print statement (write a file, test for values, etc.) they're evaluated properly.

You should read this great blog post on python and the backslash escape character.
And under some circumstances, if
Python prints information to the
console, you will see the two
backslashes rather than one. For
example, this is part of the
difference between the repr() function
and the str() function.
myFilename =
"c:\newproject\typenames.txt" print
repr(myFilename), str(myFilename)
produces
'c:\newproject\typenames.txt'
c:\newproject\typenames.txt

Backslashes are represented as escaped. You'll see two backslashes for each real one existing on the file, but that is normal behaviour.
The reason is that the backslash is used in order to create codes that represent characters that cannot be easily represented, such as new line '\n' or tab '\t'.

Are you trying to put single backslashes in a string? Strings with backslashes require and escape character, in this case "\". It will print to the screen with a single slash

In fact there is a solution - using eval, as long as the file content can be wrapped into quotes of some kind. Following worked for me (PATH contains some script that executes Matlab):
MATLAB_EXE = "C:\Program Files (x86)\MATLAB\R2012b\bin\matlab.exe"
content = open(PATH).read()
MATLAB_EXE in content # False
content = eval(f'r"""{content}"""')
MATLAB_EXE in content # True
This works by evaluating the content as python string literal, making double escapes transform into single ones. Raw string is used to prevent escapes forming special characters.

Related

Why does python add additional backslashes to the path?

I have a text file with a path that goes like this:
r"\\user\data\t83\rf\Desktop\QA"
When I try to read this file a print a line it returns the following string, I'm unable to open the file from this location:
'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
Seems you've got Python code in your text file, so either sanitize your file, so it only includes the actual path (not a Python string representation) or you can try to fiddle with string replace until you're satisfied, or just evaluate the Python string.
Note that using eval() opens Padora's box (it as unsafe as it gets), it's safer to use ast.literal_eval() instead.
import ast
file_content = 'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
print(eval(file_content)) # do not use this, it's only shown for the sake of completeness
print(ast.literal_eval(file_content))
Output:
\\user\data\t83\rf\Desktop\QA
\\user\data\t83\rf\Desktop\QA
Personally, I'd prefer to sanitize the file, so it only contains \\user\data\t83\rf\Desktop\QA
\ will wait for another character to form one like \n (new line) or \t (tab) therefore a single backslash will merge with the next character. To solve this if the next character is \\ it will represent the single backslash.

Why do some functions in Python change \ to \\

When I declare pass a file to shutil.copy as
shutil.copy(r'i:\myfile.txt', r'UNC to where I want it to go')
I get an error
No such file or directory 'i:\\myfile.txt'
I've experienced this problem before with the os module when I have a UNC path. Usually I just get frustrated enough that I forget using the os module and just put the file path into with open() or whatever I'm using it for.
It is my understanding that placing an r before '' is supposed to cause python to ignore escape characters and treat them as string literals, but the behavior I'm seeing leads me to believe that this is not the case. For some reason it takes the \ and changes it to \\.
I've seen this when using os.path.join where the \\ at the beginning of the the UNC Path gets turned into \\\\.
What is the best way to pass a string literal to ensure that all escape characters are ignored and the string is preserved?
Your string is not being modified by Python. It's the representation of your string that's coming out differently.
When the error is printed, Python calls repr() to print the value. This function will
Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a repr() method.
This can be very nice when debugging: if I paste that string (quotes, escapes, and all) into the REPL I'll get the string in memory that you were working with. I can use this to interactively try your copy command, maybe tweaking the string a bit.
If you want to see your string in a printed form, you could do
source_path = r'i:\myfile.txt'
target_path = r'UNC to where I want it to go'
print(f'Copying {source_path} to {target_path}...')
shutil.copy(source_path, target_path)

Replace double backslash in string literal with single backslash

I'm trying to print a string that contains double backslash (one to escape the other) such that only one of the backslashes are printed. I thought this would happen automatically, but I must be missing some detail.
I have this little snippet:
for path in self.tokenized:
pdb.set_trace()
print(self.tokenized[path])
When I debug with that pdb.set_trace() I can see that my strings have double backslashes, and then I enter continue to print the remainder and it prints that same thing.
> /home/kendall/Development/path-parser/tokenize_custom.py(82)print_tokens()
-> print(self.tokenized[path])
(Pdb) self.tokenized[path]
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
(Pdb) c
['c:', '\\home', '\\kendall', '\\Desktop', '\\home\\kendall\\Desktop']
Note that I'm writing a parser that parses Windows file paths -- thus the backslashes.
This is what it looks like to run the program:
kendall#kendall-XPS-8500:~/Development/path-parser$ python main.py -f c:\\home\\kendall\\Desktop
The issue you are having is that you're printing a list, which only knows one way to stringify its contents: repr. repr is only designed for debugging use. Idiomatically, when possible (classes are a notable exception), it outputs a syntactically valid python expression that can be directly fed into the interpretter to reproduce the original object - hence the escaped backslashes.
Instead, you need to loop through each list, and print each string individually.
You can use str.join() to do this for you.
To get the exact same output, minus the doubled backslashes, you'd need to do something like:
print("[{0}]".format(", ".join(self.tokenized[path])))

IO ERROR(ERRNO 20) while Accessing a file inside a folder in python

This is a code for accessing a file inside a folder using with open() as:{} option.
with open("DATABASE\password.txt") as _2_:
password=_2_.readlines()
with open("DATABASE/names.txt") as _3_:
names=_3_.readlines()
with open("DATABASE\email.txt") as _4_:
email=_4_.readlines()
In this code, if I put "DATABASE\names.txt", as I did for password and email; instead of "DATABASE/names.txt"; it does not work. Please Tell me the reason for the same.
You need to add another backslash. Example: open("path\\to\\file.txt")
Your errors are happening because you need to escape the backslash by adding another one. Such a thing won't happen with /.
You need to escape the \, use raw string r or forward slashes as you have already tried:
"DATABASE\\names.txt" # double \
r"DATABASE\names.txt" # raw string
"DATABASE/names.txt" # use forward slashes
\n is a newline character.
In [7]: print "DATABASE\names.txt" # interpreted as two lines
DATABASE
ames.txt
In [8]: print r"DATABASE\names.txt"
DATABASE\names.txt
A backslash has a special meaning in python, it is used to escape characters.

removing weird double quotes (from excel file) in python string

I'm loading in an excel file to python3 using xlrd. They are basically lines of text in a spreadsheet. On some of these lines are quotation marks. For example, one line can be:
She said, "My name is Jennifer."
When I'm reading them into python and making them into strings, the double quotes are read in as a weird double quote character that looks like a double quote in italics. I'm assuming that somewhere along the way, python read in the character as some foreign character rather than actual double quotes due to some encoding issue or something. So in the above example, if I assign that line as "text", then we'll have something like the following (although not exactly since I don't actually type out the line, so imagine "text" was already assigned beforehand):
text = 'She said, “My name is Jennifer.”'
text[10] == '"'
The second line will spit out a False because it doesn't seem to recognize it as a normal double quote character. I'm working within the Mac terminal if that makes a difference.
My questions are:
1. Is there a way to easily strip these weird double quotes?
2. Is there a way when I read in the file to get python to recognize them as double quotes properly?
I'm assuming that somewhere along the way, python read in the character as some foreign character
Yes; it read that in because that's what the file data actually represents.
rather than actual double quotes due to some encoding issue or something.
There's no issue with the encoding. The actual character is not an "actual double quote".
Is there a way to easily strip these weird double quotes?
You can use the .replace method of strings as you would normally, to either replace them with an "actual double quote" or with nothing.
Is there a way when I read in the file to get python to recognize them as double quotes properly?
If you're looking for them, you can compare them to the character they actually are.
As noted in the comment, they are most likely U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK. They're used so that opening and closing quotes can look different (by curving in different directions), which pretty typography normally does (as opposed to using " which is simply more convenient for programmers). You represent them in Python with a Unicode escape, thus:
text[10] == '\u201c'
You could also have directly asked Python for this info, by asking for text[10] at the Python command line (which would evaluate that and show you the representation), or explicitly in a script with e.g. print(repr(text[10])).

Categories

Resources