How to append '\\?\' to the front of a file path in Python

How to append '\\?\' to the front of a file path in Python - python

I'm trying to work with some long file paths (Windows) in Python and have come across some problems. After reading the question here, it looks as though I need to append '\\?\' to the front of my long file paths in order to use them with os.stat(filepath). The problem I'm having is that I can't create a string in Python that ends in a backslash. The question here points out that you can't even end strings in Python with a single '\' character.
Is there anything in any of the Python standard libraries or anywhere else that lets you simply append '\\?\' to the front of a file path you already have? Or is there any other work around for working with long file paths in Windows with Python? It seems like such a simple thing to do, but I can't figure it out for the life of me.

"\\\\?\\" should give you exactly the string you want.
Longer answer: of course you can end a string in Python with a backslash. You just can't do so when it's a "raw" string (one prefixed with an 'r'). Which you usually use for strings that contains (lots of) backslashes (to avoid the infamous "leaning toothpick" syndrome ;-))

Even with a raw string, you can end in a backslash with:
>>> print r'\\?\D:\Blah' + '\\'
\\?\D:\Blah\
or even:
>>> print r'\\?\D:\Blah' '\\'
\\?\D:\Blah\
since Python concatenates to literal strings into one.

Related

replacing string from '\' into '/' Python

I've been strugling with some code where i need to change simple \ into / in Python. Its a path of file- Python doesn't read path of file in Windows'es way, so i simply want to change Windows path for Python to read file correctly.
I want to parse some text from game to count statistics. Im Doing it this way:
import re
pathNumbers = "D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"
pathNumbers = re.sub(r"\\", r"/",pathNumbers)
fileNumbers = open (pathNumbers, "r")
print(fileNumbers.readline())
fileNumbers.close()
But the Error i get back is
----> 6 fileNumbers = open (pathNumbers, "r") OSError: [Errno 22] Invalid argument: 'D:/Gry/Tibia/packages/TibiaExternal\test server.txt'
And the problem is, that function re.sub() and .replace(), give the same result- almost full path is replaced, but last char to change always stays untouched.
Do you have any solution for this, because it seems like changing those chars are for python a sensitive point.

Simple answer:
If you want to use paths on different plattforms join them with
os.path.join(path,*paths)
This way you don't have to work with the different separators at all.
Answer to what you intended to do:
The actual problem is, that your pathNumbers variable is not raw (leading r in definition), meaning that the backslashes are used as escape characters. In most cases this does not change anything, because the combinations with the following characters don't have a meaning. \t is the tab character, \n would be the newline character, so these are not simple backslash characters any more.
So simply write
pathNumbers = r"D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"

re.escape returns unusable directory

using re.escape() on this directory:
C:\Users\admin\code
Should theoratically return this, right?
C:\\Users\\admin\\code
However, what I actually get is this:
C\:\\Users\\admin\\code
Notice the backslash immediately after C. This makes the string unusable, and trying to use directory.replace('\', '') just bugs out Python because it can't deal with a single backslash string, and treats everything after it as string.
Any ideas?
Update
This was a dumb question :p

No it should not. It's help says "Escape all the characters in pattern except ASCII letters, numbers and '_'"
What you are reporting you are getting is after calling the print function on the resulting string. In console, if you type directory and press enter, it would give something like: C\\:\\\\Users\\\\admin\\\\code. When using directory.replace('\\','') it would replace all backslashes. For example: directory.replace('\\','x') gives Cx:xxUsersxxadminxxcode. What might work in this case is replacing both the backslash and colon with ':' i.e. directory.replace('\\:',':'). This will work.
However, I will suggest doing something else. A neat way to work with Windows directories in Python is to use forward slash. Python and the OS will work out a way to understand your paths with forward slashes. Further, if you aren't using absolute paths, as far as the paths are concerned, your code will be portable to Unix-style OSes.
It also seems to me that you are calling re.escape unnecessarily. If the printing the directory is giving you C:\Users\admin\code then it's a perfectly fine directory to use already. And you don't need to escape it. It's already done. If it wasn't escaped print('C:\Users\admin\code') would give something like C:\Usersdmin\code since \a has special meaning (beep).

Some sensitive filenames cause failure of loading data

In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.

It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).

removing weird double quotes (from excel file) in python string

I'm loading in an excel file to python3 using xlrd. They are basically lines of text in a spreadsheet. On some of these lines are quotation marks. For example, one line can be:
She said, "My name is Jennifer."
When I'm reading them into python and making them into strings, the double quotes are read in as a weird double quote character that looks like a double quote in italics. I'm assuming that somewhere along the way, python read in the character as some foreign character rather than actual double quotes due to some encoding issue or something. So in the above example, if I assign that line as "text", then we'll have something like the following (although not exactly since I don't actually type out the line, so imagine "text" was already assigned beforehand):
text = 'She said, “My name is Jennifer.”'
text[10] == '"'
The second line will spit out a False because it doesn't seem to recognize it as a normal double quote character. I'm working within the Mac terminal if that makes a difference.
My questions are:
1. Is there a way to easily strip these weird double quotes?
2. Is there a way when I read in the file to get python to recognize them as double quotes properly?

I'm assuming that somewhere along the way, python read in the character as some foreign character
Yes; it read that in because that's what the file data actually represents.
rather than actual double quotes due to some encoding issue or something.
There's no issue with the encoding. The actual character is not an "actual double quote".
Is there a way to easily strip these weird double quotes?
You can use the .replace method of strings as you would normally, to either replace them with an "actual double quote" or with nothing.
Is there a way when I read in the file to get python to recognize them as double quotes properly?
If you're looking for them, you can compare them to the character they actually are.
As noted in the comment, they are most likely U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK. They're used so that opening and closing quotes can look different (by curving in different directions), which pretty typography normally does (as opposed to using " which is simply more convenient for programmers). You represent them in Python with a Unicode escape, thus:
text[10] == '\u201c'
You could also have directly asked Python for this info, by asking for text[10] at the Python command line (which would evaluate that and show you the representation), or explicitly in a script with e.g. print(repr(text[10])).

How can I read blackslashes from a file correctly?

The following code:
key = open("C:\Scripts\private.ppk",'rb').read()
reads the file and assigns its data to the var key.
For a reason, backslashes are multiplied in the process. How can I make sure they don't get multiplied?

You ... don't. They are escaped when they are read in so that they will process properly when they are written out / used. If you're declaring strings and don't want to double up the back slashes you can use raw strings r'c:\myfile.txt', but that doesn't really apply to the contents of a file you're reading in.
>>> s = r'c:\boot.ini'
>>> s
'c:\\boot.ini'
>>> repr(s)
"'c:\\\\boot.ini'"
>>> print s
c:\boot.ini
>>>
As you can see, the extra slashes are stored internally, but when you use the value in a print statement (write a file, test for values, etc.) they're evaluated properly.

You should read this great blog post on python and the backslash escape character.
And under some circumstances, if
Python prints information to the
console, you will see the two
backslashes rather than one. For
example, this is part of the
difference between the repr() function
and the str() function.
myFilename =
"c:\newproject\typenames.txt" print
repr(myFilename), str(myFilename)
produces
'c:\newproject\typenames.txt'
c:\newproject\typenames.txt

Backslashes are represented as escaped. You'll see two backslashes for each real one existing on the file, but that is normal behaviour.
The reason is that the backslash is used in order to create codes that represent characters that cannot be easily represented, such as new line '\n' or tab '\t'.

Are you trying to put single backslashes in a string? Strings with backslashes require and escape character, in this case "\". It will print to the screen with a single slash

In fact there is a solution - using eval, as long as the file content can be wrapped into quotes of some kind. Following worked for me (PATH contains some script that executes Matlab):
MATLAB_EXE = "C:\Program Files (x86)\MATLAB\R2012b\bin\matlab.exe"
content = open(PATH).read()
MATLAB_EXE in content # False
content = eval(f'r"""{content}"""')
MATLAB_EXE in content # True
This works by evaluating the content as python string literal, making double escapes transform into single ones. Raw string is used to prevent escapes forming special characters.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.