replacing string from '\' into '/' Python - python

I've been strugling with some code where i need to change simple \ into / in Python. Its a path of file- Python doesn't read path of file in Windows'es way, so i simply want to change Windows path for Python to read file correctly.
I want to parse some text from game to count statistics. Im Doing it this way:
import re
pathNumbers = "D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"
pathNumbers = re.sub(r"\\", r"/",pathNumbers)
fileNumbers = open (pathNumbers, "r")
print(fileNumbers.readline())
fileNumbers.close()
But the Error i get back is
----> 6 fileNumbers = open (pathNumbers, "r") OSError: [Errno 22] Invalid argument: 'D:/Gry/Tibia/packages/TibiaExternal\test server.txt'
And the problem is, that function re.sub() and .replace(), give the same result- almost full path is replaced, but last char to change always stays untouched.
Do you have any solution for this, because it seems like changing those chars are for python a sensitive point.

Simple answer:
If you want to use paths on different plattforms join them with
os.path.join(path,*paths)
This way you don't have to work with the different separators at all.
Answer to what you intended to do:
The actual problem is, that your pathNumbers variable is not raw (leading r in definition), meaning that the backslashes are used as escape characters. In most cases this does not change anything, because the combinations with the following characters don't have a meaning. \t is the tab character, \n would be the newline character, so these are not simple backslash characters any more.
So simply write
pathNumbers = r"D:\Gry\Tibia\packages\TibiaExternal\log\test server.txt"

Related

Why does python add additional backslashes to the path?

I have a text file with a path that goes like this:
r"\\user\data\t83\rf\Desktop\QA"
When I try to read this file a print a line it returns the following string, I'm unable to open the file from this location:
'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
Seems you've got Python code in your text file, so either sanitize your file, so it only includes the actual path (not a Python string representation) or you can try to fiddle with string replace until you're satisfied, or just evaluate the Python string.
Note that using eval() opens Padora's box (it as unsafe as it gets), it's safer to use ast.literal_eval() instead.
import ast
file_content = 'r"\\\\user\\data\\t83\\rf\\Desktop\\QA"\n'
print(eval(file_content)) # do not use this, it's only shown for the sake of completeness
print(ast.literal_eval(file_content))
Output:
\\user\data\t83\rf\Desktop\QA
\\user\data\t83\rf\Desktop\QA
Personally, I'd prefer to sanitize the file, so it only contains \\user\data\t83\rf\Desktop\QA
\ will wait for another character to form one like \n (new line) or \t (tab) therefore a single backslash will merge with the next character. To solve this if the next character is \\ it will represent the single backslash.

silly file reading question for new python user

I am simply trying to define a path and file name then use pandas.read_csv()
in the variable display of spyder, the path and file name appear correct, but in reality they have double \\. I know this has got to be something really stupid...
siteinfopath=r'C:\Users\cpsei\Documents'
siteinfofile=siteinfopath+'\grav_stats.csv'
grav_stats=pd.read_csv(siteinfofile)
When i run the script I get the following error message:
FileNotFoundError: [Errno 2] File
b'C:\Users\cpsei\Documents\grav_stats.csv' does not exist:
b'C:\Users\cpsei\Documents\grav_stats.csv'
and when I type
siteinfofile
Out[145]: 'C:\\Users\\cpsei\\Documents\\grav_stats.csv'
Why the double \. In the variable viewer the path is correct.
You see double \\ instead of one, because \ is used in python as escape character - it informs that this \ character and next character should be threated in special way. For example:
\t - means TAB
\r - is carriage return - cursor moves to the beginning of the line
\n - is new line - cursor moves to beginning of new line
If however you want just plain simple \, you have to use \\ - first one informs as usual that there is some special character, and next informs that this special character is actually \.
You can read more about it ie on https://docs.python.org/3/tutorial/introduction.html#strings - there is a lot of very good examples :)
So, everything is OK, your strings work as expected. If you want to see how this string looks like, and not how it is constructed, print it:
>>> print(siteinfofile)
C:\Users\cpsei\Documents\grav_stats.csv
Are you sure path is correct and you can read this file? That's the only advice I can think of here...

Some sensitive filenames cause failure of loading data

In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.
It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).

python 3 regex not finding confirmed matches

So I'm trying to parse a bunch of citations from a text file using the re module in python 3.4 (on, if it matters, a mac running mavericks). Here's some minimal code. Note that there are two commented lines: they represent two alternative searches. (Obviously, the little one, r'Rawls', is the one that works)
def makeRefList(reffile):
print(reffile)
# namepattern = r'(^[A-Z1][A-Za-z1]*-?[A-Za-z1]*),.*( \(?\d\d\d\d[a-z]?[.)])'
# namepattern = r'Rawls'
refsTuplesList = re.findall(namepattern, reffile, re.MULTILINE)
print(refsTuplesList)
The string in question is ugly, and so I stuck it in a gist: https://gist.github.com/paultopia/6c48c398a42d4834f2ae
As noted, the search string r'Rawls' produces expected output ['Rawls', 'Rawls']. However, the other search string just produces an empty list.
I've confirmed this regex (partially) works using the regex101 tester. Confirmation here: https://regex101.com/r/kP4nO0/1 -- this match what I expect it to match. Since it works in the tester, it should work in the code, right?
(n.b. I copied the text from terminal output from the first print command, then manually replaced \n characters in the string with carriage returns for regex101.)
One possible issue is that python has appended the bytecode flag (is the little b called a "flag?") to the string. This is an artifact of my attempt to convert the text from utf-8 to ascii, and I haven't figured out how to make it go away.
Yet re clearly is able to parse strings in that form. I know this because I'm converting two text files from utf-8 to ascii, and the following code works perfectly fine on the other string, converted from the other text file, which also has a little b in front of it:
def makeCiteList(citefile):
print(citefile)
citepattern = r'[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]? \(?\d\d\d\d[a-z]?[\s.,)]'
rawCitelist = re.findall(citepattern, citefile)
cleanCitelist = cleanup(rawCitelist)
finalCiteList = list(set(cleanCitelist))
print(finalCiteList)
return(finalCiteList)
The other chunk of text, which the code immediately above matches correctly: https://gist.github.com/paultopia/a12eba2752638389b2ee
The only hypothesis I can come up with is that the first, broken, regex expression is puking on the combination of newline characters and the string being treated as a byte object, even though a) I know the regex is correct for newlines (because, confirmation from the linked regex101), and b) I know it's matching the strings (because, confirmation from the successful match on the other string).
If that's true, though, I don't know what to do about it.
Thus, questions:
1) Is my hypothesis right that it's the combination of newlines and b that blows up my regex? If not, what is?
2) How do I fix that?
a) replace the newlines with something in the string?
b) rewrite the regex somehow?
c) somehow get rid of that b and make it into a normal string again? (how?)
thanks!
Addition
In case this is a problem I need to fix upstream, here's the code I'm using to get the text files and convert to ascii, replacing non-ascii characters:
this function gets called on utf-8 .txt files saved by textwrangler in mavericks
def makeCorpoi(citefile, reffile):
citebox = open(citefile, 'r')
refbox = open(reffile, 'r')
citecorpus = citebox.read()
refcorpus = refbox.read()
citebox.close()
refbox.close()
corpoi = [str(citecorpus), str(refcorpus)]
return corpoi
and then this function gets called on each element of the list the above function returns.
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
stringstring = str(bigstring)
return stringstring
Aah. I've tracked it down and answered my own question. Apparently one needs to call some kind of encode method on the decoded thing. The following code produces an actual string, with newlines and everything, out the other end (though now I have to fix a bunch of other bugs before I can figure out if the final output is as expected):
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
newstring = bigstring.decode('ascii', 'foreign')
return newstring
apparently the str() function doesn't do the same job, for reasons that are mysterious to me. This is despite an answer here How to make new line commands work in a .txt file opened from the internet? which suggests that it does.

How to append '\\?\' to the front of a file path in Python

I'm trying to work with some long file paths (Windows) in Python and have come across some problems. After reading the question here, it looks as though I need to append '\\?\' to the front of my long file paths in order to use them with os.stat(filepath). The problem I'm having is that I can't create a string in Python that ends in a backslash. The question here points out that you can't even end strings in Python with a single '\' character.
Is there anything in any of the Python standard libraries or anywhere else that lets you simply append '\\?\' to the front of a file path you already have? Or is there any other work around for working with long file paths in Windows with Python? It seems like such a simple thing to do, but I can't figure it out for the life of me.
"\\\\?\\" should give you exactly the string you want.
Longer answer: of course you can end a string in Python with a backslash. You just can't do so when it's a "raw" string (one prefixed with an 'r'). Which you usually use for strings that contains (lots of) backslashes (to avoid the infamous "leaning toothpick" syndrome ;-))
Even with a raw string, you can end in a backslash with:
>>> print r'\\?\D:\Blah' + '\\'
\\?\D:\Blah\
or even:
>>> print r'\\?\D:\Blah' '\\'
\\?\D:\Blah\
since Python concatenates to literal strings into one.

Categories

Resources