re.escape returns unusable directory

re.escape returns unusable directory - python

using re.escape() on this directory:
C:\Users\admin\code
Should theoratically return this, right?
C:\\Users\\admin\\code
However, what I actually get is this:
C\:\\Users\\admin\\code
Notice the backslash immediately after C. This makes the string unusable, and trying to use directory.replace('\', '') just bugs out Python because it can't deal with a single backslash string, and treats everything after it as string.
Any ideas?
Update
This was a dumb question :p

No it should not. It's help says "Escape all the characters in pattern except ASCII letters, numbers and '_'"
What you are reporting you are getting is after calling the print function on the resulting string. In console, if you type directory and press enter, it would give something like: C\\:\\\\Users\\\\admin\\\\code. When using directory.replace('\\','') it would replace all backslashes. For example: directory.replace('\\','x') gives Cx:xxUsersxxadminxxcode. What might work in this case is replacing both the backslash and colon with ':' i.e. directory.replace('\\:',':'). This will work.
However, I will suggest doing something else. A neat way to work with Windows directories in Python is to use forward slash. Python and the OS will work out a way to understand your paths with forward slashes. Further, if you aren't using absolute paths, as far as the paths are concerned, your code will be portable to Unix-style OSes.
It also seems to me that you are calling re.escape unnecessarily. If the printing the directory is giving you C:\Users\admin\code then it's a perfectly fine directory to use already. And you don't need to escape it. It's already done. If it wasn't escaped print('C:\Users\admin\code') would give something like C:\Usersdmin\code since \a has special meaning (beep).

Related

unicode_escape without deprecation warning

In Python 3.8, the following:
import codecs
codecs.decode("hello\\,world", "unicode_escape")
...produces a deprecation warning:
<ipython-input-7-28f185d30178>:1: DeprecationWarning: invalid escape sequence '\,'
codecs.decode("hello\\,", "unicode_escape", errors="strict")
Is this going to become an error in a future version of Python? Where is the reference for this?
If No, is there a way to not display this warning, if Yes, what can I use instead?
Edit: I am (obviously) not interested in fixing this for a string I would have written literally in my Python script. This is purely an example, in my use case strings come from external files, I cannot change them, and some contains invalid escape sequences such as this.

The problem is that you've got two layers of de-escaping occurring here. The first one is the regular string literal escaping, converting "hello\\,world" to a string with raw characters of hello\,world. Then codecs.decode tries to decode it with unicode_escape, which sees it as an attempt to escape , with \, which is an invalid escape.
The fix for your code as written is to use a raw string so the first level of escaping doesn't happen:
codecs.decode(r"hello\\,world", "unicode_escape")
# ^ Now it's a raw string, and both backslashes are in fact backslashes
If your data comes from elsewhere with invalid escapes, you can suppress the warning for now (see the warnings module for details), but it will eventually cause an exception in a future version of Python, so the long term solution is "Don't provide invalid data". Sorry that's not super-helpful.
The reason you get this warning is that, historically, Python has been kind of lax about spurious escapes. So if you wrote, say, a Windows path of "C:\yes" it said "hey, \y doesn't mean anything, so we'll just assume they wanted a literal backslash", while an equivalent path of "C:\no" saw \n and thought "Yup, they want a newline there".
This taught bad habits (not using raw strings when you should, because it usually worked without them) and created confusion when those habits bit you (why isn't it working this time!?!). So in the future, escapes like \y will be treated as errors, so that you end up writing r'C:\yes' and are used to doing so so you don't get bit by r'C:\no'. The warning is reminding you that this is bad code, and will eventually stop working (for good reason; as your own comment notes, you're okay with it ending up with either no backslashes or two backslashes, starting with just one, which is an insane set of options to accept without knowing the single, correct, desired result).
Alternative solution
If your goal is to "fix" bad strings, the best solution is probably to just write your own simple stripping regex, e.g.:
import re
bad_escape_re = re.compile(r'\\(?=[^\n\\\'"abfnrtv0-7xNuU])')
and then use it to strip unrecognized escapes a la:
good_string = bad_escape_re.sub('', bad_string)
which when run like so:
bad_escape_re.sub('', r'\a\b\c\d\e\f\g\,\.\n\t\x12')
produces a string with the repr '\\a\\bcde\\fg,.\\n\\t\\x12'. Note that's it's not perfect, and if you need it to validate the extended escapes to distinguish valid uses from invalid ones (\[0-7], \x, \N, \u and \U), it gets more complicated, but those are also cases where it's invariably heuristic and there is no good solution; without a human to interpret, \xab is legal and \xag is not, but it's entirely likely the former wasn't intended as an escape either.

Some sensitive filenames cause failure of loading data

In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.

It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).

Path Separator in Python 3

I have two strings:
C:\Data
and another folder
Foo1
I need, the windows output to be
C:\Data\Foo1
and the Linux output to be
/data/foo1
assuming /data is in linux. Is there any constant separator that can be used in Python, that makes it easy to use irrespective of underlying OS?

Yes, python provides os.sep, which is that character, but for your purpose, the function os.path.join() is what you are looking for.
>>> os.path.join("data", "foo1")
"data/foo1"

os.path.normpath() will normalize a path correctly for Linux and Windows. FYI, Windows OS calls can use either slash, but should be displayed to the user normalized.

The os.path.join() is always better. As Mark Tolonen wrote (my +1 to him), you can use a normal slash also for Windows, and you should prefer this way if you have to write the path explicitly. You should avoid using the backslash for paths in Python at all. Or you would have to double them in strings or you would have to use r'raw strings' to suppress the backslash interpretation. Otherwise, 'c:\for\a_path\like\this' actually contains \f, \a, and \t escape sequences that you may not notice in the time of writing... and they may be source of headaches in future.

Python, trying to run a program from the command prompt

I am trying to run a program from the command prompt in windows. I am having some issues. The code is below:
commandString = "'C:\Program Files\WebShot\webshotcmd.exe' //url '" + columns[3] + "' //out '"+columns[1]+"~"+columns[2]+".jpg'"
os.system(commandString)
time.sleep(10)
So with the single quotes I get "The filename, directory name, or volume label syntax is incorrect." If I replace the single quotes with \" then it says something to the effect of "'C:\Program' is not a valid executable."
I realize it is a syntax error, but I am not quite sure how to fix this....
column[3] contains a full url copy pasted from a web browser (so it should be url encoded). column[1] will only contain numbers and periods. column[2] contains some text, double quotes and colons are replaced. Mentioning just in case...
Thanks!

Windows requires double quotes in this situation, and you used single quotes.
Use the subprocess module rather than os.system, which is more robust and avoids calling the shell directly, making you not have to worry about confusing escaping issues.
Dont use + to put together long strings. Use string formatting (string %s" % (formatting,)), which is more readable, efficient, and idiomatic.
In this case, don't form a long string as a shell command anyhow, make a list and pass it to subprocess.call.
As best as I can tell you are escaping your forward slash but not your backslashes, which is backwards. A string literal with // has both slashes in the string it makes. In any event, rather than either you should use the os.path module which avoids any confusion from parsing escapes and often makes scripts more portable.

Use the subprocess module for calling system commands. Also ,try removing the single quotes and use double quotes.

How to append '\\?\' to the front of a file path in Python

I'm trying to work with some long file paths (Windows) in Python and have come across some problems. After reading the question here, it looks as though I need to append '\\?\' to the front of my long file paths in order to use them with os.stat(filepath). The problem I'm having is that I can't create a string in Python that ends in a backslash. The question here points out that you can't even end strings in Python with a single '\' character.
Is there anything in any of the Python standard libraries or anywhere else that lets you simply append '\\?\' to the front of a file path you already have? Or is there any other work around for working with long file paths in Windows with Python? It seems like such a simple thing to do, but I can't figure it out for the life of me.

"\\\\?\\" should give you exactly the string you want.
Longer answer: of course you can end a string in Python with a backslash. You just can't do so when it's a "raw" string (one prefixed with an 'r'). Which you usually use for strings that contains (lots of) backslashes (to avoid the infamous "leaning toothpick" syndrome ;-))

Even with a raw string, you can end in a backslash with:
>>> print r'\\?\D:\Blah' + '\\'
\\?\D:\Blah\
or even:
>>> print r'\\?\D:\Blah' '\\'
\\?\D:\Blah\
since Python concatenates to literal strings into one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.