Some sensitive filenames cause failure of loading data

Some sensitive filenames cause failure of loading data - python

In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.

It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).

Related

pywinauto escaping special characters

I am using type_keys() on a combobox to upload files via a file dialog. As mentioned in similar SO posts, this function omits certain special characters in the text that it actually types into the combobox. I'm resolving this by simply replacing every "specialchar" in that string with "{specialchar}". So far I've found the need to replace the following chars: + ^ % ( ).
I'm wondering where I can find the complete list of characters that require this treatment. I don't think it's this list because I'm not seeing, for example, the % symbol there. I also tried checking keyboard.py from the keyboard library but I don't know if it can be found there.
PS. I realize that instead of using type_keys(), for example, using send_keys() or set_edit_text(), the escaping of special characters might be done for me automatically. However, for various reasons, it looks like type_keys() works the best for my particular file dialog/situation.
Thanks

This is the full documentation: https://pywinauto.readthedocs.io/en/latest/code/pywinauto.keyboard.html All special characters can be wrapped by {}

unicode_escape without deprecation warning

In Python 3.8, the following:
import codecs
codecs.decode("hello\\,world", "unicode_escape")
...produces a deprecation warning:
<ipython-input-7-28f185d30178>:1: DeprecationWarning: invalid escape sequence '\,'
codecs.decode("hello\\,", "unicode_escape", errors="strict")
Is this going to become an error in a future version of Python? Where is the reference for this?
If No, is there a way to not display this warning, if Yes, what can I use instead?
Edit: I am (obviously) not interested in fixing this for a string I would have written literally in my Python script. This is purely an example, in my use case strings come from external files, I cannot change them, and some contains invalid escape sequences such as this.

The problem is that you've got two layers of de-escaping occurring here. The first one is the regular string literal escaping, converting "hello\\,world" to a string with raw characters of hello\,world. Then codecs.decode tries to decode it with unicode_escape, which sees it as an attempt to escape , with \, which is an invalid escape.
The fix for your code as written is to use a raw string so the first level of escaping doesn't happen:
codecs.decode(r"hello\\,world", "unicode_escape")
# ^ Now it's a raw string, and both backslashes are in fact backslashes
If your data comes from elsewhere with invalid escapes, you can suppress the warning for now (see the warnings module for details), but it will eventually cause an exception in a future version of Python, so the long term solution is "Don't provide invalid data". Sorry that's not super-helpful.
The reason you get this warning is that, historically, Python has been kind of lax about spurious escapes. So if you wrote, say, a Windows path of "C:\yes" it said "hey, \y doesn't mean anything, so we'll just assume they wanted a literal backslash", while an equivalent path of "C:\no" saw \n and thought "Yup, they want a newline there".
This taught bad habits (not using raw strings when you should, because it usually worked without them) and created confusion when those habits bit you (why isn't it working this time!?!). So in the future, escapes like \y will be treated as errors, so that you end up writing r'C:\yes' and are used to doing so so you don't get bit by r'C:\no'. The warning is reminding you that this is bad code, and will eventually stop working (for good reason; as your own comment notes, you're okay with it ending up with either no backslashes or two backslashes, starting with just one, which is an insane set of options to accept without knowing the single, correct, desired result).
Alternative solution
If your goal is to "fix" bad strings, the best solution is probably to just write your own simple stripping regex, e.g.:
import re
bad_escape_re = re.compile(r'\\(?=[^\n\\\'"abfnrtv0-7xNuU])')
and then use it to strip unrecognized escapes a la:
good_string = bad_escape_re.sub('', bad_string)
which when run like so:
bad_escape_re.sub('', r'\a\b\c\d\e\f\g\,\.\n\t\x12')
produces a string with the repr '\\a\\bcde\\fg,.\\n\\t\\x12'. Note that's it's not perfect, and if you need it to validate the extended escapes to distinguish valid uses from invalid ones (\[0-7], \x, \N, \u and \U), it gets more complicated, but those are also cases where it's invariably heuristic and there is no good solution; without a human to interpret, \xab is legal and \xag is not, but it's entirely likely the former wasn't intended as an escape either.

re.escape returns unusable directory

using re.escape() on this directory:
C:\Users\admin\code
Should theoratically return this, right?
C:\\Users\\admin\\code
However, what I actually get is this:
C\:\\Users\\admin\\code
Notice the backslash immediately after C. This makes the string unusable, and trying to use directory.replace('\', '') just bugs out Python because it can't deal with a single backslash string, and treats everything after it as string.
Any ideas?
Update
This was a dumb question :p

No it should not. It's help says "Escape all the characters in pattern except ASCII letters, numbers and '_'"
What you are reporting you are getting is after calling the print function on the resulting string. In console, if you type directory and press enter, it would give something like: C\\:\\\\Users\\\\admin\\\\code. When using directory.replace('\\','') it would replace all backslashes. For example: directory.replace('\\','x') gives Cx:xxUsersxxadminxxcode. What might work in this case is replacing both the backslash and colon with ':' i.e. directory.replace('\\:',':'). This will work.
However, I will suggest doing something else. A neat way to work with Windows directories in Python is to use forward slash. Python and the OS will work out a way to understand your paths with forward slashes. Further, if you aren't using absolute paths, as far as the paths are concerned, your code will be portable to Unix-style OSes.
It also seems to me that you are calling re.escape unnecessarily. If the printing the directory is giving you C:\Users\admin\code then it's a perfectly fine directory to use already. And you don't need to escape it. It's already done. If it wasn't escaped print('C:\Users\admin\code') would give something like C:\Usersdmin\code since \a has special meaning (beep).

How can I create files on Windows with embedded slashes, using Python?

After a half hour searching Google, I am surprised I cannot find any way to create a file on Windows with slashes in the name. The customer demands that file names have the following structure:
04/28/2012 04:07 PM 6,781 12Q1_C125_G_04-17.pdf
So far I haven't found any way to encode the slashes so they become part of the file name instead of the path.
Any Suggestions?

You can't.
The forward slash is one of the characters that are not allowed to be used in Windows file names, see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
The following fundamental rules enable applications to create and
process valid names for files and directories, regardless of the file
system:
Use a period to separate the base file name from the extension in the name of a directory or file.
Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.
Use a backslash as required as part of volume names, for example, the "C:\" in "C:\path\file" or the "\server\share" in
"\server\share\path\file" for Universal Naming Convention (UNC)
names. For more information about UNC names, see the Maximum Path
Length Limitation section.
Do not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as
different. Note that NTFS supports POSIX semantics for case
sensitivity but this is not the default behavior. For more
information, see CreateFile.
Volume designators (drive letters) are similarly case-insensitive. For example, "D:\" and "d:\" refer to the same volume.
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Integer value zero, sometimes referred to as the ASCII NUL character.
Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File
Streams.
Any other character that the target file system does not allow.

At least all windows installation i've seen won't let you create files with slashes in them.
Even if it were possible somehow, by doing deepshit magic, it will probably screw up almost all applications, including windows explorer.
you could abuse windows' unicode capabilities, though.
Creating a file with ∕ (this is not a forward slash, it is "division slash", see http://www.fileformat.info/info/unicode/char/2215/index.htm ) in it's name works just fine, for example.

Um... forward slash is not a legal character in a Windows file name?
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

How to append '\\?\' to the front of a file path in Python

I'm trying to work with some long file paths (Windows) in Python and have come across some problems. After reading the question here, it looks as though I need to append '\\?\' to the front of my long file paths in order to use them with os.stat(filepath). The problem I'm having is that I can't create a string in Python that ends in a backslash. The question here points out that you can't even end strings in Python with a single '\' character.
Is there anything in any of the Python standard libraries or anywhere else that lets you simply append '\\?\' to the front of a file path you already have? Or is there any other work around for working with long file paths in Windows with Python? It seems like such a simple thing to do, but I can't figure it out for the life of me.

"\\\\?\\" should give you exactly the string you want.
Longer answer: of course you can end a string in Python with a backslash. You just can't do so when it's a "raw" string (one prefixed with an 'r'). Which you usually use for strings that contains (lots of) backslashes (to avoid the infamous "leaning toothpick" syndrome ;-))

Even with a raw string, you can end in a backslash with:
>>> print r'\\?\D:\Blah' + '\\'
\\?\D:\Blah\
or even:
>>> print r'\\?\D:\Blah' '\\'
\\?\D:\Blah\
since Python concatenates to literal strings into one.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.