How can I create files on Windows with embedded slashes, using Python? - python

After a half hour searching Google, I am surprised I cannot find any way to create a file on Windows with slashes in the name. The customer demands that file names have the following structure:
04/28/2012 04:07 PM 6,781 12Q1_C125_G_04-17.pdf
So far I haven't found any way to encode the slashes so they become part of the file name instead of the path.
Any Suggestions?

You can't.
The forward slash is one of the characters that are not allowed to be used in Windows file names, see
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
The following fundamental rules enable applications to create and
process valid names for files and directories, regardless of the file
system:
Use a period to separate the base file name from the extension in the name of a directory or file.
Use a backslash (\) to separate the components of a path. The backslash divides the file name from the path to it, and one directory name from another directory name in a path. You cannot use a backslash in the name for the actual file or directory because it is a reserved character that separates the names into components.
Use a backslash as required as part of volume names, for example, the "C:\" in "C:\path\file" or the "\server\share" in
"\server\share\path\file" for Universal Naming Convention (UNC)
names. For more information about UNC names, see the Maximum Path
Length Limitation section.
Do not assume case sensitivity. For example, consider the names OSCAR, Oscar, and oscar to be the same, even though some file systems (such as a POSIX-compliant file system) may consider them as
different. Note that NTFS supports POSIX semantics for case
sensitivity but this is not the default behavior. For more
information, see CreateFile.
Volume designators (drive letters) are similarly case-insensitive. For example, "D:\" and "d:\" refer to the same volume.
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Integer value zero, sometimes referred to as the ASCII NUL character.
Characters whose integer representations are in the range from 1 through 31, except for alternate data streams where these characters are allowed. For more information about file streams, see File
Streams.
Any other character that the target file system does not allow.

At least all windows installation i've seen won't let you create files with slashes in them.
Even if it were possible somehow, by doing deepshit magic, it will probably screw up almost all applications, including windows explorer.
you could abuse windows' unicode capabilities, though.
Creating a file with ∕ (this is not a forward slash, it is "division slash", see http://www.fileformat.info/info/unicode/char/2215/index.htm ) in it's name works just fine, for example.

Um... forward slash is not a legal character in a Windows file name?
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Related

Python: Weird prefix in path [duplicate]

I found a reference to a file in a log that had the following format:
\\?\C:\Path\path\file.log
I cannot find a reference to what the sequence of \?\ means. I believe the part between the backslashes refers to a hostname.
For instance, on my Windows computer, the following works just fine:
dir \\?\C:\
and also, just fine with same result:
dir \\.\C:\
Questions:
Is there a reference to what the question mark means in this particular path format?
What might generate a file path in such a format?
A long read, but worth reading if you are in this domain: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx
Extract:
The Windows API has many functions that also have Unicode versions to
permit an extended-length path for a maximum total path length of
32,767 characters. This type of path is composed of components
separated by backslashes, each up to the value returned in the
lpMaximumComponentLength parameter of the GetVolumeInformation
function (this value is commonly 255 characters). To specify an
extended-length path, use the "\\?\" prefix. For example,
"\\?\D:\very long path".
and:
The "\\?\" prefix can also be used with paths constructed according to
the universal naming convention (UNC). To specify such a path using
UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share",
where "server" is the name of the computer and "share" is the name of
the shared folder. These prefixes are not used as part of the path
itself. They indicate that the path should be passed to the system
with minimal modification, which means that you cannot use forward
slashes to represent path separators, or a period to represent the
current directory, or double dots to represent the parent directory.
Because you cannot use the "\\?\" prefix with a relative path,
relative paths are always limited to a total of MAX_PATH characters.
The Windows API parses input strings for file I/O. Among other things, it translates / to \ as part of converting the name to an NT-style name, or interpreting the . and .. pseudo directories. With few exceptions, the Windows API also limits path names to 260 characters.
The documented purpose of the \\?\ prefix is:
For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system.
Among other things, this allows using otherwise reserved symbols in path names (such as . or ..). Opting out of any translations, the system no longer has to maintain an internal buffer, and the arbitrary limit of 260 characters can also be lifted (as long as the underlying filesystem supports it). Note, that this is not the purpose of the \\?\ prefix, rather than a corollary, even if the prefix is primarily used for its corollary.

Some sensitive filenames cause failure of loading data

In using keras.model.load_weights, by the way, the weight file is saved in a hdf5 format, I come across some situations where the folder names that have initial r or t, cause the error: errno = 22, error message = 'invalid argument', flags = 0, o_flags = 0.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python, or the situation I encountered is only specific to keras.
It would greatly help debug this if you include examples of such filenames that give you trouble. However, I have a good idea on what is probably happening here.
This problems seem to appear on folders that start with r or t on their names. Also, as they are folders, on their full path name they are preceded by a \ character (for example "\thisFolder", or similar). This is true in the case of a Windows environment, as they use \ for separating paths contrary to *nix systems that use the regular slash /.
Considering these things, seems that perhaps you are experiencing this as \r and \t are both special characters that mean Carriage Return and Tabulation, respectively. If this is the case many file openers will have trouble processing such file name.
Even more, I would not be surprised if you got the same errors on folders that begin with n or other letters that when concatenated to a backslash give special characters (\n is new line, \s is a white space, etc.).
To overcome this seems that you will need to escape your backslash character before passing it as a filename. In python, an escaped backslash is "\\"
. In addition, you can also opt to pass a Raw string instead, by adding the r prefix to your string, something like r"\a\raw\string". More information on escaping and raw string can be found on this question and answers.
I want to know if there are some specified rules on the filenames which should be avoided and otherwise would lead to such reading error in python,
As mentioned, you should avoid this with characters that have a special meaning with a backslash. I suggest you check here to see the characters Python accepts like this, so you can refrain from using such characters (or well use raw strings and forget about this problem).

What is the difference between using / and \\ in specifying folder location in python?

I am using python v3.6 on Windows 10. When specifying a string to represent a directory location, what is the difference between the 2 approaches below?
folder_location = 'C:\\Users\\username\\Dropbox\\Inv'
folder_location = 'C:/Users/username/Dropbox/Inv'
This is a follow-up question to another question I just posted. My problem was solved when I used \\ instead of /.
What is wrong with this selenium firefox profile to download file into customized folder?
On Unix systems, the folder separator is /, while on Windows systems, the separator is \. Unfortunately this \ is also an escape character in most programming languages and text based formats (including C, Python and many others). Strangely enough a / character is not allowed in windows paths.
So Python on windows is designed to accept both / and \ as folder separator when dealing with the filesystem, for convenience. But the \ must be escaped by another \ (unless of course you use raw strings like r'backslashes are now normal characters \\\ !')
Selenium, on the other hand, will write values into Firefox preferences, which, unlike Python, expects the appropriate kind of separator. That's why using forward slashes does not work in your example.
Windows uses by default backslashes as file/folder seperator the \\ is an escaped \. The POSIX compliant file/folder seperator / is also supported by the windows api. But the library you use (which is not recognizable in your example) need also support it.
The standard Windows path separator is backslash \. But it is used in string formatting so for example \n is end of line.
For the above reason you rather don't want to use backslash in you path as if the name of the folder will start with a letter corresponding to special characters you will run into troubles.
To use native backslash separator in windows you have two ways. Yo can use raw string and then all special characters are read literary. path = r"C:\user\myFolder" or escape backslach with escape character with turns out to be the backslash too path = "C:\\user\\myFolder".
But coming back to DOS it accepted forward slash in path string too
Python is able to accept both separators. It is advised to use native way of formatting on your system
If you want you script working on both systems try:
import os
if os.name == 'posix':
path = '/net/myFolder/'
else:
path = r'C:\Users\myFolder'
Windows inherited backslashes as a path separator from Microsoft DOS. DOS initially didn't support subdirectories and opted to use the (on US keyboards) easily typed slash / character for command line switches.
When they did introduce subdirectories in DOS 2, either slash / or backslash \ worked as a path separator, but to use slashes on the command line you had to reconfigure the switch character, a feature they later removed entirely.
Thus the command line for certain commands that look for switches without space in front (like dir/w) is the one place you can't use forward slashes (this has to do with the command line being passed as a single string, unlike POSIX which passes distinct arguments in a list). That, and poorly written code that tries things like splitting on backslash, not knowing that slash is also a path separator.
It's also sometimes complicated by either character having other meanings, such as \ being the escape character in string literals; that's why you use \\ unless you use a raw string r'foo\bar'.
The other path separator I know of is classic Mac OS, which uses colon :. Python handles these differences by including reasonable routines in os.path or pathlib.
Windows and Linux/macOS use different path separators - UNIX uses forward slashes (/) while Windows use back slashes (\).
You should never type your own separators, always use os.path.join or os.sep, which handle this for you based on the platform you're running on. Example:
import os
folder_location = os.path.join('C:\\', 'Users', 'username', 'Dropbox', 'Inv')
# or
folder_location = os.sep.join(['C:\\', 'Users', 'username', 'Dropbox', 'Inv']);
Also, you will need to manually escape the drive letter's trailing slash manually, as specified on the Python docs:
Note that on Windows, since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
Hard-coding a full path like this is usually useless, as C: will only work on Windows anyway. You will most likely want to use this later on using relative paths or paths that were fetched elsewhere and need to have segments added to them.

Are variable names in Python valid file names?

I want to use the name of variables as file names. Are there any cases in which this might not work?
example:
my_var = "value"
with open("my_var.txt", 'wb') as fp:
pickle.dump(my_var, fp)
Here "my_var" is both a part of a file name and the name of a variable.
So basically I'm asking if python allows any characters in variable names, that would make trouble in filenames.
This may vary for other OSes, but under Windows, all the reserved (forbidden) characters for file names are also not allowed within Python variable names:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
However, there are a few different problems: Certain names are not allowed as Windows file names, specifically:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
So, you can't name a file CON.txt under windows even though CON would be a perfectly valid variable name in Python.
Then, there is a length limit for filenames, so very long variable names might cause a problem.
Lastly, Python variable names may be comprised of a wide range of Unicode characters, not all of which may be available for filenames.
Since Python variable names can only contain underscores, letters, and numbers they could all be used as file names.
The one issue that could come up, although hard to imagine, is that you could have a Python variable name longer than some file systems allow. (260 characters in Windows)

How to make Python use a path that contains colons in it?

I have a program that includes an embedded Python 2.6 interpreter. When I invoke the interpreter, I call PySys_SetPath() to set the interpreter's import-path to the subdirectories installed next to my executable that contain my Python script files... like this:
PySys_SetPath("/path/to/my/program/scripts/type1:/path/to/my/program/scripts/type2");
(except that the path strings are dynamically generated based on the current location of my program's executable, not hard-coded as in the example above)
This works fine... except when the clever user decides to install my program underneath a folder that has a colon in its name. In that case, my PySys_SetPath() command ends up looking like this (note the presence of a folder named "path:to"):
PySys_SetPath("/path:to/my/program/scripts/type1:/path:to/my/program/scripts/type2");
... and this breaks all my Python scripts, because now Python looks for script files in "/path", and "to/my/program/scripts/type1" instead of in "/path:to/myprogram/scripts/type1", and so none of the import statements work.
My question is, is there any fix for this issue, other than telling the user to avoid colons in his folder names?
I looked at the makepathobject() function in Python/sysmodule.c, and it doesn't appear to support any kind of quoting or escaping to handle literal colons.... but maybe I am missing some nuance.
The problem you're running into is the PySys_SetPath function parses the string you pass using a colon as the delimiter. That parser sees each : character as delimiting a path, and there isn't a way around this (can't be escaped).
However, you can bypass this by creating a list of the individual paths (each of which may contain colons) and use PySys_SetObject to set the sys.path:
PyListObject *path;
path = (PyListObject *)PyList_New(0);
PyList_Append((PyObject *) path, PyString_FromString("foo:bar"));
PySys_SetObject("path", (PyObject *)path);
Now the interpreter will see "foo:bar" as a distinct component of the sys.path.
Supporting colons in a file path opens up a huge can of worms on multiple operating systems; it is not a valid path character on Windows or Mac OS X, for example, and it doesn't seem like a particularly reasonable thing to support in the context of a scripting environment either for exactly this reason. I'm actually a bit surprised that Linux allows colon filenames too, especially since : is a very common path separator character.
You might try escaping the colon out, i.e. converting /path:to/ to /path\:to/ and see if that works. Other than that, just tell the user to avoid using colons in their file names. They will run into all sorts of problems in quite a few different environments and it's a just plain bad idea.

Categories

Resources