Unpredictable results from os.path.join in windows - python

So what I'm trying to do is to join something in the form of
os.path.join('C:\path\to\folder', 'filename').
**edit :
Actual code is :
filename = 'creepy_%s.pcl' % identifier
file = open(os.path.join(self.cache_dir, filename), 'w')
where self.cache_dir is read from a file using configobj (returns string) and in the particular case is '\Documents and Settings\Administrator\creepy\cache'
The first part is returned from a configuration file, using configobj. The second is a concatenation of 2 strings like: 'file%s' % name
When I run the application through the console in windows using the python interpreter installed, I get the expected result which is
C:\\path\\to\\folder\\filename
When I bundle the same application and the python interpreter (same version, 2.6) in an executable in windows and run the app the result is instead
C:\\path\\to\\folderfilename
Any clues as to what might be the problem, or what would cause such inconsistencies in the output ?

Your code is malformed. You need to double those backslashes or use a raw string.
os.path.join('C:\\path\\to\\folder', 'filename').
I don't know why it works in one interpreter and not the other but your code will not be interpreted properly as is. The weird thing is i'd have expected a different output, ie: C:pathtofolder\filename.

It is surprising behavior. There is no reason it should behave in such a way.
Just be be cautious, you can change the line to the following.
os.path.join(r'C:\path\to\folder\', 'filename').
Note the r'' raw string and the final \

Three things you can do:
Use double-slashes in your original string, 'C:\\path\\to\\folder'
Use a raw string, r'C:\path\to\folder'
Use forward-slashes, 'C:/path/to/folder'

I figure it out yesterday. As usual when things seem really strange, the explanation is very simple and most of the times involve you being stupid.
To cut a long story short there were leftovers from some previous installations in dist-packages. The bundled interpreter loaded the module from there , but when i ran the python script from the terminal , the module (newer version) in the current dir was loaded. Hence the "unpredictable" results.

Related

How can I create a file with `/` in its file name? [duplicate]

I know that this is not something that should ever be done, but is there a way to use the slash character that normally separates directories within a filename in Linux?
The answer is that you can't, unless your filesystem has a bug. Here's why:
There is a system call for renaming your file defined in fs/namei.c called renameat:
SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
int, newdfd, const char __user *, newname)
When the system call gets invoked, it does a path lookup (do_path_lookup) on the name. Keep tracing this, and we get to link_path_walk which has this:
static int link_path_walk(const char *name, struct nameidata *nd)
{
struct path next;
int err;
unsigned int lookup_flags = nd->flags;
while (*name=='/')
name++;
if (!*name)
return 0;
...
This code applies to any file system. What's this mean? It means that if you try to pass a parameter with an actual '/' character as the name of the file using traditional means, it will not do what you want. There is no way to escape the character. If a filesystem "supports" this, it's because they either:
Use a unicode character or something that looks like a slash but isn't.
They have a bug.
Furthermore, if you did go in and edit the bytes to add a slash character into a file name, bad things would happen. That's because you could never refer to this file by name :( since anytime you did, Linux would assume you were referring to a nonexistent directory. Using the 'rm *' technique would not work either, since bash simply expands that to the filename. Even rm -rf wouldn't work, since a simple strace reveals how things go on under the hood (shortened):
$ ls testdir
myfile2 out
$ strace -vf rm -rf testdir
...
unlinkat(3, "myfile2", 0) = 0
unlinkat(3, "out", 0) = 0
fcntl(3, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(3) = 0
unlinkat(AT_FDCWD, "testdir", AT_REMOVEDIR) = 0
...
Notice that these calls to unlinkat would fail because they need to refer to the files by name.
You could use a Unicode character that displays as / (for example the fraction slash), assuming your filesystem supports it.
It depends on what filesystem you are using. Of some of the more popular ones:
ext3: No
ext4: No
jfs: Yes
reiserfs: No
xfs: No
Only with an agreed-upon encoding. For example, you could agree that % will be encoded as %% and that %2F will mean a /. All the software that accessed this file would have to understand the encoding.
The short answer is: No, you can't. It's a necessary prohibition because of how the directory structure is defined.
And, as mentioned, you can display a unicode character that "looks like" a slash, but that's as far as you get.
In general it's a bad idea to try to use "bad" characters in a file name at all; even if you somehow manage it, it tends to make it hard to use the file later. The filesystem separator is flat-out not going to work at all, so you're going to need to pick an alternative method.
Have you considered URL-encoding the URL then using that as the filename? The result should be fine as a filename, and it's easy to reconstruct the name from the encoded version.
Another option is to create an index - create the output filename using whatever method you like - sequentially-numbered names, SHA1 hashes, whatever - then write a file with the generated filename/URL pair. You can save that into a hash and use it to do a URL-to-filename lookup or vice-versa with the reversed version of the hash, and you can write it out and reload it later if needed.
The short answer is: you must not. The long answer is, you probably can or it depends on where you are viewing it from and in which layer you are working with.
Since the question has Unix tag in it, I am going to answer for Unix.
As mentioned in other answers that, you must not use forward slashes in a filename.
However, in MacOS you can create a file with forward slashes / by:
# avoid doing it at all cost
touch 'foo:bar'
Now, when you see this filename from terminal you will see it as foo:bar
But, if you see it from finder: you will see finder converted it as foo/bar
Same thing can be done the other way round, if you create a file from finder with forward slashes in it like /foobar, there will be a conversion done in the background. As a result, you will see :foobar in terminal but the other way round when viewed from finder.
So, : is valid in the unix layer, but it is translated to or from / in the Mac layers like Finder window, GUI. : the colon is used as the separator in HFS paths and the slash / is used as the separator in POSIX paths
So there is a two-way translation happening, depending on which “layer” you are working with.
See more details here: https://apple.stackexchange.com/a/283095/323181
You can have a filename with a / in Linux and Unix. This is a very old question, but surprisingly nobody has said it in almost 10 years since the question was asked.
Every Unix and Linux system has the root directory named /. A directory is just a special kind of file. Symbolic links, character devices, etc are also special kinds of files. See here for an in depth discussion.
You can't create any other files with a /, but you certainly have one -- and a very important one at that.

Vim and Python configuration

I want to compile and run a Python script using Gvim.
I have configured this in my _vimrc:
map <F5>:!D:\Python27\python.exe%
But when I complete the Python code and then enter :F5enter, I receive the error:
E448 Extra tail characters
How can I solve this?
Your first problem is that you do not normally compile python - you just run it.
Second I suspect that you need a space before the % in your configuration assuming that it gets replaced by the filename.
You're missing spaces, in two important areas:
First, there must be whitespace separating the {lhs} (<F5> in your case) from the {rhs} (to what the mapping is expanded.
Second, the % that Vim will expand to the filename must be separated from the Python executable path, or else you'll end up with something like python.exeFoo.py, which the operating system doesn't understand.
Additionally:
You should select the proper modes for your mapping; these usually aren't invoked by :F5Enter (for which you would need :cmap, and omit the :), but rather by just pressing F5 from normal mode (:nmap).
You should use :noremap; it makes the mapping immune to remapping and recursion.
:nnoremap <F5> :!D:\Python27\python.exe %<CR>

Convert UNC path to local path (and general path handling in Python)

System: Python 2.6 on Windows 7 64 bit
Recently I did a lot of path formatting in Python. Since the string modifications are always dangerous I started to do it the proper way by using the 'os.path' module.
The first question is if this is the right way to to handle incoming paths? Or can I somehow optimize this?
sCleanPath = sRawPath.replace('\n', '')
sCleanPath = sCleanPath.replace('\\', '/')
sCleanPythonPath = os.path.normpath(sCleanPath)
For formatting the 'sCleanPythonPath' now I use only functions from the 'os.path' module. This works very nice and I didn't had any problems so far.
There is only one exception. I have to change paths so they point not longer on a network storage but on a local drive. Is started to use 'os.path.splitunc()' in conjunction with 'os.path.join()'.
aUncSplit = os.path.splitunc(sCleanPythonUncPath)
sLocalDrive = os.path.normpath('X:/mount')
sCleanPythonLocalPath = (os.path.join(sLocalDrive, aUncSplit[1]))
Unfortunately this doesn't work due to the nature of how absolute paths are handled using 'os.path.join()'. All the solutions I found in the web are using string replacements again which I want to avoid by using the 'os.path' module. Have I overseen anything? Is there another, maybe a better way to do this?
All comments on this are very welcome!
You could optimize the first part slightly by removing the replace() call because on Windows normpath() converts forward slashes to backward slashes:
sCleanPath = sRawPath.replace('\n', '')
sCleanPythonPath = os.path.normpath(sCleanPath)
Here's something that would make the second part of your question work without doing a string replacement:
sSharedFolder = os.path.relpath(os.path.splitunc(sCleanPythonUncPath)[1], os.sep)
sLocalDrive = os.path.normpath('X:/mount') # why not hardcode the result?
sCleanPythonLocalPath = os.path.join(sLocalDrive, sSharedFolder)

How to make Python use a path that contains colons in it?

I have a program that includes an embedded Python 2.6 interpreter. When I invoke the interpreter, I call PySys_SetPath() to set the interpreter's import-path to the subdirectories installed next to my executable that contain my Python script files... like this:
PySys_SetPath("/path/to/my/program/scripts/type1:/path/to/my/program/scripts/type2");
(except that the path strings are dynamically generated based on the current location of my program's executable, not hard-coded as in the example above)
This works fine... except when the clever user decides to install my program underneath a folder that has a colon in its name. In that case, my PySys_SetPath() command ends up looking like this (note the presence of a folder named "path:to"):
PySys_SetPath("/path:to/my/program/scripts/type1:/path:to/my/program/scripts/type2");
... and this breaks all my Python scripts, because now Python looks for script files in "/path", and "to/my/program/scripts/type1" instead of in "/path:to/myprogram/scripts/type1", and so none of the import statements work.
My question is, is there any fix for this issue, other than telling the user to avoid colons in his folder names?
I looked at the makepathobject() function in Python/sysmodule.c, and it doesn't appear to support any kind of quoting or escaping to handle literal colons.... but maybe I am missing some nuance.
The problem you're running into is the PySys_SetPath function parses the string you pass using a colon as the delimiter. That parser sees each : character as delimiting a path, and there isn't a way around this (can't be escaped).
However, you can bypass this by creating a list of the individual paths (each of which may contain colons) and use PySys_SetObject to set the sys.path:
PyListObject *path;
path = (PyListObject *)PyList_New(0);
PyList_Append((PyObject *) path, PyString_FromString("foo:bar"));
PySys_SetObject("path", (PyObject *)path);
Now the interpreter will see "foo:bar" as a distinct component of the sys.path.
Supporting colons in a file path opens up a huge can of worms on multiple operating systems; it is not a valid path character on Windows or Mac OS X, for example, and it doesn't seem like a particularly reasonable thing to support in the context of a scripting environment either for exactly this reason. I'm actually a bit surprised that Linux allows colon filenames too, especially since : is a very common path separator character.
You might try escaping the colon out, i.e. converting /path:to/ to /path\:to/ and see if that works. Other than that, just tell the user to avoid using colons in their file names. They will run into all sorts of problems in quite a few different environments and it's a just plain bad idea.

How do I handle Python unicode strings with null-bytes the 'right' way?

Question
It seems that PyWin32 is comfortable with giving null-terminated unicode strings as return values. I would like to deal with these strings the 'right' way.
Let's say I'm getting a string like: u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'. This appears to be a C-style null-terminated string hanging out in a Python unicode object. I want to trim this bad boy down to a regular ol' string of characters that I could, for example, display in a window title bar.
Is trimming the string off at the first null byte the right way to deal with it?
I didn't expect to get a return value like this, so I wonder if I'm missing something important about how Python, Win32, and unicode play together... or if this is just a PyWin32 bug.
Background
I'm using the Win32 file chooser function GetOpenFileNameW from the PyWin32 package. According to the documentation, this function returns a tuple containing the full filename path as a Python unicode object.
When I open the dialog with an existing path and filename set, I get a strange return value.
For example I had the default set to: C:\\Users\\Guest\\MyFileIsReallyReallyReallyAwesome.asy
In the dialog I changed the name to MyFile.asy and clicked save.
The full path part of the return value was: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`
I expected it to be: u'C:\\Users\\Guest\\MyFile.asy'
The function is returning a recycled buffer without trimming off the terminating bytes. Needless to say, the rest of my code wasn't set up for handling a C-style null-terminated string.
Demo Code
The following code demonstrates null-terminated string in return value from GetSaveFileNameW.
Directions: In the dialog change the filename to 'MyFile.asy' then click Save. Observe what is printed to the console. The output I get is u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'.
import win32gui, win32con
if __name__ == "__main__":
initial_dir = 'C:\\Users\\Guest'
initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
filter_string = 'All Files\0*.*\0'
(filename, customfilter, flags) = \
win32gui.GetSaveFileNameW(InitialDir=initial_dir,
Flags=win32con.OFN_EXPLORER, File=initial_file,
DefExt='txt', Title="Save As", Filter=filter_string,
FilterIndex=0)
print repr(filename)
Note: If you don't shorten the filename enough (for example, if you try MyFileIsReally.asy) the string will be complete without a null byte.
Environment
Windows 7 Professional 64-bit (no service pack), Python 2.7.1, PyWin32 Build 216
UPDATE: PyWin32 Tracker Artifact
Based on the comments and answers I have received so far, this is likely a pywin32 bug so I filed a tracker artifact.
UPDATE 2: Fixed!
Mark Hammond reported in the tracker artifact that this is indeed a bug. A fix was checked in to rev f3fdaae5e93d, so hopefully that will make the next release.
I think Aleksi Torhamo's answer below is the best solution for versions of PyWin32 before the fix.
I'd say it's a bug. The right way to deal with it would probably be fixing pywin32, but in case you aren't feeling adventurous enough, just trim it.
You can get everything before the first '\x00' with filename.split('\x00', 1)[0].
This doesn't happen on the version of PyWin32/Windows/Python I tested; I don't get any nulls in the returned string even if it's very short. You might investigate if a newer version of one of the above fixes the bug.
ISTR that I had this issue some years ago, then I discovered that such Win32 filename-dialog-related functions return a sequence of 'filename1\0filename2\0...filenameN\0\0', while including possible garbage characters depending on the buffer that Windows allocated.
Now, you might prefer a list instead of the raw return value, but that would be a RFE, not a bug.
PS When I had this issue, I quite understood why one would expect GetOpenFileName to possibly return a list of filenames, while I couldn't imagine why GetSaveFileName would. Perhaps this is considered as API uniformity. Who am I to know, anyway?

Categories

Resources