os.path.join - can I get consistency between Windows and Cygwin?

os.path.join - can I get consistency between Windows and Cygwin? - python

I would quite like a set of filename components that will give me consistent and "nice-looking" filenames on both Windows and Cygwin. Here's what I've tried:
Input Windows Cygwin
1 os.path.join('c:', 'foo', 'bar') c:foo\bar c:/foo/bar
2 os.path.join('c:\\', 'foo', 'bar') c:\foo\bar c:\/foo/bar
3 os.path.join('c:/', 'foo', 'bar') c:/foo\bar c:/foo/bar
1 isn't what I want on Windows, I do want an absolute path, not relative to the current directory.
2 and 3 both work, but are not (I hereby define) "nice-looking" since they mix up forward and backward slashes on one platform or the other. My error and logging messages will be more readable if I can avoid this.
Option 4 is to define myself a driveroot variable equal to c:\ on Windows and /cygdrive/c on Cygwin. Or a function taking a drive letter and returning same. But I'd also prefer to avoid per-platform special cases between these two.
Can I have everything I want (join identical path components, to give a result that refers to the same absolute path on both platforms, and doesn't mix path separators on either platform)? Or do I have to compromise somewhere?
[Edit: in case it helps, the main use case is that c:\foo is a path that I know about at configuration time, whereas bar (and further components) are computed later. So my actual code currently looks a bit more like this:
dir = os.path.join('c:\\', 'foo')
# some time later
os.path.join(dir, 'bar')
That's using option 2, which results in "nice" reporting of filenames in Windows, but "not nice" reporting of filenames in Cygwin. What I want to avoid, if it's possible, is:
if this_is_cygwin():
dir = '/cygdrive/c/foo'
else:
dir = 'c:\\foo'
]

Edit: David pointed out that my understanding of 'current directory' on Windows is wrong. Sorry. However, try using os.path.abspath to get nice pretty paths, or os.path.normpath if you don't need absolute paths.
In method 1,
os.path.join('c:', 'foo', 'bar')
The output, c:foo\bar, is correct. This means the path foo\bar below the current directory on c:, which is not the same thing as the root directory. os.path.join is doing exactly what you tell it to, which is to take c: (the current directory on drive c) and add some extra bits to it.
In method 2,
os.path.join(r'c:\', 'foo', 'bar')
The output for Cygwin is correct, because in Cygwin, c:\ is not the root of a drive.
What you want is the os.abspath function. This will give you the absolute, normalized version of the path that you give to it.
However: You won't get the same string on Cygwin and Windows. I hope you're not looking for that.

At work, I have to deal with pretty random combinations of Windows without Cygiwn, Cygwin with non-Cygwin Python, Cygwin with Cygwin Python, and Unix. I haven't found a way of coping with this stuff that I'm particularly proud of, but the least hateful approach I've found so far is to always use what Cygwin calls a "mixed style" path on Windows. That's a Windows style path, but with forward slashes instead of backslashes. E.g., c:/foo/bar.txt. It also avoids a lot of gotchas, such as Cygwin bash shells seeing "\" as an escape character. Sadly, this means missing out on a lot of Python's built-in path manipulation utilities and doing things the hard way.
I'm don't have access to a machine with Python and Cygwin on it ATM, so I can't test the code snippets below. I apologize for any errors...
#Combine a path
path = '/'.join([ 'c:', 'foo', 'bar'])
#Split it back apart
pieces = path.split('/')
When in doubt, try to call Cygwin's cygpath utility. A lot of weird edge cases pop up in Cygwin, such as the fact that /cygdrive/d/ == d:\ and yet /cygdrive/d/../../ == c:\cygwin (or wherever you have Cygwin installed). Also remember that backslashes are used as escape characters in Unix style paths, such as /cygdrive/c/Documents\ and\ Settings. Cygpath does an astonishing job of taking care of these, and if it's not available it's generally safe to assume that your weird edge cases don't exist.
import sys
import subprocess
#Buncha code here ...
#We got somepath from somewhere, and don't know what format it's in.
try:
somepath = subprocess.check_output(['cygpath', '-m', somepath])
except subprocess.CalledProcessError:
#Cheap attempt at coping with the possibility that we're in Windows, but cygpath isn't available.
if sys.platform.startswith('win32'):
mypath = somepath.replace('\\', '/')
else:
mypath = somepath
#Now we can assume somepath is using forward slashes for delimiters.
Some Windows commands get confused if you pass them a Windows style path, and some Cygwin commands get confused if you pass in a Windows or a mixed style path. For instance, rsync can get confused by "c:/foo/bar.txt" because that looks like you're trying to specify "/foo/bar.txt" on a remote computer named "c". When you're calling one of these finicky Windows or Cygwin programs, use cygpath to make it happy. If you're calling a finicky Windows program and cygpath isn't available, try the ghetto "winpath = mypath.replace('/', '\')" approach. I think that can fail if you're converting a Unix style path and don't have Cygwin available, but hopefully if you don't have Cygwin available you don't have any Unix style paths on Windows to start with...

Related

What's the use case for pythons os.path.join dropping all arguments before when one is an absolute path? [duplicate]

I'm learning Python and I noticed something strange with one of my scripts. Doing a little testing I discovered the problem stemmed from this behavior:
>>> import os
>>> os.path.join('a','b')
'a/b'
>>> os.path.join('a','/b')
'/b'
Checking the documentation, this is, in fact, the design of the function:
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. ...
My question isn't why my script failed, but rather why the function was designed this way. I mean, on Unix at least, a//b is a perfectly acceptable way to designate a path, if not elegant. Why was the function designed this way? Is there any way to tell if one or more path elements have been discarded short of testing each path string with os.path.isabs()?
Out of curiosity, I also checked the case where a path component ends in an os.sep character:
>>> os.path.join('a/','b')
'a/b'
That works as expected.

One case where it is useful for os.path.join('a', '/b') to return /b would be if you ask a user for a filename.
The user can enter either a path relative to the current directory, or a full path, and your program could handle both cases like this:
os.path.join(os.getcwd(), filename)
In [54]: os.getcwd()
Out[54]: '/tmp'
In [55]: os.path.join(os.getcwd(), 'foo')
Out[55]: '/tmp/foo'
In [56]: os.path.join(os.getcwd(), '/foo/bar')
Out[56]: '/foo/bar'

Think you're writing a utility like cd to check the new directory, you would use
os.path.join(currdir, newdir)
If the user enters /b you would except it to throw the first argument. This hold for plenty of thing using current directory.

Methods to avoid hard-coding file paths in Python

Working with scientific data, specifically climate data, I am constantly hard-coding paths to data directories in my Python code. Even if I were to write the most extensible code in the world, the hard-coded file paths prevent it from ever being truly portable. I also feel like having information about the file system of your machine coded in your programs could be security issue.
What solutions are out there for handling the configuration of paths in Python to avoid having to code them out explicitly?

One of the solution rely on using configuration files.
You can store all your path in a json file like so :
{
"base_path" : "/home/bob/base_folder",
"low_temp_area_path" : "/home/bob/base/folder/low_temp"
}
and then in your python code, you could just do :
import json
with open("conf.json") as json_conf :
CONF = json.load(json_conf)
and then you can use your path (or any configuration variable you like) like so :
print "The base path is {}".format(CONF["base_path"])

First off its always good practise to add a main function to go with each class to test that class or functions in the file. Along with this you determine the current working directory. This becomes incredibly important when running python from a cron job or from a directory that is not the current working directory. No JSON files or environment variables are then needed and you will obtain interoperation across Mac, RHEL and Debian distributions.
This is how you do it, and it will work on windows also if you use '\' instead of '/' (if that is even necessary, in your case).
if "__main__" == __name__:
workingDirectory = os.path.realpath(sys.argv[0])
As you can see when you run your command, the working directory is calculated if you provide a full path or relative path, meaning it will work in a cron job automatically.
After that if you want to work with data that is stored in the current directory use:
fileName = os.path.join( workingDirectory, './sub-folder-of-current-directory/filename.csv' )
fp = open( fileName,'r')
or in the case of the above working directory (parallel to your project directory):
fileName = os.path.join( workingDirectory, '../folder-at-same-level-as-my-project/filename.csv' )
fp = open( fileName,'r')

I believe there are many ways around this, but here is what I would do:
Create a JSON config file with all the paths I need defined.
For even more portability, I'd have a default path where I look for this config file but also have a command line input to change it.

In my opinion passing arguments from command line would be best solution. You should take a look at argparse . This allows you to create nice way to handle arguments from the command line. for example:
myDataScript.py /home/userName/datasource1/

How do I tell which actual dll is being returned (x86 v x64)?

Let's focus on one dll: C:\Windows\System32\wbem\wmiutils.dll. Why? Because it's the file in which I personally discovered Windows delivers a different dll depending on process architecture.
TLDR; Is there a way to programmatically determine the actual path of the dll that was returned by the file system redirector?
I understand that if launched as a x86 process, I get C:\Windows\SysWOW64\wbem\wmiutils.dll. And, if launched as a x64 process, I get C:\Windows\System32\wbem\wmiutils.dll.
I need to determine which wmiutils.dll I'm actually looking at. The redirector makes system32\wbem\wmiutils.dll look and feel identical but it's not. If I use parent path, I get C:\Windows\System32\wbem even though I may/may not be looking at C:\Windows\SysWOW64\wbem.
Any sweet python magic to make this happen? I can't seem to see anything from other languages I can port. Based on my use case, I've come up with a couple hacks but they're just that. Hoping somebody has found a solution as easy as parent path that actually works in this case.

import ctypes, hashlib
k32 = ctypes.windll.kernel32
oldValue = ctypes.c_long(0)
k32.Wow64DisableWow64FsRedirection(ctypes.byref(oldValue)) # Should open 32-bit
with open(r"C:\Windows\System32\wbem\wmiutil.dll", "rb") as f:
checksum32 = hashlib.md5(f.read()).hexdigest()
k32.Wow64RevertWow64FsRedirection(oldValue) # Should use what Windows thinks you need
with open(r"C:\Windows\System32\wbem\wmiutil.dll", "rb") as f:
checksum64 = hashlib.md5(f.read()).hexdigest()
if (checksum32 != checksum64):
print("You're running 64bit wmiutil dll")
I don't have Windows Python to test this, but it should work according to https://msdn.microsoft.com/en-us/library/windows/desktop/aa365745%28v=vs.85%29.aspx.
I think an easier way would be to just do some test like creating a struct and seeing if it's 8 bytes or 4 bytes. Then you can assume that Windows is using the 64-bit version of DLLs if it's 8 bytes.

Does os.walk take advantage of the file type returned by the OS for efficiency?

The os.walk function returns separate lists for directories and files. The underlying OS calls on many common operating systems such as Windows and Linux return a file type or flag specifying whether each directory entry is a file or a directory; without this flag it's necessary to query the OS again for each returned filename. Does the code for os.walk make use of this information or does it throw it away as os.listdir does?

Nope, it does not.
Under the hood, os.walk() uses os.listdir() and os.path.isdir() to list files and directories separately. See the source code of walk().
Specifically:
try:
# Note that listdir and error are globals in this module due
# to earlier import-*.
names = listdir(top)
except error, err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
where listdir and isdir are module globals for the os.listdir() and os.path.isdir() functions. It calls itself recursively for subdirs.

As Martijn Pieters's answer explains, os.walk just uses os.listdir and os.path.isdir.
There's been some discussion on this a few times on the mailing lists, but no concrete suggestion for the stdlib has ever come out of it. There are various edge cases that make this less trivial than it seems. Also, if Python 3.4 or later grows a new path module, there's a good chance os.walk will just be replaced/deprecated rather than improved in place.
However, there are a number of third-party modules that you can use.
The simplest is probably Ben Hoyt's betterwalk. I believe he's intending to get this on PyPI, and maybe even submit it for Python 3.4 or later, but at present you have to install it off github. betterwalk provides an os.listdir replacement called iterdir_stat, and a 90%-complete os.walk replacement built on top of it. On most POSIX systems, and Win32, it can usually avoid unnecessary stat calls. (There are some cases where it can't do as good a job as fts (3)/nftw (3)/find (1), but at worst it just does some unnecessary calls, rather than failing. The parts that may not be complete, last I checked, are dealing with symlinks, and maybe error handling.)
There's also a nice wrapper around fts for POSIX systems, which is obviously ideal as far as performance goes on modern POSIX systems—but it has a different (better, in my opinion, but still different) interface, and doesn't support Windows or other platforms (or even older POSIX systems).
There are also about 30-odd "everything under the sun to do with paths" modules on PyPI and elsewhere, some of which have new walk-like functions.

Python prevent os.getcwd() to give lower case results on windows using MKS shell

On Windows if i use MKS toolkit shell, os.getcwd() function returns value in lower case. However on using windows cmd, it returned exact path.
Is it possible in Python by any means for os.getcwd() to return the exact path (without converting to lower case on Windows)?

Are you sure about this behavior? It's not documented, seems counter-intuitive, and I'm not able to reproduce it (on Windows 7 using Python 2.7.2):
>>> import os
>>> print os.getcwd()
C:\Users\foofoofoo
Note the capital characters at the start.

Before starting Python and using os.getcwd(), in your console you probably used "cd c:\your_path". It matters if this 'c' is lower or upper.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.