This problem should be a no-brainer, but I haven't yet been able to nail it.
I need a function that takes two parameters, each a file path, relative or absolute, and returns a filepath which is the first path (target) resolved relative to the second path (start). The resolved path may be relative to the current directory or may be absolute (I don't care).
Here as an attempted implementation, complete with several doc tests, that exercises some sample uses cases (and demonstrates where it fails). A runnable script is also available on my source code repository, but it may change. The runnable script will run the doctest if no parameters are supplied or will pass one or two parameters to findpath if supplied.
def findpath(target, start=os.path.curdir):
r"""
Find a path from start to target where target is relative to start.
>>> orig_wd = os.getcwd()
>>> os.chdir('c:\\windows') # so we know what the working directory is
>>> findpath('d:\\')
'd:\\'
>>> findpath('d:\\', 'c:\\windows')
'd:\\'
>>> findpath('\\bar', 'd:\\')
'd:\\bar'
>>> findpath('\\bar', 'd:\\foo') # fails with '\\bar'
'd:\\bar'
>>> findpath('bar', 'd:\\foo')
'd:\\foo\\bar'
>>> findpath('bar\\baz', 'd:\\foo')
'd:\\foo\\bar\\baz'
>>> findpath('\\baz', 'd:\\foo\\bar') # fails with '\\baz'
'd:\\baz'
Since we're on the C drive, findpath may be allowed to return
relative paths for targets on the same drive. I use abspath to
confirm that the ultimate target is what we expect.
>>> os.path.abspath(findpath('\\bar'))
'c:\\bar'
>>> os.path.abspath(findpath('bar'))
'c:\\windows\\bar'
>>> findpath('..', 'd:\\foo\\bar')
'd:\\foo'
>>> findpath('..\\bar', 'd:\\foo')
'd:\\bar'
The parent of the root directory is the root directory.
>>> findpath('..', 'd:\\')
'd:\\'
restore the original working directory
>>> os.chdir(orig_wd)
"""
return os.path.normpath(os.path.join(start, target))
As you can see from the comments in the doctest, this implementation fails when the start specifies a drive letter and the target is relative to the root of the drive.
This brings up a few questions
Is this behavior a limitation of os.path.join? In other words, should os.path.join('d:\foo', '\bar') resolve to 'd:\bar'? As a Windows user, I tend to think so, but I hate to think that a mature function like path.join would need alteration to handle this use case.
Is there an example of an existing target path resolver such as findpath that will work in all of these test cases?
If 'no' to the above questions, how would you implement this desired behavior?
I agree with you: this seems like a deficiency in os.path.join. Looks like you have to deal with the drives separately. This code passes all your tests:
def findpath(target, start=os.path.curdir):
sdrive, start = os.path.splitdrive(start)
tdrive, target = os.path.splitdrive(target)
rdrive = tdrive or sdrive
return os.path.normpath(os.path.join(rdrive, os.path.join(start, target)))
(and yes, I had to nest two os.path.join's to get it to work...)
Related
I am very new to WAF. I have configuration function like:
def configure(ctx):
########################################################################
# **/myexe does not work too; also abs path in path_list does not work!
ctx.find_program('myexe', var='MYEXE', path_list=['mydir/here'])
and it does not find myexe binary. Only if I pass 'mydir/here/this_dir'! It seems that find_program() is not recursive. How to do it in recursive way? Maybe another method?
find_program is not recursive, meaning that it doesn't look for subdirectories of the ones you provide. It's for efficiency and security reasons. That the same when your OS look for binaries, it looks in a path list (usually through the PATH environment variable) but not recursively in subdirectories. A hacker can put a modified command in a subdirectory that will be used instead of the real one. That why the current directory is never in PATH :)
As waf is python, and if you absolutely want to get that behavior, you can implement it :)
Say, I have two absolute paths. I need to check if the location referring to by one of the paths is a descendant of the other. If true, I need to find out the relative path of the descendant from the ancestor. What's a good way to implement this in Python? Any library that I can benefit from?
os.path.commonprefix() and os.path.relpath() are your friends:
>>> print os.path.commonprefix(['/usr/var/log', '/usr/var/security'])
'/usr/var'
>>> print os.path.commonprefix(['/tmp', '/usr/var']) # No common prefix: the root is the common prefix
'/'
You can thus test whether the common prefix is one of the paths, i.e. if one of the paths is a common ancestor:
paths = […, …, …]
common_prefix = os.path.commonprefix(list_of_paths)
if common_prefix in paths:
…
You can then find the relative paths:
relative_paths = [os.path.relpath(path, common_prefix) for path in paths]
You can even handle more than two paths, with this method, and test whether all the paths are all below one of them.
PS: depending on how your paths look like, you might want to perform some normalization first (this is useful in situations where one does not know whether they always end with '/' or not, or if some of the paths are relative). Relevant functions include os.path.abspath() and os.path.normpath().
PPS: as Peter Briggs mentioned in the comments, the simple approach described above can fail:
>>> os.path.commonprefix(['/usr/var', '/usr/var2/log'])
'/usr/var'
even though /usr/var is not a common prefix of the paths. Forcing all paths to end with '/' before calling commonprefix() solves this (specific) problem.
PPPS: as bluenote10 mentioned, adding a slash does not solve the general problem. Here is his followup question: How to circumvent the fallacy of Python's os.path.commonprefix?
PPPPS: starting with Python 3.4, we have pathlib, a module that provides a saner path manipulation environment. I guess that the common prefix of a set of paths can be obtained by getting all the prefixes of each path (with PurePath.parents()), taking the intersection of all these parent sets, and selecting the longest common prefix.
PPPPPS: Python 3.5 introduced a proper solution to this question: os.path.commonpath(), which returns a valid path.
os.path.relpath:
Return a relative filepath to path either from the current directory or from an optional start point.
>>> from os.path import relpath
>>> relpath('/usr/var/log/', '/usr/var')
'log'
>>> relpath('/usr/var/log/', '/usr/var/sad/')
'../log'
So, if relative path starts with '..' - it means that the second path is not descendant of the first path.
In Python3 you can use PurePath.relative_to:
Python 3.5.1 (default, Jan 22 2016, 08:54:32)
>>> from pathlib import Path
>>> Path('/usr/var/log').relative_to('/usr/var/log/')
PosixPath('.')
>>> Path('/usr/var/log').relative_to('/usr/var/')
PosixPath('log')
>>> Path('/usr/var/log').relative_to('/etc/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pathlib.py", line 851, in relative_to
.format(str(self), str(formatted)))
ValueError: '/usr/var/log' does not start with '/etc'
A write-up of jme's suggestion, using pathlib, in Python 3.
from pathlib import Path
parent = Path(r'/a/b')
son = Path(r'/a/b/c/d')
if parent in son.parents or parent==son:
print(son.relative_to(parent)) # returns Path object equivalent to 'c/d'
Another option is
>>> print os.path.relpath('/usr/var/log/', '/usr/var')
log
Pure Python2 w/o dep:
def relpath(cwd, path):
"""Create a relative path for path from cwd, if possible"""
if sys.platform == "win32":
cwd = cwd.lower()
path = path.lower()
_cwd = os.path.abspath(cwd).split(os.path.sep)
_path = os.path.abspath(path).split(os.path.sep)
eq_until_pos = None
for i in xrange(min(len(_cwd), len(_path))):
if _cwd[i] == _path[i]:
eq_until_pos = i
else:
break
if eq_until_pos is None:
return path
newpath = [".." for i in xrange(len(_cwd[eq_until_pos+1:]))]
newpath.extend(_path[eq_until_pos+1:])
return os.path.join(*newpath) if newpath else "."
Edit : See jme's answer for the best way with Python3.
Using pathlib, you have the following solution :
Let's say we want to check if son is a descendant of parent, and both are Path objects.
We can get a list of the parts in the path with list(parent.parts).
Then, we just check that the begining of the son is equal to the list of segments of the parent.
>>> lparent = list(parent.parts)
>>> lson = list(son.parts)
>>> if lson[:len(lparent)] == lparent:
>>> ... #parent is a parent of son :)
If you want to get the remaining part, you can just do
>>> ''.join(lson[len(lparent):])
It's a string, but you can of course use it as a constructor of an other Path object.
I'm learning Python and I noticed something strange with one of my scripts. Doing a little testing I discovered the problem stemmed from this behavior:
>>> import os
>>> os.path.join('a','b')
'a/b'
>>> os.path.join('a','/b')
'/b'
Checking the documentation, this is, in fact, the design of the function:
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. ...
My question isn't why my script failed, but rather why the function was designed this way. I mean, on Unix at least, a//b is a perfectly acceptable way to designate a path, if not elegant. Why was the function designed this way? Is there any way to tell if one or more path elements have been discarded short of testing each path string with os.path.isabs()?
Out of curiosity, I also checked the case where a path component ends in an os.sep character:
>>> os.path.join('a/','b')
'a/b'
That works as expected.
One case where it is useful for os.path.join('a', '/b') to return /b would be if you ask a user for a filename.
The user can enter either a path relative to the current directory, or a full path, and your program could handle both cases like this:
os.path.join(os.getcwd(), filename)
In [54]: os.getcwd()
Out[54]: '/tmp'
In [55]: os.path.join(os.getcwd(), 'foo')
Out[55]: '/tmp/foo'
In [56]: os.path.join(os.getcwd(), '/foo/bar')
Out[56]: '/foo/bar'
Think you're writing a utility like cd to check the new directory, you would use
os.path.join(currdir, newdir)
If the user enters /b you would except it to throw the first argument. This hold for plenty of thing using current directory.
Say, I have two absolute paths. I need to check if the location referring to by one of the paths is a descendant of the other. If true, I need to find out the relative path of the descendant from the ancestor. What's a good way to implement this in Python? Any library that I can benefit from?
os.path.commonprefix() and os.path.relpath() are your friends:
>>> print os.path.commonprefix(['/usr/var/log', '/usr/var/security'])
'/usr/var'
>>> print os.path.commonprefix(['/tmp', '/usr/var']) # No common prefix: the root is the common prefix
'/'
You can thus test whether the common prefix is one of the paths, i.e. if one of the paths is a common ancestor:
paths = […, …, …]
common_prefix = os.path.commonprefix(list_of_paths)
if common_prefix in paths:
…
You can then find the relative paths:
relative_paths = [os.path.relpath(path, common_prefix) for path in paths]
You can even handle more than two paths, with this method, and test whether all the paths are all below one of them.
PS: depending on how your paths look like, you might want to perform some normalization first (this is useful in situations where one does not know whether they always end with '/' or not, or if some of the paths are relative). Relevant functions include os.path.abspath() and os.path.normpath().
PPS: as Peter Briggs mentioned in the comments, the simple approach described above can fail:
>>> os.path.commonprefix(['/usr/var', '/usr/var2/log'])
'/usr/var'
even though /usr/var is not a common prefix of the paths. Forcing all paths to end with '/' before calling commonprefix() solves this (specific) problem.
PPPS: as bluenote10 mentioned, adding a slash does not solve the general problem. Here is his followup question: How to circumvent the fallacy of Python's os.path.commonprefix?
PPPPS: starting with Python 3.4, we have pathlib, a module that provides a saner path manipulation environment. I guess that the common prefix of a set of paths can be obtained by getting all the prefixes of each path (with PurePath.parents()), taking the intersection of all these parent sets, and selecting the longest common prefix.
PPPPPS: Python 3.5 introduced a proper solution to this question: os.path.commonpath(), which returns a valid path.
os.path.relpath:
Return a relative filepath to path either from the current directory or from an optional start point.
>>> from os.path import relpath
>>> relpath('/usr/var/log/', '/usr/var')
'log'
>>> relpath('/usr/var/log/', '/usr/var/sad/')
'../log'
So, if relative path starts with '..' - it means that the second path is not descendant of the first path.
In Python3 you can use PurePath.relative_to:
Python 3.5.1 (default, Jan 22 2016, 08:54:32)
>>> from pathlib import Path
>>> Path('/usr/var/log').relative_to('/usr/var/log/')
PosixPath('.')
>>> Path('/usr/var/log').relative_to('/usr/var/')
PosixPath('log')
>>> Path('/usr/var/log').relative_to('/etc/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pathlib.py", line 851, in relative_to
.format(str(self), str(formatted)))
ValueError: '/usr/var/log' does not start with '/etc'
A write-up of jme's suggestion, using pathlib, in Python 3.
from pathlib import Path
parent = Path(r'/a/b')
son = Path(r'/a/b/c/d')
if parent in son.parents or parent==son:
print(son.relative_to(parent)) # returns Path object equivalent to 'c/d'
Another option is
>>> print os.path.relpath('/usr/var/log/', '/usr/var')
log
Pure Python2 w/o dep:
def relpath(cwd, path):
"""Create a relative path for path from cwd, if possible"""
if sys.platform == "win32":
cwd = cwd.lower()
path = path.lower()
_cwd = os.path.abspath(cwd).split(os.path.sep)
_path = os.path.abspath(path).split(os.path.sep)
eq_until_pos = None
for i in xrange(min(len(_cwd), len(_path))):
if _cwd[i] == _path[i]:
eq_until_pos = i
else:
break
if eq_until_pos is None:
return path
newpath = [".." for i in xrange(len(_cwd[eq_until_pos+1:]))]
newpath.extend(_path[eq_until_pos+1:])
return os.path.join(*newpath) if newpath else "."
Edit : See jme's answer for the best way with Python3.
Using pathlib, you have the following solution :
Let's say we want to check if son is a descendant of parent, and both are Path objects.
We can get a list of the parts in the path with list(parent.parts).
Then, we just check that the begining of the son is equal to the list of segments of the parent.
>>> lparent = list(parent.parts)
>>> lson = list(son.parts)
>>> if lson[:len(lparent)] == lparent:
>>> ... #parent is a parent of son :)
If you want to get the remaining part, you can just do
>>> ''.join(lson[len(lparent):])
It's a string, but you can of course use it as a constructor of an other Path object.
I would quite like a set of filename components that will give me consistent and "nice-looking" filenames on both Windows and Cygwin. Here's what I've tried:
Input Windows Cygwin
1 os.path.join('c:', 'foo', 'bar') c:foo\bar c:/foo/bar
2 os.path.join('c:\\', 'foo', 'bar') c:\foo\bar c:\/foo/bar
3 os.path.join('c:/', 'foo', 'bar') c:/foo\bar c:/foo/bar
1 isn't what I want on Windows, I do want an absolute path, not relative to the current directory.
2 and 3 both work, but are not (I hereby define) "nice-looking" since they mix up forward and backward slashes on one platform or the other. My error and logging messages will be more readable if I can avoid this.
Option 4 is to define myself a driveroot variable equal to c:\ on Windows and /cygdrive/c on Cygwin. Or a function taking a drive letter and returning same. But I'd also prefer to avoid per-platform special cases between these two.
Can I have everything I want (join identical path components, to give a result that refers to the same absolute path on both platforms, and doesn't mix path separators on either platform)? Or do I have to compromise somewhere?
[Edit: in case it helps, the main use case is that c:\foo is a path that I know about at configuration time, whereas bar (and further components) are computed later. So my actual code currently looks a bit more like this:
dir = os.path.join('c:\\', 'foo')
# some time later
os.path.join(dir, 'bar')
That's using option 2, which results in "nice" reporting of filenames in Windows, but "not nice" reporting of filenames in Cygwin. What I want to avoid, if it's possible, is:
if this_is_cygwin():
dir = '/cygdrive/c/foo'
else:
dir = 'c:\\foo'
]
Edit: David pointed out that my understanding of 'current directory' on Windows is wrong. Sorry. However, try using os.path.abspath to get nice pretty paths, or os.path.normpath if you don't need absolute paths.
In method 1,
os.path.join('c:', 'foo', 'bar')
The output, c:foo\bar, is correct. This means the path foo\bar below the current directory on c:, which is not the same thing as the root directory. os.path.join is doing exactly what you tell it to, which is to take c: (the current directory on drive c) and add some extra bits to it.
In method 2,
os.path.join(r'c:\', 'foo', 'bar')
The output for Cygwin is correct, because in Cygwin, c:\ is not the root of a drive.
What you want is the os.abspath function. This will give you the absolute, normalized version of the path that you give to it.
However: You won't get the same string on Cygwin and Windows. I hope you're not looking for that.
At work, I have to deal with pretty random combinations of Windows without Cygiwn, Cygwin with non-Cygwin Python, Cygwin with Cygwin Python, and Unix. I haven't found a way of coping with this stuff that I'm particularly proud of, but the least hateful approach I've found so far is to always use what Cygwin calls a "mixed style" path on Windows. That's a Windows style path, but with forward slashes instead of backslashes. E.g., c:/foo/bar.txt. It also avoids a lot of gotchas, such as Cygwin bash shells seeing "\" as an escape character. Sadly, this means missing out on a lot of Python's built-in path manipulation utilities and doing things the hard way.
I'm don't have access to a machine with Python and Cygwin on it ATM, so I can't test the code snippets below. I apologize for any errors...
#Combine a path
path = '/'.join([ 'c:', 'foo', 'bar'])
#Split it back apart
pieces = path.split('/')
When in doubt, try to call Cygwin's cygpath utility. A lot of weird edge cases pop up in Cygwin, such as the fact that /cygdrive/d/ == d:\ and yet /cygdrive/d/../../ == c:\cygwin (or wherever you have Cygwin installed). Also remember that backslashes are used as escape characters in Unix style paths, such as /cygdrive/c/Documents\ and\ Settings. Cygpath does an astonishing job of taking care of these, and if it's not available it's generally safe to assume that your weird edge cases don't exist.
import sys
import subprocess
#Buncha code here ...
#We got somepath from somewhere, and don't know what format it's in.
try:
somepath = subprocess.check_output(['cygpath', '-m', somepath])
except subprocess.CalledProcessError:
#Cheap attempt at coping with the possibility that we're in Windows, but cygpath isn't available.
if sys.platform.startswith('win32'):
mypath = somepath.replace('\\', '/')
else:
mypath = somepath
#Now we can assume somepath is using forward slashes for delimiters.
Some Windows commands get confused if you pass them a Windows style path, and some Cygwin commands get confused if you pass in a Windows or a mixed style path. For instance, rsync can get confused by "c:/foo/bar.txt" because that looks like you're trying to specify "/foo/bar.txt" on a remote computer named "c". When you're calling one of these finicky Windows or Cygwin programs, use cygpath to make it happy. If you're calling a finicky Windows program and cygpath isn't available, try the ghetto "winpath = mypath.replace('/', '\')" approach. I think that can fail if you're converting a Unix style path and don't have Cygwin available, but hopefully if you don't have Cygwin available you don't have any Unix style paths on Windows to start with...