Get relative path from comparing two absolute paths - python

Say, I have two absolute paths. I need to check if the location referring to by one of the paths is a descendant of the other. If true, I need to find out the relative path of the descendant from the ancestor. What's a good way to implement this in Python? Any library that I can benefit from?

os.path.commonprefix() and os.path.relpath() are your friends:
>>> print os.path.commonprefix(['/usr/var/log', '/usr/var/security'])
'/usr/var'
>>> print os.path.commonprefix(['/tmp', '/usr/var']) # No common prefix: the root is the common prefix
'/'
You can thus test whether the common prefix is one of the paths, i.e. if one of the paths is a common ancestor:
paths = […, …, …]
common_prefix = os.path.commonprefix(list_of_paths)
if common_prefix in paths:
…
You can then find the relative paths:
relative_paths = [os.path.relpath(path, common_prefix) for path in paths]
You can even handle more than two paths, with this method, and test whether all the paths are all below one of them.
PS: depending on how your paths look like, you might want to perform some normalization first (this is useful in situations where one does not know whether they always end with '/' or not, or if some of the paths are relative). Relevant functions include os.path.abspath() and os.path.normpath().
PPS: as Peter Briggs mentioned in the comments, the simple approach described above can fail:
>>> os.path.commonprefix(['/usr/var', '/usr/var2/log'])
'/usr/var'
even though /usr/var is not a common prefix of the paths. Forcing all paths to end with '/' before calling commonprefix() solves this (specific) problem.
PPPS: as bluenote10 mentioned, adding a slash does not solve the general problem. Here is his followup question: How to circumvent the fallacy of Python's os.path.commonprefix?
PPPPS: starting with Python 3.4, we have pathlib, a module that provides a saner path manipulation environment. I guess that the common prefix of a set of paths can be obtained by getting all the prefixes of each path (with PurePath.parents()), taking the intersection of all these parent sets, and selecting the longest common prefix.
PPPPPS: Python 3.5 introduced a proper solution to this question: os.path.commonpath(), which returns a valid path.

os.path.relpath:
Return a relative filepath to path either from the current directory or from an optional start point.
>>> from os.path import relpath
>>> relpath('/usr/var/log/', '/usr/var')
'log'
>>> relpath('/usr/var/log/', '/usr/var/sad/')
'../log'
So, if relative path starts with '..' - it means that the second path is not descendant of the first path.
In Python3 you can use PurePath.relative_to:
Python 3.5.1 (default, Jan 22 2016, 08:54:32)
>>> from pathlib import Path
>>> Path('/usr/var/log').relative_to('/usr/var/log/')
PosixPath('.')
>>> Path('/usr/var/log').relative_to('/usr/var/')
PosixPath('log')
>>> Path('/usr/var/log').relative_to('/etc/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pathlib.py", line 851, in relative_to
.format(str(self), str(formatted)))
ValueError: '/usr/var/log' does not start with '/etc'

A write-up of jme's suggestion, using pathlib, in Python 3.
from pathlib import Path
parent = Path(r'/a/b')
son = Path(r'/a/b/c/d')
​
if parent in son.parents or parent==son:
print(son.relative_to(parent)) # returns Path object equivalent to 'c/d'

Another option is
>>> print os.path.relpath('/usr/var/log/', '/usr/var')
log

Pure Python2 w/o dep:
def relpath(cwd, path):
"""Create a relative path for path from cwd, if possible"""
if sys.platform == "win32":
cwd = cwd.lower()
path = path.lower()
_cwd = os.path.abspath(cwd).split(os.path.sep)
_path = os.path.abspath(path).split(os.path.sep)
eq_until_pos = None
for i in xrange(min(len(_cwd), len(_path))):
if _cwd[i] == _path[i]:
eq_until_pos = i
else:
break
if eq_until_pos is None:
return path
newpath = [".." for i in xrange(len(_cwd[eq_until_pos+1:]))]
newpath.extend(_path[eq_until_pos+1:])
return os.path.join(*newpath) if newpath else "."

Edit : See jme's answer for the best way with Python3.
Using pathlib, you have the following solution :
Let's say we want to check if son is a descendant of parent, and both are Path objects.
We can get a list of the parts in the path with list(parent.parts).
Then, we just check that the begining of the son is equal to the list of segments of the parent.
>>> lparent = list(parent.parts)
>>> lson = list(son.parts)
>>> if lson[:len(lparent)] == lparent:
>>> ... #parent is a parent of son :)
If you want to get the remaining part, you can just do
>>> ''.join(lson[len(lparent):])
It's a string, but you can of course use it as a constructor of an other Path object.

Related

dynamic way to find relative paths from absolute paths in python [duplicate]

Say, I have two absolute paths. I need to check if the location referring to by one of the paths is a descendant of the other. If true, I need to find out the relative path of the descendant from the ancestor. What's a good way to implement this in Python? Any library that I can benefit from?
os.path.commonprefix() and os.path.relpath() are your friends:
>>> print os.path.commonprefix(['/usr/var/log', '/usr/var/security'])
'/usr/var'
>>> print os.path.commonprefix(['/tmp', '/usr/var']) # No common prefix: the root is the common prefix
'/'
You can thus test whether the common prefix is one of the paths, i.e. if one of the paths is a common ancestor:
paths = […, …, …]
common_prefix = os.path.commonprefix(list_of_paths)
if common_prefix in paths:
…
You can then find the relative paths:
relative_paths = [os.path.relpath(path, common_prefix) for path in paths]
You can even handle more than two paths, with this method, and test whether all the paths are all below one of them.
PS: depending on how your paths look like, you might want to perform some normalization first (this is useful in situations where one does not know whether they always end with '/' or not, or if some of the paths are relative). Relevant functions include os.path.abspath() and os.path.normpath().
PPS: as Peter Briggs mentioned in the comments, the simple approach described above can fail:
>>> os.path.commonprefix(['/usr/var', '/usr/var2/log'])
'/usr/var'
even though /usr/var is not a common prefix of the paths. Forcing all paths to end with '/' before calling commonprefix() solves this (specific) problem.
PPPS: as bluenote10 mentioned, adding a slash does not solve the general problem. Here is his followup question: How to circumvent the fallacy of Python's os.path.commonprefix?
PPPPS: starting with Python 3.4, we have pathlib, a module that provides a saner path manipulation environment. I guess that the common prefix of a set of paths can be obtained by getting all the prefixes of each path (with PurePath.parents()), taking the intersection of all these parent sets, and selecting the longest common prefix.
PPPPPS: Python 3.5 introduced a proper solution to this question: os.path.commonpath(), which returns a valid path.
os.path.relpath:
Return a relative filepath to path either from the current directory or from an optional start point.
>>> from os.path import relpath
>>> relpath('/usr/var/log/', '/usr/var')
'log'
>>> relpath('/usr/var/log/', '/usr/var/sad/')
'../log'
So, if relative path starts with '..' - it means that the second path is not descendant of the first path.
In Python3 you can use PurePath.relative_to:
Python 3.5.1 (default, Jan 22 2016, 08:54:32)
>>> from pathlib import Path
>>> Path('/usr/var/log').relative_to('/usr/var/log/')
PosixPath('.')
>>> Path('/usr/var/log').relative_to('/usr/var/')
PosixPath('log')
>>> Path('/usr/var/log').relative_to('/etc/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/pathlib.py", line 851, in relative_to
.format(str(self), str(formatted)))
ValueError: '/usr/var/log' does not start with '/etc'
A write-up of jme's suggestion, using pathlib, in Python 3.
from pathlib import Path
parent = Path(r'/a/b')
son = Path(r'/a/b/c/d')
​
if parent in son.parents or parent==son:
print(son.relative_to(parent)) # returns Path object equivalent to 'c/d'
Another option is
>>> print os.path.relpath('/usr/var/log/', '/usr/var')
log
Pure Python2 w/o dep:
def relpath(cwd, path):
"""Create a relative path for path from cwd, if possible"""
if sys.platform == "win32":
cwd = cwd.lower()
path = path.lower()
_cwd = os.path.abspath(cwd).split(os.path.sep)
_path = os.path.abspath(path).split(os.path.sep)
eq_until_pos = None
for i in xrange(min(len(_cwd), len(_path))):
if _cwd[i] == _path[i]:
eq_until_pos = i
else:
break
if eq_until_pos is None:
return path
newpath = [".." for i in xrange(len(_cwd[eq_until_pos+1:]))]
newpath.extend(_path[eq_until_pos+1:])
return os.path.join(*newpath) if newpath else "."
Edit : See jme's answer for the best way with Python3.
Using pathlib, you have the following solution :
Let's say we want to check if son is a descendant of parent, and both are Path objects.
We can get a list of the parts in the path with list(parent.parts).
Then, we just check that the begining of the son is equal to the list of segments of the parent.
>>> lparent = list(parent.parts)
>>> lson = list(son.parts)
>>> if lson[:len(lparent)] == lparent:
>>> ... #parent is a parent of son :)
If you want to get the remaining part, you can just do
>>> ''.join(lson[len(lparent):])
It's a string, but you can of course use it as a constructor of an other Path object.

What's the use case for pythons os.path.join dropping all arguments before when one is an absolute path? [duplicate]

I'm learning Python and I noticed something strange with one of my scripts. Doing a little testing I discovered the problem stemmed from this behavior:
>>> import os
>>> os.path.join('a','b')
'a/b'
>>> os.path.join('a','/b')
'/b'
Checking the documentation, this is, in fact, the design of the function:
os.path.join(path1[, path2[, ...]])
Join one or more path components intelligently. If any component is an absolute path, all previous components (on Windows, including the previous drive letter, if there was one) are thrown away, and joining continues. ...
My question isn't why my script failed, but rather why the function was designed this way. I mean, on Unix at least, a//b is a perfectly acceptable way to designate a path, if not elegant. Why was the function designed this way? Is there any way to tell if one or more path elements have been discarded short of testing each path string with os.path.isabs()?
Out of curiosity, I also checked the case where a path component ends in an os.sep character:
>>> os.path.join('a/','b')
'a/b'
That works as expected.
One case where it is useful for os.path.join('a', '/b') to return /b would be if you ask a user for a filename.
The user can enter either a path relative to the current directory, or a full path, and your program could handle both cases like this:
os.path.join(os.getcwd(), filename)
In [54]: os.getcwd()
Out[54]: '/tmp'
In [55]: os.path.join(os.getcwd(), 'foo')
Out[55]: '/tmp/foo'
In [56]: os.path.join(os.getcwd(), '/foo/bar')
Out[56]: '/foo/bar'
Think you're writing a utility like cd to check the new directory, you would use
os.path.join(currdir, newdir)
If the user enters /b you would except it to throw the first argument. This hold for plenty of thing using current directory.

Why does the python pathlib Path('').exists() return True?

I was expecting Path('') to be a path that does not exist because it does not correspond to a file or directory name. Why is this considered to exist?
from pathlib import Path
print(Path('').exists())
I assume there is an advantage gained by defining the Path('') to be the same as Path('.'). In what case is there an advantage?
As other said, it resolves to the current path and therefore exists, but here's why,
pathlib.Path is acutally a subclass of pathlib.PurePath which assumes the current directory when the pathsegments (argument) is empty (equivalent to '').
You can prove that empirically like this,
from pathlib import PurePath
print(PurePath())
>>>> .
I assume there is an advantage gained by defining the Path('') to be the same as Path('.').
Correct. Even though I'm not the creator of that lib, I assume this is for syntax and logical reasons. Indeed, people often want to refer to the current directory to compute something dynamically. Therefore, for the same reason . points to the current directory, the lib creator probably wanted to let you write something like this,
>>>> p = Path() # or possibly Path('.')
>>> [x for x in p.iterdir() if x.is_dir()]
that would list sub directories.
Basically, see this as a default. It was logic that the default path returned by Path() was the current directory. Thus, logically, an empty string value should have the same behavior.
If you try stat you get:
$ touch ""
touch: cannot touch '': No such file or directory
but if you peek inside, the story is different:
$ strace -e file touch ""
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=122530, ...}, AT_EMPTY_PATH) = 0
openat(AT_FDCWD, "", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENOENT (No such file or directory)
So you can stat "" because it's the CWD, but you can't open it as a file because no such file exists. Indeed:
$ strace -e file ipython3
In [1]: import pathlib
In [2]: p = pathlib.Path()
In [3]: p.exists()
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|0755, st_size=20480, ...}, 0) = 0
Out[3]: True
So this is not exactly any kind of assumption from Python's pathlib module, but all the way down the the C lib and kernel.
Slightly off topic: I want to have a Path whose boolean value is False. Seems to be not possible. I'm doing this inside argparse with type=Path. I suspect that OP wanted something similar.
I ended up using one of two options, neither as "elegant" as it would be if I could simply test "if the_possibly_false_Path:..."
Set the default (which I wish tested False) to something like '%%'
which causes argparse to create a Path with the name "%%" which I
can then test for
Leave the result as a default type, set the default to something
that tests False and then call the Path constructor if the value
isn't false.

How can I check how much of a path exists in python

Let's say on my filesystem the following directory exists:
/foo/bar/
In my python code I have the following path:
/foo/bar/baz/quix/
How can I tell that only the /foo/bar/ part of the path exists?
I can walk the path recursively and check it step by step, but is there an easier way?
No easy function in the standard lib but not really a difficult one to make yourself.
Here's a function that takes a path and returns only the path that does exist.
In [129]: def exists(path):
...: if os.path.exists(path): return path
...: return exists(os.path.split(path)[0])
...:
In [130]: exists("/home/sevanteri/src/mutta/oisko/siellä/jotain/mitä/ei ole/")
Out[130]: '/home/sevanteri/src'
I think a simple while loop with os.path.dirname() will suffice the requirement
path_string = '/home/moin/Desktop/my/dummy/path'
while path_string:
if not os.path.exists(path_string):
path_string = os.path.dirname(path_string)
else:
break
# path_string = '/home/moin/Desktop' # which is valid path in my system
I don't actually get your requirements as whether you want every path to be checked or upto some specific level.But for simple sanity checks you can just iterate through the full path create the paths and check the sanity.
for i in filter(lambda s: s, sample_path.split('/')):
_path = os.path.join(_path, i)
if os.path.exists(_path):
print "correct path"
Well, I think the only way is to work recursively... Though, I would work up the directory tree. The code isn't too hard to implement:
import os
def doesItExist(directory):
if not os.path.exists(directory):
doesItExist(os.path.dirname(directory)
else:
print "Found: " + directory
return directory

Find a path in Windows relative to another

This problem should be a no-brainer, but I haven't yet been able to nail it.
I need a function that takes two parameters, each a file path, relative or absolute, and returns a filepath which is the first path (target) resolved relative to the second path (start). The resolved path may be relative to the current directory or may be absolute (I don't care).
Here as an attempted implementation, complete with several doc tests, that exercises some sample uses cases (and demonstrates where it fails). A runnable script is also available on my source code repository, but it may change. The runnable script will run the doctest if no parameters are supplied or will pass one or two parameters to findpath if supplied.
def findpath(target, start=os.path.curdir):
r"""
Find a path from start to target where target is relative to start.
>>> orig_wd = os.getcwd()
>>> os.chdir('c:\\windows') # so we know what the working directory is
>>> findpath('d:\\')
'd:\\'
>>> findpath('d:\\', 'c:\\windows')
'd:\\'
>>> findpath('\\bar', 'd:\\')
'd:\\bar'
>>> findpath('\\bar', 'd:\\foo') # fails with '\\bar'
'd:\\bar'
>>> findpath('bar', 'd:\\foo')
'd:\\foo\\bar'
>>> findpath('bar\\baz', 'd:\\foo')
'd:\\foo\\bar\\baz'
>>> findpath('\\baz', 'd:\\foo\\bar') # fails with '\\baz'
'd:\\baz'
Since we're on the C drive, findpath may be allowed to return
relative paths for targets on the same drive. I use abspath to
confirm that the ultimate target is what we expect.
>>> os.path.abspath(findpath('\\bar'))
'c:\\bar'
>>> os.path.abspath(findpath('bar'))
'c:\\windows\\bar'
>>> findpath('..', 'd:\\foo\\bar')
'd:\\foo'
>>> findpath('..\\bar', 'd:\\foo')
'd:\\bar'
The parent of the root directory is the root directory.
>>> findpath('..', 'd:\\')
'd:\\'
restore the original working directory
>>> os.chdir(orig_wd)
"""
return os.path.normpath(os.path.join(start, target))
As you can see from the comments in the doctest, this implementation fails when the start specifies a drive letter and the target is relative to the root of the drive.
This brings up a few questions
Is this behavior a limitation of os.path.join? In other words, should os.path.join('d:\foo', '\bar') resolve to 'd:\bar'? As a Windows user, I tend to think so, but I hate to think that a mature function like path.join would need alteration to handle this use case.
Is there an example of an existing target path resolver such as findpath that will work in all of these test cases?
If 'no' to the above questions, how would you implement this desired behavior?
I agree with you: this seems like a deficiency in os.path.join. Looks like you have to deal with the drives separately. This code passes all your tests:
def findpath(target, start=os.path.curdir):
sdrive, start = os.path.splitdrive(start)
tdrive, target = os.path.splitdrive(target)
rdrive = tdrive or sdrive
return os.path.normpath(os.path.join(rdrive, os.path.join(start, target)))
(and yes, I had to nest two os.path.join's to get it to work...)

Categories

Resources