normalize non-existing path using pathlib only - python

python has recently added the pathlib module (which i like a lot!).
there is just one thing i'm struggling with: is it possible to normalize a path to a file or directory that does not exist? i can do that perfectly well with os.path.normpath. but wouldn't it be absurd to have to use something other than the library that should take care of path related stuff?
the functionality i would like to have is this:
from os.path import normpath
from pathlib import Path
pth = Path('/tmp/some_directory/../i_do_not_exist.txt')
pth = Path(normpath(str(pth)))
# -> /tmp/i_do_not_exist.txt
but without having to resort to os.path and without having to type-cast to str and back to Path. also pth.resolve() does not work for non-existing files.
is there a simple way to do that with just pathlib?

is it possible to normalize a path to a file or directory that does not exist?
Starting from 3.6, it's the default behavior. See https://docs.python.org/3.6/library/pathlib.html#pathlib.Path.resolve
Path.resolve(strict=False)
...
If strict is False, the path is resolved as far as possible and any remainder is appended without checking whether it exists

As of Python 3.5: No, there's not.
PEP 0428 states:
Path resolution
The resolve() method makes a path absolute, resolving
any symlink on the way (like the POSIX realpath() call). It is the
only operation which will remove " .. " path components. On Windows,
this method will also take care to return the canonical path (with the
right casing).
Since resolve() is the only operation to remove the ".." components, and it fails when the file doesn't exist, there won't be a simple means using just pathlib.
Also, the pathlib documentation gives a hint as to why:
Spurious slashes and single dots are collapsed, but double dots ('..')
are not, since this would change the meaning of a path in the face of
symbolic links:
PurePath('foo//bar') produces PurePosixPath('foo/bar')
PurePath('foo/./bar') produces PurePosixPath('foo/bar')
PurePath('foo/../bar') produces PurePosixPath('foo/../bar')
(a naïve approach would make PurePosixPath('foo/../bar') equivalent to PurePosixPath('bar'), which is wrong if foo is a symbolic link to another directory)
All that said, you could create a 0 byte file at the location of your path, and then it'd be possible to resolve the path (thus eliminating the ..). I'm not sure that's any simpler than your normpath approach, though.

If this fits you usecase(e.g. ifle's directory already exists) you might try to resolve path's parent and then re-append file name, e.g.:
from pathlib import Path
p = Path()/'hello.there'
print(p.parent.resolve()/p.name)

Old question, but here is another solution in particular if you want POSIX paths across the board (like nix paths on Windows too).
I found pathlib resolve() to be broken as of Python 3.10, and this method is not currently exposed by PurePosixPath.
What I found worked was to use posixpath.normpath(). Also found PurePosixPath.joinpath() to be broken. I.E. It will not join ".." with "myfile.txt" as expected. It will return just "myfile.txt". But posixpath.join() works perfectly; will return "../myfile.txt".
Note this is in path strings, but easily back to pathlib.Path(my_posix_path) et al for an OOP container.
And easily transpose to Windows platform paths too by just constructing this way, as the module takes care of the platform independence for you.
Might be the solution for others with Python file path woes..

Related

Python - How to write a windows path to a json file?

I am working together with a colleague and he has Ubuntu while I have windows. We have a dataset of json files which have in them a "path" written. His paths look like this:
'C:/Users/krock/Desktop/FIIT/BP/Ubuntu/luadb/etc/luarocks_test/modules/30log/share/lua/5.3/30log.lua'
But this doesn't work on Windows, I was trying to do
some_string.replace('/', '\\')
But this results in strings written in json that look like this:
'C:\\Users\\krock\\Desktop\\FIIT\\BP\\Ubuntu\\luadb\\etc\\luarocks_test\\data_all'
On my windows machine, I can't read (the program) these paths as it give an error:
No such file or directory
Is there a solution to this?
EDIT: I tried using Path from pathlib, but I got another error saying:
TypeError: Object of type WindowsPath is not JSON serializable
I found the solution to this is to do str(Path(path_string)), but the result is again the path in double quotes.
Yes, the solution is to use Python's built in pathlib. Also, using string literals might help the clarity of your program.
https://docs.python.org/3/library/pathlib.html
This question is missing code samples, so can't be more specific, but generally speaking, doing this manually is error-prone. Consider using a library, such as pathlib. E.G:
>>> from pathlib import Path
>>> Path('luarocks_test/modules/30log/share/lua/5.3/30log.lua')
PosixPath('luarocks_test/modules/30log/share/lua/5.3/30log.lua')
On Windows, instantiating a Path would give you a WindowsPath. You'll also want to use relative, rather than absolute references, as the paths will be different on your workstations.

os.path.basename() is inconsistent and I'm not sure why

While creating a program that backs up my files, I found that os.path.basename() was not working consistently. For example:
import os
folder = '\\\\server\\studies\\backup\\backup_files'
os.path.basename(folder)
returns 'backup_files'
folder = '\\\\server\\studies'
os.path.basename(folder)
returns ''
I want that second basename function to return 'studies' but it returns an empty string. I ran os.path.split(folder) to see how it's splitting the string and it turns out it's considering the entire path to be the directory, i.e. ('\\\\server\\studies', ' ').
I can't figure out how to get around it.. The weirdest thing is I ran the same line earlier and it worked, but it won't anymore! Does it have something to do with the very first part being a shared folder on the network drive?
that looks like a Windows UNC specificity
UNC paths can be seen as equivalent of unix path, only with double backslashes at the start.
A workaround would be to use classical rsplit:
>>> r"\\server\studies".rsplit(os.sep,1)[-1]
'studies'
Fun fact: with 3 paths it works properly:
>>> os.path.basename(r"\\a\b\c")
'c'
Now why this? let's check the source code of ntpath on windows:
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
okay now split
def split(p):
seps = _get_bothseps(p)
d, p = splitdrive(p)
now splitdrive
def splitdrive(p):
"""Split a pathname into drive/UNC sharepoint and relative path specifiers.
Returns a 2-tuple (drive_or_unc, path); either part may be empty.
Just reading the documentation makes us understand what's going on.
A Windows sharepoint has to contain 2 path parts:
\\server\shareroot
So \\server\studies is seen as the drive, and the path is empty. Doesn't happen when there are 3 parts in the path.
Note that it's not a bug, since it's not possible to use \\server like a normal directory, create dirs below, etc...
Note that the official documentation for os.path.basename doesn't mention that (because os.path calls ntpath behind the scenes) but it states:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program
That last emphasised part at least is true! (and the documentation for os.path.split() doesn't mention that issue or even talks about windows)

How to deal with multiple dots in a file name with Python pathlib?

I am having an issue with pathlib when I try to construct a file path that has a "." in its name, the pathlib module ignores it.
Here are example lines (I tried multiple versions, all resulted the same issue)
The issue is that the original file name will be coming from another application, so it is not like I can edit the name myself. I also do not want to do string replacement work arounds, if possible.
path=r"c:\temp"
1
p=Path(path).joinpath("myfile.001").with_suffix(".bat")
2
p=Path(path, "myfile.001").with_suffix(".bat")
3
p=Path(path).with_name("myfile.001").with_suffix(".bat")
All these lines will yield to
WindowsPath('C:/temp/myfile.bat')
So how do I make pathlib.Path to construct this full path properly. The final path has to be
WindowsPath('C:/temp/myfile.001.bat')
Not
WindowsPath('C:/temp/myfile.bat')
Naturally I am looking for a way to do it through pathlib itself, otherwise I can just use os.
thanks
You are telling pathlib to replace the suffix .001 with the suffix .bat. pathlib complies.
Tell pathlib to add .bat to the existing suffix.
p = Path(path, 'myfile.001')
p = p.with_suffix(p.suffix+'.001')

Python os.path.dirname returns unexpected path when changing directory

Currently I do not understand, why pythons os.path.dirname behave like it does.
Let's assume I have the following script:
# Not part of the script, just for the current sample
__file__ = 'C:\\Python\\Test\\test.py'
Then I try to get the absolute path to the following directory: C:\\Python\\doc\\py
With this code:
base_path = os.path.dirname(os.path.dirname(os.path.realpath(__file__)) + '\\..\\doc\\py\\')
But why does the method os.path.dirname does not resolve the path, and print out (print (base_path):
C:\Python\Test\..\doc\py
I've expected the method to resolve the path to:
C:\Python\Test\doc\py
I just know this behaviour from the .NET Framework, that getting directory paths will always resolve the complete path and remove directory changes with ..\\. What do I have in Python for possibilities to do this?
Look into os.path.normpath
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes.
The reason os.path.dirname works the way it does is because it's not very smart - it even work for URLs!
os.path.dirname("http://www.google.com/test") # outputs http://www.google.com
It simply chops off anything after the last slash. It doesn't look at anything before the last slash, so it doesn't care if you have /../ in there somewhere.
os.path.normpath() will return a normalized path, with all references to the current or parent directory removed or replaced appropriately.

Python os module path functions

From the documentation:
os.path.realpath(path)
Return the canonical path of the specified filename, eliminating any
symbolic links encountered in the path (if they are supported by the
operating system).
When I invoke this with an extant file's name, I get the path to it: /home/myhome/myproject.
When I invoke this with a 'nonsense.xxx' string argument, I still get a path to /home/myhome/myproject/nonsense.xxx. This is a little inconsistent because it looks like nonsense.xxx is taken to be a directory not a file (though it is neither: it does not exist).
When I invoke this with a null string file name, I still get a path to /home/myhome/myproject.
How can I account for this behaviour when the documentation says so little about realpath()? (I am using Python 2.5.)
Edit: Somebody suggested a way to test if files exist. My concern is not to test if files exist. My concern is to account for behaviour.
os.path isn't interested in whether or not the files exist. It is merely concerned with constructing paths.
realpath eliminates known symlinks from the equation, but directories that do not exist are assumed to be valid elements of a path regardless.
Rather than guess, just read the code! It's there in your python installation. Or browse here, it's only 14 lines minus comments.
Place test such as "os.path.isfile(x)", "x is not None" and "os.path.isdir(x)" before the call?

Categories

Resources