os.path.basename() is inconsistent and I'm not sure why

os.path.basename() is inconsistent and I'm not sure why - python

While creating a program that backs up my files, I found that os.path.basename() was not working consistently. For example:
import os
folder = '\\\\server\\studies\\backup\\backup_files'
os.path.basename(folder)
returns 'backup_files'
folder = '\\\\server\\studies'
os.path.basename(folder)
returns ''
I want that second basename function to return 'studies' but it returns an empty string. I ran os.path.split(folder) to see how it's splitting the string and it turns out it's considering the entire path to be the directory, i.e. ('\\\\server\\studies', ' ').
I can't figure out how to get around it.. The weirdest thing is I ran the same line earlier and it worked, but it won't anymore! Does it have something to do with the very first part being a shared folder on the network drive?

that looks like a Windows UNC specificity
UNC paths can be seen as equivalent of unix path, only with double backslashes at the start.
A workaround would be to use classical rsplit:
>>> r"\\server\studies".rsplit(os.sep,1)[-1]
'studies'
Fun fact: with 3 paths it works properly:
>>> os.path.basename(r"\\a\b\c")
'c'
Now why this? let's check the source code of ntpath on windows:
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
okay now split
def split(p):
seps = _get_bothseps(p)
d, p = splitdrive(p)
now splitdrive
def splitdrive(p):
"""Split a pathname into drive/UNC sharepoint and relative path specifiers.
Returns a 2-tuple (drive_or_unc, path); either part may be empty.
Just reading the documentation makes us understand what's going on.
A Windows sharepoint has to contain 2 path parts:
\\server\shareroot
So \\server\studies is seen as the drive, and the path is empty. Doesn't happen when there are 3 parts in the path.
Note that it's not a bug, since it's not possible to use \\server like a normal directory, create dirs below, etc...
Note that the official documentation for os.path.basename doesn't mention that (because os.path calls ntpath behind the scenes) but it states:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program
That last emphasised part at least is true! (and the documentation for os.path.split() doesn't mention that issue or even talks about windows)

Related

Os.path gives unexpected output

lately I started working with the Os module in python . And I finally arrived to this Os.path method . So here is my question . I ran this method in one of my kivy project just for testing and it actually didn't returned the correct output.The method consisted of finding if any directory exist and return a list of folders in the directory . otherwise print Invalid Path and return -1 . I passed in an existing directory and it returned -1 but the weird path is that when I run similar program out of my kivy project using the same path present in thesame folder as my python file it return the desired output .here is the image with the python file and the directory name image I have tested which returns invalid path.
and here is my code snippet
def get_imgs(self, img_path):
if not os.path.exists(img_path):
print("Invalid Path...")
return -1
else:
all_files = os.listdir(img_path)
imgs = []
for f in all_files:
if (
f.endswith(".png")
or f.endswith(".PNG")
or f.endswith(".jpg")
or f.endswith(".JPG")
or f.endswith(".jpeg")
or f.endswith(".JPEG")
):
imgs.append("/".join([img_path, f]))
return imgs

It's tough to tell without seeing the code with your function call. Whatever argument you're passing must not be a valid path. I use the os module regularly and have slowly learned a lot of useful methods. I always print out paths that I'm reading or where I'm writing before doing it in case anything unexpected happens, I can see that img_path variable, for example. Copy and paste the path in file explorer up to the directory and make sure that's all good.
Some other useful os.path methods you will find useful, based on your code:
os.join(<directory>, <file_name.ext>) is much more intuitive than imgs.append("/".join([img_path, f]))
os.getcwd() gets your working directory (which I print at the start of scripts in dev to quickly address issues before debugging). I typically use full paths to play it safe because Python pathing can cause differences/issues when running from cmd vs. PyCharm
os.path.basename(f) gives you the file, while os.path.dirname(f) gives you the directory.
It seems like a better approach to this is to use pathlib and glob. You can iterate over directories and use wild cards.
Look at these:
iterating over directories: How can I iterate over files in a given directory?
different file types: Python glob multiple filetypes
Then you don't even need to check whether os.path.exists(img_path) because this will read the files directly from your file system. There's also more wild cards in the glob library such as * for anything/any length, ? for any character, [0-9] for any number, found here: https://docs.python.org/3/library/glob.html

Why does Path.rglob() returns filenames in lower case if whole name is specified as pattern?

I'h have an executable named "MyCamelCase.exe" in the current python script directory and in a subfolder "MyFolder". Additionally, in "MyFolder" there is another executable "DontWannaFindThis.exe". I'd like to find all occurences of "MyCamelCase.exe" in the current directory and all subfolders. Therefore, I'm using Path.rglob(pattern):
from pathlib import Path
if __name__ == '__main__':
[print(f) for f in Path.cwd().rglob('MyCamelCase.exe')]
[print(f) for f in Path.cwd().rglob('.\MyCamelCase.exe')]
[print(f) for f in Path.cwd().rglob('*.exe')]
This code leads to the following output:
D:\PyTesting\mycamelcase.exe
D:\PyTesting\MyFolder\mycamelcase.exe
D:\PyTesting\mycamelcase.exe
D:\PyTesting\MyFolder\mycamelcase.exe
D:\PyTesting\MyCamelCase.exe
D:\PyTesting\MyFolder\DontWannaFindThis.exe
D:\PyTesting\MyFolder\MyCamelCase.exe
Why does rglob returns a string with only lower case if a provide the full file name and on the other hand return a string containing the original notation when using a pattern with '.*'?
Note: The same happens when using Path.glob()

This is because all paths on Windows are case insensitive (in fact, before Windows 10 there was no way to make Windows case sensitive). For some reason, when looking for an exact match, pathlib makes the path lowercase in Windows. When it is doing normal globbing with *, it takes whatever the normal representation is from Windows.
The casing not matching in Windows should not matter though, and it will not if the only consumer of the information is the computer itself when it is processing the files.

As it have already been answered, this is the correct behaviour on Windows. However, if you want to display the full path without discarding the case, the resolve() method may be useful.
E.g.
from pathlib import Path
if __name__ == '__main__':
[print(f) for f in Path.cwd().rglob('MyCamelCase.exe').resolve()]
[print(f) for f in Path.cwd().rglob('.\MyCamelCase.exe').resolve()]
[print(f) for f in Path.cwd().rglob('*.exe').resolve()]
Note that .resolve() is normally used to convert relative and/or symlinked paths to absolute ones.

How to manipulate directory paths to work across multiple OS?

I'm writing a python script which has to internally create output path from the input path. However, I am facing issues to create the path which I can use irrespective of OS.
I have tried to use os.path.join and it has its own limitations.
Apart from that, I think simple string concatenation is not the way to go.
Pathlib can be an option but I am not allowed to use it.
import os
inputpath = "C:\projects\django\hereisinput"
lastSlash = left.rfind("\\")
# This won't work as os path join stops at a slash
outputDir = os.path.join(left[:lastSlash], "\internal\morelevel\outputpath")
OR
OutDir = left[:lastSlash] + "\internal\morelevel\outputpath"
Expected output path :
C:\projects\django\internal\morelevel\outputpath
Also, the above code doesn't do it OS Specific where in Linux, the slash will be different.
Is os.sep() some option ?

From the documentation os.path.join can join "one or more path components...". So you could split "\internal\morelevel\outputpath" up into each of its components and pass all of them to your os.path.join function instead. That way you don't need to "hard-code" the separator between the path components. For example:
paths = ("internal", "morelevel", "outputpath")
outputDir = os.path.join(left[:lastSlash], *paths)
Remember that the backslash (\) is a special character in Python so your strings containing singular backslashes wouldn't work as you expect them to! You need to escape them with another \ in front.
This part of your code lastSlash = left.rfind("\\") might also not work on any operating system. You could rather use something like os.path.split to get the last part of the path that you need. For example, _, lastSlash = os.path.split(left).

Assuming your original path is "C:\projects\django\hereisinput", your other part of the path as "internal\morelevel\outputpath" (notice this is a relative path, not absolute), you could always move your primary back one folder (or more) and then append the second part. Do note that your first path needs to contain only folders and can be absolute or relative, while your second path must always be relative for this hack to work
path_1 = r"C:\projects\django\hereisinput"
path_2 = r"internal\morelevel\outputpath"
path_1_one_folder_down = os.path.join(path_1, os.path.pardir)
final_path = os.path.join(path_1_one_folder_down, path_2)
'C:\\projects\\django\\hereisinput\\..\\internal\\morelevel\\outputpath'

Python os.path.dirname returns unexpected path when changing directory

Currently I do not understand, why pythons os.path.dirname behave like it does.
Let's assume I have the following script:
# Not part of the script, just for the current sample
__file__ = 'C:\\Python\\Test\\test.py'
Then I try to get the absolute path to the following directory: C:\\Python\\doc\\py
With this code:
base_path = os.path.dirname(os.path.dirname(os.path.realpath(__file__)) + '\\..\\doc\\py\\')
But why does the method os.path.dirname does not resolve the path, and print out (print (base_path):
C:\Python\Test\..\doc\py
I've expected the method to resolve the path to:
C:\Python\Test\doc\py
I just know this behaviour from the .NET Framework, that getting directory paths will always resolve the complete path and remove directory changes with ..\\. What do I have in Python for possibilities to do this?

Look into os.path.normpath
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes.
The reason os.path.dirname works the way it does is because it's not very smart - it even work for URLs!
os.path.dirname("http://www.google.com/test") # outputs http://www.google.com
It simply chops off anything after the last slash. It doesn't look at anything before the last slash, so it doesn't care if you have /../ in there somewhere.

os.path.normpath() will return a normalized path, with all references to the current or parent directory removed or replaced appropriately.

normalize non-existing path using pathlib only

python has recently added the pathlib module (which i like a lot!).
there is just one thing i'm struggling with: is it possible to normalize a path to a file or directory that does not exist? i can do that perfectly well with os.path.normpath. but wouldn't it be absurd to have to use something other than the library that should take care of path related stuff?
the functionality i would like to have is this:
from os.path import normpath
from pathlib import Path
pth = Path('/tmp/some_directory/../i_do_not_exist.txt')
pth = Path(normpath(str(pth)))
# -> /tmp/i_do_not_exist.txt
but without having to resort to os.path and without having to type-cast to str and back to Path. also pth.resolve() does not work for non-existing files.
is there a simple way to do that with just pathlib?

is it possible to normalize a path to a file or directory that does not exist?
Starting from 3.6, it's the default behavior. See https://docs.python.org/3.6/library/pathlib.html#pathlib.Path.resolve
Path.resolve(strict=False)
...
If strict is False, the path is resolved as far as possible and any remainder is appended without checking whether it exists

As of Python 3.5: No, there's not.
PEP 0428 states:
Path resolution
The resolve() method makes a path absolute, resolving
any symlink on the way (like the POSIX realpath() call). It is the
only operation which will remove " .. " path components. On Windows,
this method will also take care to return the canonical path (with the
right casing).
Since resolve() is the only operation to remove the ".." components, and it fails when the file doesn't exist, there won't be a simple means using just pathlib.
Also, the pathlib documentation gives a hint as to why:
Spurious slashes and single dots are collapsed, but double dots ('..')
are not, since this would change the meaning of a path in the face of
symbolic links:
PurePath('foo//bar') produces PurePosixPath('foo/bar')
PurePath('foo/./bar') produces PurePosixPath('foo/bar')
PurePath('foo/../bar') produces PurePosixPath('foo/../bar')
(a naïve approach would make PurePosixPath('foo/../bar') equivalent to PurePosixPath('bar'), which is wrong if foo is a symbolic link to another directory)
All that said, you could create a 0 byte file at the location of your path, and then it'd be possible to resolve the path (thus eliminating the ..). I'm not sure that's any simpler than your normpath approach, though.

If this fits you usecase(e.g. ifle's directory already exists) you might try to resolve path's parent and then re-append file name, e.g.:
from pathlib import Path
p = Path()/'hello.there'
print(p.parent.resolve()/p.name)

Old question, but here is another solution in particular if you want POSIX paths across the board (like nix paths on Windows too).
I found pathlib resolve() to be broken as of Python 3.10, and this method is not currently exposed by PurePosixPath.
What I found worked was to use posixpath.normpath(). Also found PurePosixPath.joinpath() to be broken. I.E. It will not join ".." with "myfile.txt" as expected. It will return just "myfile.txt". But posixpath.join() works perfectly; will return "../myfile.txt".
Note this is in path strings, but easily back to pathlib.Path(my_posix_path) et al for an OOP container.
And easily transpose to Windows platform paths too by just constructing this way, as the module takes care of the platform independence for you.
Might be the solution for others with Python file path woes..

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

os.path.basename() is inconsistent and I'm not sure why - python

Related

Os.path gives unexpected output

Why does Path.rglob() returns filenames in lower case if whole name is specified as pattern?

How to manipulate directory paths to work across multiple OS?

Python os.path.dirname returns unexpected path when changing directory

normalize non-existing path using pathlib only

Categories

Resources