How to manipulate directory paths to work across multiple OS? - python

I'm writing a python script which has to internally create output path from the input path. However, I am facing issues to create the path which I can use irrespective of OS.
I have tried to use os.path.join and it has its own limitations.
Apart from that, I think simple string concatenation is not the way to go.
Pathlib can be an option but I am not allowed to use it.
import os
inputpath = "C:\projects\django\hereisinput"
lastSlash = left.rfind("\\")
# This won't work as os path join stops at a slash
outputDir = os.path.join(left[:lastSlash], "\internal\morelevel\outputpath")
OR
OutDir = left[:lastSlash] + "\internal\morelevel\outputpath"
Expected output path :
C:\projects\django\internal\morelevel\outputpath
Also, the above code doesn't do it OS Specific where in Linux, the slash will be different.
Is os.sep() some option ?

From the documentation os.path.join can join "one or more path components...". So you could split "\internal\morelevel\outputpath" up into each of its components and pass all of them to your os.path.join function instead. That way you don't need to "hard-code" the separator between the path components. For example:
paths = ("internal", "morelevel", "outputpath")
outputDir = os.path.join(left[:lastSlash], *paths)
Remember that the backslash (\) is a special character in Python so your strings containing singular backslashes wouldn't work as you expect them to! You need to escape them with another \ in front.
This part of your code lastSlash = left.rfind("\\") might also not work on any operating system. You could rather use something like os.path.split to get the last part of the path that you need. For example, _, lastSlash = os.path.split(left).

Assuming your original path is "C:\projects\django\hereisinput", your other part of the path as "internal\morelevel\outputpath" (notice this is a relative path, not absolute), you could always move your primary back one folder (or more) and then append the second part. Do note that your first path needs to contain only folders and can be absolute or relative, while your second path must always be relative for this hack to work
path_1 = r"C:\projects\django\hereisinput"
path_2 = r"internal\morelevel\outputpath"
path_1_one_folder_down = os.path.join(path_1, os.path.pardir)
final_path = os.path.join(path_1_one_folder_down, path_2)
'C:\\projects\\django\\hereisinput\\..\\internal\\morelevel\\outputpath'

Related

Python Pathlib escaping string stored in variable

I'm on windows and have an api response that includes a key value pair like
object = {'path':'/my_directory/my_subdirectory/file.txt'}
I'm trying to use pathlib to open and create that structure relative to the current working directory as well as a user supplied directory name in the current working directory like this:
output = "output_location"
path = pathlib.Path.cwd().joinpath(output,object['path'])
print(path)
What this gives me is this
c:\directory\my_subdirectory\file.txt
Whereas I'm looking for it to output something like:
'c:\current_working_directory\output_location\directory\my_subdirectory\file.txt'
The issue is because the object['path'] is a variable I'm not sure how to escape it as a raw string. And so I think the escapes are breaking it. I can't guarantee there will always be a leading slash in the object['path'] value so I don't want to simply trim the first character.
I was hoping there was an elegant way to do this using pathlib that didn't involve ugly string manipulation.
Try lstrip('/')
You want to remove your leading slash whenever it’s there, because pathlib will ignore whatever comes before it.
import pathlib
object = {'path': '/my_directory/my_subdirectory/file.txt'}
output = "output_location"
# object['path'][1:] removes the initial '/'
path = pathlib.PureWindowsPath(pathlib.Path.cwd()).joinpath(output,object[
'path'][1:])
# path = pathlib.Path.cwd().joinpath(output,object['path'])
print(path)

How to remove the first part of a path?

Say I have the path fodler1/folder2/folder3, and I don't know in advance the names of the folders.
How can I remove the first part of this path to get only folder2/folder3?
You can use pathlib.Path for that:
from pathlib import Path
p = Path("folder1/folder2/folder3")
And either concatenate all parts except the first:
new_path = Path(*p.parts[1:])
Or create a path relative_to the first part:
new_path = p.relative_to(p.parts[0])
This code doesn't require specifying the path delimiter, and works for all pathlib supported platforms (Python >= 3.4).
Use str.split with 1 as maxsplit argument:
path = "folder1/folder2/folder3"
path.split("/", 1)[1]
# 'folder2/folder3'
If there is no / in there, you might be safer with:
path.split("/", 1)[-1] # pick the last of one or two tokens
but that depends on your desired logic in that case.
For better protability across systems, you could replace the slash "/" with os.path.sep:
import os
path.split(os.path.sep, 1)[1]

how to list the images paths in the directory in python?

I created one folder called dataset, then in this folder i created subfolder called subfold1, subfold2
names=[]
for users in os.listdir("dataset"):
names.append(users)
print(names)
Output:
['subfold1','subfold2']
In the subfold1 , i have 5 images and subfold2 also i have 5 images
Then, i want to list the images paths which i have inside the subfold1 and subfol2?
path= []
for name in names:
for image in os.listdir("dataset/{}".format(name)):
path_string = os.path.join("dataset/{}".format(name), image)
path.append(path_string)
print(path)
My output is
['dataset/subfold1\\1_1.jpg', 'dataset/subfold1\\1_2.jpg', 'dataset/subfold1\\1_3.jpg', 'dataset/subfold1\\1_4.jpg', 'dataset/subfold1\\1_5.jpg', 'dataset/subfold2\\2_1.jpg', 'dataset/subfold3\\2_2.jpg', 'dataset/subfold2\\2_3.jpg', 'dataset/subfold2\\2_4.jpg', 'dataset/subfold2\\2_5.jpg']
I want the correct paths
You code works correctly in Linux.
However you may want to simplify it by using os.walk. Please see below:
new_names = []
for dirpath, dirnames, filenames in os.walk('dataset'):
for filename in filenames:
new_names.append(os.path.join(dirpath, filename))
print(new_names)
which gives me following output:
['dataset/subfold2/93.jpg', 'dataset/subfold2/99.jpg', 'dataset/subfold2/97.jpg', 'dataset/subfold1/3.jpg', 'dataset/subfold1/2.jpg', 'dataset/subfold1/1.jpg']
I think you're in windows OS. As you know in Windows \ is the address separator.
And as \ is the escape character in Python (it will be followed by another character indicating a special character, for example, \t means tab), thus the \\ means \, and your addresses are totally correct and you can change the / to \\ in compliance with the Windows rule. BTW, I strongly suggest you to apply the pathlib library. It is more convenient and powerful.
from pathlib import Path
p = Path('MyPictures')
for image in p.iterdir():
print(image)
quick solution;
use os.path.normpath for normalizing a path (modifying every separator to os.path.sep AND adapting the path to the operating system)
paths = [
'dataset/subfold1\\1_1.jpg',
'dataset/subfold1\\1_2.jpg',
'dataset/subfold1\\1_3.jpg',
'dataset/subfold1\\1_4.jpg',
'dataset/subfold1\\1_5.jpg',
'dataset/subfold2\\2_1.jpg',
'dataset/subfold3\\2_2.jpg',
'dataset/subfold2\\2_3.jpg',
'dataset/subfold2\\2_4.jpg',
'dataset/subfold2\\2_5.jpg'
]
import os
paths = list(map(os.path.normpath, paths))
>>> paths
out
['dataset\\subfold1\\1_1.jpg',
'dataset\\subfold1\\1_2.jpg',
'dataset\\subfold1\\1_3.jpg',
'dataset\\subfold1\\1_4.jpg',
'dataset\\subfold1\\1_5.jpg',
'dataset\\subfold2\\2_1.jpg',
'dataset\\subfold3\\2_2.jpg',
'dataset\\subfold2\\2_3.jpg',
'dataset\\subfold2\\2_4.jpg',
'dataset\\subfold2\\2_5.jpg']
extra info:
you cant get rid of this \\ from this 'dataset/subfold1\\1_1.jpg' because that is the string __repr__ of the element, and when you do __repr__ you see double backslash because its escaped. if you will actually print the value on the screen you will see just one \
quick demo:
print('dataset\\subfold1\\1_1.jpg')
out
dataset\subfold1\1_1.jpg
if you really want to join the paths with / then make your own join function 4 paths (also i dont recommend this, i made that in the past and i realised that it was worthless, because os.path is handling everything for you)
but if you are on windows and you really want to have linux path separator you can try this:
paths = [
'dataset/subfold1\\1_1.jpg',
'dataset/subfold1\\1_2.jpg',
'dataset/subfold1\\1_3.jpg',
'dataset/subfold1\\1_4.jpg',
'dataset/subfold1\\1_5.jpg',
'dataset/subfold2\\2_1.jpg',
'dataset/subfold3\\2_2.jpg',
'dataset/subfold2\\2_3.jpg',
'dataset/subfold2\\2_4.jpg',
'dataset/subfold2\\2_5.jpg'
]
paths = [path.replace("\\", "/") for path in paths]
>>> paths
out
['dataset/subfold1/1_1.jpg',
'dataset/subfold1/1_2.jpg',
'dataset/subfold1/1_3.jpg',
'dataset/subfold1/1_4.jpg',
'dataset/subfold1/1_5.jpg',
'dataset/subfold2/2_1.jpg',
'dataset/subfold3/2_2.jpg',
'dataset/subfold2/2_3.jpg',
'dataset/subfold2/2_4.jpg',
'dataset/subfold2/2_5.jpg']`

Python os.path.dirname returns unexpected path when changing directory

Currently I do not understand, why pythons os.path.dirname behave like it does.
Let's assume I have the following script:
# Not part of the script, just for the current sample
__file__ = 'C:\\Python\\Test\\test.py'
Then I try to get the absolute path to the following directory: C:\\Python\\doc\\py
With this code:
base_path = os.path.dirname(os.path.dirname(os.path.realpath(__file__)) + '\\..\\doc\\py\\')
But why does the method os.path.dirname does not resolve the path, and print out (print (base_path):
C:\Python\Test\..\doc\py
I've expected the method to resolve the path to:
C:\Python\Test\doc\py
I just know this behaviour from the .NET Framework, that getting directory paths will always resolve the complete path and remove directory changes with ..\\. What do I have in Python for possibilities to do this?
Look into os.path.normpath
Normalize a pathname by collapsing redundant separators and up-level references so that A//B, A/B/, A/./B and A/foo/../B all become A/B. This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes.
The reason os.path.dirname works the way it does is because it's not very smart - it even work for URLs!
os.path.dirname("http://www.google.com/test") # outputs http://www.google.com
It simply chops off anything after the last slash. It doesn't look at anything before the last slash, so it doesn't care if you have /../ in there somewhere.
os.path.normpath() will return a normalized path, with all references to the current or parent directory removed or replaced appropriately.

os.path.join producing an extra forward slash

I am trying to join an absolute path and variable folder path depending on the variable run. However when I use the following code it inserts a forward slash after a string, which I don't require. How can I remove the slash after Folder_?
import os
currentwd = os.getcwd()
folder = '001'
run_folder = os.path.join(currentwd, 'Folder_', folder)
print run_folder
The output I get using this code is:
/home/xkr/Workspace/Folder_/001
You are asking os.path.join() to take multiple path elements and join them. It is doing its job.
Don't use os.path.join() to produce filenames; just use concatenation:
run_folder = os.path.join(currentwd, 'Folder_' + folder)
or use string formatting; the latter can give you such nice features such as automatic padding of integers:
folder = 1
run_folder = os.path.join(currentwd, 'Folder_{:03d}'.format(folder))
That way you can increment folder past 10 or 100 and still have the correct number of leading zeros.
Note that you don't have to use os.getcwd(); you could also use os.path.abspath(), it'll make relative paths absolute based on the current working directory:
run_folder = os.path.abspath('Folder_' + folder)

Categories

Resources