Working with relative paths - python

When I run the following script:
c:\Program Files\foo\bar\scripy.py
How can I refer to directory 'foo'?
Is there a convenient way of using relative paths?
I've done it before with the string module, but there must be a better way (I couldn't find it in os.path).

The os.path module includes various functions for working with paths like this. The convention in most operating system is to use .. to go "up one level", so to get the outside directory you could do this:
import os
import os.path
current_dir = os.getcwd() # find the current directory
print current_dir # c:\Program Files\foo\bar\scripy.py
parent = os.path.join(current_dir, "..") # construct a path to its parent
print parent # c:\Program Files\foo\bar\..
normal_parent = os.path.normpath(parent) # "normalize" the path
print normal_parent # c:\Program Files\foo
# or on one line:
print os.path.normpath(os.path.join(os.getcwd(), ".."))

os.path.dirname(path)
Will return the second half of a SPLIT that is performed on the path parameter. (head - the directory and tail, the file) Put simply it returns the directory the path is in. You'll need to do it twice but this is probably the best way.
Python Docs on path functions:
http://docs.python.org/library/os.path#os.path.expanduser

I have recently started using the unipath library instead of os.path. Its object-oriented representations of paths are much simpler:
from unipath import Path
original = Path(__file__) # .absolute() # r'c:\Program Files\foo\bar\scripy.py'
target = original.parent.parent
print target # Path(u'c:\\Program Files\\foo')
Path is a subclass of str so you can use it with standard filesystem functions, but it also provides alternatives for many of them:
print target.isdir() # True
numbers_dir = target.child('numbers')
print numbers_dir.exists() # False
numbers_dir.mkdir()
print numbers_dir.exists() # True
for n in range(10):
file_path = numbers_dir.child('%s.txt' % (n,))
file_path.write_file("Hello world %s!\n" % (n,), 'wt')

This is a bit tricky. For instance, the following code:
import sys
import os
z = sys.argv[0]
p = os.path.dirname(z)
f = os.path.abspath(p)
print "argv[0]={0} , dirname={1} , abspath={2}\n".format(z,p,f)
gives this output on Windows
argv[0]=../zzz.py , dirname=.. , abspath=C:\Users\michael\Downloads
First of all, notice that argv has the slash which I typed in the command python ../zzz.py and the absolute path has the normal Windows backslashes. If you need to be cross platform you should probably refrain from putting regular slashes on Python command lines, and use os.sep to refer to the character that separated pathname components.
So far I have only partly answered your question. There are a couple of ways to use the value of f to get what you want. Brute force is to use something like:
targetpath = f + os.sep + ".." + os.sep + ".."
which would result in something like C:\Users\michael\Downloads\..\.. on Windows and /home/michael/../.. on Unix. Each .. goes back one step and is the equivalent of removing the pathname component.
But you could do better by breaking up the path:
target = f.split(os.sep)
targetpath = os.sep.join(target[:-2]
and rejoining all but the last two bits to get C:\Users on Windows and / on Unix. If you do that it might be a good idea to check that there are enough pathname components to remove.
Note that I ran the program above by typing python ../xxx.py. In other words I was not in the same working directory as the script, therefore getcwd() would not be useful.

Related

how to list the images paths in the directory in python?

I created one folder called dataset, then in this folder i created subfolder called subfold1, subfold2
names=[]
for users in os.listdir("dataset"):
names.append(users)
print(names)
Output:
['subfold1','subfold2']
In the subfold1 , i have 5 images and subfold2 also i have 5 images
Then, i want to list the images paths which i have inside the subfold1 and subfol2?
path= []
for name in names:
for image in os.listdir("dataset/{}".format(name)):
path_string = os.path.join("dataset/{}".format(name), image)
path.append(path_string)
print(path)
My output is
['dataset/subfold1\\1_1.jpg', 'dataset/subfold1\\1_2.jpg', 'dataset/subfold1\\1_3.jpg', 'dataset/subfold1\\1_4.jpg', 'dataset/subfold1\\1_5.jpg', 'dataset/subfold2\\2_1.jpg', 'dataset/subfold3\\2_2.jpg', 'dataset/subfold2\\2_3.jpg', 'dataset/subfold2\\2_4.jpg', 'dataset/subfold2\\2_5.jpg']
I want the correct paths
You code works correctly in Linux.
However you may want to simplify it by using os.walk. Please see below:
new_names = []
for dirpath, dirnames, filenames in os.walk('dataset'):
for filename in filenames:
new_names.append(os.path.join(dirpath, filename))
print(new_names)
which gives me following output:
['dataset/subfold2/93.jpg', 'dataset/subfold2/99.jpg', 'dataset/subfold2/97.jpg', 'dataset/subfold1/3.jpg', 'dataset/subfold1/2.jpg', 'dataset/subfold1/1.jpg']
I think you're in windows OS. As you know in Windows \ is the address separator.
And as \ is the escape character in Python (it will be followed by another character indicating a special character, for example, \t means tab), thus the \\ means \, and your addresses are totally correct and you can change the / to \\ in compliance with the Windows rule. BTW, I strongly suggest you to apply the pathlib library. It is more convenient and powerful.
from pathlib import Path
p = Path('MyPictures')
for image in p.iterdir():
print(image)
quick solution;
use os.path.normpath for normalizing a path (modifying every separator to os.path.sep AND adapting the path to the operating system)
paths = [
'dataset/subfold1\\1_1.jpg',
'dataset/subfold1\\1_2.jpg',
'dataset/subfold1\\1_3.jpg',
'dataset/subfold1\\1_4.jpg',
'dataset/subfold1\\1_5.jpg',
'dataset/subfold2\\2_1.jpg',
'dataset/subfold3\\2_2.jpg',
'dataset/subfold2\\2_3.jpg',
'dataset/subfold2\\2_4.jpg',
'dataset/subfold2\\2_5.jpg'
]
import os
paths = list(map(os.path.normpath, paths))
>>> paths
out
['dataset\\subfold1\\1_1.jpg',
'dataset\\subfold1\\1_2.jpg',
'dataset\\subfold1\\1_3.jpg',
'dataset\\subfold1\\1_4.jpg',
'dataset\\subfold1\\1_5.jpg',
'dataset\\subfold2\\2_1.jpg',
'dataset\\subfold3\\2_2.jpg',
'dataset\\subfold2\\2_3.jpg',
'dataset\\subfold2\\2_4.jpg',
'dataset\\subfold2\\2_5.jpg']
extra info:
you cant get rid of this \\ from this 'dataset/subfold1\\1_1.jpg' because that is the string __repr__ of the element, and when you do __repr__ you see double backslash because its escaped. if you will actually print the value on the screen you will see just one \
quick demo:
print('dataset\\subfold1\\1_1.jpg')
out
dataset\subfold1\1_1.jpg
if you really want to join the paths with / then make your own join function 4 paths (also i dont recommend this, i made that in the past and i realised that it was worthless, because os.path is handling everything for you)
but if you are on windows and you really want to have linux path separator you can try this:
paths = [
'dataset/subfold1\\1_1.jpg',
'dataset/subfold1\\1_2.jpg',
'dataset/subfold1\\1_3.jpg',
'dataset/subfold1\\1_4.jpg',
'dataset/subfold1\\1_5.jpg',
'dataset/subfold2\\2_1.jpg',
'dataset/subfold3\\2_2.jpg',
'dataset/subfold2\\2_3.jpg',
'dataset/subfold2\\2_4.jpg',
'dataset/subfold2\\2_5.jpg'
]
paths = [path.replace("\\", "/") for path in paths]
>>> paths
out
['dataset/subfold1/1_1.jpg',
'dataset/subfold1/1_2.jpg',
'dataset/subfold1/1_3.jpg',
'dataset/subfold1/1_4.jpg',
'dataset/subfold1/1_5.jpg',
'dataset/subfold2/2_1.jpg',
'dataset/subfold3/2_2.jpg',
'dataset/subfold2/2_3.jpg',
'dataset/subfold2/2_4.jpg',
'dataset/subfold2/2_5.jpg']`

How to use pathlib to process paths that begin with ~?

I am writing a cli-tool that needs some path as input.
I am writing this tool in python and would like to use no python interpreter below 3.6. Using the packagepathlibseems to be the modern way to go when dealing with paths in python. So I would like to leave os and os.path behind if possible.
It seems like pathlib interprets the path ~/test/ as a relative path to the current working directory, the code below shows it
import pathlib
test_path = pathlib.Path('~/test')
absolute_path = test_path.absolute()
print(f"{str(test_path):>31}\n{str(absolute_path):>31}")
# output:
# ~/test
# /home/myUser/~/test
How can I use pathlib to recognize every path starting with ~ as an absolute path and automatically expand ~ into the users home directory?
The answer is easy, use .expanduser() on your Path object instead of .absolute()and it will replace ~ with the home directory of the user running the script, the result is also an absolute path only if ~ is at the beginning:
import pathlib
test_path = pathlib.Path('~/test')
absolute_path = test_path.expanduser()
# If ~ is somewhere in the middle of the path, use .resolve() to get an absolute path.
print(f"{str(test_path):>31}\n{str(absolute_path):>31}")
# output:
# ~/test
# /home/myUser/test

How to manipulate directory paths to work across multiple OS?

I'm writing a python script which has to internally create output path from the input path. However, I am facing issues to create the path which I can use irrespective of OS.
I have tried to use os.path.join and it has its own limitations.
Apart from that, I think simple string concatenation is not the way to go.
Pathlib can be an option but I am not allowed to use it.
import os
inputpath = "C:\projects\django\hereisinput"
lastSlash = left.rfind("\\")
# This won't work as os path join stops at a slash
outputDir = os.path.join(left[:lastSlash], "\internal\morelevel\outputpath")
OR
OutDir = left[:lastSlash] + "\internal\morelevel\outputpath"
Expected output path :
C:\projects\django\internal\morelevel\outputpath
Also, the above code doesn't do it OS Specific where in Linux, the slash will be different.
Is os.sep() some option ?
From the documentation os.path.join can join "one or more path components...". So you could split "\internal\morelevel\outputpath" up into each of its components and pass all of them to your os.path.join function instead. That way you don't need to "hard-code" the separator between the path components. For example:
paths = ("internal", "morelevel", "outputpath")
outputDir = os.path.join(left[:lastSlash], *paths)
Remember that the backslash (\) is a special character in Python so your strings containing singular backslashes wouldn't work as you expect them to! You need to escape them with another \ in front.
This part of your code lastSlash = left.rfind("\\") might also not work on any operating system. You could rather use something like os.path.split to get the last part of the path that you need. For example, _, lastSlash = os.path.split(left).
Assuming your original path is "C:\projects\django\hereisinput", your other part of the path as "internal\morelevel\outputpath" (notice this is a relative path, not absolute), you could always move your primary back one folder (or more) and then append the second part. Do note that your first path needs to contain only folders and can be absolute or relative, while your second path must always be relative for this hack to work
path_1 = r"C:\projects\django\hereisinput"
path_2 = r"internal\morelevel\outputpath"
path_1_one_folder_down = os.path.join(path_1, os.path.pardir)
final_path = os.path.join(path_1_one_folder_down, path_2)
'C:\\projects\\django\\hereisinput\\..\\internal\\morelevel\\outputpath'

How to format the beginning a loop correctly?

I have a program which I designed both for myself and my colleague to use, with all the data being stored in a directories. However, I want to set up the loop so that it work both for me and him. I tried all of these:
file_location = glob.glob('/../*.nc')
file_location = glob('/../*.nc')
But none of them are picking up any files. How can I fix this?
You can get a directory relative to a user's home (called ~ in the function call) using os.path.expanduser(). In your case, the line would be
file_location = glob.glob(os.path.expanduser('~/Dropbox/Argo/Data/*.nc'))
Usually is a good practice not hardcoding paths if you're going to use your paths for other tasks which need well-formed paths (ie: subprocess, writing paths to shell scripts), I'd recommend to manage paths using the os.path module instead, for example:
import os, glob
home_path = os.path.expanduser("~")
dropbox_path = os.path.join(home_path, "Dropbox")
good_paths = glob.glob(os.path.join(dropbox_path,"Argo","Data","*.nc"))
bad_paths = glob.glob(dropbox_path+"/Argo\\Data/*.nc")
print len(good_paths)==len(bad_paths)
print all([os.path.exists(p) for p in good_paths])
print all([os.path.exists(p) for p in bad_paths])
The example shows a comparison between bad and well formed paths. Both of them will work, but good_paths will be more flexible and portable in the long term.

Why doesn't os.path.join() work in this case?

The below code will not join, when debugged the command does not store the whole path but just the last entry.
os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/')
When I test this it only stores the /new_sandbox/ part of the code.
The latter strings shouldn't start with a slash. If they start with a slash, then they're considered an "absolute path" and everything before them is discarded.
Quoting the Python docs for os.path.join:
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
Note on Windows, the behaviour in relation to drive letters, which seems to have changed compared to earlier Python versions:
On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
The idea of os.path.join() is to make your program cross-platform (linux/windows/etc).
Even one slash ruins it.
So it only makes sense when being used with some kind of a reference point like
os.environ['HOME'] or os.path.dirname(__file__).
os.path.join() can be used in conjunction with os.path.sep to create an absolute rather than relative path.
os.path.join(os.path.sep, 'home','build','test','sandboxes',todaystr,'new_sandbox')
Do not use forward slashes at the beginning of path components, except when refering to the root directory:
os.path.join('/home/build/test/sandboxes', todaystr, 'new_sandbox')
see also: http://docs.python.org/library/os.path.html#os.path.join
To help understand why this surprising behavior isn't entirely terrible, consider an application which accepts a config file name as an argument:
config_root = "/etc/myapp.conf/"
file_name = os.path.join(config_root, sys.argv[1])
If the application is executed with:
$ myapp foo.conf
The config file /etc/myapp.conf/foo.conf will be used.
But consider what happens if the application is called with:
$ myapp /some/path/bar.conf
Then myapp should use the config file at /some/path/bar.conf (and not /etc/myapp.conf/some/path/bar.conf or similar).
It may not be great, but I believe this is the motivation for the absolute path behaviour.
It's because your '/new_sandbox/' begins with a / and thus is assumed to be relative to the root directory. Remove the leading /.
Try combo of split("/") and * for strings with existing joins.
import os
home = '/home/build/test/sandboxes/'
todaystr = '042118'
new = '/new_sandbox/'
os.path.join(*home.split("/"), todaystr, *new.split("/"))
How it works...
split("/") turns existing path into list: ['', 'home', 'build', 'test', 'sandboxes', '']
* in front of the list breaks out each item of list its own parameter
To make your function more portable, use it as such:
os.path.join(os.sep, 'home', 'build', 'test', 'sandboxes', todaystr, 'new_sandbox')
or
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
do it like this, without too the extra slashes
root="/home"
os.path.join(root,"build","test","sandboxes",todaystr,"new_sandbox")
Try with new_sandbox only
os.path.join('/home/build/test/sandboxes/', todaystr, 'new_sandbox')
os.path.join("a", *"/b".split(os.sep))
'a/b'
a fuller version:
import os
def join (p, f, sep = os.sep):
f = os.path.normpath(f)
if p == "":
return (f);
else:
p = os.path.normpath(p)
return (os.path.join(p, *f.split(os.sep)))
def test (p, f, sep = os.sep):
print("os.path.join({}, {}) => {}".format(p, f, os.path.join(p, f)))
print(" join({}, {}) => {}".format(p, f, join(p, f, sep)))
if __name__ == "__main__":
# /a/b/c for all
test("\\a\\b", "\\c", "\\") # optionally pass in the sep you are using locally
test("/a/b", "/c", "/")
test("/a/b", "c")
test("/a/b/", "c")
test("", "/c")
test("", "c")
Note that a similar issue can bite you if you use os.path.join() to include an extension that already includes a dot, which is what happens automatically when you use os.path.splitext(). In this example:
components = os.path.splitext(filename)
prefix = components[0]
extension = components[1]
return os.path.join("avatars", instance.username, prefix, extension)
Even though extension might be .jpg you end up with a folder named "foobar" rather than a file called "foobar.jpg". To prevent this you need to append the extension separately:
return os.path.join("avatars", instance.username, prefix) + extension
you can strip the '/':
>>> os.path.join('/home/build/test/sandboxes/', todaystr, '/new_sandbox/'.strip('/'))
'/home/build/test/sandboxes/04122019/new_sandbox'
I'd recommend to strip from the second and the following strings the string os.path.sep, preventing them to be interpreted as absolute paths:
first_path_str = '/home/build/test/sandboxes/'
original_other_path_to_append_ls = [todaystr, '/new_sandbox/']
other_path_to_append_ls = [
i_path.strip(os.path.sep) for i_path in original_other_path_to_append_ls
]
output_path = os.path.join(first_path_str, *other_path_to_append_ls)
The problem is your laptop maybe running Window. And Window annoyingly use back lash instead of forward slash'/'. To make your program cross-platform (linux/windows/etc).
You shouldn't provide any slashes (forward or backward) in your path if you want os.path.join to handle them properly. you should using:
os.path.join(os.environ.get("HOME"), 'test', 'sandboxes', todaystr, 'new_sandbox')
Or throw some Path(__file__).resolve().parent (path to parent of current file) or anything so that you don't use any slash inside os.path.join
Please refer following code snippet for understanding os.path.join(a, b)
a = '/home/user.name/foo/'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = '/bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
But, when
a = '/home/user.name/foo/'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /bar/file_name.extension
OR
a = '/home/user.name/foo'
b = 'bar/file_name.extension'
print(os.path.join(a, b))
>>> /home/user.name/foo/bar/file_name.extension

Categories

Resources