Get the directory path of absolute file path in Python - python

I want to get the directory where the file resides. For example the full path is:
fullpath = "/absolute/path/to/file"
# something like:
os.getdir(fullpath) # if this existed and behaved like I wanted, it would return "/absolute/path/to"
I could do it like this:
dir = '/'.join(fullpath.split('/')[:-1])
But the example above relies on specific directory separator and is not really pretty. Is there a better way?

You are looking for this:
>>> import os.path
>>> fullpath = '/absolute/path/to/file'
>>> os.path.dirname(fullpath)
'/absolute/path/to'
Related functions:
>>> os.path.basename(fullpath)
'file'
>>> os.path.split(fullpath)
('/absolute/path/to','file')

Related

python Path from pathlib module doesn't add path separator

I have the following issue using Path from the pathlib library:
from pathlib import Path
import os
path_list = ["path","to","directory1","directory2"]
join_str = os.path.sep
my_path = join_str.join(path_list)
my_path = str(Path(my_path))
dir_exists = os.path.isdir(my_path)
Output:
my_path = "path/to/directory1/directory2"
dir_exists = False
I have tried with Path itself and manually giving it a path that I know exists:
(Pdb) test2 = str(Path("Users","userK","my_directory"))
(Pdb) os.path.isdir(test2)
False
(Pdb) test2
Users/userK/my_directory
I was wondering why Path doesn't simply add the backslash at the beginning? Is there a better way to build my path with the needed path separator from a list?
I didn't notice it before, because I use os.getcwd() and add to it to build my paths:
(Pdb) my_cwd = os.getcwd()
(Pdb) my_path = str(Path(my_cwd,"directory1"))
(Pdb) os.path.isdir(my_path)
True
However, I am not in the position code-wise to explicitly build the path I need, hence the issue.
Thanks!
Path cannot assume how the result will be used further, so it constructs the path object from the input data as is.
I will demonstrate with examples.
To begin with, prepare the directories in /tmp:
zyzop#localhost:~$ cd /tmp/
zyzop#localhost:/tmp$ pwd
/tmp
zyzop#localhost:/tmp$ mkdir -p path/to/directory1/directory2
zyzop#localhost:/tmp$ ls -ld path/to/directory1/directory2
drwxr-xr-x 2 zyzop zyzop 4096 Jul 8 03:28 path/to/directory1/directory2
zyzop#localhost:/tmp$ python3
Let's create a relative path from the list of its parts
>>> from pathlib import Path
>>> path_list = ["path", "to", "directory1", "directory2"]
>>> my_path = Path(*path_list)
>>> my_path
PosixPath('path/to/directory1/directory2')
Since the relative path exists in the current directory, we can check its existence
>>> my_path.exists()
True
>>> my_path.is_absolute()
False
>>> my_path.resolve()
PosixPath('/tmp/path/to/directory1/directory2')
There are two ways to create an absolute path. The first is to attach the path to the current directory to the beginning of the path
>>> my_path1 = Path(Path().cwd(), *path_list)
>>> my_path1
PosixPath('/tmp/path/to/directory1/directory2')
>>> my_path1.is_absolute()
True
>>> my_path1.exists()
True
The second way is to explicitly specify the path from the root
>>> path_list2 = ["/", "tmp", "path", "to", "directory1", "directory2"]
>>> my_path2 = Path(*path_list2)
>>> my_path2
PosixPath('/tmp/path/to/directory1/directory2')
>>> my_path2.exists()
True
>>> my_path2.is_absolute()
True
For example, let's see what happens to relative paths in another directory. Let's move to the home directory
>>> import os
>>> os.chdir(Path.home())
>>> Path().cwd()
PosixPath('/home/zyzop')
Let's check the existence of paths
>>> my_path
PosixPath('path/to/directory1/directory2')
>>> my_path.exists()
False
>>> my_path2
PosixPath('/tmp/path/to/directory1/directory2')
>>> my_path2.exists()
True
Now let's see how the absolute paths have changed. Since my_path is a relative path, it will always resolve relative to the current directory
>>> my_path.resolve()
PosixPath('/home/zyzop/path/to/directory1/directory2')
my_path 2 is an absolute path, so it will always be unchanged
>>> my_path2.resolve()
PosixPath('/tmp/path/to/directory1/directory2')
More details can be found here: https://docs.python.org/3/library/pathlib.html
I do not know why pathlib does not add a slash at the beginning. If I had to guess, maybe sometimes it is not obvious what is the "root" of the file system (think windows with multiple disks) and also it is probably better this way.
Anyway, easy to add it yourself.
Path("/path/to/directory1/directory2")
Path("/Users/userK/my_directory")
Path("./directory1") # or Path.cwd() / "directory1"
To get the root, you can use this syntax :
from pathlib import Path
Path.cwd().root
Another approach could be :
from pathlib import Path
Path("/") # Or Path("/").root
It works on UNIX, to test on Windows

Proper way to remove last element of delimeter while it's in a string? [duplicate]

I need to extract the name of the parent directory of a certain path. This is what it looks like:
C:\stuff\directory_i_need\subdir\file.jpg
I would like to extract directory_i_need.
import os
## first file in current dir (with full path)
file = os.path.join(os.getcwd(), os.listdir(os.getcwd())[0])
file
os.path.dirname(file) ## directory of file
os.path.dirname(os.path.dirname(file)) ## directory of directory of file
...
And you can continue doing this as many times as necessary...
Edit: from os.path, you can use either os.path.split or os.path.basename:
dir = os.path.dirname(os.path.dirname(file)) ## dir of dir of file
## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir)
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.
For Python 3.4+, try the pathlib module:
>>> from pathlib import Path
>>> p = Path('C:\\Program Files\\Internet Explorer\\iexplore.exe')
>>> str(p.parent)
'C:\\Program Files\\Internet Explorer'
>>> p.name
'iexplore.exe'
>>> p.suffix
'.exe'
>>> p.parts
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
>>> p.relative_to('C:\\Program Files')
WindowsPath('Internet Explorer/iexplore.exe')
>>> p.exists()
True
All you need is parent part if you use pathlib.
from pathlib import Path
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parent)
Will output:
C:\Program Files\Internet Explorer
Case you need all parts (already covered in other answers) use parts:
p = Path(r'C:\Program Files\Internet Explorer\iexplore.exe')
print(p.parts)
Then you will get a list:
('C:\\', 'Program Files', 'Internet Explorer', 'iexplore.exe')
Saves tone of time.
First, see if you have splitunc() as an available function within os.path. The first item returned should be what you want... but I am on Linux and I do not have this function when I import os and try to use it.
Otherwise, one semi-ugly way that gets the job done is to use:
>>> pathname = "\\C:\\mystuff\\project\\file.py"
>>> pathname
'\\C:\\mystuff\\project\\file.py'
>>> print pathname
\C:\mystuff\project\file.py
>>> "\\".join(pathname.split('\\')[:-2])
'\\C:\\mystuff'
>>> "\\".join(pathname.split('\\')[:-1])
'\\C:\\mystuff\\project'
which shows retrieving the directory just above the file, and the directory just above that.
import os
directory = os.path.abspath('\\') # root directory
print(directory) # e.g. 'C:\'
directory = os.path.abspath('.') # current directory
print(directory) # e.g. 'C:\Users\User\Desktop'
parent_directory, directory_name = os.path.split(directory)
print(directory_name) # e.g. 'Desktop'
parent_parent_directory, parent_directory_name = os.path.split(parent_directory)
print(parent_directory_name) # e.g. 'User'
This should also do the trick.
This is what I did to extract the piece of the directory:
for path in file_list:
directories = path.rsplit('\\')
directories.reverse()
line_replace_add_directory = line_replace+directories[2]
Thank you for your help.
You have to put the entire path as a parameter to os.path.split. See The docs. It doesn't work like string split.

Get files from specific folders in python

I have the following directory structure with the following files:
Folder_One
├─file1.txt
├─file1.doc
└─file2.txt
Folder_Two
├─file2.txt
├─file2.doc
└─file3.txt
I would like to get only the .txt files from each folder listed. Example:
Folder_One-> file1.txt and file2.txt
Folder_Two-> file2.txt and file3.txt
Note: This entire directory is inside a folder called dataset. My code looks like this, but I believe something is missing. Can someone help me.
path_dataset = "./dataset/"
filedataset = os.listdir(path_dataset)
for i in filedataset:
pasta = ''
pasta = pasta.join(i)
for file in glob.glob(path_dataset+"*.txt"):
print(file)
from pathlib import Path
for path in Path('dataset').rglob('*.txt'):
print(path.name)
Using glob
import glob
for x in glob.glob('dataset/**/*.txt', recursive=True):
print(x)
You can use re module to check that filename ends with .txt.
import re
import os
path_dataset = "./dataset/"
l = os.listdir(path_dataset)
for e in l:
if os.path.isdir("./dataset/" + e):
ll = os.listdir(path_dataset + e)
for file in ll:
if re.match(r".*\.txt$", file):
print(e + '->' + file)
One may use an additional option to check and find all files by using the os module (this is of advantage if you already use this module):
import os
#get current directory, you may also provide an absolute path
path=os.getcwd()
#walk recursivly through all folders and gather information
for root, dirs, files in os.walk(path):
#check if file is of correct type
check=[f for f in files if f.find(".txt")!=-1]
if check!=[]:print(root,check)

Find absolute path of a file in python when knowing the last part of the path and the base directory?

Using python, I have the last parts of paths to existing files, like that:
sub_folder1/file1.txt
sub_folder2/file120.txt
sub_folder78/file99.txt
Note, that these paths are not the relative paths to the current folder I am working in, e.g., this pandas.read_csv('sub_folder1/file1.txt') would through a non-existing-file error. Nevertheless, I know all the files have the same base directory base_dir, but I don't know the absolute path. This means a file could be located like this:
base_dir/inter_folder1/sub_folder1/file1.txt
Or like this:
base_dir/inter_folder7/inter_folder4/.../sub_folder1/file1.txt
Is there a function that returns the absolute path, when given the last part of the path and the base directory of a file (or equivalently, finding the intermediate folders)? Should be looking like that:
absolut_path = some_func(end_path='bla/bla.txt', base_dir='BLAH')
I thought pathlib might have a solution, but couldn't find anything there. Thanks
I need this to do something like the below:
for end_path in list_of_paths:
full_path = some_func(end_path=end_path, base_dir='base_dir')
image = cv2.imread(full_path)
This should be fairly easy to implement from pathlib:
from pathlib import Path
def find(end_path: str, base_dir: str):
for file in Path(base_dir).rglob("*"):
if str(file).endswith(end_path):
yield file
This is a generator, to match the pathlib interface; as such it will yield pathlib.PosixPath objects. It will also find all matching files, for example:
[str(f) for f in find(end_path="a.txt", base_dir="my_dir")]
# creates:
# ['my_dir/a.txt', 'my_dir/sub_dir/a.txt']
If you just want the first value you can just return the first item:
def find_first(end_path: str, base_dir: str):
for file in Path(base_dir).rglob("*"):
if str(file).endswith(end_path):
return str(file)
abs_path = find_first(end_path="a.txt", base_dir="my_dir")
A better function that would improve the lookup:
from pathlib import Path
def find(pattern, suffixes, base_dir):
for file in Path(base_dir).rglob(pattern):
if any(str(file).endswith(suffix) for suffix in suffixes):
yield str(file)
base_dir = "base_directory"
suffixes = [
'sub_folder1/file1.txt',
'sub_folder2/file120.txt',
'sub_folder78/file99.txt',
]
for full_path in find(pattern="*.txt", suffixes=suffix, base_dir=base_dir):
image = cv2.imread(full_path)
You need to search for the sub-folder within the base folder e.g,
import os
for dirpath, dirnames, files in os.walk(os.path.abspath(base_dir)):
if dirpath.endswith(subfolder1):
print(dirpath)
You might want to also make sure that the file exists in that folder using:
if dirpath.endswith("subfolder1") and "file1.txt" in files:
print(dirpath)

How do I get the parent directory in Python?

Could someone tell me how to get the parent directory of a path in Python in a cross platform way. E.g.
C:\Program Files ---> C:\
and
C:\ ---> C:\
If the directory doesn't have a parent directory, it returns the directory itself. The question might seem simple but I couldn't dig it up through Google.
Python 3.4
Use the pathlib module.
from pathlib import Path
path = Path("/here/your/path/file.txt")
print(path.parent.absolute())
Old answer
Try this:
import os
print os.path.abspath(os.path.join(yourpath, os.pardir))
where yourpath is the path you want the parent for.
Using os.path.dirname:
>>> os.path.dirname(r'C:\Program Files')
'C:\\'
>>> os.path.dirname('C:\\')
'C:\\'
>>>
Caveat: os.path.dirname() gives different results depending on whether a trailing slash is included in the path. This may or may not be the semantics you want. Cf. #kender's answer using os.path.join(yourpath, os.pardir).
The Pathlib method (Python 3.4+)
from pathlib import Path
Path('C:\Program Files').parent
# Returns a Pathlib object
The traditional method
import os.path
os.path.dirname('C:\Program Files')
# Returns a string
Which method should I use?
Use the traditional method if:
You are worried about existing code generating errors if it were to use a Pathlib object. (Since Pathlib objects cannot be concatenated with strings.)
Your Python version is less than 3.4.
You need a string, and you received a string. Say for example you have a string representing a filepath, and you want to get the parent directory so you can put it in a JSON string. It would be kind of silly to convert to a Pathlib object and back again for that.
If none of the above apply, use Pathlib.
What is Pathlib?
If you don't know what Pathlib is, the Pathlib module is a terrific module that makes working with files even easier for you. Most if not all of the built in Python modules that work with files will accept both Pathlib objects and strings. I've highlighted below a couple of examples from the Pathlib documentation that showcase some of the neat things you can do with Pathlib.
Navigating inside a directory tree:
>>> p = Path('/etc')
>>> q = p / 'init.d' / 'reboot'
>>> q
PosixPath('/etc/init.d/reboot')
>>> q.resolve()
PosixPath('/etc/rc.d/init.d/halt')
Querying path properties:
>>> q.exists()
True
>>> q.is_dir()
False
import os
p = os.path.abspath('..')
C:\Program Files ---> C:\\\
C:\ ---> C:\\\
An alternate solution of #kender
import os
os.path.dirname(os.path.normpath(yourpath))
where yourpath is the path you want the parent for.
But this solution is not perfect, since it will not handle the case where yourpath is an empty string, or a dot.
This other solution will handle more nicely this corner case:
import os
os.path.normpath(os.path.join(yourpath, os.pardir))
Here the outputs for every case that can find (Input path is relative):
os.path.dirname(os.path.normpath('a/b/')) => 'a'
os.path.normpath(os.path.join('a/b/', os.pardir)) => 'a'
os.path.dirname(os.path.normpath('a/b')) => 'a'
os.path.normpath(os.path.join('a/b', os.pardir)) => 'a'
os.path.dirname(os.path.normpath('a/')) => ''
os.path.normpath(os.path.join('a/', os.pardir)) => '.'
os.path.dirname(os.path.normpath('a')) => ''
os.path.normpath(os.path.join('a', os.pardir)) => '.'
os.path.dirname(os.path.normpath('.')) => ''
os.path.normpath(os.path.join('.', os.pardir)) => '..'
os.path.dirname(os.path.normpath('')) => ''
os.path.normpath(os.path.join('', os.pardir)) => '..'
os.path.dirname(os.path.normpath('..')) => ''
os.path.normpath(os.path.join('..', os.pardir)) => '../..'
Input path is absolute (Linux path):
os.path.dirname(os.path.normpath('/a/b')) => '/a'
os.path.normpath(os.path.join('/a/b', os.pardir)) => '/a'
os.path.dirname(os.path.normpath('/a')) => '/'
os.path.normpath(os.path.join('/a', os.pardir)) => '/'
os.path.dirname(os.path.normpath('/')) => '/'
os.path.normpath(os.path.join('/', os.pardir)) => '/'
os.path.split(os.path.abspath(mydir))[0]
os.path.abspath(os.path.join(somepath, '..'))
Observe:
import posixpath
import ntpath
print ntpath.abspath(ntpath.join('C:\\', '..'))
print ntpath.abspath(ntpath.join('C:\\foo', '..'))
print posixpath.abspath(posixpath.join('/', '..'))
print posixpath.abspath(posixpath.join('/home', '..'))
import os
print"------------------------------------------------------------"
SITE_ROOT = os.path.dirname(os.path.realpath(__file__))
print("example 1: "+SITE_ROOT)
PARENT_ROOT=os.path.abspath(os.path.join(SITE_ROOT, os.pardir))
print("example 2: "+PARENT_ROOT)
GRANDPAPA_ROOT=os.path.abspath(os.path.join(PARENT_ROOT, os.pardir))
print("example 3: "+GRANDPAPA_ROOT)
print "------------------------------------------------------------"
>>> import os
>>> os.path.basename(os.path.dirname(<your_path>))
For example in Ubuntu:
>>> my_path = '/home/user/documents'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'user'
For example in Windows:
>>> my_path = 'C:\WINDOWS\system32'
>>> os.path.basename(os.path.dirname(my_path))
# Output: 'WINDOWS'
Both examples tried in Python 2.7
Suppose we have directory structure like
1]
/home/User/P/Q/R
We want to access the path of "P" from the directory R then we can access using
ROOT = os.path.abspath(os.path.join("..", os.pardir));
2]
/home/User/P/Q/R
We want to access the path of "Q" directory from the directory R then we can access using
ROOT = os.path.abspath(os.path.join(".", os.pardir));
If you want only the name of the folder that is the immediate parent of the file provided as an argument and not the absolute path to that file:
os.path.split(os.path.dirname(currentDir))[1]
i.e. with a currentDir value of /home/user/path/to/myfile/file.ext
The above command will return:
myfile
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
parent_path = os.path.abspath(os.path.join(dir_path, os.pardir))
import os.path
os.path.abspath(os.pardir)
Just adding something to the Tung's answer (you need to use rstrip('/') to be more of the safer side if you're on a unix box).
>>> input1 = "../data/replies/"
>>> os.path.dirname(input1.rstrip('/'))
'../data'
>>> input1 = "../data/replies"
>>> os.path.dirname(input1.rstrip('/'))
'../data'
But, if you don't use rstrip('/'), given your input is
>>> input1 = "../data/replies/"
would output,
>>> os.path.dirname(input1)
'../data/replies'
which is probably not what you're looking at as you want both "../data/replies/" and "../data/replies" to behave the same way.
print os.path.abspath(os.path.join(os.getcwd(), os.path.pardir))
You can use this to get the parent directory of the current location of your py file.
GET Parent Directory Path and make New directory (name new_dir)
Get Parent Directory Path
os.path.abspath('..')
os.pardir
Example 1
import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.pardir, 'new_dir'))
Example 2
import os
print os.makedirs(os.path.join(os.path.dirname(__file__), os.path.abspath('..'), 'new_dir'))
os.path.abspath('D:\Dir1\Dir2\..')
>>> 'D:\Dir1'
So a .. helps
import os
def parent_filedir(n):
return parent_filedir_iter(n, os.path.dirname(__file__))
def parent_filedir_iter(n, path):
n = int(n)
if n <= 1:
return path
return parent_filedir_iter(n - 1, os.path.dirname(path))
test_dir = os.path.abspath(parent_filedir(2))
The answers given above are all perfectly fine for going up one or two directory levels, but they may get a bit cumbersome if one needs to traverse the directory tree by many levels (say, 5 or 10). This can be done concisely by joining a list of N os.pardirs in os.path.join. Example:
import os
# Create list of ".." times 5
upup = [os.pardir]*5
# Extract list as arguments of join()
go_upup = os.path.join(*upup)
# Get abspath for current file
up_dir = os.path.abspath(os.path.join(__file__, go_upup))
To find the parent of the current working directory:
import pathlib
pathlib.Path().resolve().parent
import os
def parent_directory():
# Create a relative path to the parent of the current working directory
relative_parent = os.path.join(os.getcwd(), "..") # .. means parent directory
# Return the absolute path of the parent directory
return os.path.abspath(relative_parent)
print(parent_directory())

Categories

Resources