Remove file name and extension from path and just keep path - python

Hi I have a string like this which will be dynamic and can be in following combinations.
'new/file.csv'
'new/mainfolder/file.csv'
'new/mainfolder/subfolder/file.csv'
'new/mainfolder/subfolder/secondsubfolder/file.csv'
Something like these. In any case I just want this string in 2 parts like path and filename. Path will not consist of file name for example.
End result expected
'new'
'new/mainfolder'
'new/mainfolder/subfolder'
'new/mainfolder/subfolder/secondsubfolder'
Till now tried many things included
path = 'new/mainfolder/file.csv'
final_path = path.split('/', 1)[-1]
And rstrip() but nothing worked till now.

You can use pathlib for this.
For example,
>>>import pathlib
>>>path = pathlib.Path('new/mainfolder/file.csv')
>>>path.name
'file.csv'
>>>str(path.parent)
'new/mainfolder'

input = ['new/file.csv',
'new/mainfolder/file.csv',
'new/mainfolder/subfolder/file.csv',
'new/mainfolder/subfolder/secondsubfolder/file.csv']
output = []
for i in input:
i = i.split("/")[:-1]
i = "/".join(i)
output.append(i)
print(output)
Output:
['new', 'new/mainfolder', 'new/mainfolder/subfolder', 'new/mainfolder/subfolder/secondsubfolder']

An option to pathlib is os
import os
fullPath = 'new/mainfolder/file.csv'
parent = os.path.dirname(fullPath) # get path only
file = os.path.basename(fullPath) # get file name
print (parent)
Output:
new/mainfolder
path.dirname:
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split(). Source: https://docs.python.org/3.3/library/os.path.html#os.path.dirname
path.basename:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar', the basename() function returns an empty string ('').
Source: https://docs.python.org/3.3/library/os.path.html#os.path.basename

you almost got it, just use rsplit, like this:
path = 'new/mainfolder/file.csv'
file_path, file_name = path.rsplit('/', 1)
print(f'{file_path=}\n{file_name=}')
'''
file_path='new/mainfolder'
file_name='file.csv'

Related

searching specific string in list

How to search for every string in a list that starts with a specific string like:
path = (r"C:\Users\Example\Desktop")
desktop = os.listdir(path)
print(desktop)
#['faf.docx', 'faf.txt', 'faad.txt', 'gas.docx']
So my question is: how do i filter from every file that starts with "fa"?
For this specific cases, involving filenames in one directory, you can use globbing:
import glob
import os
path = (r"C:\Users\Example\Desktop")
pattern = os.path.join(path, 'fa*')
files = glob.glob(pattern)
This code filters all items out that start with "fa" and stores them in a separate list
filtered = [item for item in path if item.startswith("fa")]
All strings have a .startswith() method!
results = []
for value in os.listdir(path):
if value.startswith("fa"):
results.append(value)

Python grab substring between two specific characters

I have a folder with hundreds of files named like:
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
Convention:
year_month_ID_zone_date_0_L2A_B01.tif ("_0_L2A_B01.tif", and "zone" never change)
What I need is to iterate through every file and build a path based on their name in order to download them.
For example:
name = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
path = "2017/5/S2B_7VEG_20170528_0_L2A/B01.tif"
The path convention needs to be: path = year/month/ID_zone_date_0_L2A/B01.tif
I thought of making a loop which would "cut" my string into several parts every time it encounters a "_" character, then stitch the different parts in the right order to create my path name.
I tried this but it didn't work:
import re
filename =
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
try:
found = re.search('_(.+?)_', filename).group(1)
except AttributeError:
# _ not found in the original string
found = '' # apply your error handling
How could I achieve that on Python ?
Since you only have one separator character, you may as well simply use Python's built in split function:
import os
items = filename.split('_')
year, month = items[:2]
new_filename = '_'.join(items[2:])
path = os.path.join(year, month, new_filename)
Try the following code snippet
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
found = re.sub('(\d+)_(\d+)_(.*)_(.*)\.tif', r'\1/\2/\3/\4.tif', filename)
print(found) # prints 2017/05/S2B_7VEG_20170528_0_L2A/B01.tif
No need for a regex -- you can just use split().
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
parts = filename.split("_")
year = parts[0]
month = parts[1]
Maybe you can do like this:
from os import listdir, mkdir
from os.path import isfile, join, isdir
my_path = 'your_soure_dir'
files_name = [f for f in listdir(my_path) if isfile(join(my_path, f))]
def create_dir(files_name):
for file in files_name:
month = file.split('_', '1')[0]
week = file.split('_', '2')[1]
if not isdir(my_path):
mkdir(month)
mkdir(week)
### your download code
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
temp = filename.split('_')
result = "/".join(temp)
print(result)
result is
2017/05/S2B/7VEG/20170528/0/L2A/B01.tif

How to append the grand-parent folder name to the filename?

I have directory where multiple folders exist and within each folder,a file exist inside another folder. Below is the structure
C:\users\TPCL\New\20190919_xz.txt
C:\users\TPCH\New\20190919_abc.txt
Objective:
I want to rename the file names like below:
C:\users\TPCL\New\20190919_xz_TPCL.txt
C:\users\TPCH\New\20190919_abc_TPCH.txt
My Approach:
for root,dirs,filename in os.walk('C\users\TPCL\New'):
prefix = os.path.basename(root)
for f in filename:
os.rename(os.path.join(root,f),os.path.join(root,"{}_{}".format(f,prefix)))
The above approach is yielding the following result:
C:\users\TPCL\New\20190919_xz_New.txt
C:\users\TPCH\New\20190919_abc_New.txt
So the question is: How to get the grand-parent folder name get appended, instead of parent folder name?
You need to use both dirname and basename to do this.
Use os.path.dirname to get the directory name (excluding the last part) and
then use os.path.basename to get the last part of the pathname.
Replace
prefix = os.path.basename(root)
with
os.path.basename(os.path.dirname(root))
Please refer this:
https://docs.python.org/3.7/library/os.path.html#os.path.basename
https://docs.python.org/3.7/library/os.path.html#os.path.dirname
Using PurePath from pathlib you can get the parts of the path. If the path contains the filename its grand-parent folder will be at index -3.
In [23]: from pathlib import PurePath
In [24]: p = r'C:\users\TPCL\New\20190919_xz_TPCL.txt'
In [25]: g = PurePath(p)
In [26]: g.parts
Out[26]: ('C:\\', 'users', 'TPCL', 'New', '20190919_xz_TPCL.txt')
In [27]: g.parts[-3]
Out[27]: 'TPCL'
If the path does not contain the filename the grand=parent would be at index -2.
Your process would look something like this:
import os.path
from pathlib import PurePath
for root,dirs,fnames in os.walk(topdirectory):
#print(root)
try:
addition = PurePath(root).parts[-2]
for name in fnames:
n,ext = os.path.splitext(name)
newname = n + '_' + addition + ext
print(name, os.path.join(root,newname))
except IndexError:
pass
I added the try/except to filter out paths that don't have grand-parents - it isn't necessary if you know it isn't needed.
You can split the path string using '\' and then count back to what would be considered the grandparent
directory for any given file and then append it. For example, if you have
filename = "dir1\dir2\file.txt"
splitPaths = filename.split('\') // gives you ['dir1', 'dir2', 'file.txt']
the last entry is the file name, the second to last is the parent, and the third to last is the grandparent and so on. You can then append whichever directory you want to the end of the string.

How to insert a directory in the middle of a file path in Python?

I want to insert a directory name in the middle of a given file path, like this:
directory_name = 'new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
file_path1 = some_func(file_path0, directory_name, position=2)
print(file_path1)
>>> 'dir1/dir2/new_dir/dir3/dir4/file.txt'
I looked through the os.path and pathlib packages, but it looks like they don't have a function that allows for inserting in the middle of a file path. I tried:
import sys,os
from os.path import join
path_ = file_path0.split(os.sep)
path_.insert(2, 'new_dir')
print(join(path_))
but this results in the error
"expected str, bytes or os.PathLike object, not list"
Does anyone know standard python functions that allow such inserting in the middle of a file path? Alternatively - how can I turn path_ to something that can be processed by os.path. I am new to pathlib, so maybe I missed something out there
Edit: Following the answers to the question I can suggest the following solutions:
1.) As Zach Favakeh suggests and as written in this answer just correct my code above to join(*path_) by using the 'splat' operator * and everything is solved.
2.) As suggested by buran you can use the pathlib package, in very short it results in:
from pathlib import PurePath
path_list = list(PurePath(file_path0).parts)
path_list.insert(2, 'new_dir')
file_path1 = PurePath('').joinpath(*path_list)
print(file_path1)
>>> 'dir1/dir2/new_dir/dir3/dir4/file.txt'
Take a look at pathlib.PurePath.parts. It will return separate components of the path and you can insert at desired position and construct the new path
>>> from pathlib import PurePath
>>> file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
>>> p = PurePath(file_path0)
>>> p.parts
('dir1', 'dir2', 'dir3', 'dir4', 'file.txt')
>>> spam = list(p.parts)
>>> spam.insert(2, 'new_dir')
>>> new_path = PurePath('').joinpath(*spam)
>>> new_path
PurePosixPath('dir1/dir2/new_dir/dir3/dir4/file.txt')
This will work with path as a str as well as with pathlib.Path objects
Since you want to use join on a list to produce the pathname, you should do the following using the "splat" operator: Python os.path.join() on a list
Edit: You could also take your np array and concatenate its elements into a string using np.array2string, using '/' as your separator parameter:https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.array2string.html
Hope this helps.
Solution using regex. The regex will create groups of the following
[^\/]+ - non-'/' characters(i.e. directory names)
\w+\.\w+ - word characters then '.' then word characters (i.e. file name)
import re
directory_name = 'new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
position = 2
regex = re.compile(r'([^\/]+|\w+\.\w+)')
tokens = re.findall(regex, file_path0)
tokens.insert(position, directory_name)
file_path1 = '/'.join(tokens)
Result:
'dir1/dir2/new_dir/dir3/dir4/file.txt'
Your solution has only one flaw. After inserting the new directory in the path list path_.insert(2, 'new_dir')you need to call os.path.join(*path_) to get the new modified path. The error that you get is because you are passing a list as parameter to the join function, but you have to unpack it.
In my case, I knew the portion of path that would precede the insertion point (i.e., "root"). However, the position of the insertion point was not constant due to the possibility of having varying number of path components in the root path. I used Path.relative_to() to break the full path to yield an insertion point for the new_dir.
from pathlib import Path
directory_name = Path('new_dir')
root = Path('dir1/dir2/')
file_path0 = Path('dir1/dir2/dir3/dir4/file.txt')
# non-root component of path
chld = file_path0.relative_to(root)
file_path1 = root / directory_name / chld
print(file_path1)
Result:
'dir1/dir2/new_dir/dir3/dir4/file.txt'
I made a try with your need:
directory_name = '/new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
before_the_newpath = 'dir1/dir2'
position = file_path0.split(before_the_newpath)
new_directory = before_the_newpath + directory_name + position[1]
Hope it helps.

How to get folder name, in which given file resides, from pathlib.path?

Is there something similar to os.path.dirname(path), but in pathlib?
It looks like there is a parents element that contains all the parent directories of a given path. E.g., if you start with:
>>> import pathlib
>>> p = pathlib.Path('/path/to/my/file')
Then p.parents[0] is the directory containing file:
>>> p.parents[0]
PosixPath('/path/to/my')
...and p.parents[1] will be the next directory up:
>>> p.parents[1]
PosixPath('/path/to')
Etc.
p.parent is another way to ask for p.parents[0]. You can convert a Path into a string and get pretty much what you would expect:
>>> str(p.parent)
'/path/to/my'
And also on any Path you can use the .absolute() method to get an absolute path:
>>> os.chdir('/etc')
>>> p = pathlib.Path('../relative/path')
>>> str(p.parent)
'../relative'
>>> str(p.parent.absolute())
'/etc/../relative'
Note that os.path.dirname and pathlib treat paths with a trailing slash differently. The pathlib parent of some/path/ is some:
>>> p = pathlib.Path('some/path/')
>>> p.parent
PosixPath('some')
While os.path.dirname on some/path/ returns some/path:
>>> os.path.dirname('some/path/')
'some/path'
summary
from pathlib import Path
file_path = Path("/Users/yuanz/PycharmProjects/workenv/little_code/code09/sample.csv")
1. get dir path
file_path.parent
# >>> /Users/yuanz/PycharmProjects/workenv/little_code/code09
2. get filename
file_path.name
# >>> sample.csv
3. get file type
file_path.suffix
# >>> .csv
4.add new file in this dir path
file_path.parent.joinpath("dd.png")
I came here looking for something very similar. My solution, based on the above by #larsks, and assuming you want to preserve the entire path except the filename, is to do:
>>> import pathlib
>>> p = pathlib.Path('/path/to/my/file')
>>> pathlib.Path('/'.join(list(p.parts)[1:-1])+'/')
Essentially, list(p.parts)[1:-1] creates a list of Path elements, starting from the second to n-1th, and you join them with a '/' and make a path of the resulting string. Edit The final +'/' adds in the trailing slash - adjust as required.

Categories

Resources