How to append the grand-parent folder name to the filename?

How to append the grand-parent folder name to the filename? - python

I have directory where multiple folders exist and within each folder,a file exist inside another folder. Below is the structure
C:\users\TPCL\New\20190919_xz.txt
C:\users\TPCH\New\20190919_abc.txt
Objective:
I want to rename the file names like below:
C:\users\TPCL\New\20190919_xz_TPCL.txt
C:\users\TPCH\New\20190919_abc_TPCH.txt
My Approach:
for root,dirs,filename in os.walk('C\users\TPCL\New'):
prefix = os.path.basename(root)
for f in filename:
os.rename(os.path.join(root,f),os.path.join(root,"{}_{}".format(f,prefix)))
The above approach is yielding the following result:
C:\users\TPCL\New\20190919_xz_New.txt
C:\users\TPCH\New\20190919_abc_New.txt
So the question is: How to get the grand-parent folder name get appended, instead of parent folder name?

You need to use both dirname and basename to do this.
Use os.path.dirname to get the directory name (excluding the last part) and
then use os.path.basename to get the last part of the pathname.
Replace
prefix = os.path.basename(root)
with
os.path.basename(os.path.dirname(root))
Please refer this:
https://docs.python.org/3.7/library/os.path.html#os.path.basename
https://docs.python.org/3.7/library/os.path.html#os.path.dirname

Using PurePath from pathlib you can get the parts of the path. If the path contains the filename its grand-parent folder will be at index -3.
In [23]: from pathlib import PurePath
In [24]: p = r'C:\users\TPCL\New\20190919_xz_TPCL.txt'
In [25]: g = PurePath(p)
In [26]: g.parts
Out[26]: ('C:\\', 'users', 'TPCL', 'New', '20190919_xz_TPCL.txt')
In [27]: g.parts[-3]
Out[27]: 'TPCL'
If the path does not contain the filename the grand=parent would be at index -2.
Your process would look something like this:
import os.path
from pathlib import PurePath
for root,dirs,fnames in os.walk(topdirectory):
#print(root)
try:
addition = PurePath(root).parts[-2]
for name in fnames:
n,ext = os.path.splitext(name)
newname = n + '_' + addition + ext
print(name, os.path.join(root,newname))
except IndexError:
pass
I added the try/except to filter out paths that don't have grand-parents - it isn't necessary if you know it isn't needed.

You can split the path string using '\' and then count back to what would be considered the grandparent
directory for any given file and then append it. For example, if you have
filename = "dir1\dir2\file.txt"
splitPaths = filename.split('\') // gives you ['dir1', 'dir2', 'file.txt']
the last entry is the file name, the second to last is the parent, and the third to last is the grandparent and so on. You can then append whichever directory you want to the end of the string.

Related

Remove file name and extension from path and just keep path

Hi I have a string like this which will be dynamic and can be in following combinations.
'new/file.csv'
'new/mainfolder/file.csv'
'new/mainfolder/subfolder/file.csv'
'new/mainfolder/subfolder/secondsubfolder/file.csv'
Something like these. In any case I just want this string in 2 parts like path and filename. Path will not consist of file name for example.
End result expected
'new'
'new/mainfolder'
'new/mainfolder/subfolder'
'new/mainfolder/subfolder/secondsubfolder'
Till now tried many things included
path = 'new/mainfolder/file.csv'
final_path = path.split('/', 1)[-1]
And rstrip() but nothing worked till now.

You can use pathlib for this.
For example,
>>>import pathlib
>>>path = pathlib.Path('new/mainfolder/file.csv')
>>>path.name
'file.csv'
>>>str(path.parent)
'new/mainfolder'

input = ['new/file.csv',
'new/mainfolder/file.csv',
'new/mainfolder/subfolder/file.csv',
'new/mainfolder/subfolder/secondsubfolder/file.csv']
output = []
for i in input:
i = i.split("/")[:-1]
i = "/".join(i)
output.append(i)
print(output)
Output:
['new', 'new/mainfolder', 'new/mainfolder/subfolder', 'new/mainfolder/subfolder/secondsubfolder']

An option to pathlib is os
import os
fullPath = 'new/mainfolder/file.csv'
parent = os.path.dirname(fullPath) # get path only
file = os.path.basename(fullPath) # get file name
print (parent)
Output:
new/mainfolder
path.dirname:
Return the directory name of pathname path. This is the first element of the pair returned by passing path to the function split(). Source: https://docs.python.org/3.3/library/os.path.html#os.path.dirname
path.basename:
Return the base name of pathname path. This is the second element of the pair returned by passing path to the function split(). Note that the result of this function is different from the Unix basename program; where basename for '/foo/bar/' returns 'bar', the basename() function returns an empty string ('').
Source: https://docs.python.org/3.3/library/os.path.html#os.path.basename

you almost got it, just use rsplit, like this:
path = 'new/mainfolder/file.csv'
file_path, file_name = path.rsplit('/', 1)
print(f'{file_path=}\n{file_name=}')
'''
file_path='new/mainfolder'
file_name='file.csv'

Python grab substring between two specific characters

I have a folder with hundreds of files named like:
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
Convention:
year_month_ID_zone_date_0_L2A_B01.tif ("_0_L2A_B01.tif", and "zone" never change)
What I need is to iterate through every file and build a path based on their name in order to download them.
For example:
name = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
path = "2017/5/S2B_7VEG_20170528_0_L2A/B01.tif"
The path convention needs to be: path = year/month/ID_zone_date_0_L2A/B01.tif
I thought of making a loop which would "cut" my string into several parts every time it encounters a "_" character, then stitch the different parts in the right order to create my path name.
I tried this but it didn't work:
import re
filename =
"2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
try:
found = re.search('_(.+?)_', filename).group(1)
except AttributeError:
# _ not found in the original string
found = '' # apply your error handling
How could I achieve that on Python ?

Since you only have one separator character, you may as well simply use Python's built in split function:
import os
items = filename.split('_')
year, month = items[:2]
new_filename = '_'.join(items[2:])
path = os.path.join(year, month, new_filename)

Try the following code snippet
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
found = re.sub('(\d+)_(\d+)_(.*)_(.*)\.tif', r'\1/\2/\3/\4.tif', filename)
print(found) # prints 2017/05/S2B_7VEG_20170528_0_L2A/B01.tif

No need for a regex -- you can just use split().
filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
parts = filename.split("_")
year = parts[0]
month = parts[1]

Maybe you can do like this:
from os import listdir, mkdir
from os.path import isfile, join, isdir
my_path = 'your_soure_dir'
files_name = [f for f in listdir(my_path) if isfile(join(my_path, f))]
def create_dir(files_name):
for file in files_name:
month = file.split('_', '1')[0]
week = file.split('_', '2')[1]
if not isdir(my_path):
mkdir(month)
mkdir(week)
### your download code

filename = "2017_05_S2B_7VEG_20170528_0_L2A_B01.tif"
temp = filename.split('_')
result = "/".join(temp)
print(result)
result is
2017/05/S2B/7VEG/20170528/0/L2A/B01.tif

How to insert a directory in the middle of a file path in Python?

I want to insert a directory name in the middle of a given file path, like this:
directory_name = 'new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
file_path1 = some_func(file_path0, directory_name, position=2)
print(file_path1)
>>> 'dir1/dir2/new_dir/dir3/dir4/file.txt'
I looked through the os.path and pathlib packages, but it looks like they don't have a function that allows for inserting in the middle of a file path. I tried:
import sys,os
from os.path import join
path_ = file_path0.split(os.sep)
path_.insert(2, 'new_dir')
print(join(path_))
but this results in the error
"expected str, bytes or os.PathLike object, not list"
Does anyone know standard python functions that allow such inserting in the middle of a file path? Alternatively - how can I turn path_ to something that can be processed by os.path. I am new to pathlib, so maybe I missed something out there
Edit: Following the answers to the question I can suggest the following solutions:
1.) As Zach Favakeh suggests and as written in this answer just correct my code above to join(*path_) by using the 'splat' operator * and everything is solved.
2.) As suggested by buran you can use the pathlib package, in very short it results in:
from pathlib import PurePath
path_list = list(PurePath(file_path0).parts)
path_list.insert(2, 'new_dir')
file_path1 = PurePath('').joinpath(*path_list)
print(file_path1)
>>> 'dir1/dir2/new_dir/dir3/dir4/file.txt'

Take a look at pathlib.PurePath.parts. It will return separate components of the path and you can insert at desired position and construct the new path
>>> from pathlib import PurePath
>>> file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
>>> p = PurePath(file_path0)
>>> p.parts
('dir1', 'dir2', 'dir3', 'dir4', 'file.txt')
>>> spam = list(p.parts)
>>> spam.insert(2, 'new_dir')
>>> new_path = PurePath('').joinpath(*spam)
>>> new_path
PurePosixPath('dir1/dir2/new_dir/dir3/dir4/file.txt')
This will work with path as a str as well as with pathlib.Path objects

Since you want to use join on a list to produce the pathname, you should do the following using the "splat" operator: Python os.path.join() on a list
Edit: You could also take your np array and concatenate its elements into a string using np.array2string, using '/' as your separator parameter:https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.array2string.html
Hope this helps.

Solution using regex. The regex will create groups of the following
[^\/]+ - non-'/' characters(i.e. directory names)
\w+\.\w+ - word characters then '.' then word characters (i.e. file name)
import re
directory_name = 'new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
position = 2
regex = re.compile(r'([^\/]+|\w+\.\w+)')
tokens = re.findall(regex, file_path0)
tokens.insert(position, directory_name)
file_path1 = '/'.join(tokens)
Result:
'dir1/dir2/new_dir/dir3/dir4/file.txt'

Your solution has only one flaw. After inserting the new directory in the path list path_.insert(2, 'new_dir')you need to call os.path.join(*path_) to get the new modified path. The error that you get is because you are passing a list as parameter to the join function, but you have to unpack it.

In my case, I knew the portion of path that would precede the insertion point (i.e., "root"). However, the position of the insertion point was not constant due to the possibility of having varying number of path components in the root path. I used Path.relative_to() to break the full path to yield an insertion point for the new_dir.
from pathlib import Path
directory_name = Path('new_dir')
root = Path('dir1/dir2/')
file_path0 = Path('dir1/dir2/dir3/dir4/file.txt')
# non-root component of path
chld = file_path0.relative_to(root)
file_path1 = root / directory_name / chld
print(file_path1)
Result:
'dir1/dir2/new_dir/dir3/dir4/file.txt'

I made a try with your need:
directory_name = '/new_dir'
file_path0 = 'dir1/dir2/dir3/dir4/file.txt'
before_the_newpath = 'dir1/dir2'
position = file_path0.split(before_the_newpath)
new_directory = before_the_newpath + directory_name + position[1]
Hope it helps.

Change only a part of filename

I have these images in my folder:
area11.tif
area12.tif
area14.tif
area21.tif
area22.tif
area25.tif
How can I change only the last digit so they became ordered and "more incremental" ?
Instead if area14.tif it should be area13.tif and the same thing for area22/area25.
I have a code but it's a bit broken because it delete some files (it's strange, I know...).
EDIT: added (maybe broken..) code
try:
path = (os.path.expanduser('~\\FOLDER\\'))
files = os.listdir(path)
idx = 0
for file in files:
idx =+ 1
i = 'ex_area'
if file.endswith('.tif'):
i = i + str(idx)
os.rename(os.path.join(path, file), os.path.join(path, str(i) + '.tif'))
except OSError as e:
if e.errno != errno.EEXIST:
raise

1) Read the files names in the directory into array (of strings).
2) Iterate over the array of filenames
3) For each filename, slice the string and insert the index
4) Rename
For example:
import os
import glob
[os.rename(n, "{}{}.tif".format(n[:5], i)) for i, n in enumerate(glob.glob("area*"))]

First you get the list of the images pathes with the glob module :
images = glob.glob("/sample/*.tif")
then you just rename all of them with the os module :
for i in range(len(images)): os.rename(images[i], ‘area’+i+’.tif’)

First rename all filename to temp name and then add whatever name you prefer
import glob,os
images = glob.glob("*.tif")
for i in range(len(images)):
os.rename(images[i], 'temp_'+str(i)+'.tif')
tempImages = glob.glob("temp*.tif")
for i in range(len(tempImages)):
os.rename(tempImages[i], 'area'+str(i+1)+'.tif')

Found also this other solution. But there is a small difference in this one, and a better way of do the job in the end (at least for me): create a folder for each area. So simple I didn't think of it before...
BTW, here is the code, commented. I am using this one just because I achieved what I want. Thanks to all who answered, made me learn new things.
path = (os.path.expanduser('~\\FOLDER\\AREA1\\')) #select folder
files = os.listdir(path)
i = 1 #counter
name = 'area' #variable which the file will take as name
for file in files:
if file.endswith('.tif'): #search only for .tif. Can change with any supported format
os.rename(os.path.join(path, file), os.path.join(path, name + str(i)+'.tif')) #name + str(i)+'.tif' will take the name and concatenate to first number in counter. #If you put as name "area1" the str(i) will add another number near the name so, here is the second digit.
i += 1 #do this for every .tif file in the folder
It's a little bit simple, but because I put the files in two separate folders. If you keep the files in the same folder, this will not work properly.
EDIT: now that I see, it's the same as my code above....

Python: moving file to a newly created directory

I've got my script creating a bunch of files (size varies depending on inputs) and I want to be certain files in certain folders based on the filenames.
So far I've got the following but although directories are being created no files are being moved, I'm not sure if the logic in the final for loop makes any sense.
In the below code I'm trying to move all .png files ending in _01 into the sub_frame_0 folder.
Additionally is their someway to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
for index, i in enumerate(range(num_sub_frames+10)):
path = os.makedirs('./sub_frame_{}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if i == '*_01.png':
shutil.move(os.path.join(os.getcwd(), '*_01.png'), './sub_frame_0/')

As you said, the logic of the final loop does not make sense.
if i == '*_01.ng'
It would evaluate something like 'image_01.png' == '*_01.png' and be always false.
Regexp should be the way to go, but for this simple case you just can slice the number from the file name.
for i in list_of_sub_frames:
frame = int(i[-6:-4]) - 1
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_{}/'.format(frame))
If i = 'image_01.png' then i[-6:-4] would take '01', convert it to integer and then just subtract 1 to follow your schema.

A simple fix would be to check if '*_01.png' is in the file name i and change the shutil.move to include i, the filename. (It's also worth mentioning that iis not a good name for a filepath
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if '*_01.png' in i:
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_0/')
Additionally is [there some way] to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
You could create file names doing something as simple as this:
for i in range(10):
#simple string parsing
file_name = 'sub_frame_'+str(i)
folder_name = 'folder_sub_frame_'+str(i)

Here is a complete example using regular expressions. This also implements the incrementing of file names/destination folders
import os
import glob
import shutil
import re
num_sub_frames = 3
# No need to enumerate range list without start or step
for index in range(num_sub_frames+10):
path = os.makedirs('./sub_frame_{0:02}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for name in list_of_sub_frames:
m = re.search('(?P<fname>.+?)_(?P<num>\d+).png', name)
if m:
num = int(m.group('num'))+1
newname = '{0}_{1:02}.png'.format(m.group('fname'), num)
newpath = os.path.join('./sub_frame_{0:02}/'.format(num), newname)
print m.group() + ' -> ' + newpath
shutil.move(os.path.join(os.getcwd(), m.group()), newpath)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to append the grand-parent folder name to the filename? - python

Related

Remove file name and extension from path and just keep path

Python grab substring between two specific characters

How to insert a directory in the middle of a file path in Python?

Change only a part of filename

Python: moving file to a newly created directory

Categories

Resources