Python: Retrieving and renaming indexed files in a directory

Python: Retrieving and renaming indexed files in a directory - python

I created a script to rename indexed files in a given directory
e.g If the directory has the following files >> (bar001.txt, bar004.txt, bar007.txt, foo2.txt, foo5.txt, morty.dat, rick.py). My script should be able to rename 'only' the indexed files and close gaps like this >> (bar001.txt, bar002.txt, bar003.txt, foo1.txt, foo2.txt...).
I put the full script below which doesn't work. The error is logical because no error messages are given but files in the directory remain unchanged.
#! python3
import os, re
working_dir = os.path.abspath('.')
# A regex pattern that matches files with prefix,numbering and then extension
pattern = re.compile(r'''
^(.*?) # text before the file number
(\d+) # file index
(\.([a-z]+))$ # file extension
''',re.VERBOSE)
# Method that renames the items of an array
def rename(array):
for i in range(len(array)):
matchObj = pattern.search(array[i])
temp = list(matchObj.group(2))
temp[-1] = str(i+1)
index = ''.join(temp)
array[i] = matchObj.group(1) + index + matchObj.group(3)
return(array)
array = []
directory = sorted(os.listdir('.'))
for item in directory:
matchObj = pattern.search(item)
if not matchObj:
continue
if len(array) == 0 or matchObj.group(1) in array[0]:
array.append(item)
else:
temp = array
newNames = rename(temp)
for i in range(len(temp)):
os.rename(os.path.join(working_dir,temp[i]),
os.path.join(working_dir,newNames[i]))
array.clear() #reset array for other files
array.append(item)

To summarise, you want to find every file whose name ends with a number and
fill in the gaps for every set of files that have the same name, save for the number suffix. You don't want to create any new files; rather, the ones with the highest numbers should be used to fill the gaps.
Since this summary translates rather nicely into code, I will do so rather than working off of your code.
import re
import os
from os import path
folder = 'path/to/folder/'
pattern = re.compile(r'(.*?)(\d+)(\.[a-z]+)$')
summary = {}
for fn in os.listdir(folder):
m = pattern.match(fn)
if m and path.isfile(path.join(folder, fn)):
# Create a key if there isn't one, add the 'index' to the set
# The first item in the tuple - len(n) - tells use how the numbers should be formatted later on
name, n, ext = m.groups()
summary.setdefault((name, ext), (len(n), set()))[1].add(int(n))
for (name, ext), (n, current) in summary.items():
required = set(range(1, len(current)+1)) # You want these
gaps = required - current # You're missing these
superfluous = current - required # You don't need these, so they should be renamed to fill the gaps
assert(len(gaps) == len(superfluous)), 'Something has gone wrong'
for old, new in zip(superfluous, gaps):
oldname = '{name}{n:>0{pad}}{ext}'.format(pad=n, name=name, n=old, ext=ext)
newname = '{name}{n:>0{pad}}{ext}'.format(pad=n, name=name, n=new, ext=ext)
print('{old} should be replaced with {new}'.format(old=oldname, new=newname))
That about covers it I think.

Related

Renaming a single directory of files with a specific syntax

I'm trying to rename and add pad the names of a few hundred files to the same length.
So far I've managed to correctly rename and pad the file names within my IDE but I'm not sure how I link that to actually rename the files themselves.
Atomic Samurai__________.png
BabyYodatheBased________.png
Baradum_________________.png
bcav____________________.png
This is the code that does the rename and the padding within my IDE:
import glob, os
pad_images = glob.glob(r"C:\Users\test\*.png")
split_images = []
for i in pad_images:
split = i.split("\\")[-1]
split_images.append(split)
longest_file_name = max(split_images, key=len)
longest_int = len(longest_file_name)
new_images = []
for i in split_images:
parts = i.split('.')
new_name = (parts[0]).ljust(longest_int, '_') + "." + parts[1])
I've been trying to get os.rename(old_name, new_name) to work but I'm not sure where I actually get the old name from as I've split things up into different for loops.

Try saving the old file names to a list and do all the modifications (split and rename) in a single loop thereafter:
path = "C:/Users/test"
images = [f for f in os.listdir(path) if f.endswith(".png")]
length = len(max(images, key=len))
for file in images:
parts = file.split("\\")[-1].split(".")
new_name = f'{parts[0].ljust(length,"_")}.{parts[1]}'
os.rename(os.path.join(path,file), os.path.join(path,new_name))

How to compare two files from filelist using regex?

The file is reading from a folder with os.listdir. After I entered regex of the file r'^[1-9\w]{2}_[1-9\w]{4}[1][7][\d\w]+\.[\d\w]+' and the similar for another file r'^[1-9\w]{2}_[1-9\w]{4}[1][8]+' . The condition of the comparison is that when the first seven symbols are matching then os.remove(os.path.join(dir_name, each)) . Example of a few: bh_txbh171002.xml, bh_txbh180101.xml, ce_txce170101.xml...
As I understood we can't use match because there's no any string and it returns None, moreover it compares file with regex only. I am thinking about the condition if folder.itself(file) and file.startswitch("......."): But can't figure out how could I point the first seven symbols of file names what should be compared.
Honestly, I've placed my worse version of the code in that request and since that time I learnt a little bit more: the link - press to check it up

Regex is the wrong tool here I do not have your files so I create randomized demodata:
import random
import string
random.seed(42) # make random repeatable
def generateFileNames(amount):
"""Generate 2*amount of names XX_XXXX with X in [a-zA-T0-9] with duplicates in it"""
def rndName():
"""generate one random name XX_XXXX with X in [a-zA-T0-9]"""
characters = string.ascii_lowercase + string.digits
return random.choices(characters,k=2)+['_']+random.choices(characters,k=4)
for _ in range(amount): # create 2*amount names, some duplicates
name = rndName()
yield ''.join(name) # yield name once
if random.randint(1,10) > 3: # more likely to get same names twice
yield ''.join(name) # same name twice
else:
yield ''.join(rndName()) # different 2nd name
def generateNumberParts(amount):
"""Generate 2*amount of 6-digit-strings, some with 17+18 as starting numbers"""
def rndNums(nr):
"""Generate nr digits as string list"""
return random.choices(string.digits,k=nr)
for _ in range(amount):
choi = rndNums(4)
# i am yielding 18 first to demonstrate that sorting later works
yield ''.join(['18']+choi) # 18xxxx numbers
if random.randint(1,10) > 5:
yield ''.join(['17']+choi) # 17xxxx
else:
yield ''.join(rndNums(6)) # make it something other
# half the amount of files generated
m = 10
# generate filenames
filenames = [''.join(x)+'.xml' for x in zip(generateFileNames(m),
generateNumberParts(m)]
Now I have my names as list and can start to find out which are dupes with newer timestamps:
# make a dict out of your filenames, use first 7 as key
# with list of values of files starting with this key a values:
fileDict={}
for names in filenames:
fileDict.setdefault(names[0:7],[]).append(names) # create key=[] or/and append names
for k,v in fileDict.items():
print (k, " " , v)
# get files to delete (all the lower nr of the value-list if multiple in it)
filesToDelete = []
for k,v in fileDict.items():
if len(v) == 1: # nothing to do, its only 1 file
continue
print(v, " to ", end = "" ) # debugging output
v.sort(key = lambda x: int(x[7:9])) # sort by a lambda that integerfies 17/18
print (v) # debugging output
filesToDelete.extend(v[:-1]) # add all but the last file to the delete list
print("")
print(filesToDelete)
Output:
# the created filenames in your dict by "key [values]"
xa_ji0y ['xa_ji0y188040.xml', 'xa_ji0y501652.xml']
v3_a3zm ['v3_a3zm181930.xml']
mm_jbqe ['mm_jbqe171930.xml']
ck_w5ng ['ck_w5ng180679.xml', 'ck_w5ng348136.xml']
zy_cwti ['zy_cwti184296.xml', 'zy_cwti174296.xml']
41_iblj ['41_iblj182983.xml', '41_iblj172983.xml']
5x_ff0t ['5x_ff0t187453.xml']
sd_bdw2 ['sd_bdw2177453.xml']
vn_vqjt ['vn_vqjt189618.xml', 'vn_vqjt179618.xml']
ep_q85j ['ep_q85j185198.xml', 'ep_q85j175198.xml']
vf_1t2t ['vf_1t2t180309.xml', 'vf_1t2t089040.xml']
11_ertj ['11_ertj188425.xml', '11_ertj363842.xml']
# sorting the names by its integer at 8/9 position of name
['xa_ji0y188040.xml','xa_ji0y501652.xml'] to ['xa_ji0y188040.xml','xa_ji0y501652.xml']
['ck_w5ng180679.xml','ck_w5ng348136.xml'] to ['ck_w5ng180679.xml','ck_w5ng348136.xml']
['zy_cwti184296.xml','zy_cwti174296.xml'] to ['zy_cwti174296.xml','zy_cwti184296.xml']
['41_iblj182983.xml','41_iblj172983.xml'] to ['41_iblj172983.xml','41_iblj182983.xml']
['vn_vqjt189618.xml','vn_vqjt179618.xml'] to ['vn_vqjt179618.xml','vn_vqjt189618.xml']
['ep_q85j185198.xml','ep_q85j175198.xml'] to ['ep_q85j175198.xml','ep_q85j185198.xml']
['vf_1t2t180309.xml','vf_1t2t089040.xml'] to ['vf_1t2t089040.xml','vf_1t2t180309.xml']
['11_ertj188425.xml','11_ertj363842.xml'] to ['11_ertj188425.xml','11_ertj363842.xml']
# list of files to delete
['xa_ji0y188040.xml', 'ck_w5ng180679.xml', 'zy_cwti174296.xml', '41_iblj172983.xml',
'vn_vqjt179618.xml', 'ep_q85j175198.xml', 'vf_1t2t089040.xml', '11_ertj188425.xml']

I can't understand what's wrong with my code. There I defined the list from certain folder, so that I could work at the strings in each file, right? Then I applied the conditions for filtering and further choice of the one file to delete.
import os
dir_name = "/Python/Test_folder/Schems"
filenames = os.listdir(dir_name)
for names in filenames:
filenames.setdefault(names[0:7],[]).append(names) # create key=[] or/and append names
for k,v in filenames.items():
filesToDelete = [] #ther's a syntax mistake. But I can't get it - there's the list or not?
for k,v in filenames.items():
if len(v) == 1:
continue
v.sort(key = lambda x: int(x[7:9]))
filesToDelete.extend(v[:-1])

Python: moving file to a newly created directory

I've got my script creating a bunch of files (size varies depending on inputs) and I want to be certain files in certain folders based on the filenames.
So far I've got the following but although directories are being created no files are being moved, I'm not sure if the logic in the final for loop makes any sense.
In the below code I'm trying to move all .png files ending in _01 into the sub_frame_0 folder.
Additionally is their someway to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
for index, i in enumerate(range(num_sub_frames+10)):
path = os.makedirs('./sub_frame_{}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if i == '*_01.png':
shutil.move(os.path.join(os.getcwd(), '*_01.png'), './sub_frame_0/')

As you said, the logic of the final loop does not make sense.
if i == '*_01.ng'
It would evaluate something like 'image_01.png' == '*_01.png' and be always false.
Regexp should be the way to go, but for this simple case you just can slice the number from the file name.
for i in list_of_sub_frames:
frame = int(i[-6:-4]) - 1
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_{}/'.format(frame))
If i = 'image_01.png' then i[-6:-4] would take '01', convert it to integer and then just subtract 1 to follow your schema.

A simple fix would be to check if '*_01.png' is in the file name i and change the shutil.move to include i, the filename. (It's also worth mentioning that iis not a good name for a filepath
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if '*_01.png' in i:
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_0/')
Additionally is [there some way] to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
You could create file names doing something as simple as this:
for i in range(10):
#simple string parsing
file_name = 'sub_frame_'+str(i)
folder_name = 'folder_sub_frame_'+str(i)

Here is a complete example using regular expressions. This also implements the incrementing of file names/destination folders
import os
import glob
import shutil
import re
num_sub_frames = 3
# No need to enumerate range list without start or step
for index in range(num_sub_frames+10):
path = os.makedirs('./sub_frame_{0:02}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for name in list_of_sub_frames:
m = re.search('(?P<fname>.+?)_(?P<num>\d+).png', name)
if m:
num = int(m.group('num'))+1
newname = '{0}_{1:02}.png'.format(m.group('fname'), num)
newpath = os.path.join('./sub_frame_{0:02}/'.format(num), newname)
print m.group() + ' -> ' + newpath
shutil.move(os.path.join(os.getcwd(), m.group()), newpath)

Python - Error when opening two files [duplicate]

I'm creating a program that will create a file and save it to the directory with the filename sample.xml. Once the file is saved when i try to run the program again it overwrites the old file into the new one because they do have the same file name. How do I increment the file names so that whenever I try to run the code again it will going to increment the file name. and will not overwrite the existing one. I am thinking of checking the filename first on the directory and if they are the same the code will generate a new filename:
fh = open("sample.xml", "w")
rs = [blockresult]
fh.writelines(rs)
fh.close()

I would iterate through sample[int].xml for example and grab the next available name that is not used by a file or directory.
import os
i = 0
while os.path.exists("sample%s.xml" % i):
i += 1
fh = open("sample%s.xml" % i, "w")
....
That should give you sample0.xml initially, then sample1.xml, etc.
Note that the relative file notation by default relates to the file directory/folder you run the code from. Use absolute paths if necessary. Use os.getcwd() to read your current dir and os.chdir(path_to_dir) to set a new current dir.

Sequentially checking each file name to find the next available one works fine with small numbers of files, but quickly becomes slower as the number of files increases.
Here is a version that finds the next available file name in log(n) time:
import os
def next_path(path_pattern):
"""
Finds the next free path in an sequentially named list of files
e.g. path_pattern = 'file-%s.txt':
file-1.txt
file-2.txt
file-3.txt
Runs in log(n) time where n is the number of existing files in sequence
"""
i = 1
# First do an exponential search
while os.path.exists(path_pattern % i):
i = i * 2
# Result lies somewhere in the interval (i/2..i]
# We call this interval (a..b] and narrow it down until a + 1 = b
a, b = (i // 2, i)
while a + 1 < b:
c = (a + b) // 2 # interval midpoint
a, b = (c, b) if os.path.exists(path_pattern % c) else (a, c)
return path_pattern % b
To measure the speed improvement I wrote a small test function that creates 10,000 files:
for i in range(1,10000):
with open(next_path('file-%s.foo'), 'w'):
pass
And implemented the naive approach:
def next_path_naive(path_pattern):
"""
Naive (slow) version of next_path
"""
i = 1
while os.path.exists(path_pattern % i):
i += 1
return path_pattern % i
And here are the results:
Fast version:
real 0m2.132s
user 0m0.773s
sys 0m1.312s
Naive version:
real 2m36.480s
user 1m12.671s
sys 1m22.425s
Finally, note that either approach is susceptible to race conditions if multiple actors are trying to create files in the sequence at the same time.

def get_nonexistant_path(fname_path):
"""
Get the path to a filename which does not exist by incrementing path.
Examples
--------
>>> get_nonexistant_path('/etc/issue')
'/etc/issue-1'
>>> get_nonexistant_path('whatever/1337bla.py')
'whatever/1337bla.py'
"""
if not os.path.exists(fname_path):
return fname_path
filename, file_extension = os.path.splitext(fname_path)
i = 1
new_fname = "{}-{}{}".format(filename, i, file_extension)
while os.path.exists(new_fname):
i += 1
new_fname = "{}-{}{}".format(filename, i, file_extension)
return new_fname
Before you open the file, call
fname = get_nonexistant_path("sample.xml")
This will either give you 'sample.xml' or - if this alreay exists - 'sample-i.xml' where i is the lowest positive integer such that the file does not already exist.
I recommend using os.path.abspath("sample.xml"). If you have ~ as home directory, you might need to expand it first.
Please note that race conditions might occur with this simple code if you have multiple instances running at the same time. If this might be a problem, please check this question.

Try setting a count variable, and then incrementing that variable nested inside the same loop you write your file in. Include the count loop inside the name of the file with an escape character, so every loop ticks +1 and so does the number in the file.
Some code from a project I just finished:
numberLoops = #some limit determined by the user
currentLoop = 1
while currentLoop < numberLoops:
currentLoop = currentLoop + 1
fileName = ("log%d_%d.txt" % (currentLoop, str(now())))
For reference:
from time import mktime, gmtime
def now():
return mktime(gmtime())
which is probably irrelevant in your case but i was running multiple instances of this program and making tons of files. Hope this helps!

The two ways to do it are:
Check for the existence of the old file and if it exists try the next file name +1
save state data somewhere
an easy way to do it off the bat would be:
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
filenum+=1
my_next_file = open(filename+str(filenum)+".py",'w')
as a design thing, while True slows things down and isn't a great thing for code readability
edited: #EOL contributions/ thoughts
so I think not having .format is more readable at first glance - but using .format is better for generality and convention so.
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(pth.abspath(filename+str(filenum)+".py")):
filenum+=1
my_next_file = open("{}{}.py".format(filename, filenum),'w')
# or
my_next_file = open(filename + "{}.py".format(filenum),'w')
and you don't have to use abspath - you can use relative paths if you prefer, I prefer abs path sometimes because it helps to normalize the paths passed :).
import os.path as pth
filename = "myfile"
filenum = 1
while (pth.exists(filename+str(filenum)+".py"):
filenum+=1
##removed for conciseness

Another solution that avoids the use of while loop is to use os.listdir() function which returns a list of all the files and directories contained in a directory whose path is taken as an argument.
To answer the example in the question, supposing that the directory you are working in only contains "sample_i.xlm" files indexed starting at 0, you can easily obtain the next index for the new file with the following code.
import os
new_index = len(os.listdir('path_to_file_containing_only_sample_i_files'))
new_file = open('path_to_file_containing_only_sample_i_files/sample_%s.xml' % new_index, 'w')

You can use a while loop with a counter which checks if a file with a name and the counter's value exists if it does then move on else break and make a file.
I have done it in this way for one of my projects:`
from os import path
import os
i = 0
flnm = "Directory\\Filename" + str(i) + ".txt"
while path.exists(flnm) :
flnm = "Directory\\Filename" + str(i) + ".txt"
i += 1
f = open(flnm, "w") #do what you want to with that file...
f.write(str(var))
f.close() # make sure to close it.
`
Here the counter i starts from 0 and a while loop checks everytime if the file exists, if it does it moves on else it breaks out and creates a file from then you can customize. Also make sure to close it else it will result in the file being open which can cause problems while deleting it.
I used path.exists() to check if a file exists.
Don't do from os import * it can cause problem when we use open() method as there is another os.open() method too and it can give the error. TypeError: Integer expected. (got str)
Else wish u a Happy New Year and to all.

Without storing state data in an extra file, a quicker solution to the ones presented here would be to do the following:
from glob import glob
import os
files = glob("somedir/sample*.xml")
files = files.sorted()
cur_num = int(os.path.basename(files[-1])[6:-4])
cur_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()
This will also keep incrementing, even if some of the lower numbered files disappear.
The other solution here that I like (pointed out by Eiyrioü) is the idea of keeping a temporary file that contains your most recent number:
temp_fh = open('somedir/curr_num.txt', 'r')
curr_num = int(temp_fh.readline().strip())
curr_num += 1
fh = open("somedir/sample%s.xml" % cur_num, 'w')
rs = [blockresult]
fh.writelines(rs)
fh.close()

Another example using recursion
import os
def checkFilePath(testString, extension, currentCount):
if os.path.exists(testString + str(currentCount) +extension):
return checkFilePath(testString, extension, currentCount+1)
else:
return testString + str(currentCount) +extension
Use:
checkFilePath("myfile", ".txt" , 0)

I needed to do something similar, but for output directories in a data processing pipeline. I was inspired by Vorticity's answer, but added use of regex to grab the trailing number. This method continues to increment the last directory, even if intermediate numbered output directories are deleted. It also adds leading zeros so the names will sort alphabetically (i.e. width 3 gives 001 etc.)
def get_unique_dir(path, width=3):
# if it doesn't exist, create
if not os.path.isdir(path):
log.debug("Creating new directory - {}".format(path))
os.makedirs(path)
return path
# if it's empty, use
if not os.listdir(path):
log.debug("Using empty directory - {}".format(path))
return path
# otherwise, increment the highest number folder in the series
def get_trailing_number(search_text):
serch_obj = re.search(r"([0-9]+)$", search_text)
if not serch_obj:
return 0
else:
return int(serch_obj.group(1))
dirs = glob(path + "*")
num_list = sorted([get_trailing_number(d) for d in dirs])
highest_num = num_list[-1]
next_num = highest_num + 1
new_path = "{0}_{1:0>{2}}".format(path, next_num, width)
log.debug("Creating new incremented directory - {}".format(new_path))
os.makedirs(new_path)
return new_path
get_unique_dir("output")

Here is one more example. Code tests whether a file exists in the directory or not if exist it does increment in the last index of the file name and saves
The typical file name is: Three letters of month_date_lastindex.txt ie.e.g.May10_1.txt
import time
import datetime
import shutil
import os
import os.path
da=datetime.datetime.now()
data_id =1
ts = time.time()
st = datetime.datetime.fromtimestamp(ts).strftime("%b%d")
data_id=str(data_id)
filename = st+'_'+data_id+'.dat'
while (os.path.isfile(str(filename))):
data_id=int(data_id)
data_id=data_id+1
print(data_id)
filename = st+'_'+str(data_id)+'.dat'
print(filename)
shutil.copyfile('Autonamingscript1.py',filename)
f = open(filename,'a+')
f.write("\n\n\n")
f.write("Data comments: \n")
f.close()

Continues sequence numbering from the given filename with or without the appended sequence number.
The given filename will be used if it doesn't exist, otherwise a sequence number is applied, and gaps between numbers will be candidates.
This version is quick if the given filename is not already sequenced or is the sequentially highest numbered pre-existing file.
for example the provided filename can be
sample.xml
sample-1.xml
sample-23.xml
import os
import re
def get_incremented_filename(filename):
name, ext = os.path.splitext(filename)
seq = 0
# continue from existing sequence number if any
rex = re.search(r"^(.*)-(\d+)$", name)
if rex:
name = rex[1]
seq = int(rex[2])
while os.path.exists(filename):
seq += 1
filename = f"{name}-{seq}{ext}"
return filename

My 2 cents: an always increasing, macOS-style incremental naming procedure
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir ; then
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (1) ; then
get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (2) ; etc.
If ./some_new_dir (2) exists but not ./some_new_dir (1), then get_increased_path("./some_new_dir").mkdir() creates ./some_new_dir (3) anyways, so that indexes always increase and you always know which is the latest
from pathlib import Path
import re
def get_increased_path(file_path):
fp = Path(file_path).resolve()
f = str(fp)
vals = []
for n in fp.parent.glob("{}*".format(fp.name)):
ms = list(re.finditer(r"^{} \(\d+\)$".format(f), str(n)))
if ms:
m = list(re.finditer(r"\(\d+\)$", str(n)))[0].group()
vals.append(int(m.replace("(", "").replace(")", "")))
if vals:
ext = " ({})".format(max(vals) + 1)
elif fp.exists():
ext = " (1)"
else:
ext = ""
return fp.parent / (fp.name + ext + fp.suffix)

Rename a group of files in python

I'm trying to rename some files in a directory using Python. I've looked around the forums here, and because I'm a newbie, I can't adapt what I need from what is out there.
Say in a directory I have a group of files called
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125225754_7_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125236347_8_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125236894_5_S110472_I238621.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125248691_6_S110472_I238621.jpg
and I want to remove "125225754", "125236347", "125236894" and "125248691" here so my resulting filename will be
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_7_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_8_S110472_I238620.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_5_S110472_I238621.jpg
FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_6_S110472_I238621.jpg
I'm trying to use the os.path.split but it's not working properly.
I have also considered using string manipulations, but have not been successful with that either.
Any help would be greatly appreciated. Thanks.

os.path.split splits a path (/home/mattdmo/work/projects/python/2014/website/index.html) into its component directories and file name.
As #wim suggested, if the file names are all exactly the same length, you can use string slicing to split out whatever occurs between two indexes, then join them back together. So, in your example,
filename = "FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602125248691_6_S110472_I238621.jpg"
newname = filename[:57] + filename[66:]
print(newname)
# FILENAME_002_S_0295_MR_3_Plane_Localizer__br_raw_20110602_6_S110472_I238621.jpg
This takes the first 58 characters of the string (remember in Python string indexes are 0-based) and joins it to all characters after the 67 one.
Now that you can do this, just put all the filenames into a list and iterate over it to get your new filenames:
import os
filelist = os.listdir('.') # get files in current directory
for filename in filelist:
if ".jpg" in filename: # only process pictures
newname = filename[:57] + filename[66:]
print(filename + " will be renamed as " + newname)
os.rename(filename, newname)

Can we assume that the files are all the same name up to the date _20110602[difference here]?
If that's the case then it's actually fairly easy to do.
First you need the index of that difference. Starting from character 0 which is 'F' in this case, count right until you hit that first difference. You can programatically do this by this:
s1 = 'String1'
s2 = 'String2'
i = 0
while(i < len(s1) && i < len(s2)):
if(s1[i] == s2[i]) i++
else break
And i is now set to the first difference of s1 and s2 (or if there is none, their length).
From here you know that you want to strip everything from this index to the following _.
j = i
while(j < len(s1)):
if(s1[j] != '_') j++
else break
# j is the index of the _ character after i
p1 = s1[:i] # Everything up to i
p2 = s1[j:] # Everything after j
s1 = p1.concat(p2)
# Do the same for s2, or even better, do this in a loop.
The only caveat here is that they have to be the same name up to this point for this to work. If they are the same length then this is still fairly easy, but you have to figure out yourself what the indices are rather than using the string difference method.

If you always have exact string: '20110602' in the file names stored in 'my_directory' folder:
import re #for regular expression
from os import rename
from glob import glob
for filename in glob('my_directory/*.jpg'):
match = re.search('20110602', filename)
if match:
newname = re.sub(r'20110602[0-9]+_','20110602_', filename)
rename(filename, newname)
A more general code to match any YYYYMMDD (or YYYYDDMM):
import re #for regular expression
from os import rename
from glob import glob
for filename in glob('my_directory/*.jpg'):
match = re.search(r'\d{4}\d{2}\d{2}\d+_', filename)
if match:
newname = re.sub(r'(\d{4}\d{2}\d{2})(\d+)(_)', '\\1'+'\\3', filename)
rename(filename, newname)
'\\1': This is match.group(1) that refers to the first set of parentheses
'\\3': This is match.group(3) that refers to the third set of parentheses
\d or [0-9]: are the same. They match any digit
{number}: the number of times the previous token (in this case a digit) are repeated
+ : 1 or more of previous expression (in this case a digit)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Retrieving and renaming indexed files in a directory - python

Related

Renaming a single directory of files with a specific syntax

How to compare two files from filelist using regex?

Python: moving file to a newly created directory

Python - Error when opening two files [duplicate]

Rename a group of files in python

Categories

Resources