Renaming files in Folder - python

I am working on a code to rename multiple files in a folder each month, that we currently have to do manually within my company. I am fairly new to Python, currently on lists in the Python Crash Course book.
I managed to put together the below code, but I have some questions:
import os
import glob
#Asks the user for the current month for renaming and for the path of the
#files
month = input("Which month's reports? Type the full name of the month: ")
path = input("Enter the file path: ")
pattern = path + "\A_BCD_012345" + "*.pdf"
result = glob.glob(pattern)
for file_name in result:
old_name = file_name
new_name = path + '\\' + old_name[90:99] + month + ' Report' + old_name[-4:]
print(new_name)
Now, my question is how to use a wildcard to be able to be more flexible, as my current code is not great.
The files always look like the same:
A_BCD_0123456_20220901_20220930_02_V2_0000_00000_FILE_5-8 digits number which is important to keep_AB0001.pdf
I would like the files to be renamed to: 5-8 digits of important number + company name + current month Report.
Where should I search to be able to finish my code? I know I am very close, the os.rename function is still missing as I did not want to add it yet, so only the wildcards are boggling my mind yet. The important digits always come after the 10th underscore character and before the 11th one. After the 11th underscore, I would like to purge everything too to rename as I would like to.

Ok, your last comment make it clearer.
First of all, you need to extract that data from the file name, not from the whole path name. Otherwise if there are _ in the dir name, you will have the same problem than with slicing.
And then, from this part, you can, for example, use split to separate from '_', and extract the part you want from it.
It could look like
import pathlib
dir = pathlib.Path(path)
result = dir.glob("A_BCD_012345*.pdf")
for fullpath in result:
filename = fullpath.stem
num = filename.split('_')[10]
new_name = num + month + " Report" + fullpath.suffix
new_fullpath = dir / new_name
# os.rename(str(fullpath), str(new_fullpath))
Here pathlib provides you 2 things
Extraction of the filename, without the parent directory name. So no worry about the _ or the number of letters that could be in it
Extraction of the suffix. So no need for the -4 (which is ok for ".pdf")
Plus, it helps your create something more os independent. As you see, no \ in my code. The operator / of pathlib concatenates a parent directory with a content name, using the needed separator for the os (so it will be a \ on windows and a / for unix); and also avoid redundancy you often end up with when concatenating path strings (having a \\ instead of a \).
But, well, pathlib is not vital here. You could do without it. I just took the occasion to show it here. You can also keep your glob.glob. But you need to extract the filename (without the path) for the extraction, if you don't want, as you said, to make assumption on what is in the path (number of chars for your method, or number of _ for the new one).
You can also do that with os.path.filename for example.
So another version closer to yours
import os
import glob
#Asks the user for the current month for renaming and for the path of the
#files
month = input("Which month's reports? Type the full name of the month: ")
path = input("Enter the file path: ")
pattern = path + "\A_BCD_012345" + "*.pdf"
result = glob.glob(pattern)
for file_name in result:
old_name = file_name
number = os.path.filename(file_name).split('_')[10]
new_name = path + '\\' + number + month + ' Report' + old_name[-4:]
print(new_name)
(Style note: the variable name "file_name" is not the best choice here, when it is important to make the difference between the full path, and the filename, which is the name, without the directory)
Last remark: you may also want to read about regular expressions (module re in python). They can be very useful to extract that kind of information. For example if you discover in the future that sometimes there are only 9 _ before the wanted part instead of 10, but with a pattern helping to see which one is the important one, a simple split may not cut it, when with regular expression you can do really convoluted extraction with one-liner.

Related

How to correctly apply a RE for obtaining the last name (of a file or folder) from a given path and print it on Python?

I have wrote a code which creates a dictionary that stores all the absolute paths of folders from the current path as keys, and all of its filenames as values, respectively. This code would only be applied to paths that have folders which only contain file images. Here:
import os
import re
# Main method
the_dictionary_list = {}
for name in os.listdir("."):
if os.path.isdir(name):
path = os.path.abspath(name)
print(f'\u001b[45m{path}\033[0m')
match = re.match(r'/(?:[^\\])[^\\]*$', path)
print(match)
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[path] = list_of_file_contents
print('\n')
print('\u001b[43mthe_dictionary_list:\033[0m')
print(the_dictionary_list)
The thing is, that I want this dictionary to store only the last folder names as keys instead of its absolute paths, so I was planning to use this re /(?:[^\\])[^\\]*$, which would be responsible for obtaining the last name (of a file or folder from a given path), and then add those last names as keys in the dictionary in the for loop.
I wanted to test the code above first to see if it was doing what I wanted, but it didn't seem so, the value of the match variable became None in each iteration, which didn't make sense to me, everything else works fine.
So I would like to know what I'm doing wrong here.
I would highly recommend to use the builtin library pathlib. It would appear you are interested in the f.name part. Here is a cheat sheet.
I decided to rewrite the code above, in case of wanting to apply it only in the current directory (where this program would be found).
import os
# Main method
the_dictionary_list = {}
for subdir in os.listdir("."):
if os.path.isdir(subdir):
path = os.path.abspath(subdir)
print(f'\u001b[45m{path}\033[0m')
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[subdir] = list_of_file_contents
print('\n')
print('\033[1;37;40mThe dictionary list:\033[0m')
for subdir in the_dictionary_list:
print('\u001b[43m'+subdir+'\033[0m')
for archivo in the_dictionary_list[subdir]:
print(" ", archivo)
print('\n')
print(the_dictionary_list)
This would be useful in case the user wants to run the program with a double click on a specific location (my personal case)

Add character in a link

I want to add an character to a link.
The link is C:\Users\user\Documents\test.csv I want to add C:\Users\user\Documents\test_new.csv.
So you can see I added the _new to the filename.
Should I extract the name with Path(path).name) and then with Regex? What is the best option for do that?
As you said you want to "add" _new and not rename here is your solution and it is tiny just 2 lines of code apart from the varaible and the result, this is solution might be complex because i have compressed the code to take less memory and do the work fast, you could also change the keyword and the extension from the OUTPUT FUNCTION arguments
PATH = "C:\\User\\Folder\\file.csv"
def new_name(path, ext="csv", keyword="_new"):
print('\\'.join(path.split("\\")[:-1])+"\\"+path.split("\\")[-1].split(".")[0] + keyword + "." + ext)
new_name(PATH)
Here's a solution using the os module:
path = r"C:\User\Folder\file.csv"
root, ext = os.path.splitext(path)
new_path = f'{root}_new{ext}'
And here's one using pathlib:
path = pathlib.Path(r"C:\User\Folder\file.csv")
new_path = str(path.with_stem(path.stem + '_new'))

How to remove characters from multiple files in python

I'm, trying to write a simple program to batch rename files in a folder.
file format:
11170_tcd001-20160824-094716.txt
11170_tcd001-20160824-094716.rst
11170_tcd001-20160824-094716.raw
I have 48 of the above with a different 14 digit character configuration after the first "-".
My final goal is to convert the above to:
11170_tcd001.txt
11170_tcd001.rst
11170_tcd001.raw
I know it's possible to os.rename files in python. However, I can't figure out how to batch rename multiple files with a different character configuration.
Is this possible?
some pseudocode below of what I would like to achieve.
import os
pathiter = (os.path.join(root, filename)
for root, _, filenames in os.walk(folder)
for filename in filenames
)
for path in pathiter:
newname = path.replace('14 digits.txt', ' 0 digits.txt')
if newname != path:
os.rename(path,newname)
If you are looking for a non-regex approach and considering your files all match that particular pattern you are expecting, what you can do first is get the extension of the file using splitext:
from os.path import splitext
file_name = '11170_tcd001-20160824-094716.txt'
extension = splitext(file_name)[1]
print(extension) # outputs: .txt
Then, with the extension in hand, split the file_name on the - and get the first item since you know that is the part that you want to keep:
new_filename = file_name.split('-')[0]
print(new_filename) # 11170_tcd001
Now, append the extension:
new_filename = new_filename + extension
print(new_filename) # 11170_tcd001.txt
Now you can proceed with the rename:
os.rename(file_name, new_filename)
You should probably try using regular expressions, like
import re
<...>
newfilename = re.sub(r'-\d{8}-\d{6}\b', '', oldfilename)
<...>
This will replace any 'hyphen, 8 digits, hyphen, 6 digits' not followed by letter, digit or underscore with empty string in your filename. Hope I got you right.

Removing numbers and spaces in multiple file names with Python

I am trying to rename multiple mp3 files I have in a folder. They start with something like "1 Hotel California - The Eagles" and so on. I would like it to be just "Hotel California - The Eagles".
Also, there could be a "05 Hotel California - The Eagles" as well, which means removing the number from a different files would create duplicates, which is the problem I am facing. I want it to replace existing files/overwrite/delete one of them or whatever a solution might be.
P.S, Adding "3" to the "1234567890 " would remove the "3" from the .mp3 extension
I am new to python, but here is the code I am using to implement this
import os
def renamefiles():
list = os.listdir(r"E:\NEW")
print(list)
path = os.getcwd()
print(path)
os.chdir(r"E:\NEW")
for name in list:
os.rename(name, name.translate(None, "124567890 "))
os.chdir(path)
renamefiles()
And here is the error I get
WindowsError: [Error 183] Cannot create a file when that file already exists
Any help on how I could rename the files correctly would be highly appreciated!
You need to verify that the names being changed actually changed. If the name doesn't have digits or spaces in it, the translate will return the same string, and you'll try to rename name to name, which Windows rejects. Try:
for name in list:
newname = name.translate(None, "124567890 ")
if name != newname:
os.rename(name, newname)
Note, this will still fail if the file target exists, which you'd probably want if you were accidentally collapsing two names into one. But if you want silent replace behavior, if you're on Python 3.3 or higher, you can change os.rename to os.replace to silently overwrite; on earlier Python, you can explicitly os.remove before calling os.rename.
You can catch an OSError and also use glob to find the .mp3 files:
import os
from glob import iglob
def renamefiles(pth):
os.chdir(pth)
for name in iglob("*.mp3"):
try:
os.rename(name, name.translate(None, "124567890").lstrip())
except OSError:
print("Caught error for {}".format(name))
# os.remove(name) ?
What you do when you catch the error is up to you, you could keep some record of names found and increment a count for each or leave as is.
If the numbers are always at the start you can also just lstrip then away so you can then use 3 safely:
os.rename(name, name.lstrip("0123456789 "))
using one of your example strings:
In [2]: "05 Hotel California - The Eagles.mp3".lstrip("01234567890 ")
Out[2]: 'Hotel California - The Eagles.mp3'
Using your original approach could never work as desired as you would remove all spaces:
In [3]: "05 Hotel California - The Eagles.mp3".translate(None,"0124567890 ")
Out[3]: 'HotelCalifornia-TheEagles.mp3'
If you don't care what file gets overwritten you can use shutil.move:
import os
from glob import iglob
from shutil import move
def renamefiles(pth):
os.chdir(pth)
for name in iglob("*.mp3"):
move(name, name.translate(None, "124567890").lstrip())
On another note, don't use list as a variable name.
instead of using name.translate, import the re lib (regular expressions) and use something like
"(?:\d*)?\s*(.+?).mp3"
as your pattern. You can then use
Match.group(1)
as your rename.
For dealing with multiple files, add an if statement that checks if the file already exists in the library like this:
os.path.exists(dirpath)
where dirpath is the directory that you want to check in
I was unable to easily get any of the answers to work with Python 3.5, so here's one that works under that condition:
import os
import re
def rename_files():
path = os.getcwd()
file_names = os.listdir(path)
for name in file_names:
os.rename(name, re.sub("[0-9](?!\d*$)", "", name))
rename_files()
This should work for a list of files like "1 Hotel California - The Eagles.mp3", renaming them to "Hotel California - The Eagles.mp3" (so the extension is untouched).
Ok so what you want is:
create a new filename removing leading numbers
if that new filename exists, remove it
rename the file to that new filename
The following code should work (not tested).
import os
import string
class FileExists(Exception):
pass
def rename_files(path, ext, remove_existing=True):
for fname in os.listdir(path):
# test if the file name ends with the expected
# extension else skip it
if not fname.endswith(ext):
continue
# chdir is not a good idea, better to work
# with absolute path whenever possible
oldpath = os.path.join(path, fname)
# remove _leading_ digits then remove all whitespaces
newname = fname.lstrip(string.digits).strip()
newpath = os.path.join(path, newname)
# check if the file already exists
if os.path.exists(newpath):
if remove_existing:
# it exists and we were told to
# remove existing file:
os.remove(newpath)
else:
# it exists and we were told to
# NOT remove existing file:
raise FileExists(newpath)
# ok now we should be safe
os.rename(oldpath, newpath)
# only execute the function if we are called directly
# we dont want to do anything if we are just imported
# from the Python shell or another script or module
if __name__ == "__main__":
# exercice left to the reader:
# add command line options / arguments handling
# to specify the path to browse, the target
# extension and whether to remove existing files
# or not
rename_files(r"E:\NEW", ".mp3", True)
You just need to change directory to where *.mp3 files are located and execute 2 lines of below with python:
import os,re
for filename in os.listdir():
os.rename(filename, filname.strip(re.search("[0-9]{2}", filename).group(0)))

os.path.join producing an extra forward slash

I am trying to join an absolute path and variable folder path depending on the variable run. However when I use the following code it inserts a forward slash after a string, which I don't require. How can I remove the slash after Folder_?
import os
currentwd = os.getcwd()
folder = '001'
run_folder = os.path.join(currentwd, 'Folder_', folder)
print run_folder
The output I get using this code is:
/home/xkr/Workspace/Folder_/001
You are asking os.path.join() to take multiple path elements and join them. It is doing its job.
Don't use os.path.join() to produce filenames; just use concatenation:
run_folder = os.path.join(currentwd, 'Folder_' + folder)
or use string formatting; the latter can give you such nice features such as automatic padding of integers:
folder = 1
run_folder = os.path.join(currentwd, 'Folder_{:03d}'.format(folder))
That way you can increment folder past 10 or 100 and still have the correct number of leading zeros.
Note that you don't have to use os.getcwd(); you could also use os.path.abspath(), it'll make relative paths absolute based on the current working directory:
run_folder = os.path.abspath('Folder_' + folder)

Categories

Resources