I want to add an character to a link.
The link is C:\Users\user\Documents\test.csv I want to add C:\Users\user\Documents\test_new.csv.
So you can see I added the _new to the filename.
Should I extract the name with Path(path).name) and then with Regex? What is the best option for do that?
As you said you want to "add" _new and not rename here is your solution and it is tiny just 2 lines of code apart from the varaible and the result, this is solution might be complex because i have compressed the code to take less memory and do the work fast, you could also change the keyword and the extension from the OUTPUT FUNCTION arguments
PATH = "C:\\User\\Folder\\file.csv"
def new_name(path, ext="csv", keyword="_new"):
print('\\'.join(path.split("\\")[:-1])+"\\"+path.split("\\")[-1].split(".")[0] + keyword + "." + ext)
new_name(PATH)
Here's a solution using the os module:
path = r"C:\User\Folder\file.csv"
root, ext = os.path.splitext(path)
new_path = f'{root}_new{ext}'
And here's one using pathlib:
path = pathlib.Path(r"C:\User\Folder\file.csv")
new_path = str(path.with_stem(path.stem + '_new'))
Related
I am working on a code to rename multiple files in a folder each month, that we currently have to do manually within my company. I am fairly new to Python, currently on lists in the Python Crash Course book.
I managed to put together the below code, but I have some questions:
import os
import glob
#Asks the user for the current month for renaming and for the path of the
#files
month = input("Which month's reports? Type the full name of the month: ")
path = input("Enter the file path: ")
pattern = path + "\A_BCD_012345" + "*.pdf"
result = glob.glob(pattern)
for file_name in result:
old_name = file_name
new_name = path + '\\' + old_name[90:99] + month + ' Report' + old_name[-4:]
print(new_name)
Now, my question is how to use a wildcard to be able to be more flexible, as my current code is not great.
The files always look like the same:
A_BCD_0123456_20220901_20220930_02_V2_0000_00000_FILE_5-8 digits number which is important to keep_AB0001.pdf
I would like the files to be renamed to: 5-8 digits of important number + company name + current month Report.
Where should I search to be able to finish my code? I know I am very close, the os.rename function is still missing as I did not want to add it yet, so only the wildcards are boggling my mind yet. The important digits always come after the 10th underscore character and before the 11th one. After the 11th underscore, I would like to purge everything too to rename as I would like to.
Ok, your last comment make it clearer.
First of all, you need to extract that data from the file name, not from the whole path name. Otherwise if there are _ in the dir name, you will have the same problem than with slicing.
And then, from this part, you can, for example, use split to separate from '_', and extract the part you want from it.
It could look like
import pathlib
dir = pathlib.Path(path)
result = dir.glob("A_BCD_012345*.pdf")
for fullpath in result:
filename = fullpath.stem
num = filename.split('_')[10]
new_name = num + month + " Report" + fullpath.suffix
new_fullpath = dir / new_name
# os.rename(str(fullpath), str(new_fullpath))
Here pathlib provides you 2 things
Extraction of the filename, without the parent directory name. So no worry about the _ or the number of letters that could be in it
Extraction of the suffix. So no need for the -4 (which is ok for ".pdf")
Plus, it helps your create something more os independent. As you see, no \ in my code. The operator / of pathlib concatenates a parent directory with a content name, using the needed separator for the os (so it will be a \ on windows and a / for unix); and also avoid redundancy you often end up with when concatenating path strings (having a \\ instead of a \).
But, well, pathlib is not vital here. You could do without it. I just took the occasion to show it here. You can also keep your glob.glob. But you need to extract the filename (without the path) for the extraction, if you don't want, as you said, to make assumption on what is in the path (number of chars for your method, or number of _ for the new one).
You can also do that with os.path.filename for example.
So another version closer to yours
import os
import glob
#Asks the user for the current month for renaming and for the path of the
#files
month = input("Which month's reports? Type the full name of the month: ")
path = input("Enter the file path: ")
pattern = path + "\A_BCD_012345" + "*.pdf"
result = glob.glob(pattern)
for file_name in result:
old_name = file_name
number = os.path.filename(file_name).split('_')[10]
new_name = path + '\\' + number + month + ' Report' + old_name[-4:]
print(new_name)
(Style note: the variable name "file_name" is not the best choice here, when it is important to make the difference between the full path, and the filename, which is the name, without the directory)
Last remark: you may also want to read about regular expressions (module re in python). They can be very useful to extract that kind of information. For example if you discover in the future that sometimes there are only 9 _ before the wanted part instead of 10, but with a pattern helping to see which one is the important one, a simple split may not cut it, when with regular expression you can do really convoluted extraction with one-liner.
Is it using the os.path.join() method, or concatenating strings? Examples:
fullpath1 = os.path.join(dir, subdir)
fullpath2 = os.path.join(dir, "subdir")
fullpath3 = os.path.join("dir", subdir)
fullpath4 = os.path.join(os.path.join(dir, subdir1), subdir2)
etc
or
fullpath1 = dir + "\\" + subdir
fullpath2 = dir + "\\" + "subdir"
fullpath3 = "dir" + "\\" + subdir
fullpath4 = dir + "\\" + subdir1 + \\" + subdir2"
etc
Edit with some more info.
This is a disagreement between a colleague and I. He insists the second method is "purer", while I insist using the built in functions are actually "purer" as it would make it more pythonic, and of course it makes the path handling OS-independent.
We tried searching to see if this question had been answered before, either here in SO or elsewhere, but found nothing
In my opinion (I know, no one asked) it is indeed using Path from pathlib
import pathlib
folder = pathlib.Path('path/to/folder')
subfolder = folder / 'subfolder'
file = subfolder / 'file1.txt'
Please read into pathlib for more useful functions, one I often use is resolve and folder.exists() to check if a folder exist or subfolder.mkdir(parents=True, exist_ok=True) to create a new folder including its parents. Those are random examples, the module can do a lot more.
See https://docs.python.org/3/library/pathlib.html
You can either use the first method using os.join().
A second option is to use the Pathlib module as #DeepSpace suggested.
But the other option is way worse and harder to read so you shouldn't use it.
Say I have the path fodler1/folder2/folder3, and I don't know in advance the names of the folders.
How can I remove the first part of this path to get only folder2/folder3?
You can use pathlib.Path for that:
from pathlib import Path
p = Path("folder1/folder2/folder3")
And either concatenate all parts except the first:
new_path = Path(*p.parts[1:])
Or create a path relative_to the first part:
new_path = p.relative_to(p.parts[0])
This code doesn't require specifying the path delimiter, and works for all pathlib supported platforms (Python >= 3.4).
Use str.split with 1 as maxsplit argument:
path = "folder1/folder2/folder3"
path.split("/", 1)[1]
# 'folder2/folder3'
If there is no / in there, you might be safer with:
path.split("/", 1)[-1] # pick the last of one or two tokens
but that depends on your desired logic in that case.
For better protability across systems, you could replace the slash "/" with os.path.sep:
import os
path.split(os.path.sep, 1)[1]
I have two paths to a file like below;
old_path_1 = 'old_path_1/12374994/12324515/000000.dcm'
old_path_2 = 'old_path_2/07-20-2016-DDSM-74994/1_full_24515/000000.dcm'
I want to have a new path like below in order to create .csv file, that contains the correct paths to all images.
new_path = 'old_path_1/07-20-2016-DDSM-74994/1_full_24515/000000.dcm'
** only 12374994/12324515 must be replaced by 07-20-2016-DDSM-74994/1_full_24515.
I have to do this as there are some inconsistent in the path of the original file. Could anyone show me how can we do this in python in simpler way?
this is what I did;
old_path_1.split('/')[0]+ '/' + old_path_2.split('/')[1]+'/' +old_path_2.split('/')[2]+'/' +old_path_1.split('/')[3]
is there any better way?
I think your question needs some more explanation about the general case you're dealing with.
However, if this is the only case you're dealing then you only need to replace the '2' in old_path_2 in a '1' so:
new_path = old_path_2
new_path[9] = '1'
Or, if you're looking for a one liner:
new_path = old_path_1[:10] + old_path_2[10:]
i have this path
c:\JAVA\eclipse\java-neon\eclipse\configuration\
i want to get back the last folder "configuration"
or on
c:\JAVA\eclipse\java-neon\eclipse\configuration\S\D\CV\S\D\D\AAAAA
get "AAAAA"
i don't found this function on os.path
thanks
Suppose you know you have a separator character sep, this should accomplish what you ask:
path.split(sep)[-1]
Where path is the str containing your path.
If you don't know what the separator is you can call
os.path.sep
You can use os.path.split to split according to path separator:
os.path.split(path)[-1]
please check the code
import os
def getFolderName(str):
if(str.endswith("\\")):
str = str[0:-2]
return os.path.split(str)[-1]
print(getFolderName(r'c:\JAVA\eclipse\java-neon\eclipse\configuration\S\D\CV\S\D\D\AAAAA'))
if you're wanting to explore your paths try something like this
def explore(path):
finalpaths = []
for paths in os.listdir(path):
nextpath = path + '/' + paths
if os.path.isdir(nextpath):
finalpaths.extend(explore(nextpath))
else:
finalpaths.append(path)
return finalpaths
then if you run
set(explore(path)
you'll get a list of all folders that can be in that directory (the lowest folder down you can get)
this works for unix, you might need to change it to \ rather than / for windows