So I have a lot of folders with a certain name. In each folder I have +200 items. The items inside the folders has names like:
CT.34562346.246.dcm
RD.34562346.dcm
RN.34562346.LAO.dcm
And some along that style.
I now wish to rename all files inside all folders so that the number (34562346) is replaced with the name of the folder. So for example in the folder named "1" the files inside should become:
CT.1.246.dcm
RD.1.dcm
RN.1.LAO.dcm
So only the large number is replaced. And yes, all files are similar like this. It would be the number after the first . that should be renamed.
So far I have:
import os
base_dir = "foo/bar/" #In this dir I have all my folders
dir_list = []
for dirname in os.walk(base_dir):
dir_list.append(dirname[0])
This one just lists the entire paths of all folders.
dir_list_split = []
for name in dir_list[1:]: #The 1 is because it lists the base_dir as well
x = name.split('/')[2]
dir_list_split.append(x)
This one extracts the name of each folder.
And then the next thing would be to enter the folders and rename them. And I'm kind of stuck here ?
The pathlib module, which was new in Python 3.4, is often overlooked. I find that it often makes code simpler than it would otherwise be with os.walk.
In this case, .glob('**/*.*') looks recursively through all of the folders and subfolders that I created in a sample folder called example. The *.* part means that it considers all files.
I put path.parts in the loop to show you that pathlib arranges to parse pathnames for you.
I check that the string constant '34562346' is in its correct position in each filename first. If it is then I simply replace it with the items from .parts that is the next level of folder 'up' the folders tree.
Then I can replace the rightmost element of .parts with the newly altered filename to create the new pathname and then do the rename. In each case I display the new pathname, if it was appropriate to create one.
>>> from pathlib import Path
>>> from os import rename
>>> for path in Path('example').glob('**/*.*'):
... path.parts
... if path.parts[-1][3:11]=='34562346':
... new_name = path.parts[-1].replace('34562346', path.parts[-2])
... new_path = '/'.join(list(path.parts[:-1])+[new_name])
... new_path
... ## rename(str(path), new_path)
... else:
... 'no change'
...
('example', 'folder_1', 'id.34562346.6.a.txt')
'example/folder_1/id.folder_1.6.a.txt'
('example', 'folder_1', 'id.34562346.wax.txt')
'example/folder_1/id.folder_1.wax.txt'
('example', 'folder_2', 'subfolder_1', 'ty.34562346.90.py')
'example/folder_2/subfolder_1/ty.subfolder_1.90.py'
('example', 'folder_2', 'subfolder_1', 'tz.34562346.98.py')
'example/folder_2/subfolder_1/tz.subfolder_1.98.py'
('example', 'folder_2', 'subfolder_2', 'doc.34.34562346.implication.rtf')
'no change'
This will rename files in subdirectories too:
import os
rootdir = "foo" + os.sep + "bar"
for subdir, dirs, files in os.walk(rootdir):
for file in files:
filepath = subdir + os.sep + file
foldername = subdir.split(os.sep)[-1]
number = ""
foundnumber = False
for c in filepath:
if c.isdigit():
foundnumber = True
number = number + c
elif foundnumber:
break
if foundnumber:
newfilepath = filepath.replace(number,foldername)
os.rename(filepath, newfilepath)
Split each file name on the . and replace the second item with the file name, then join on .'s again for the new file name. Here's some sample code that demonstrates the concept.
folder_name = ['1', '2']
file_names = ['CT.2345.234.dcm', 'BG.234234.222.dcm', "RA.3342.221.dcm"]
for folder in folder_name:
new_names = []
for x in file_names:
file_name = x.split('.')
file_name[1] = folder
back_together = '.'.join(file_name)
new_names.append(back_together)
print(new_names)
Output
['CT.1.234.dcm', 'BG.1.222.dcm', 'RA.1.221.dcm']
['CT.2.234.dcm', 'BG.2.222.dcm', 'RA.2.221.dcm']
Related
I have a folder with many files named like homeXXX_roomXXX_high.csv or homeXXX_roomXXX_low.csv, where the XXX part is replaced with a three-digit number.
I want to use some code to move the files into separate folders based on the number next to "home" in the filename. For example, I want to specify that files with names starting home101, home103, home320, home553, etc. should all be moved into folder A whereas those starting with home555, home431, home105 should go to FolderB.
I have this code so far:
import shutil
import os
source = '/path/to/source_folder'
dest1 = '/path/to/FolderA'
dest2 = '/path/to/FolderB'
files = os.listdir(source)
for f in files:
if (f.startswith("home101") or f.startswith("home103")):
shutil.move(f, dest1)
elif (f.startswith("home431") or f.startswith("home555")):
shutil.move(f, dest2)
However, it's tedious to specify all the if and else cases. I'd like to use some kind of structured data, such as a list, to specify groups of "home" numbers and the corresponding folder paths. How can I do this in Python?
it seems like you can use another for, it would look something like this:
import shutil
import os
source = '/path/to/source_folder'
dest1 = '/path/to/FolderA'
dest2 = '/path/to/FolderB'
list1 = ["home101", "home103"]
list2 = ["home431", "home555"]
files = os.listdir(source)
for f in files:
for home1 in list1:
if f.startswith(home1):
shutil.move(f, dest1)
break
for home2 in list2:
if f.startswith(home2):
shutil.move(f, dest2)
break
You can also create a function:
def check_and_move(file, list_of_patterns, destination):
for pattern in list_of_patterns:
if file.startswith(pattern):
shutil.move(file, destination)
and the code will get cleaner because you avoid repetition :)
for f in files:
check_and_move(f, list1, dest1)
check_and_move(f, list2, dest2)
# etc...
You can make an array for folderA that contains the "home+number"
FolderAGroup = ['home101', 'home103', 'homeXXX', 'homeXXX']
And if they get split like you say with a "_" use this code to filter them
Won't work if they are not split like that.
files = os.listdir(source)
for f in files:
parts = f.split('_')
# Get the first part of the filename before the _
home_number = parts[0]
# Check if the home number is in the FolderA group array
if home_number in FolderAGroup:
shutil.move(f, dest1)
else:
shutil.move(f, dest2)
You can expand with more elif statements if you would want more folders.
If the names homexxx are incremental, you could try something like this:
home_names_list_1 = []
home_names_list_2 = []
for i in range(100):
home_names_list_1.append("home" + str(i))
for i in range(100,200):
home_names_list_2.append("home" + str(i))
for file in files:
moved = False
for name in home_names_list_1:
if file.startswith(name):
print("move somewhere")
moved = True
break
if moved:
break
for name in home_names_list_2:
if file.startswith(name):
print("move somewhere else")
break
print(" did not move because did not match anything")
I have a quiet complex problem. I have multiple filenames in a list, the root directory of those files is the same: mother_directory. However every file has a different subdirectory. Now I have a script which is processing some files and I need to know the exact full path including the subdirectories of every file. I know that I could use os.walk but that will make my function too nested as inside this function I'm planning to use another function which uses those full paths.
This is the file structure:
mother_directory:
|_child1:
20211011.xml
20211001.xml
|_child2:
20211002.xml
This is my current code:
mother_path = r'c:\data\user1\Desktop\mother_directory'
blue_dates = ['20211011', '20211012', '20211013', '20211001', '20211002']
red_dates = ['20211011', '20211009', '20211008', '20211001', '20211002']
file_names = ['20211011.xml', '20211001.xml', '20211002.xml']
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
file_path = os.path.join(mother_path, match_file [0])
print(file_path)
for x in blue_dates:
process_files(x)
My current output:
c:\data\user1\Desktop\mother_directory\20211011.xml
c:\data\user1\Desktop\mother_directory\20211001.xml
c:\data\user1\Desktop\mother_directory\20211002.xml
When I run my function I want my desired output to be like this:
c:\data\user1\Desktop\mother_directory\child1\20211011.xml
c:\data\user1\Desktop\mother_directory\child1\20211001.xml
c:\data\user1\Desktop\mother_directory\child2\20211002.xml
I added a condition, I believe it will work now.
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
for root, dirs, files in os.walk(mother_path):
for file in files:
if match_file[0] in file:
print(os.path.join(root,match_file[0]))
In this code, I want to print the desired output, like x[0] should print the first path, and x[1] should print the second path. But I don't know how to do it? I used split but it didn't give me the expected result.
Given result
/home/runner/TestP1/folder1/
/home/runner/TestP1/folder2/
/home/runner/TestP1/folder1/sub
Required result
/home/runner/TestP1/folder2/
Code
import os
def filesystem(rootdir):
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
x = os.path.join(rootdir, subdir)
print(x)
filesystem("/home/runner/TestP1")
If you want to access each directory as if it were a list, then you need to construct one. Below I have given an example:
import os
all_directories = []
for root, dirs, files in os.walk(r"/home/runner/TestP1"):
all_directories += dirs # Append the list of subdirs in this directory to the master list.
print(all_directories[1])
Be mindful that the order will be depth-based. Meaning directories in sub directories will come before adjacent ones. If you want breadth-first ordering, set top_down to False in os.walk.
path = '/Users/my/path/tofile'
files = os.listdir(path)
names= ["GAT4", "LO", "sds"]
for filename in files:
if files.startswith("sample" + str[i]):
original_file= os.path.join(path, filename)
new_file= os.path.join(path, names.join([str(i), '.html']))
os.rename(original_file, new_file)
i have many files and i wanna rename all of them using a python code that changes the name depending on a given name from a list:
for example i have a list of x = [sample1, sample236, GAT988] and my files are named like: exp1.html exp2.html exp3.html
how can i make the files names become GAT988.html instead of exp3.html?enter code here
thank you.
How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?
I'm using Python 3.3.
I've seen the filecmp module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.
Here's what I've got so far:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1 looks like this:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
And dir2 looks like this:
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
I want results to look something like:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.
Also, I need a way to check whether the directories are identical or not.
Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.
My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
Discussion
The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names
The function compare_directories() takes two set of files and return the diferences--very straight forward.
Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
I didn't test this so you may have to fix
Also it can easily be refactored to avoid reuse
Actually, filecmp can and should be used for this, but you have to do a little coding.
You give filecmp.dircmp() two directories, which it calls left and right.
filecmp.dircmp.left_only is a list of the files and dirs that are only in the left dir.
filecmp.dircmp.right_only is a list of the files and dirs that are only in the right dir.
filecmp.dircmp.common_dirs is a list of the dirs that are in both.
You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.
Code:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
Test Case:
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
Demo:
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Result:
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
Python 2:
import os
folder1 = os.listdir('/path1')
folder2 = os.listdir('/path2')
folder_diff = set(folder1) - set(folder2) if folder1 > folder2 else set(folder2) - set(folder1)
print folder_diff