I have 2 folders, each with the same number of files. I want to rename the files in folder 2 based on the names of the files in folder 1. So in folder 1there might be three files titled:
Landsat_1,
Landsat_2,
Landsat_3
and in folder 2 these files are called:
1,
2,
3
and I want to rename them based on folder 1 names. I thought about turning the item names of each folder into a a .txt file and then turning the .txt file in a list and then renaming but I'm not sure if this is the best way to do it. Any suggestions?
Edit:
I have simplified the file names above, so just appending with Landsat_ wil not work for me.
The real file names in folder 1 are more like LT503002011_band1, LT5040300201_band1, LT50402312_band4. In folder 2 they are extract1, extract2, extract3. There are 500 files in total and in folder 2 it is just a running count of extract and a number for each file.
As someone said, "sort each list and zip them together in order to rename".
Notes:
the key() function extracts all of the numbers so that sorted() can sort the lists numerically based on the embedded numbers.
we sort both lists: os.listdir() returns files in arbitrary order.
The for loop is a common way to use zip: for itemA, itemB in zip(listA, listB):
os.path.join() provides portability: no worries about / or \
A typical invocation on Windows: python doit.py c:\data\lt c:\data\extract, assuming those are directories you have described.
A typical invocation on *nix: : python doit.py ./lt ./extract
import sys
import re
import os
assert len(sys.argv) == 3, "Usage: %s LT-dir extract-dir"%sys.argv[0]
_, ltdir, exdir = sys.argv
def key(x):
return [int(y) for y in re.findall('\d+', x)]
ltfiles = sorted(os.listdir(ltdir), key=key)
exfiles = sorted(os.listdir(exdir), key=key)
for exfile,ltfile in zip(exfiles, ltfiles):
os.rename(os.path.join(exdir,exfile), os.path.join(exdir,ltfile))
You might want to use the glob package which takes a filename pattern and outputs it into a list. For example, in that directory
glob.glob('*')
gives you
['Landsat_1', 'Landsat_2', 'Landsat_3']
Then you can loop over the filenames in the list and change the filenames accordingly:
import glob
import os
folderlist = glob.glob('*')
for folder in folderlist:
filelist = glob.glob(folder + '*')
for fil in filelist:
os.rename(fil, folder + fil)
Hope this helps
I went for more completeness :D.
# WARNING: BACKUP your data before running this code. I've checked to
# see that it mostly works, but I would want to test this very well
# against my actual data before I trusted it with that data! Especially
# if you're going to be modifying anything in the directories while this
# is running. Also, make sure you understand what this code is expecting
# to find in each directory.
import os
import re
main_dir_demo = 'main_dir_path'
extract_dir_demo = 'extract_dir_path'
def generate_paths(directory, filenames, target_names):
for filename, target_name in zip(filenames, target_names):
yield (os.path.join(directory, filename),
os.path.join(directory, target_name))
def sync_filenames(main_dir, main_regex, other_dir, other_regex, key=None):
main_files = [f for f in os.listdir(main_dir) if main_regex.match(f)]
other_files = [f for f in os.listdir(other_dir) if other_regex.match(f)]
# Do not proceed if there aren't the same number of things in each
# directory; better safe than sorry.
assert len(main_files) == len(other_files)
main_files.sort(key=key)
other_files.sort(key=key)
path_pairs = generate_paths(other_dir, other_files, main_files)
for other_path, target_path in path_pairs:
os.rename(other_path, target_path)
def demo_key(item):
"""Sort by the numbers in a string ONLY; not the letters."""
return [int(y) for y in re.findall('\d+', item)]
def main(main_dir, extract_dir, key=None):
main_regex = re.compile('LT\d+_band\d')
other_regex = re.compile('extract\d+')
sync_filenames(main_dir, main_regex, extract_dir, other_regex, key=key)
if __name__ == '__main__':
main(main_dir_demo, extract_dir_demo, key=demo_key)
Related
I have a quiet complex problem. I have multiple filenames in a list, the root directory of those files is the same: mother_directory. However every file has a different subdirectory. Now I have a script which is processing some files and I need to know the exact full path including the subdirectories of every file. I know that I could use os.walk but that will make my function too nested as inside this function I'm planning to use another function which uses those full paths.
This is the file structure:
mother_directory:
|_child1:
20211011.xml
20211001.xml
|_child2:
20211002.xml
This is my current code:
mother_path = r'c:\data\user1\Desktop\mother_directory'
blue_dates = ['20211011', '20211012', '20211013', '20211001', '20211002']
red_dates = ['20211011', '20211009', '20211008', '20211001', '20211002']
file_names = ['20211011.xml', '20211001.xml', '20211002.xml']
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
file_path = os.path.join(mother_path, match_file [0])
print(file_path)
for x in blue_dates:
process_files(x)
My current output:
c:\data\user1\Desktop\mother_directory\20211011.xml
c:\data\user1\Desktop\mother_directory\20211001.xml
c:\data\user1\Desktop\mother_directory\20211002.xml
When I run my function I want my desired output to be like this:
c:\data\user1\Desktop\mother_directory\child1\20211011.xml
c:\data\user1\Desktop\mother_directory\child1\20211001.xml
c:\data\user1\Desktop\mother_directory\child2\20211002.xml
I added a condition, I believe it will work now.
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
for root, dirs, files in os.walk(mother_path):
for file in files:
if match_file[0] in file:
print(os.path.join(root,match_file[0]))
I've got my script creating a bunch of files (size varies depending on inputs) and I want to be certain files in certain folders based on the filenames.
So far I've got the following but although directories are being created no files are being moved, I'm not sure if the logic in the final for loop makes any sense.
In the below code I'm trying to move all .png files ending in _01 into the sub_frame_0 folder.
Additionally is their someway to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
for index, i in enumerate(range(num_sub_frames+10)):
path = os.makedirs('./sub_frame_{}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if i == '*_01.png':
shutil.move(os.path.join(os.getcwd(), '*_01.png'), './sub_frame_0/')
As you said, the logic of the final loop does not make sense.
if i == '*_01.ng'
It would evaluate something like 'image_01.png' == '*_01.png' and be always false.
Regexp should be the way to go, but for this simple case you just can slice the number from the file name.
for i in list_of_sub_frames:
frame = int(i[-6:-4]) - 1
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_{}/'.format(frame))
If i = 'image_01.png' then i[-6:-4] would take '01', convert it to integer and then just subtract 1 to follow your schema.
A simple fix would be to check if '*_01.png' is in the file name i and change the shutil.move to include i, the filename. (It's also worth mentioning that iis not a good name for a filepath
list_of_sub_frames = glob.glob('*.png')
for i in list_of_sub_frames:
if '*_01.png' in i:
shutil.move(os.path.join(os.getcwd(), i), './sub_frame_0/')
Additionally is [there some way] to increment both the file endings _01 to _02 etc., and the destn folder ie. from sub_frame_0 to sub_frame_1 to sub_frame_2 and so on.
You could create file names doing something as simple as this:
for i in range(10):
#simple string parsing
file_name = 'sub_frame_'+str(i)
folder_name = 'folder_sub_frame_'+str(i)
Here is a complete example using regular expressions. This also implements the incrementing of file names/destination folders
import os
import glob
import shutil
import re
num_sub_frames = 3
# No need to enumerate range list without start or step
for index in range(num_sub_frames+10):
path = os.makedirs('./sub_frame_{0:02}'.format(index))
# Slice layers into sub-frames and add to appropriate directory
list_of_files = glob.glob('*.tif')
for fname in list_of_files:
image_slicer.slice(fname, num_sub_frames) # Slices the .tif frames into .png sub-frames
list_of_sub_frames = glob.glob('*.png')
for name in list_of_sub_frames:
m = re.search('(?P<fname>.+?)_(?P<num>\d+).png', name)
if m:
num = int(m.group('num'))+1
newname = '{0}_{1:02}.png'.format(m.group('fname'), num)
newpath = os.path.join('./sub_frame_{0:02}/'.format(num), newname)
print m.group() + ' -> ' + newpath
shutil.move(os.path.join(os.getcwd(), m.group()), newpath)
My training on Python is ongoing and I'm currently trying to rename sequentially many files that have this kind of root and extension:
Ite_1_0001.eps
Ite_2_0001.eps
Ite_3_0001.eps
Ite_4_0001.eps
However, I'm trying to rename all these files as follows:
Ite_0001.eps
Ite_0002.eps
Ite_0003.eps
Ite_0004.eps
So I'm proceeding in this way:
for path, subdirs, files in os.walk(newpath):
num = len(os.listdir(newpath))
for filename in files:
basename, extension = os.path.splitext(filename)
for x in range(1, num+1):
new_filename = '_%04d' % x + extension
os.rename(os.path.join(newpath, filename), os.path.join(newpath, new_filename))
It's not working at all because all the files are erased from the directory and when running the script once at a time I have this:
First run: _00004
Second run: _00005
.... and so on.
Could any one have some tips that could help me to achieve this task :).
Thank you very much for your help.
You could test the approach with a list of strings. So you do not run the risk of deleting the files. ;-)
files = ["Ite_1_0001.eps", "Ite_2_0001.eps", "Ite_3_0001.eps", "Ite_4_0001.eps",]
for f in files:
# Get the value between underscores. This is the index.
index = int(f[4:f.index('_', 4)])
new_name = '_%04d' % index
# Join the prefix, index and sufix file
print ''.join([f[:3], new_name, f[-4:]])
Ite_0001.eps
Ite_0002.eps
Ite_0003.eps
Ite_0004.eps
You can dynamically change the thing you're substituting in within your loop, like so
import os, re
n = 1
for i in os.listdir('.'):
os.rename(i, re.sub(r'\(\d{4}\)', '(Ite_) ({n})'.format(n=n), i))
n += 1
I write a function that if you give in input your basename it returns the correct name.
def newname(old_name):
num = old_name[4]
return (old_name[0:3] + old_name[5:-1] + num)
How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?
I'm using Python 3.3.
I've seen the filecmp module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.
Here's what I've got so far:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1 looks like this:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
And dir2 looks like this:
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
I want results to look something like:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.
Also, I need a way to check whether the directories are identical or not.
Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.
My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
Discussion
The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names
The function compare_directories() takes two set of files and return the diferences--very straight forward.
Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
I didn't test this so you may have to fix
Also it can easily be refactored to avoid reuse
Actually, filecmp can and should be used for this, but you have to do a little coding.
You give filecmp.dircmp() two directories, which it calls left and right.
filecmp.dircmp.left_only is a list of the files and dirs that are only in the left dir.
filecmp.dircmp.right_only is a list of the files and dirs that are only in the right dir.
filecmp.dircmp.common_dirs is a list of the dirs that are in both.
You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.
Code:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
Test Case:
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
Demo:
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Result:
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
Python 2:
import os
folder1 = os.listdir('/path1')
folder2 = os.listdir('/path2')
folder_diff = set(folder1) - set(folder2) if folder1 > folder2 else set(folder2) - set(folder1)
print folder_diff
I would like to ask how to efficiently handle accessing of filenames in a folder in the right order (alphabetical and increasing in number).
For example, I have the following files in a folder: apple1.dat, apple2.dat, apple10.dat, banana1.dat, banana2.dat, banana10.dat. I would like to read the contents of the files such that apple1.dat will be read first and banana10.dat will be read last.
Thanks.
This is what I did so far.
from glob import glob
files=glob('*.dat')
for list in files
# I read the files here in order
But as pointed out, apple10.dat comes before apple2.dat
from glob import glob
import os
files_list = glob(os.path.join(my_folder, '*.dat'))
for a_file in sorted(files_list):
# do whatever with the file
# 'open' or 'with' statements depending on your python version
try this one.
import os
def get_sorted_files(Directory)
filenamelist = []
for root, dirs, files in os.walk(Directory):
for name in files:
fullname = os.path.join(root, name)
filenamelist.append(fullname)
return sorted(filenamelist)
You have to cast the numbers to an int first. Doing it the long way would require breaking the names into the strings and numbers, casting the numbers to an int and sorting. Perhaps someone else has a shorter or more efficient way.
def split_in_two(str_in):
## go from right to left until a letter is found
## assume first letter of name is not a digit
for ctr in range(len(str_in)-1, 0, -1):
if not str_in[ctr].isdigit():
return str_in[:ctr+1], str_in[ctr+1:] ## ctr+1 = first digit
## default for no letters found
return str_in, "0"
files=['apple1.dat', 'apple2.dat', 'apple10.dat', 'apple11.dat',
'banana1.dat', 'banana10.dat', 'banana2.dat']
print sorted(files) ## sorted as you say
sort_numbers = []
for f in files:
## split off '.dat.
no_ending = f[:-4]
str_1, str_2 = split_in_two(no_ending)
sort_numbers.append([str_1, int(str_2), ".dat"])
sort_numbers.sort()
print sort_numbers