In this code, I want to print the desired output, like x[0] should print the first path, and x[1] should print the second path. But I don't know how to do it? I used split but it didn't give me the expected result.
Given result
/home/runner/TestP1/folder1/
/home/runner/TestP1/folder2/
/home/runner/TestP1/folder1/sub
Required result
/home/runner/TestP1/folder2/
Code
import os
def filesystem(rootdir):
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
x = os.path.join(rootdir, subdir)
print(x)
filesystem("/home/runner/TestP1")
If you want to access each directory as if it were a list, then you need to construct one. Below I have given an example:
import os
all_directories = []
for root, dirs, files in os.walk(r"/home/runner/TestP1"):
all_directories += dirs # Append the list of subdirs in this directory to the master list.
print(all_directories[1])
Be mindful that the order will be depth-based. Meaning directories in sub directories will come before adjacent ones. If you want breadth-first ordering, set top_down to False in os.walk.
Related
Not sure if logical is the right word here. However, when I run os.walk I'm appending paths to a list, and I would like the order to be so that if you were to read top to bottom, it would make sense.
For example, if the path I was looping through was C:\test which has a single file, along with folders (each with their own subfolders and files), this is what I'd want the list output to resemble.
C:\test
C:\test\test1.txt
C:\test\subfolder1
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2
C:\test\subfolder2\file3.txt
However, my output is the following.
C:\test\subfolder1
C:\test\subfolder2
C:\test\test1.txt
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2\file3.txt
First problem is that C:\test doesn't appear. I could just append C:\test to the list. However, I would want C:\test\test1.txt to appear directly below it. Ordering ascendingly would just stick it at the end.
When using os.walk is there a way for me to append to my list in such as way that everything would be in order?
Code:
import os
tree = []
for root, dirs, files in os.walk(r'C:\test', topdown = True):
for d in dirs:
tree.append(os.path.join(root, d))
for f in files:
tree.append(os.path.join(root, f))
for x in tree:
print(x)
Edit: By logical order I mean I would like it to appear as top folder, followed by subfolders and files in that folder, files and subfolders in those folders, and so on.
e.g.
C:\test
C:\test\test1.txt
C:\test\subfolder1
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2
C:\test\subfolder2\file3.txt
The order you want is the order in which os.walk iterates over the folders. You just have to append root to your list, instead of the folders.
import os
tree = []
for root, _, files in os.walk('test'):
tree.append(root)
for f in files:
tree.append(os.path.join(root, f))
for x in tree:
print(x)
Output
test
test/test1.txt
test/subfolder1
test/subfolder1/file1.txt
test/subfolder1/file2.txt
test/subfolder2
test/subfolder2/file3.txt
This code should solve your problem
Explanation:
Loop through your tree variable and create a list of tuples where first element of the tuple is the dir/file path and second element is the count of \ in the dir/file path
Sort the list of tuples by the second element of the tuple
Create a list of the second elements of the tuples in your list
import os
tree = []
for root, dirs, files in os.walk('C:\\test', topdown = True):
for d in dirs:
tree.append(os.path.join(root, d))
for f in files:
tree.append(os.path.join(root, f))
tup = []
def Sort_Tuple(tup):
"""function code sourced from https://www.geeksforgeeks.org/python-program-to-sort-a-list-of-tuples-by-second-item/"""
# getting length of list of tuples
lst = len(tup)
for i in range(0, lst):
for j in range(0, lst-i-1):
if (tup[j][1] > tup[j + 1][1]):
temp = tup[j]
tup[j]= tup[j + 1]
tup[j + 1]= temp
return tup
for x in tree:
i = x.count('\\')
tup.append((x,i))
sorted = Sort_Tuple(tup)
answer = [tup[0] for tup in sorted]
print(answer)
This should work. In no way shape or form this is optimized.
I have a quiet complex problem. I have multiple filenames in a list, the root directory of those files is the same: mother_directory. However every file has a different subdirectory. Now I have a script which is processing some files and I need to know the exact full path including the subdirectories of every file. I know that I could use os.walk but that will make my function too nested as inside this function I'm planning to use another function which uses those full paths.
This is the file structure:
mother_directory:
|_child1:
20211011.xml
20211001.xml
|_child2:
20211002.xml
This is my current code:
mother_path = r'c:\data\user1\Desktop\mother_directory'
blue_dates = ['20211011', '20211012', '20211013', '20211001', '20211002']
red_dates = ['20211011', '20211009', '20211008', '20211001', '20211002']
file_names = ['20211011.xml', '20211001.xml', '20211002.xml']
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
file_path = os.path.join(mother_path, match_file [0])
print(file_path)
for x in blue_dates:
process_files(x)
My current output:
c:\data\user1\Desktop\mother_directory\20211011.xml
c:\data\user1\Desktop\mother_directory\20211001.xml
c:\data\user1\Desktop\mother_directory\20211002.xml
When I run my function I want my desired output to be like this:
c:\data\user1\Desktop\mother_directory\child1\20211011.xml
c:\data\user1\Desktop\mother_directory\child1\20211001.xml
c:\data\user1\Desktop\mother_directory\child2\20211002.xml
I added a condition, I believe it will work now.
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
for root, dirs, files in os.walk(mother_path):
for file in files:
if match_file[0] in file:
print(os.path.join(root,match_file[0]))
So I have a lot of folders with a certain name. In each folder I have +200 items. The items inside the folders has names like:
CT.34562346.246.dcm
RD.34562346.dcm
RN.34562346.LAO.dcm
And some along that style.
I now wish to rename all files inside all folders so that the number (34562346) is replaced with the name of the folder. So for example in the folder named "1" the files inside should become:
CT.1.246.dcm
RD.1.dcm
RN.1.LAO.dcm
So only the large number is replaced. And yes, all files are similar like this. It would be the number after the first . that should be renamed.
So far I have:
import os
base_dir = "foo/bar/" #In this dir I have all my folders
dir_list = []
for dirname in os.walk(base_dir):
dir_list.append(dirname[0])
This one just lists the entire paths of all folders.
dir_list_split = []
for name in dir_list[1:]: #The 1 is because it lists the base_dir as well
x = name.split('/')[2]
dir_list_split.append(x)
This one extracts the name of each folder.
And then the next thing would be to enter the folders and rename them. And I'm kind of stuck here ?
The pathlib module, which was new in Python 3.4, is often overlooked. I find that it often makes code simpler than it would otherwise be with os.walk.
In this case, .glob('**/*.*') looks recursively through all of the folders and subfolders that I created in a sample folder called example. The *.* part means that it considers all files.
I put path.parts in the loop to show you that pathlib arranges to parse pathnames for you.
I check that the string constant '34562346' is in its correct position in each filename first. If it is then I simply replace it with the items from .parts that is the next level of folder 'up' the folders tree.
Then I can replace the rightmost element of .parts with the newly altered filename to create the new pathname and then do the rename. In each case I display the new pathname, if it was appropriate to create one.
>>> from pathlib import Path
>>> from os import rename
>>> for path in Path('example').glob('**/*.*'):
... path.parts
... if path.parts[-1][3:11]=='34562346':
... new_name = path.parts[-1].replace('34562346', path.parts[-2])
... new_path = '/'.join(list(path.parts[:-1])+[new_name])
... new_path
... ## rename(str(path), new_path)
... else:
... 'no change'
...
('example', 'folder_1', 'id.34562346.6.a.txt')
'example/folder_1/id.folder_1.6.a.txt'
('example', 'folder_1', 'id.34562346.wax.txt')
'example/folder_1/id.folder_1.wax.txt'
('example', 'folder_2', 'subfolder_1', 'ty.34562346.90.py')
'example/folder_2/subfolder_1/ty.subfolder_1.90.py'
('example', 'folder_2', 'subfolder_1', 'tz.34562346.98.py')
'example/folder_2/subfolder_1/tz.subfolder_1.98.py'
('example', 'folder_2', 'subfolder_2', 'doc.34.34562346.implication.rtf')
'no change'
This will rename files in subdirectories too:
import os
rootdir = "foo" + os.sep + "bar"
for subdir, dirs, files in os.walk(rootdir):
for file in files:
filepath = subdir + os.sep + file
foldername = subdir.split(os.sep)[-1]
number = ""
foundnumber = False
for c in filepath:
if c.isdigit():
foundnumber = True
number = number + c
elif foundnumber:
break
if foundnumber:
newfilepath = filepath.replace(number,foldername)
os.rename(filepath, newfilepath)
Split each file name on the . and replace the second item with the file name, then join on .'s again for the new file name. Here's some sample code that demonstrates the concept.
folder_name = ['1', '2']
file_names = ['CT.2345.234.dcm', 'BG.234234.222.dcm', "RA.3342.221.dcm"]
for folder in folder_name:
new_names = []
for x in file_names:
file_name = x.split('.')
file_name[1] = folder
back_together = '.'.join(file_name)
new_names.append(back_together)
print(new_names)
Output
['CT.1.234.dcm', 'BG.1.222.dcm', 'RA.1.221.dcm']
['CT.2.234.dcm', 'BG.2.222.dcm', 'RA.2.221.dcm']
I have 2 folders, each with the same number of files. I want to rename the files in folder 2 based on the names of the files in folder 1. So in folder 1there might be three files titled:
Landsat_1,
Landsat_2,
Landsat_3
and in folder 2 these files are called:
1,
2,
3
and I want to rename them based on folder 1 names. I thought about turning the item names of each folder into a a .txt file and then turning the .txt file in a list and then renaming but I'm not sure if this is the best way to do it. Any suggestions?
Edit:
I have simplified the file names above, so just appending with Landsat_ wil not work for me.
The real file names in folder 1 are more like LT503002011_band1, LT5040300201_band1, LT50402312_band4. In folder 2 they are extract1, extract2, extract3. There are 500 files in total and in folder 2 it is just a running count of extract and a number for each file.
As someone said, "sort each list and zip them together in order to rename".
Notes:
the key() function extracts all of the numbers so that sorted() can sort the lists numerically based on the embedded numbers.
we sort both lists: os.listdir() returns files in arbitrary order.
The for loop is a common way to use zip: for itemA, itemB in zip(listA, listB):
os.path.join() provides portability: no worries about / or \
A typical invocation on Windows: python doit.py c:\data\lt c:\data\extract, assuming those are directories you have described.
A typical invocation on *nix: : python doit.py ./lt ./extract
import sys
import re
import os
assert len(sys.argv) == 3, "Usage: %s LT-dir extract-dir"%sys.argv[0]
_, ltdir, exdir = sys.argv
def key(x):
return [int(y) for y in re.findall('\d+', x)]
ltfiles = sorted(os.listdir(ltdir), key=key)
exfiles = sorted(os.listdir(exdir), key=key)
for exfile,ltfile in zip(exfiles, ltfiles):
os.rename(os.path.join(exdir,exfile), os.path.join(exdir,ltfile))
You might want to use the glob package which takes a filename pattern and outputs it into a list. For example, in that directory
glob.glob('*')
gives you
['Landsat_1', 'Landsat_2', 'Landsat_3']
Then you can loop over the filenames in the list and change the filenames accordingly:
import glob
import os
folderlist = glob.glob('*')
for folder in folderlist:
filelist = glob.glob(folder + '*')
for fil in filelist:
os.rename(fil, folder + fil)
Hope this helps
I went for more completeness :D.
# WARNING: BACKUP your data before running this code. I've checked to
# see that it mostly works, but I would want to test this very well
# against my actual data before I trusted it with that data! Especially
# if you're going to be modifying anything in the directories while this
# is running. Also, make sure you understand what this code is expecting
# to find in each directory.
import os
import re
main_dir_demo = 'main_dir_path'
extract_dir_demo = 'extract_dir_path'
def generate_paths(directory, filenames, target_names):
for filename, target_name in zip(filenames, target_names):
yield (os.path.join(directory, filename),
os.path.join(directory, target_name))
def sync_filenames(main_dir, main_regex, other_dir, other_regex, key=None):
main_files = [f for f in os.listdir(main_dir) if main_regex.match(f)]
other_files = [f for f in os.listdir(other_dir) if other_regex.match(f)]
# Do not proceed if there aren't the same number of things in each
# directory; better safe than sorry.
assert len(main_files) == len(other_files)
main_files.sort(key=key)
other_files.sort(key=key)
path_pairs = generate_paths(other_dir, other_files, main_files)
for other_path, target_path in path_pairs:
os.rename(other_path, target_path)
def demo_key(item):
"""Sort by the numbers in a string ONLY; not the letters."""
return [int(y) for y in re.findall('\d+', item)]
def main(main_dir, extract_dir, key=None):
main_regex = re.compile('LT\d+_band\d')
other_regex = re.compile('extract\d+')
sync_filenames(main_dir, main_regex, extract_dir, other_regex, key=key)
if __name__ == '__main__':
main(main_dir_demo, extract_dir_demo, key=demo_key)
How do I recursively compare two directories (comparison should be based only on file name) and print out files/folders only in one or the other directory?
I'm using Python 3.3.
I've seen the filecmp module, however, it doesn't seem to quite do what I need. Most importantly, it compares files based on more than just the filename.
Here's what I've got so far:
import filecmp
dcmp = filecmp.dircmp('./dir1', './dir2')
dcmp.report_full_closure()
dir1 looks like this:
dir1
- atextfile.txt
- anotherfile.xml
- afolder
- testscript.py
- anotherfolder
- file.txt
- athirdfolder
And dir2 looks like this:
dir2
- atextfile.txt
- afolder
- testscript.py
- anotherfolder
- file.txt
- file2.txt
I want results to look something like:
files/folders only in dir1
* anotherfile.xml
* athirdfolder
files/folders only in dir2
* anotherfolder/file2.txt
I need a simple pythonic way to compare two directoies based only on file/folder name, and print out differences.
Also, I need a way to check whether the directories are identical or not.
Note: I have searched on stackoverflow and google for something like this. I see lots of examples of how to compare files taking into account the file content, but I can't find anything about just file names.
My solution uses the set() type to store relative paths. Then comparison is just a matter of set subtraction.
import os
import re
def build_files_set(rootdir):
root_to_subtract = re.compile(r'^.*?' + rootdir + r'[\\/]{0,1}')
files_set = set()
for (dirpath, dirnames, filenames) in os.walk(rootdir):
for filename in filenames + dirnames:
full_path = os.path.join(dirpath, filename)
relative_path = root_to_subtract.sub('', full_path, count=1)
files_set.add(relative_path)
return files_set
def compare_directories(dir1, dir2):
files_set1 = build_files_set(dir1)
files_set2 = build_files_set(dir2)
return (files_set1 - files_set2, files_set2 - files_set1)
if __name__ == '__main__':
dir1 = 'old'
dir2 = 'new'
in_dir1, in_dir2 = compare_directories(dir1, dir2)
print '\nFiles only in {}:'.format(dir1)
for relative_path in in_dir1:
print '* {0}'.format(relative_path)
print '\nFiles only in {}:'.format(dir2)
for relative_path in in_dir2:
print '* {0}'.format(relative_path)
Discussion
The workhorse is the function build_files_set(). It traverse a directory and create a set of relative file/dir names
The function compare_directories() takes two set of files and return the diferences--very straight forward.
Basic idea, use the os.walk method to populate dictionaries of filenames and then compare the dictionaries.
import os
from os.path import join
fpa = {}
for root, dirs, files in os.walk('/your/path'):
for name in files:
fpa[name] = 1
fpb = {}
for root, dirs, files in os.walk('/your/path2'):
for name in files:
fpb[name] = 1
print "files only in a"
for name in fpa.keys():
if not(name in fpb.keys()):
print name,"\n"
print "files only in b"
for name in fpb.keys():
if not(name in fpa.keys()):
print name,"\n"
I didn't test this so you may have to fix
Also it can easily be refactored to avoid reuse
Actually, filecmp can and should be used for this, but you have to do a little coding.
You give filecmp.dircmp() two directories, which it calls left and right.
filecmp.dircmp.left_only is a list of the files and dirs that are only in the left dir.
filecmp.dircmp.right_only is a list of the files and dirs that are only in the right dir.
filecmp.dircmp.common_dirs is a list of the dirs that are in both.
You can use those to build a simple recursive function for finding all the files and dirs that are not common to both trees.
Code:
from os.path import join
from filecmp import dircmp
def find_uncommon(L_dir, R_dir):
dcmp = dircmp(L_dir, R_dir)
L_only = [join(L_dir, f) for f in dcmp.left_only]
R_only = [join(R_dir, f) for f in dcmp.right_only]
for sub_dir in dcmp.common_dirs:
new_L, new_R = find_uncommon(join(L_dir, sub_dir), join(R_dir, sub_dir))
L_only.extend(new_L)
R_only.extend(new_R)
return L_only, R_only
Test Case:
C:/
L_dir/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_L_tree.txt
dir_in_L_tree/
file_inside_dir_only_in_L_tree.txt
R_dir/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
dir_in_both_trees/
file_in_both_trees.txt
file_in_R_tree.txt
dir_in_R_tree/
file_inside_dir_only_in_R_tree.txt
Demo:
L_only, R_only = find_uncommon('C:\\L_dir', 'C:\\R_dir')
print('Left only:\n\t' + '\n\t'.join(L_only))
print('Right only:\n\t' + '\n\t'.join(R_only))
Result:
Left_only:
C:\L_dir\file_in_L_tree.txt
C:\L_dir\dir_in_L_tree
C:\L_dir\dir_in_both_trees\file_in_L_tree.txt
C:\L_dir\dir_in_both_trees\dir_in_L_tree
Right_only:
C:\R_dir\file_in_R_tree.txt
C:\L_dir\dir_in_R_tree
C:\R_dir\dir_in_both_trees\file_in_R_tree.txt
C:\R_dir\dir_in_both_trees\dir_in_R_tree
Note that you would have to modify the above code a bit if you wanted see inside of uncommon directories. What I'm talking about would be these 2 files in my example above:
file_inside_dir_only_in_L_tree.txt
file_inside_dir_only_in_R_tree.txt
Python 2:
import os
folder1 = os.listdir('/path1')
folder2 = os.listdir('/path2')
folder_diff = set(folder1) - set(folder2) if folder1 > folder2 else set(folder2) - set(folder1)
print folder_diff