Output directory tree in logical order using os.walk - python

Not sure if logical is the right word here. However, when I run os.walk I'm appending paths to a list, and I would like the order to be so that if you were to read top to bottom, it would make sense.
For example, if the path I was looping through was C:\test which has a single file, along with folders (each with their own subfolders and files), this is what I'd want the list output to resemble.
C:\test
C:\test\test1.txt
C:\test\subfolder1
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2
C:\test\subfolder2\file3.txt
However, my output is the following.
C:\test\subfolder1
C:\test\subfolder2
C:\test\test1.txt
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2\file3.txt
First problem is that C:\test doesn't appear. I could just append C:\test to the list. However, I would want C:\test\test1.txt to appear directly below it. Ordering ascendingly would just stick it at the end.
When using os.walk is there a way for me to append to my list in such as way that everything would be in order?
Code:
import os
tree = []
for root, dirs, files in os.walk(r'C:\test', topdown = True):
for d in dirs:
tree.append(os.path.join(root, d))
for f in files:
tree.append(os.path.join(root, f))
for x in tree:
print(x)
Edit: By logical order I mean I would like it to appear as top folder, followed by subfolders and files in that folder, files and subfolders in those folders, and so on.
e.g.
C:\test
C:\test\test1.txt
C:\test\subfolder1
C:\test\subfolder1\file1.txt
C:\test\subfolder1\file2.txt
C:\test\subfolder2
C:\test\subfolder2\file3.txt

The order you want is the order in which os.walk iterates over the folders. You just have to append root to your list, instead of the folders.
import os
tree = []
for root, _, files in os.walk('test'):
tree.append(root)
for f in files:
tree.append(os.path.join(root, f))
for x in tree:
print(x)
Output
test
test/test1.txt
test/subfolder1
test/subfolder1/file1.txt
test/subfolder1/file2.txt
test/subfolder2
test/subfolder2/file3.txt

This code should solve your problem
Explanation:
Loop through your tree variable and create a list of tuples where first element of the tuple is the dir/file path and second element is the count of \ in the dir/file path
Sort the list of tuples by the second element of the tuple
Create a list of the second elements of the tuples in your list
import os
tree = []
for root, dirs, files in os.walk('C:\\test', topdown = True):
for d in dirs:
tree.append(os.path.join(root, d))
for f in files:
tree.append(os.path.join(root, f))
tup = []
def Sort_Tuple(tup):
"""function code sourced from https://www.geeksforgeeks.org/python-program-to-sort-a-list-of-tuples-by-second-item/"""
# getting length of list of tuples
lst = len(tup)
for i in range(0, lst):
for j in range(0, lst-i-1):
if (tup[j][1] > tup[j + 1][1]):
temp = tup[j]
tup[j]= tup[j + 1]
tup[j + 1]= temp
return tup
for x in tree:
i = x.count('\\')
tup.append((x,i))
sorted = Sort_Tuple(tup)
answer = [tup[0] for tup in sorted]
print(answer)
This should work. In no way shape or form this is optimized.

Related

searching specific string in list

How to search for every string in a list that starts with a specific string like:
path = (r"C:\Users\Example\Desktop")
desktop = os.listdir(path)
print(desktop)
#['faf.docx', 'faf.txt', 'faad.txt', 'gas.docx']
So my question is: how do i filter from every file that starts with "fa"?
For this specific cases, involving filenames in one directory, you can use globbing:
import glob
import os
path = (r"C:\Users\Example\Desktop")
pattern = os.path.join(path, 'fa*')
files = glob.glob(pattern)
This code filters all items out that start with "fa" and stores them in a separate list
filtered = [item for item in path if item.startswith("fa")]
All strings have a .startswith() method!
results = []
for value in os.listdir(path):
if value.startswith("fa"):
results.append(value)

How to get the full file path including the directory?

I have a quiet complex problem. I have multiple filenames in a list, the root directory of those files is the same: mother_directory. However every file has a different subdirectory. Now I have a script which is processing some files and I need to know the exact full path including the subdirectories of every file. I know that I could use os.walk but that will make my function too nested as inside this function I'm planning to use another function which uses those full paths.
This is the file structure:
mother_directory:
|_child1:
20211011.xml
20211001.xml
|_child2:
20211002.xml
This is my current code:
mother_path = r'c:\data\user1\Desktop\mother_directory'
blue_dates = ['20211011', '20211012', '20211013', '20211001', '20211002']
red_dates = ['20211011', '20211009', '20211008', '20211001', '20211002']
file_names = ['20211011.xml', '20211001.xml', '20211002.xml']
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
file_path = os.path.join(mother_path, match_file [0])
print(file_path)
for x in blue_dates:
process_files(x)
My current output:
c:\data\user1\Desktop\mother_directory\20211011.xml
c:\data\user1\Desktop\mother_directory\20211001.xml
c:\data\user1\Desktop\mother_directory\20211002.xml
When I run my function I want my desired output to be like this:
c:\data\user1\Desktop\mother_directory\child1\20211011.xml
c:\data\user1\Desktop\mother_directory\child1\20211001.xml
c:\data\user1\Desktop\mother_directory\child2\20211002.xml
I added a condition, I believe it will work now.
def process_files(x):
if x in red_dates:
match_file = [s for s in file_names if x in s]
for root, dirs, files in os.walk(mother_path):
for file in files:
if match_file[0] in file:
print(os.path.join(root,match_file[0]))

How to print desired pathname

In this code, I want to print the desired output, like x[0] should print the first path, and x[1] should print the second path. But I don't know how to do it? I used split but it didn't give me the expected result.
Given result
/home/runner/TestP1/folder1/
/home/runner/TestP1/folder2/
/home/runner/TestP1/folder1/sub
Required result
/home/runner/TestP1/folder2/
Code
import os
def filesystem(rootdir):
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
x = os.path.join(rootdir, subdir)
print(x)
filesystem("/home/runner/TestP1")
If you want to access each directory as if it were a list, then you need to construct one. Below I have given an example:
import os
all_directories = []
for root, dirs, files in os.walk(r"/home/runner/TestP1"):
all_directories += dirs # Append the list of subdirs in this directory to the master list.
print(all_directories[1])
Be mindful that the order will be depth-based. Meaning directories in sub directories will come before adjacent ones. If you want breadth-first ordering, set top_down to False in os.walk.

Rename files in a folder based on names in a different folder

I have 2 folders, each with the same number of files. I want to rename the files in folder 2 based on the names of the files in folder 1. So in folder 1there might be three files titled:
Landsat_1,
Landsat_2,
Landsat_3
and in folder 2 these files are called:
1,
2,
3
and I want to rename them based on folder 1 names. I thought about turning the item names of each folder into a a .txt file and then turning the .txt file in a list and then renaming but I'm not sure if this is the best way to do it. Any suggestions?
Edit:
I have simplified the file names above, so just appending with Landsat_ wil not work for me.
The real file names in folder 1 are more like LT503002011_band1, LT5040300201_band1, LT50402312_band4. In folder 2 they are extract1, extract2, extract3. There are 500 files in total and in folder 2 it is just a running count of extract and a number for each file.
As someone said, "sort each list and zip them together in order to rename".
Notes:
the key() function extracts all of the numbers so that sorted() can sort the lists numerically based on the embedded numbers.
we sort both lists: os.listdir() returns files in arbitrary order.
The for loop is a common way to use zip: for itemA, itemB in zip(listA, listB):
os.path.join() provides portability: no worries about / or \
A typical invocation on Windows: python doit.py c:\data\lt c:\data\extract, assuming those are directories you have described.
A typical invocation on *nix: : python doit.py ./lt ./extract
import sys
import re
import os
assert len(sys.argv) == 3, "Usage: %s LT-dir extract-dir"%sys.argv[0]
_, ltdir, exdir = sys.argv
def key(x):
return [int(y) for y in re.findall('\d+', x)]
ltfiles = sorted(os.listdir(ltdir), key=key)
exfiles = sorted(os.listdir(exdir), key=key)
for exfile,ltfile in zip(exfiles, ltfiles):
os.rename(os.path.join(exdir,exfile), os.path.join(exdir,ltfile))
You might want to use the glob package which takes a filename pattern and outputs it into a list. For example, in that directory
glob.glob('*')
gives you
['Landsat_1', 'Landsat_2', 'Landsat_3']
Then you can loop over the filenames in the list and change the filenames accordingly:
import glob
import os
folderlist = glob.glob('*')
for folder in folderlist:
filelist = glob.glob(folder + '*')
for fil in filelist:
os.rename(fil, folder + fil)
Hope this helps
I went for more completeness :D.
# WARNING: BACKUP your data before running this code. I've checked to
# see that it mostly works, but I would want to test this very well
# against my actual data before I trusted it with that data! Especially
# if you're going to be modifying anything in the directories while this
# is running. Also, make sure you understand what this code is expecting
# to find in each directory.
import os
import re
main_dir_demo = 'main_dir_path'
extract_dir_demo = 'extract_dir_path'
def generate_paths(directory, filenames, target_names):
for filename, target_name in zip(filenames, target_names):
yield (os.path.join(directory, filename),
os.path.join(directory, target_name))
def sync_filenames(main_dir, main_regex, other_dir, other_regex, key=None):
main_files = [f for f in os.listdir(main_dir) if main_regex.match(f)]
other_files = [f for f in os.listdir(other_dir) if other_regex.match(f)]
# Do not proceed if there aren't the same number of things in each
# directory; better safe than sorry.
assert len(main_files) == len(other_files)
main_files.sort(key=key)
other_files.sort(key=key)
path_pairs = generate_paths(other_dir, other_files, main_files)
for other_path, target_path in path_pairs:
os.rename(other_path, target_path)
def demo_key(item):
"""Sort by the numbers in a string ONLY; not the letters."""
return [int(y) for y in re.findall('\d+', item)]
def main(main_dir, extract_dir, key=None):
main_regex = re.compile('LT\d+_band\d')
other_regex = re.compile('extract\d+')
sync_filenames(main_dir, main_regex, extract_dir, other_regex, key=key)
if __name__ == '__main__':
main(main_dir_demo, extract_dir_demo, key=demo_key)

In python exclude folders which start with underscore or more than six characters long

I want to store all the folder names except the folders which start with underscore (_) or have more than 6 characters.To get the list, i use this code
folders = [name for name in os.listdir(".") if os.path.isdir(name)]
What change do i need to make to get the desired output.
Well the simplest way is to extend the if clause of your list comprehension to contain two more clauses:
folders = [name for name in os.listdir(".")
if os.path.isdir(name) and name[0] != '_' and len(name) <= 6]
Another approach would be to use os.walk. This would traverse the entire directory tree from the top level directory you specify.
import os
from os.path import join
all_dirs = []
for root,dirs,filenames in os.walk('/dir/path'):
x = [join(root,d) for d in dirs if not d.startswith('_') and len(d)>6]
all_dirs.extend(x)
print all_dirs # list of all directories matching the criteria
A list comprehension might be too unwieldy for this, so I have expanded it to make it clear what the conditions are:
folders = []
for name in os.listdir('.'):
if os.path.isdir(name):
dirname = os.path.basename(name)
if not (dirname.startswith('_') or len(dirname) > 6):
folders.append(name)

Categories

Resources