I'm writing a simple function that walks through a directory tree looking for folders of a certain name. What I'm after is a match's parent path. E.g., for "C:/a/b/c/MATCH" I want "C:/a/b/c". I don't need duplicate parents or paths to subfolder matches, so if there's a "C:/a/b/c/d/e/f/MATCH", I don't need it. So, during my walk, once I have a parent, I want to iterate to the next current root.
Below is what I have so far, including a comment where I'm stuck.
def FindProjectSubfolders(masterPath, projectSubfolders):
for currRoot, dirnames, filenames in os.walk(masterPath):
# check if have a project subfolder
foundMatch = False
for dirname in dirnames:
for projectSubfolder in projectSubfolders:
if (dirname == projectSubfolder):
foundMatch = True;
break
if (foundMatch == True):
# what goes here to stop traversing "currRoot"
# and iterate to the next one?
Trim dirnames to an empty list to prevent further traversing of directories below the current one:
def FindProjectSubfolders(masterPath, projectSubfolders):
for currRoot, dirnames, filenames in os.walk(masterPath):
# check if have a project subfolder
foundMatch = False
for dirname in dirnames:
for projectSubfolder in projectSubfolders:
if (dirname == projectSubfolder):
foundMatch = True;
break
if (foundMatch == True):
# what goes here to stop traversing "currRoot"
# and iterate to the next one?
dirnames[:] = [] # replace all indices in `dirnames` with the empty list
Note that the above code alters the list dirnames refers to with a slice assignment, it does not rebind dirnames to a new list.
Related
I have list containing some paths:
['folder1/folder2/Module1', 'folder4/folder5/Module2', 'folder7/folder8/Module3', 'folder12/folder13/Module4', 'folder17/folder20/folder50/Module5' .. etc]
What would be the best way to extract each element of that list and create new list or some other place to store that path with it's specific name?
Mu current code for going through each element of the list and storing it one by one, but I can't generate new list for each element, not sure if that is even possible:
for j in range(len(listOfPaths)):
del pathList[:]
path = listOfPaths[j]
pathList.append(path)
So to clarify, at the end what I need is to get one list list[Module1] that contains only 'folder1/folder2/Module1', and second one list[Module2] with only path to Module2, etc...
It might be better to use a Dictionary here, instead of a list.
#!/usr/bin/env python3
import os
paths = []
paths.append("folder1/subfolderA/Module1")
paths.append("folder2/subfolderB/Module1")
paths.append("folder3/subfolderC/Module1")
paths.append("folder4/subfolderD/Module2")
paths.append("folder5/subfolderE/Module2")
paths.append("folder6/subfolderF/Module50")
# create an empty dictionary
modulesDict = {}
# it will look like this:
# "ModuleX" -> ["path1/to/ModuleX", "path2/to/ModuleX", ...]
# "ModuleY" -> ["path3/to/ModuleY", "path4/to/ModuleY", ...]
for path in paths: # loop over original list of paths
# take only the "ModuleX" part
moduleName = os.path.basename(os.path.normpath(path))
# check if its already in our dict or not
if moduleName in modulesDict:
# add the path to the list of paths for that module
modulesDict.get(moduleName).append(path)
else:
# create an new list, with only one element (only the path)
modulesDict[moduleName] = [path]
print(modulesDict)
OUTPUT: (formatted a bit)
{
'Module1':
['folder1/subfolderA/Module1', 'folder2/subfolderB/Module1', 'folder3/subfolderC/Module1'],
'Module2':
['folder4/subfolderD/Module2', 'folder5/subfolderE/Module2'],
'Module50':
['folder6/subfolderF/Module50']
}
Check this?
temp=['folder1/folder2/Module1', 'folder4/folder5/Module2', 'folder7/folder8/Module3', 'folder12/folder13/Module4', 'folder17/folder20/folder50/Module5']
# List initialization
Output = []
# Using Iteration to convert
# element into list of list
for elem in temp:
temp3=[]
temp3.append(elem)
Output.append(temp3)
# printing
print(Output)
Output:
[['folder1/folder2/Module1'], ['folder4/folder5/Module2'], ['folder7/folder8/Module3'], ['folder12/folder13/Module4'], ['folder17/folder20/folder50/Module5']]
I think you can try this:
input=["stra", "strb"]
output =list(map(lambda x: [x], input))
I have the following code for processing an XML file:
for el in root:
checkChild(rootDict, el)
for child in el:
checkChild(rootDict, el, child)
for grandchild in child:
checkChild(rootDict, el, child, grandchild)
for grandgrandchild in grandchild:
checkChild(rootDict, el, child, grandchild, grandgrandchild)
...
...
As you can see, on every iteration I just call the same function with one extra parameter. Is there a way to avoid writing so many nested for loops that basically do the same thing?
Any help would be appreciated. Thank you.
Whatever operation you wish to perform on files and directories you can traverse them. In python the easiest way I know is:
#!/usr/bin/env python
import os
# Set the directory you want to start from
root_dir = '.'
for dir_name, subdirList, file_list in os.walk(root_dir):
print(f'Found directory: {dir_name}s')
for file_name in file_list:
print(f'\t{file_name}s')
while traversing you can add the to groups or perform other operations
Assuming that root comes from an ElemenTree parsing, you can make a datastructure containing the list of all the ancestors for each node, cnd then iterate over this to call checkChild:
def checkChild(*element_chain):
# Code placeholder
print("Checking %s" % '.'.join(t.tag for t in reversed(element_chain)))
tree = ET.fromstring(xml)
# Build a dict containing each node and its ancestors
nodes_and_parents = {}
for elem in tree.iter(): # tree.iter yields every tag in the XML, not only the root childs
for child in elem:
nodes_and_parents[child] = [elem, ] + nodes_and_parents.get(elem, [])
for t, parents in nodes_and_parents.items():
checkChild(t, *parents)
def recurse(tree):
"""Walks a tree depth-first and yields the path at every step."""
# We convert the tree to a list of paths through it,
# with the most recently visited path last. This is the stack.
def explore(stack):
try:
# Popping from the stack means reading the most recently
# discovered but yet unexplored path in the tree. We yield it
# so you can call your method on it.
path = stack.pop()
except IndexError:
# The stack is empty. We're done.
return
yield path
# Then we expand this path further, adding all extended paths to the
# stack. In reversed order so the first child element will end up at
# the end, and thus will be yielded first.
stack.extend(path + (elm,) for elm in reversed(path[-1]))
yield from explore([(tree,)])
# The linear structure yields tuples (root, child, ...)
linear = recurse(root)
# Then call checkChild(rootDict, child, ...)
next(linear) # skip checkChild(rootDict)
for path in linear:
checkChild(rootDict, *path[1:])
For your understanding, suppose the root looked something like this:
root
child1
sub1
sub2
child2
sub3
subsub1
sub4
child3
That is like a tree. We can find a few paths through this tree, e.g. (root, child1). And as you feed these paths to checkChild this would result in a call checkChild(rootNode, child1). Eventually checkChild will be called exactly once for every path in the tree. We can thus write the tree as a list of paths like so:
[(root,),
(root, child1),
(root, child1, sub1),
(root, child1, sub2),
(root, child2),
(root, child2, sub3),
(root, child2, sub3, subsub1),
(root, child2, sub4),
(root, child3)]
The order of paths in this list happens to match your loop structure. It is called depth-first. (Another sort order, breadth-first, would first list all child nodes, then all sub nodes and finally all subsub nodes.)
The list above is the same as the stack variable in the code, with a small change that stack only stores the minimal number of paths it needs to remember.
To conclude, recurse yields those paths one-by-one and the last bit of code invokes the checkChild method as you do in your question.
I am looping through a directory and want to get all files in a folder stored as list in a dictionary, where each key is a folder and the list of files the value.
The first print in the loop shows exactly the output I am expecting.
However the second print shows empty values.
The third print after initialization of the class shows the list of the last subfolder as value for every key.
What am I overlooking or doing wrong?
class FileAndFolderHandling() :
folders_and_files = dict()
def __init__(self) :
self.getSubfolderAndImageFileNames()
def getSubfolderAndImageFileNames(self) :
subfolder = ""
files_in_subfolder = []
for filename in glob.iglob('X:\\Some_Directory\\**\\*.tif', recursive=True) :
if not subfolder == os.path.dirname(filename) and not subfolder == "" :
print(subfolder + " / / " + str(files_in_subfolder))
self.folders_and_files[subfolder] = files_in_subfolder
files_in_subfolder.clear()
print(self.folders_and_files)
subfolder = os.path.dirname(filename) # new subfolder
files_in_subfolder.append(os.path.basename(filename))
folder_content = FileAndFolderHandling()
print(folder_content.folders_and_files)
It seems like the problem you have is that you are actually using always the same list.
Defining files_in_subfolder = [] creates a list and assigns a pointer to that list in the variable you just defined. So what happens then is that when you assign self.folders_and_files[subfolder] = files_in_subfolder you are only storing the pointer to your list (which is the same in every iteration) in the dictionary and not the actual list.
Later, when you do files_in_subfolder.clear() you are clearing the list to which that pointer was pointing to, and therefore to all the entries of the dictionary (as it was always the same list).
To solve this, I would recommend you to create a new list for each different entry in your dictionary, instead of clearing it for each iteration. This is, move the definition of files_in_subfolder from outside the loop to inside of it.
Hope it helps!
It sounds like you are after defaultdict.
I adapted your code like this:
import glob, os
from collections import defaultdict
class FileAndFolderHandling() :
folders_and_files = defaultdict(list)
def __init__(self) :
self.getSubfolderAndImageFileNames()
def getSubfolderAndImageFileNames(self) :
for filename in glob.iglob(r'C:\Temp\T\**\*.txt', recursive=True) :
# print(filename)
subfolder = os.path.dirname(filename)
self.folders_and_files[subfolder].append(os.path.basename(filename))
folder_content = FileAndFolderHandling()
print(dict(folder_content.folders_and_files))
Output:
{'C:\\Temp\\T': ['X.txt'], 'C:\\Temp\\T\\X': ['X1.txt', 'X2.txt'], 'C:\\Temp\\T\\X2': ['X1.txt']}
The defaultdict(list) makes a new list for every new key added. This is what you seems to want to happen in your code.
You are clearing the array, from what I see...
files_in_subfolder.clear()
Remove that and make sure your value gets added to the folders_and_files variable before any clear operation.
I have a main folder like this:
mainf/01/streets/streets.shp
mainf/02/streets/streets.shp #normal files
mainf/03/streets/streets.shp
...
and another main folder like this:
mainfo/01/streets/streets.shp
mainfo/02/streets/streets.shp #empty files
mainfo/03/streets/streets.shp
...
I want to use a function that will take as first parameter the first normal file from the upper folder (normal files) and as second the corresponding from the other folder (empty files).
Based on the [-3] level folder number (ex.01,02,03,etc)
Example with a function:
appendfunc(first_file_from_normal_files,first_file_from_empty_files)
How to do this in a loop?
My code:
for i in mainf and j in mainfo:
appendfunc(i,j)
Update
Correct version:
first = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
second = ["mainfo/01/streets/streets.shp", "mainfo/02/streets/streets.shp", "mainfo/03/streets/streets.shp"]
final = [(f,s) for f,s in zip(first,second)]
for i , j in final:
appendfunc(i,j)
An alternative to automatically put in a list all the files in a main folder with full paths?
first= []
for (dirpath, dirnames, filenames) in walk(mainf):
first.append(os.path.join(dirpath,dirnames,filenames))
second = []
for (dirpath, dirnames, filenames) in walk(mainfo):
second.append(os.path.join(dirpath,dirnames,filenames))
Use zip:
first = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
second = ["mainf/01/streets/streets.shp", "mainf/02/streets/streets.shp", "mainf/03/streets/streets.shp"]
final = [(f,s) for f,s in zip(first,second)]
print(final)
You can't use a for ... and loop. You can loop one iterable in one statement, and another iterable in another statement. This still won't give you what you want:
for i in mainf:
for j in mainfo:
appendfunc(i,j)
What you probably want is something like (I'm assuming mainf and mainfo are essentially the same, except one is empty):
for folder_num in range(len(mainf)):
appendfunc(mainf[folder_num], mainfo[folder_num])
You haven't said what appendfunc is supposed to do, so I'll leave that to you. I'm also assuming that, depending on how you're accessing the files, you can figure out how you might need to modify the calls to mainf[folder_num] and mainfo[folder_num] (eg. you may need to inject the number back into the directory structure somehow (mainf/{}/streets/streets.shp".format(zero_padded(folder_num))).
I am new to Python. I need to traverse the list of files in a directory, and have a 2D list of files (keys) with a value. Then I need to sort it based on their values, and delete the files with lower half of values. How can I do that?
This is what I did so far. I can't figure it out how to create such 2D array.
dir = "images"
num_files=len(os.listdir(dir))
for file in os.listdir(dir):
print(file)
value = my_function(file)
#this is wrong:
_list[0][0].append(value)
#and then sorting, and removing the files associated with lower half
Basically, the 2D array should look like [[file1, 0.876], [file2, 0.5], [file3, 1.24]], which needed to be sorted out based on second indexes.
Based on the comments, looks like I have to do this when appending:
mylist.append([file, value])
And for sorting, I have to do this:
mylist.sort(key=lambda mylist: mylist[1])
I don't understand what this message means.
delete the files with lower half of values
Does this mean that you have to select the files having value less than the midpoint between minimum and maximum values on the files or that you just have to select the lower half of the files?
There isn't any need to use a 2D-array if the second coordinate depends on the first thanks to my_function. Here is a function that does what you need:
from os import listdir as ls
from os import remove as rm
from os.path import realpath
def delete_low_score_files(dir, func, criterion="midpoint")
"""Delete files having low score according to function f
Args:
dir (str): path of the dir;
func (fun): function that score the files;
criterion (str): can be "midpoint" or "half-list";
Returns:
(list) deleted files.
"""
files = ls(dir)
sorted_files = sorted(files, key=func)
if criterion == "midpoint":
midpoint = func(sorted_files[-1]) - func(sorted_files[0])
files_to_delete = [f for f in sorted_files if func(f) < midpoint]
if criterion == "half-list":
n = len(sorted_files)/2
files_to_delete = sorted_files[:n]
for f in files_to_delete:
rm(realpath(f))
return files_to_delete