I have the following code for processing an XML file:
for el in root:
checkChild(rootDict, el)
for child in el:
checkChild(rootDict, el, child)
for grandchild in child:
checkChild(rootDict, el, child, grandchild)
for grandgrandchild in grandchild:
checkChild(rootDict, el, child, grandchild, grandgrandchild)
...
...
As you can see, on every iteration I just call the same function with one extra parameter. Is there a way to avoid writing so many nested for loops that basically do the same thing?
Any help would be appreciated. Thank you.
Whatever operation you wish to perform on files and directories you can traverse them. In python the easiest way I know is:
#!/usr/bin/env python
import os
# Set the directory you want to start from
root_dir = '.'
for dir_name, subdirList, file_list in os.walk(root_dir):
print(f'Found directory: {dir_name}s')
for file_name in file_list:
print(f'\t{file_name}s')
while traversing you can add the to groups or perform other operations
Assuming that root comes from an ElemenTree parsing, you can make a datastructure containing the list of all the ancestors for each node, cnd then iterate over this to call checkChild:
def checkChild(*element_chain):
# Code placeholder
print("Checking %s" % '.'.join(t.tag for t in reversed(element_chain)))
tree = ET.fromstring(xml)
# Build a dict containing each node and its ancestors
nodes_and_parents = {}
for elem in tree.iter(): # tree.iter yields every tag in the XML, not only the root childs
for child in elem:
nodes_and_parents[child] = [elem, ] + nodes_and_parents.get(elem, [])
for t, parents in nodes_and_parents.items():
checkChild(t, *parents)
def recurse(tree):
"""Walks a tree depth-first and yields the path at every step."""
# We convert the tree to a list of paths through it,
# with the most recently visited path last. This is the stack.
def explore(stack):
try:
# Popping from the stack means reading the most recently
# discovered but yet unexplored path in the tree. We yield it
# so you can call your method on it.
path = stack.pop()
except IndexError:
# The stack is empty. We're done.
return
yield path
# Then we expand this path further, adding all extended paths to the
# stack. In reversed order so the first child element will end up at
# the end, and thus will be yielded first.
stack.extend(path + (elm,) for elm in reversed(path[-1]))
yield from explore([(tree,)])
# The linear structure yields tuples (root, child, ...)
linear = recurse(root)
# Then call checkChild(rootDict, child, ...)
next(linear) # skip checkChild(rootDict)
for path in linear:
checkChild(rootDict, *path[1:])
For your understanding, suppose the root looked something like this:
root
child1
sub1
sub2
child2
sub3
subsub1
sub4
child3
That is like a tree. We can find a few paths through this tree, e.g. (root, child1). And as you feed these paths to checkChild this would result in a call checkChild(rootNode, child1). Eventually checkChild will be called exactly once for every path in the tree. We can thus write the tree as a list of paths like so:
[(root,),
(root, child1),
(root, child1, sub1),
(root, child1, sub2),
(root, child2),
(root, child2, sub3),
(root, child2, sub3, subsub1),
(root, child2, sub4),
(root, child3)]
The order of paths in this list happens to match your loop structure. It is called depth-first. (Another sort order, breadth-first, would first list all child nodes, then all sub nodes and finally all subsub nodes.)
The list above is the same as the stack variable in the code, with a small change that stack only stores the minimal number of paths it needs to remember.
To conclude, recurse yields those paths one-by-one and the last bit of code invokes the checkChild method as you do in your question.
Related
I solved an exercise where I had to apply a recursive algorithm to a tree that's so defined:
class GenericTree:
""" A tree in which each node can have any number of children.
Each node is linked to its parent and to its immediate sibling on the right
"""
def __init__(self, data):
self._data = data
self._child = None
self._sibling = None
self._parent = None
I had to concatenate the data of the leaves with the data of the parents and so on until we arrive to the root that will have the sum of all the leaves data. I solved it in this way and it works but it seems very tortuous and mechanic:
def marvelous(self):
""" MODIFIES each node data replacing it with the concatenation
of its leaves data
- MUST USE a recursive solution
- assume node data is always a string
"""
if not self._child: #If there isn't any child
self._data=self._data #the value remains the same
if self._child: #If there are children
if self._child._child: #if there are niece
self._child.marvelous() #reapply the function to them
else: #if not nieces
self._data=self._child._data #initializing the name of our root node with the name of its 1st son
#if there are other sons, we'll add them to the root name
if self._child._sibling: #check
current=self._child._sibling #iterating through the sons-siblings line
while current:
current.marvelous() #we reapplying the function to them to replacing them with their concatenation (bottom-up process)
self._data+=current._data #we sum the sibling content to the node data
current=current._sibling #next for the iteration
#To add the new names to the new root node name:
self._data="" #initializing the root str value
current=self._child #having the child that through recursion have the correct str values, i can sum all them to the root node
while current:
self._data+=current._data
current=current._sibling
if self._sibling: #if there are siblings, they need to go through the function themselves
self._sibling.marvelous()
Basically I check if the node tree passed has children: if not, it remains with the same data.
If there are children, I check if there are nieces: in this case I restart the algorithm until I can some the leaves to the pre-terminal nodes, and I sum the leaves values to put that sum to their parents'data.
Then, I act on the root node with the code after the first while loop, so to put its name as the sum of all the leaves.
The final piece of code serves as to make the code ok for the siblings in each step.
How can I improve it?
It seems to me that your method performs a lot of redundant recursive calls.
For example this loop in your code:
while current:
current.marvelous()
self._data += current._data
current = current._sibling
is useless because the recursive call will be anyway performed by the last
instruction in your method (self._sibling.marvelous()). Besides,
you update self._data and then right after the loop you reset
self._data to "".
I tried to simplify it and came up with this solution that seems to
work.
def marvelous(self):
if self.child:
self.child.marvelous()
# at that point we know that the data for all the tree
# rooted in self have been computed. we collect these
self.data = ""
current = self.child
while current:
self.data += current.data
current = current.sibling
if self.sibling:
self.sibling.marvelous()
And here is a simpler solution:
def marvelous2(self):
if not self.child:
result = self.data
else:
result = self.child.marvelous2()
self.data = result
if self.sibling:
result += self.sibling.marvelous2()
return result
marvelous2 returns the data computed for a node and all its siblings. This avoids performing the while loop of the previous solution.
I have a tree of Python objects. The tree is defined intrinsically: each object has a list (potentially empty) of children.
I would like to be able to print a list of all paths from the root to each leaf.
In the case of the tree above, this would mean:
result = [
[Node_0001,Node_0002,Node_0004],
[Node_0001,Node_0002,Node_0005,Node_0007],
[Node_0001,Node_0003,Node_0006],
]
The nodes must be treated as objects and not as integers (only their integer ID is displayed).
I don't care about the order of branches in the result. Each node has an arbitrary number of children, and the level of recursion is not fixed either.
I am trying a recursive approach:
def get_all_paths(node):
if len(node.children)==0:
return [[node]]
else:
return [[node] + get_all_paths(child) for child in node.children]
but I end-up with nested lists, which is not what I want:
[[Node_0001,
[Node_0002, [Node_0004]],
[Node_0002, [Node_0005, [Node_0007]]]],
[Node_0001, [Node_0003, [Node_0006]]]]
Any help would be gladly welcomed, this problem is driving me crazy :p
Thanks
I think this is what you are trying:
def get_all_paths(node):
if len(node.children) == 0:
return [[node]]
return [
[node] + path for child in node.children for path in get_all_paths(child)
]
For each child of a node, you should take all paths of the child and prepend the node itself to each path. You prepended the node to the list of paths, not every path individually.
I'm trying to traverse a tree, and get certain subtrees into a particular data structure. I think an example is the best way to explain it:
For this tree, I want the root node and it's children. Then any children that have their own children should be traversed in the same way, and so on. So for the above tree, we would end up with a data structure such as:
[
(a, [b, c]),
(c, [d, e, f]),
(f, [g, h]),
]
I have some code so far to produce this, but there's an issue that it stops too early (or that's what it seems like):
from spacy.en import English
def _subtrees(sent, root=None, subtrees=[]):
if not root:
root = sent.root
children = list(root.children)
if not children:
return subtrees
subtrees.append((root, [child for child in children]))
for child in children:
return _subtrees(sent, child, subtrees)
nlp = English()
doc = nlp('they showed us an example')
print(_subtrees(list(doc.sents)[0]))
Note that this code won't produce the same tree as in the image. I feel like a generator would be better suited here also, but my generator-fu is even worse than my recursion-fu.
Let's first sketch the recursive algorithm:
Given a tree node, return:
A tuple of the node with its children
The subtrees of each child.
That's all it takes, so let's convert it to pseudocode, ehm, python:
def subtrees(node):
if not node.children:
return []
result = [ (node.dep, list(node.children)) ]
for child in node.children:
result.extend(subtrees(child))
return result
The root is just a node, so it shouldn't need special treatment. But please fix the member references if I misunderstood the data structure.
def _subtrees(root):
subtrees=[]
queue = []
queue.append(root)
while(len(queue)=!0):
root=queue[0]
children = list(root.children)
if (children):
queue = queue + list(root.children)
subtrees.append((root.dep, [child.dep for child in children]))
queue=queue.pop(0)
return subtrees
Assuming you want to know this for using spaCy specifically, why not just:
[(word, list(word.children)) for word in sent]
The Doc object lets you iterate over all nodes in order. So you don't need to walk the tree recursively here --- just iterate.
I can't quite comment yet, but if you modify the response by #syllogism_ like so and it'll omit all nodes that haven't any children in them.
[(word, list(word.children)) for word in s if bool(list(word.children))]
So I'm trying to represent an XML file using python. I already managed to do that. But each child node in my tree has to be doubly linked and I don't know how to do that. I've found a few examples of code online but they all use classes and the professor doesn't want us using classes. Here's my code:
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def create_tree(): #This function creates the root element of my tree
n = input("Enter your root Element: ")
root = Element(n)
tree = ElementTree(root)
print(etree.tostring(root))
return root
def add_child():
root = create_tree()
new = True
while new == True:
ask = input("If you wish to add a new child type 'yes', else type something else: ")
if ask == 'yes':
n = input("Enter the name of the node: ") #This block of code creates a child and appends it to the root element
node = Element(n)
root.append(node)
print(etree.tostring(root))
else:
break
return etree.tostring(root)
add_child()
For those of you wondering, the purpose of this project is to create a rooted tree with unbounded branching. I hope that once I can implement the doubly linked list I will be able to add a child node within a child node.
You should be able to implement a linked list using lists. You can define
each element as a list with the element itself, the item before it, and the item after it. If the list contains the previous node, the next node, and the element itself, respectively, and the head is stored in head, the syntax to append newElement to the beginning of the list would be as follows:
newNode = [None,head,newElement]
head[0] = newNode
head = newNode
I'm writing a simple function that walks through a directory tree looking for folders of a certain name. What I'm after is a match's parent path. E.g., for "C:/a/b/c/MATCH" I want "C:/a/b/c". I don't need duplicate parents or paths to subfolder matches, so if there's a "C:/a/b/c/d/e/f/MATCH", I don't need it. So, during my walk, once I have a parent, I want to iterate to the next current root.
Below is what I have so far, including a comment where I'm stuck.
def FindProjectSubfolders(masterPath, projectSubfolders):
for currRoot, dirnames, filenames in os.walk(masterPath):
# check if have a project subfolder
foundMatch = False
for dirname in dirnames:
for projectSubfolder in projectSubfolders:
if (dirname == projectSubfolder):
foundMatch = True;
break
if (foundMatch == True):
# what goes here to stop traversing "currRoot"
# and iterate to the next one?
Trim dirnames to an empty list to prevent further traversing of directories below the current one:
def FindProjectSubfolders(masterPath, projectSubfolders):
for currRoot, dirnames, filenames in os.walk(masterPath):
# check if have a project subfolder
foundMatch = False
for dirname in dirnames:
for projectSubfolder in projectSubfolders:
if (dirname == projectSubfolder):
foundMatch = True;
break
if (foundMatch == True):
# what goes here to stop traversing "currRoot"
# and iterate to the next one?
dirnames[:] = [] # replace all indices in `dirnames` with the empty list
Note that the above code alters the list dirnames refers to with a slice assignment, it does not rebind dirnames to a new list.