List of branches of a Python tree - python

I have a tree of Python objects. The tree is defined intrinsically: each object has a list (potentially empty) of children.
I would like to be able to print a list of all paths from the root to each leaf.
In the case of the tree above, this would mean:
result = [
[Node_0001,Node_0002,Node_0004],
[Node_0001,Node_0002,Node_0005,Node_0007],
[Node_0001,Node_0003,Node_0006],
]
The nodes must be treated as objects and not as integers (only their integer ID is displayed).
I don't care about the order of branches in the result. Each node has an arbitrary number of children, and the level of recursion is not fixed either.
I am trying a recursive approach:
def get_all_paths(node):
if len(node.children)==0:
return [[node]]
else:
return [[node] + get_all_paths(child) for child in node.children]
but I end-up with nested lists, which is not what I want:
[[Node_0001,
[Node_0002, [Node_0004]],
[Node_0002, [Node_0005, [Node_0007]]]],
[Node_0001, [Node_0003, [Node_0006]]]]
Any help would be gladly welcomed, this problem is driving me crazy :p
Thanks

I think this is what you are trying:
def get_all_paths(node):
if len(node.children) == 0:
return [[node]]
return [
[node] + path for child in node.children for path in get_all_paths(child)
]
For each child of a node, you should take all paths of the child and prepend the node itself to each path. You prepended the node to the list of paths, not every path individually.

Related

How can I write nested for loops recursively?

I have the following code for processing an XML file:
for el in root:
checkChild(rootDict, el)
for child in el:
checkChild(rootDict, el, child)
for grandchild in child:
checkChild(rootDict, el, child, grandchild)
for grandgrandchild in grandchild:
checkChild(rootDict, el, child, grandchild, grandgrandchild)
...
...
As you can see, on every iteration I just call the same function with one extra parameter. Is there a way to avoid writing so many nested for loops that basically do the same thing?
Any help would be appreciated. Thank you.
Whatever operation you wish to perform on files and directories you can traverse them. In python the easiest way I know is:
#!/usr/bin/env python
import os
# Set the directory you want to start from
root_dir = '.'
for dir_name, subdirList, file_list in os.walk(root_dir):
print(f'Found directory: {dir_name}s')
for file_name in file_list:
print(f'\t{file_name}s')
while traversing you can add the to groups or perform other operations
Assuming that root comes from an ElemenTree parsing, you can make a datastructure containing the list of all the ancestors for each node, cnd then iterate over this to call checkChild:
def checkChild(*element_chain):
# Code placeholder
print("Checking %s" % '.'.join(t.tag for t in reversed(element_chain)))
tree = ET.fromstring(xml)
# Build a dict containing each node and its ancestors
nodes_and_parents = {}
for elem in tree.iter(): # tree.iter yields every tag in the XML, not only the root childs
for child in elem:
nodes_and_parents[child] = [elem, ] + nodes_and_parents.get(elem, [])
for t, parents in nodes_and_parents.items():
checkChild(t, *parents)
def recurse(tree):
"""Walks a tree depth-first and yields the path at every step."""
# We convert the tree to a list of paths through it,
# with the most recently visited path last. This is the stack.
def explore(stack):
try:
# Popping from the stack means reading the most recently
# discovered but yet unexplored path in the tree. We yield it
# so you can call your method on it.
path = stack.pop()
except IndexError:
# The stack is empty. We're done.
return
yield path
# Then we expand this path further, adding all extended paths to the
# stack. In reversed order so the first child element will end up at
# the end, and thus will be yielded first.
stack.extend(path + (elm,) for elm in reversed(path[-1]))
yield from explore([(tree,)])
# The linear structure yields tuples (root, child, ...)
linear = recurse(root)
# Then call checkChild(rootDict, child, ...)
next(linear) # skip checkChild(rootDict)
for path in linear:
checkChild(rootDict, *path[1:])
For your understanding, suppose the root looked something like this:
root
child1
sub1
sub2
child2
sub3
subsub1
sub4
child3
That is like a tree. We can find a few paths through this tree, e.g. (root, child1). And as you feed these paths to checkChild this would result in a call checkChild(rootNode, child1). Eventually checkChild will be called exactly once for every path in the tree. We can thus write the tree as a list of paths like so:
[(root,),
(root, child1),
(root, child1, sub1),
(root, child1, sub2),
(root, child2),
(root, child2, sub3),
(root, child2, sub3, subsub1),
(root, child2, sub4),
(root, child3)]
The order of paths in this list happens to match your loop structure. It is called depth-first. (Another sort order, breadth-first, would first list all child nodes, then all sub nodes and finally all subsub nodes.)
The list above is the same as the stack variable in the code, with a small change that stack only stores the minimal number of paths it needs to remember.
To conclude, recurse yields those paths one-by-one and the last bit of code invokes the checkChild method as you do in your question.

Find the smallest item in a singly-linked List and move to the head?

I had a quiz recently and this is what the question looked like:-
You may use the following Node class:
class Node:
"""Lightweight, nonpublic class for storing a singly linked node."""
__slots__ = 'element', 'next' # streamline memory usage
def __init__(self, element, next): # initialize node's fields
self.element = element # reference to user's element
self.next = next # reference to next node
Assume you have a singly-linked list of unique integers. Write a Python method that traverses this list to find the smallest element, removes the node that contains that value, and inserts the smallest value in a new node at the front of the list. Finally, return the head pointer. For simplicity, you may assume that the node containing the smallest value is not already at the head of the list (ie, you will never have to remove the head node and re-add it again).
Your method will be passed the head of the list as a parameter (of type Node), as in the following method signature:
def moveSmallest(head):
You may use only the Node class; no other methods (like size(), etc) are available. Furthermore, the only pointer you have is head (passed in as a parameter); you do not have access to a tail pointer.
For example, if the list contains:
5 → 2 → 1 → 3
the resulting list will contain:
1 → 5 → 2 → 3
Hint 1: There are several parts to this question; break the problem down and think about how to do each part separately.
Hint 2: If you need to exit from a loop early, you can use the break command.
Hint 3: For an empty list or a list with only one element, there is nothing to do!
My answer:
def moveSmallest(h):
if h==None or h.next==None:
return h
# find part
temp=h
myList=[]
while temp!=None:
myList.append(temp.element)
temp=temp.next
myList.sort()
sv=myList[0]
# remove part
if h.element==sv and h.next!=None:
h=h.next
else:
start=h
while start!=None:
if start.next.element==sv and start.next.next!=None:
start.next=start.next.next
break
if start.next.element==sv and start.next.next==None:
start.next=None
break
start=start.next
# Insert part
newNode=Node(sv)
newNode.next=h
h=newNode
return h
Mark received=10/30
Feedback on my answer:
"Not supposed to use sorting; searching the list should be the way we've covered in class.
You're advancing too far ahead in the list without checking whether nodes exist.
Review the 'singly-linked list' slides and answer this question as the examples suggest."
As you can see I am finding the element in the list and removing it and then adding it to the list as a head node. I ran this code and it works fine. As you can see in the feedback he says "You're advancing too far ahead in the list without checking whether nodes exist." which is taken care by the first if statement in my answer and for "Not supposed to use sorting; searching the list should be the way we've covered in class." I believe my mistake was to use the list at the first place but given the code the final score should be or be more than 20/30. Can you guys please check this or give your opinion on this feedback?
As you can see in the feedback he says "You're advancing too far ahead
in the list without checking whether nodes exist." which is taken care
by the first if statement in my answer
It's not taken care of. The first if-statement of your function just checks to see if the head exists and that the node following the head exists as well (basically, asserting that you have at least two nodes in the linked list).
What you have:
if h.element==sv and h.next!=None:
h=h.next
else:
start=h
while start!=None:
if start.next.element==sv and start.next.next!=None:
If you enter the while loop, you only know the following things:
Your linked list has at least two elements
h.element != sv or h.next == None
The current node is not None
The node following the current node (start.next), however, may be None at some point - when you reach the end of your linked list. Therefore, you are trying to access a node that doesn't exist.
Here's how I would have done it. I haven't tested this, but I'm pretty sure this works:
def moveSmallest(head):
if head is None or head.next is None:
return head
# First, determine the smallest element.
# To do this, we need to visit each node once (except the head).
# Create a cursor that "iterates" through the nodes
# (The cursor can start at the 2nd element because we're guaranteed the head will never have the smallest element.)
cursor = head.next
current_minimum = head.next.element
while cursor is not None:
if current_minimum > cursor.element:
# We've found the new minimum.
current_minimum = cursor.element
cursor = cursor.next
# At this point, current_minimum is the smallest element.
# Next, go through the linked list again until right before we reach the node with the smallest element.
cursor = head
while cursor.next is not None:
if cursor.next.element == current_minimum:
# We want to reconnect the arrows.
if cursor.next.next is None:
cursor.next = None
else:
cursor.next = cursor.next.next
break
new_node = Node(current_minimum, head)
head = new_node
return head

Sum of all Nodes Iteratively - Not Recursively - Without 'left' and 'right'

I have this Binary Tree Structure:
# A Node is an object
# - value : Number
# - children : List of Nodes
class Node:
def __init__(self, value, children):
self.value = value
self.children = children
I can easily sum the Nodes, recursively:
def sumNodesRec(root):
sumOfNodes = 0
for child in root.children:
sumOfNodes += sumNodesRec(child)
return root.value + sumOfNodes
Example Tree:
exampleTree = Node(1,[Node(2,[]),Node(3,[Node(4,[Node(5,[]),Node(6,[Node(7,[])])])])])
sumNodesRec(exampleTree)
> 28
However, I'm having difficulty figuring out how to sum all the nodes iteratively. Normally, with a binary tree that has 'left' and 'right' in the definition, I can find the sum. But, this definition is tripping me up a bit when thinking about it iteratively.
Any help or explanation would be great. I'm trying to make sure I'm not always doing things recursively, so I'm trying to practice creating normally recursive functions as iterative types, instead.
If we're talking iteration, this is a good use case for a queue.
total = 0
queue = [exampleTree]
while queue:
v = queue.pop(0)
queue.extend(v.children)
total += v.value
print(total)
28
This is a common idiom. Iterative graph traversal algorithms also work in this manner.
You can simulate stacks/queues using python's vanilla lists. Other (better) alternatives would be the collections.deque structure in the standard library. I should explicitly mention that its enque/deque operations are more efficient than what you'd expect from a vanilla list.
Iteratively you can create a list, stack, queue, or other structure that can hold the items you run through. Put the root into it. Start going through the list, take an element and add its children into the list also. Add the value to the sum. Take next element and repeat. This way there’s no recursion but performance and memory usage may be worse.
In response to the first answer:
def sumNodes(root):
current = [root]
nodeList = []
while current:
next_level = []
for n in current:
nodeList.append(n.value)
next_level.extend(n.children)
current = next_level
return sum(nodeList)
Thank you! That explanation helped me think through it more clearly.

Tree traversal and getting neighbouring child nodes in Python

I'm trying to traverse a tree, and get certain subtrees into a particular data structure. I think an example is the best way to explain it:
For this tree, I want the root node and it's children. Then any children that have their own children should be traversed in the same way, and so on. So for the above tree, we would end up with a data structure such as:
[
(a, [b, c]),
(c, [d, e, f]),
(f, [g, h]),
]
I have some code so far to produce this, but there's an issue that it stops too early (or that's what it seems like):
from spacy.en import English
def _subtrees(sent, root=None, subtrees=[]):
if not root:
root = sent.root
children = list(root.children)
if not children:
return subtrees
subtrees.append((root, [child for child in children]))
for child in children:
return _subtrees(sent, child, subtrees)
nlp = English()
doc = nlp('they showed us an example')
print(_subtrees(list(doc.sents)[0]))
Note that this code won't produce the same tree as in the image. I feel like a generator would be better suited here also, but my generator-fu is even worse than my recursion-fu.
Let's first sketch the recursive algorithm:
Given a tree node, return:
A tuple of the node with its children
The subtrees of each child.
That's all it takes, so let's convert it to pseudocode, ehm, python:
def subtrees(node):
if not node.children:
return []
result = [ (node.dep, list(node.children)) ]
for child in node.children:
result.extend(subtrees(child))
return result
The root is just a node, so it shouldn't need special treatment. But please fix the member references if I misunderstood the data structure.
def _subtrees(root):
subtrees=[]
queue = []
queue.append(root)
while(len(queue)=!0):
root=queue[0]
children = list(root.children)
if (children):
queue = queue + list(root.children)
subtrees.append((root.dep, [child.dep for child in children]))
queue=queue.pop(0)
return subtrees
Assuming you want to know this for using spaCy specifically, why not just:
[(word, list(word.children)) for word in sent]
The Doc object lets you iterate over all nodes in order. So you don't need to walk the tree recursively here --- just iterate.
I can't quite comment yet, but if you modify the response by #syllogism_ like so and it'll omit all nodes that haven't any children in them.
[(word, list(word.children)) for word in s if bool(list(word.children))]

Tree traversal in a customised way in Python?

I have two trees in python. I need to compare them in a customized way according to the following specifications. Suppose I have a tree for entity E1 and a tree for entity E2. I need to traverse both the trees starting from E1 and E2 and moving upwards till I get to a common root. (Please note that I have to start the traversal from node E1 on the first tree and node E2 on the second tree.) Then I need to compare the count of the lengths of both their paths.
Can someone provide me an insight as to how to do this in Python? Can the classical tree traversal algorithms be useful here?
This is not a "traveral" (which visits each node in a tree); what you describe is merely following the parents.
The algorithm would look as follows. One of the tricks is to interleave calculations, so that the algorithm is guaranteed to terminate quickly. You also have to consider cases like node1==node2. This is also an O(A+B=N) rather than O(A*B=N^2) algorithm, where we consider the distance between the nodes and their youngest common ancestor.
def findYoungestCommonAncestor(node1, node2):
visited = set()
if node1==node2:
return node1
while True:
if node1 in visited:
return node1
if node2 in visited:
return node2
if not node1.parent and not node2.parent:
return None
if node1.parent:
visited.add(node1)
node1 = node1.parent
if node2.parent:
visited.add(node2)
node2 = node2.parent
The nodes node1 and node2 may be part of a forest (a set of interlinked trees) and this should still work.
More elegant would be something like:
def ancestors(node):
"""
an iterator of node, node.parent, node.parent.parent...
"""
yield node
while node.parent:
yield node
node = node.parent
def interleave(*iters):
"""
interleave(range(3), range(10,16)) -> 0,10,1,11,2,12,13,14,15
"""
ignore = object()
for tuple in zip_longest(*iters, fillvalue=ignore):
for x in tuple:
if not x is ignore:
yield x
def findYoungestCommonAncestor(node1, node2):
# implementation: find first repeated value in interleaved ancestors
visited = set()
for node in interleave(ancestors(node1), ancestors(node2)):
if node in visited:
return node
else:
visited.add(node)
That they are trees is not even relevant to the solution. You're looking for how long it takes for two single-linked (the parent link) lists to converge into the same list.
Simply follow the links, but keep a length count for each visited node. Once you reach an already visited node, sum the previously found count and the new one. This won't work if either list ends up circular, but if they do it's not a proper tree anyway. A way to fix that case is to track separate visited dictionaries for either branch; if you reach a node visited in its own branch, you can stop traversing that branch as there's no point recounting the loop.
This all naturally assumes you can find the parent of any node. The simplest tree structures don't actually have that link.
def closest_common_ancestor(ds1, ds2):
while ds1 != None:
dd = ds2
while dd != None:
if ds1 == dd:
return dd
dd = dd.parent
ds1 = ds1.parent
return None

Categories

Resources