I have two trees in python. I need to compare them in a customized way according to the following specifications. Suppose I have a tree for entity E1 and a tree for entity E2. I need to traverse both the trees starting from E1 and E2 and moving upwards till I get to a common root. (Please note that I have to start the traversal from node E1 on the first tree and node E2 on the second tree.) Then I need to compare the count of the lengths of both their paths.
Can someone provide me an insight as to how to do this in Python? Can the classical tree traversal algorithms be useful here?
This is not a "traveral" (which visits each node in a tree); what you describe is merely following the parents.
The algorithm would look as follows. One of the tricks is to interleave calculations, so that the algorithm is guaranteed to terminate quickly. You also have to consider cases like node1==node2. This is also an O(A+B=N) rather than O(A*B=N^2) algorithm, where we consider the distance between the nodes and their youngest common ancestor.
def findYoungestCommonAncestor(node1, node2):
visited = set()
if node1==node2:
return node1
while True:
if node1 in visited:
return node1
if node2 in visited:
return node2
if not node1.parent and not node2.parent:
return None
if node1.parent:
visited.add(node1)
node1 = node1.parent
if node2.parent:
visited.add(node2)
node2 = node2.parent
The nodes node1 and node2 may be part of a forest (a set of interlinked trees) and this should still work.
More elegant would be something like:
def ancestors(node):
"""
an iterator of node, node.parent, node.parent.parent...
"""
yield node
while node.parent:
yield node
node = node.parent
def interleave(*iters):
"""
interleave(range(3), range(10,16)) -> 0,10,1,11,2,12,13,14,15
"""
ignore = object()
for tuple in zip_longest(*iters, fillvalue=ignore):
for x in tuple:
if not x is ignore:
yield x
def findYoungestCommonAncestor(node1, node2):
# implementation: find first repeated value in interleaved ancestors
visited = set()
for node in interleave(ancestors(node1), ancestors(node2)):
if node in visited:
return node
else:
visited.add(node)
That they are trees is not even relevant to the solution. You're looking for how long it takes for two single-linked (the parent link) lists to converge into the same list.
Simply follow the links, but keep a length count for each visited node. Once you reach an already visited node, sum the previously found count and the new one. This won't work if either list ends up circular, but if they do it's not a proper tree anyway. A way to fix that case is to track separate visited dictionaries for either branch; if you reach a node visited in its own branch, you can stop traversing that branch as there's no point recounting the loop.
This all naturally assumes you can find the parent of any node. The simplest tree structures don't actually have that link.
def closest_common_ancestor(ds1, ds2):
while ds1 != None:
dd = ds2
while dd != None:
if ds1 == dd:
return dd
dd = dd.parent
ds1 = ds1.parent
return None
Related
I solved an exercise where I had to apply a recursive algorithm to a tree that's so defined:
class GenericTree:
""" A tree in which each node can have any number of children.
Each node is linked to its parent and to its immediate sibling on the right
"""
def __init__(self, data):
self._data = data
self._child = None
self._sibling = None
self._parent = None
I had to concatenate the data of the leaves with the data of the parents and so on until we arrive to the root that will have the sum of all the leaves data. I solved it in this way and it works but it seems very tortuous and mechanic:
def marvelous(self):
""" MODIFIES each node data replacing it with the concatenation
of its leaves data
- MUST USE a recursive solution
- assume node data is always a string
"""
if not self._child: #If there isn't any child
self._data=self._data #the value remains the same
if self._child: #If there are children
if self._child._child: #if there are niece
self._child.marvelous() #reapply the function to them
else: #if not nieces
self._data=self._child._data #initializing the name of our root node with the name of its 1st son
#if there are other sons, we'll add them to the root name
if self._child._sibling: #check
current=self._child._sibling #iterating through the sons-siblings line
while current:
current.marvelous() #we reapplying the function to them to replacing them with their concatenation (bottom-up process)
self._data+=current._data #we sum the sibling content to the node data
current=current._sibling #next for the iteration
#To add the new names to the new root node name:
self._data="" #initializing the root str value
current=self._child #having the child that through recursion have the correct str values, i can sum all them to the root node
while current:
self._data+=current._data
current=current._sibling
if self._sibling: #if there are siblings, they need to go through the function themselves
self._sibling.marvelous()
Basically I check if the node tree passed has children: if not, it remains with the same data.
If there are children, I check if there are nieces: in this case I restart the algorithm until I can some the leaves to the pre-terminal nodes, and I sum the leaves values to put that sum to their parents'data.
Then, I act on the root node with the code after the first while loop, so to put its name as the sum of all the leaves.
The final piece of code serves as to make the code ok for the siblings in each step.
How can I improve it?
It seems to me that your method performs a lot of redundant recursive calls.
For example this loop in your code:
while current:
current.marvelous()
self._data += current._data
current = current._sibling
is useless because the recursive call will be anyway performed by the last
instruction in your method (self._sibling.marvelous()). Besides,
you update self._data and then right after the loop you reset
self._data to "".
I tried to simplify it and came up with this solution that seems to
work.
def marvelous(self):
if self.child:
self.child.marvelous()
# at that point we know that the data for all the tree
# rooted in self have been computed. we collect these
self.data = ""
current = self.child
while current:
self.data += current.data
current = current.sibling
if self.sibling:
self.sibling.marvelous()
And here is a simpler solution:
def marvelous2(self):
if not self.child:
result = self.data
else:
result = self.child.marvelous2()
self.data = result
if self.sibling:
result += self.sibling.marvelous2()
return result
marvelous2 returns the data computed for a node and all its siblings. This avoids performing the while loop of the previous solution.
I am trying to understand why does inserting the same node twice in a singly linked list causes an infinite loop.
I tried to insert the last node as the new head node. But when I run the code, it's starting an infinite loop I can see since I am calling a method to print nodes at the end. Here's my code.
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
def insertLast(self, newNode):
if self.head is None:
self.head = newNode
else:
lastNode = self.head
while True:
if lastNode.next is None:
break
lastNode = lastNode.next
lastNode.next = newNode
def insertHead(self, newNode):
# x, y ,z. => new head, x,y,z
if self.head is None:
print("List is empy please call inserlast()")
else:
currentHead = self.head
self.head = newNode
self.head.next = currentHead
def printList(self):
if self.head is None:
print("EMPTY List. No Data found!")
return
else:
currentNode = self.head
while True:
print(currentNode.data)
currentNode = currentNode.next
if currentNode is None:
break
node1 = Node("Head")
node2 = Node("Some Data")
node3 = Node("Some More Data")
# I am adding this node at the end of the list
newnode1 = Node("New Head")
linkedList = LinkedList()
# create a new linked list by inserting at end
linkedList.insertLast(node1)
linkedList.insertLast(node2)
linkedList.insertLast(newnode1)
# using a node i have already added in the list as New head of the list
linkedList.insertHead(newnode1)
linkedList.printList()
When you re-add an existing node to the list, you don't do anything about existing nodes that already referenced the re-added node. In your case, you now have a node at the end of the list whose next points back to the node which has just become the head, and so now your list is circular, which is going to cause any function which tries to traverse the list to infaloop.
There are three ways I can think of to solve this, and I'll save the best for last:
Add cycle detection logic to protect you from infalooping when your list becomes circular (i.e. break the cycles or just stop traversing once you get back to someplace you've already been). This is nontrivial and IMO not a great solution; better to keep the list from getting broken in the first place than have to constantly check to see if it needs fixing.
Add a removeNode function to ensure that a given node has been fully removed from the list (i.e. by scanning through the whole list to see what other nodes, if any, reference the given node, and adjusting pointers to skip over it). Then you can safely re-insert a node by first removing it. Note: if the caller passes you a Node that belongs to a DIFFERENT list, you can still end up in trouble, because you'll have no way of finding that other list! Circular structures will still be possible by building lists and then inserting the nodes of each into the other.
Don't allow callers of your list class to access the actual nodes at all; have them insert values (rather than nodes) so that you can ensure that they're never giving you a node with weird pointer values (a node that belongs to another list, a node that points to itself, etc). The pointers should be completely internal to your linked list class rather than something that the caller can mess up.
Briefly, according to your code, the infinite loop you face results from the fact that The linked list does not arrange the nodes physically, it just tells each node which node is next.
Let us explain it in details:
As shown, each node has a next attribute which is set by LinkedList to the next one.
after initiating the Nodes
node1.next mentions to None
node2.next mentions to None
newnode1.next mentions to None
after creating the linked list
node1.next mention to Node2
node2.next mention to Newnode1
newnode1.next mention to None
after inserting Newnode1 again as a head, the linked list set Nownode1.next to mention to Node1, and the hole figure becomes:
node1.next mention to Node2
node2.next mention to Newnode1
newnode1.next mention to Node1
Therefor, it becomes a closed loop
The linked list does not arrange the nodes physically, it just tells each node which node is next.
That is the explanation of the infinite loop you face.
Good Luck
I have this Binary Tree Structure:
# A Node is an object
# - value : Number
# - children : List of Nodes
class Node:
def __init__(self, value, children):
self.value = value
self.children = children
I can easily sum the Nodes, recursively:
def sumNodesRec(root):
sumOfNodes = 0
for child in root.children:
sumOfNodes += sumNodesRec(child)
return root.value + sumOfNodes
Example Tree:
exampleTree = Node(1,[Node(2,[]),Node(3,[Node(4,[Node(5,[]),Node(6,[Node(7,[])])])])])
sumNodesRec(exampleTree)
> 28
However, I'm having difficulty figuring out how to sum all the nodes iteratively. Normally, with a binary tree that has 'left' and 'right' in the definition, I can find the sum. But, this definition is tripping me up a bit when thinking about it iteratively.
Any help or explanation would be great. I'm trying to make sure I'm not always doing things recursively, so I'm trying to practice creating normally recursive functions as iterative types, instead.
If we're talking iteration, this is a good use case for a queue.
total = 0
queue = [exampleTree]
while queue:
v = queue.pop(0)
queue.extend(v.children)
total += v.value
print(total)
28
This is a common idiom. Iterative graph traversal algorithms also work in this manner.
You can simulate stacks/queues using python's vanilla lists. Other (better) alternatives would be the collections.deque structure in the standard library. I should explicitly mention that its enque/deque operations are more efficient than what you'd expect from a vanilla list.
Iteratively you can create a list, stack, queue, or other structure that can hold the items you run through. Put the root into it. Start going through the list, take an element and add its children into the list also. Add the value to the sum. Take next element and repeat. This way there’s no recursion but performance and memory usage may be worse.
In response to the first answer:
def sumNodes(root):
current = [root]
nodeList = []
while current:
next_level = []
for n in current:
nodeList.append(n.value)
next_level.extend(n.children)
current = next_level
return sum(nodeList)
Thank you! That explanation helped me think through it more clearly.
I have a tree of Python objects. The tree is defined intrinsically: each object has a list (potentially empty) of children.
I would like to be able to print a list of all paths from the root to each leaf.
In the case of the tree above, this would mean:
result = [
[Node_0001,Node_0002,Node_0004],
[Node_0001,Node_0002,Node_0005,Node_0007],
[Node_0001,Node_0003,Node_0006],
]
The nodes must be treated as objects and not as integers (only their integer ID is displayed).
I don't care about the order of branches in the result. Each node has an arbitrary number of children, and the level of recursion is not fixed either.
I am trying a recursive approach:
def get_all_paths(node):
if len(node.children)==0:
return [[node]]
else:
return [[node] + get_all_paths(child) for child in node.children]
but I end-up with nested lists, which is not what I want:
[[Node_0001,
[Node_0002, [Node_0004]],
[Node_0002, [Node_0005, [Node_0007]]]],
[Node_0001, [Node_0003, [Node_0006]]]]
Any help would be gladly welcomed, this problem is driving me crazy :p
Thanks
I think this is what you are trying:
def get_all_paths(node):
if len(node.children) == 0:
return [[node]]
return [
[node] + path for child in node.children for path in get_all_paths(child)
]
For each child of a node, you should take all paths of the child and prepend the node itself to each path. You prepended the node to the list of paths, not every path individually.
I'm trying to traverse a tree, and get certain subtrees into a particular data structure. I think an example is the best way to explain it:
For this tree, I want the root node and it's children. Then any children that have their own children should be traversed in the same way, and so on. So for the above tree, we would end up with a data structure such as:
[
(a, [b, c]),
(c, [d, e, f]),
(f, [g, h]),
]
I have some code so far to produce this, but there's an issue that it stops too early (or that's what it seems like):
from spacy.en import English
def _subtrees(sent, root=None, subtrees=[]):
if not root:
root = sent.root
children = list(root.children)
if not children:
return subtrees
subtrees.append((root, [child for child in children]))
for child in children:
return _subtrees(sent, child, subtrees)
nlp = English()
doc = nlp('they showed us an example')
print(_subtrees(list(doc.sents)[0]))
Note that this code won't produce the same tree as in the image. I feel like a generator would be better suited here also, but my generator-fu is even worse than my recursion-fu.
Let's first sketch the recursive algorithm:
Given a tree node, return:
A tuple of the node with its children
The subtrees of each child.
That's all it takes, so let's convert it to pseudocode, ehm, python:
def subtrees(node):
if not node.children:
return []
result = [ (node.dep, list(node.children)) ]
for child in node.children:
result.extend(subtrees(child))
return result
The root is just a node, so it shouldn't need special treatment. But please fix the member references if I misunderstood the data structure.
def _subtrees(root):
subtrees=[]
queue = []
queue.append(root)
while(len(queue)=!0):
root=queue[0]
children = list(root.children)
if (children):
queue = queue + list(root.children)
subtrees.append((root.dep, [child.dep for child in children]))
queue=queue.pop(0)
return subtrees
Assuming you want to know this for using spaCy specifically, why not just:
[(word, list(word.children)) for word in sent]
The Doc object lets you iterate over all nodes in order. So you don't need to walk the tree recursively here --- just iterate.
I can't quite comment yet, but if you modify the response by #syllogism_ like so and it'll omit all nodes that haven't any children in them.
[(word, list(word.children)) for word in s if bool(list(word.children))]