Construct a tree from a list of data

Construct a tree from a list of data - python

Suppose I have a list of numbers like below,
a = [1,23,5,72,3,5,15,7,78,1,5,77,23]
Now I need to build a tree from these data.
Divide the dataset into two parts according to the self-defined function par. Let's call these two parts a0, a1.
Apply the function par on a0, a1 respectively, and get a00, a01, a11, a12.
Repeat the process until there is only one number left at the end node.
For each end node, we have a "binary code" like "0010100", where 0 represents left, and 1 represents right at each specific step.
I was trying to use the tree class like below, but I was stuck at the first place.
class Node(input):
def __init__(self, data):
self.data = data
self.left = '*'
self.right = '*'

It is not clear which values the internal nodes would have. It is maybe not important, but you could assign always the first value in the data array that belongs to that particular subtree.
The "binary code" is like a navigation path from the root to the node. So for the data you presented, we would expect something like this tree:
value: 1
path: ____________ "" ________
/ \
value: 1 15
path: __ 0 __ _____ 1 _______
/ \ / \
value: 1 72 15 1
path: 00 01 10 __ 11 __
/ \ / \ / \ / \
value: 1 23 72 3 15 7 1 77
path: 000 001 010 011 100 101 110 111
/ \ / \ / \ / \ / \
value: 23 5 3 5 7 78 1 5 77 23
path: 0010 0011 0110 0111 1010 1011 1100 1101 1110 1111
You can use a simple Node class, which can store both a value and a path:
class Node:
def __init__(self, value, path, left=None, right=None):
self.value = value
self.path = path
self.left = left
self.right = right
The par function would do something like this:
def par(data):
i = len(data) // 2
return (data[:i], data[i:]) if i else ()
The if..else operator is used to return an empty list when there is only one data element. This will be useful in the main function:
def maketree(data, path=""):
return Node(data[0], path, *(
maketree(part, path + str(i)) for i, part in enumerate(par(data))
))
This enumerates the parts that are returned by par and passes those parts to the recursive call (if any parts are returned). At the same time that recursive call gets a path string that is extended with a 0 or 1 (i.e. the index of the enumeration).
Example call:
a = [1,23,5,72,3,5,15,7,78,1,5,77,23]
root = maketree(a)
# Output the properties of one particular node:
node = root.left.left.right.left
print("value: {}, path: {}".format(node.value, node.path))
# Outputs: "value: 23, path: 0010"

class Node(input):
def __init__(self, data, path='0'):
self.path = path # 0 means left, 1 means right
if len(data) = 1:
self.value = data[0] # this assumes the leaf is a list with single data, rather than the data itself
else:
left, right = par(data)
left = Node(left, side+'0')
right = Node(right, side+'1')
Node.path gives you the binary code.

Related

Count Number of Good Nodes

problem statement
I am having trouble understanding what is wrong with my code and understanding the constraint below.
My pseudocode:
Traverse the tree Level Order and construct the array representation (input is actually given as a single root, but they use array representation to show the full tree)
iterate over this array representation, skipping null nodes
for each node, let's call it X, iterate upwards until we reach the root checking to see if at any point in the path, parentNode > nodeX, meaning, nodeX is not a good node.
increment counter if the node is good
Constraints:
The number of nodes in the binary tree is in the range [1, 10^5].
Each node's value is between [-10^4, 10^4]
First of all:
My confusion on the constraint is that, the automated tests are giving input such as [2,4,4,4,null,1,3,null,null,5,null,null,null,null,5,4,4] and if we follow the rules that childs are at c1 = 2k+1 and c2 = 2k+2 and parent = (k-1)//2 then this means that there are nodes with value null
Secondly:
For the input above, my code outputs 8, the expected value is 6, but when I draw the tree from the array, I also think the answer should be 8!
tree of input
# Definition for a binary tree node.
# class TreeNode:
# def __init__(self, val=0, left=None, right=None):
# self.val = val
# self.left = left
# self.right = right
class Solution:
def goodNodes(self, root: TreeNode) -> int:
arrRepresentation = []
queue = []
queue.append(root)
# while queue not empty
while queue:
# remove node
node = queue.pop(0)
if node is None:
arrRepresentation.append(None)
else:
arrRepresentation.append(node.val)
if node is not None:
# add left to queue
queue.append(node.left)
# add right to queue
queue.append(node.right)
print(arrRepresentation)
goodNodeCounter = 1
# iterate over array representation of binary tree
for k in range(len(arrRepresentation)-1, 0, -1):
child = arrRepresentation[k]
if child is None:
continue
isGoodNode = self._isGoodNode(k, arrRepresentation)
print('is good: ' + str(isGoodNode))
if isGoodNode:
goodNodeCounter += 1
return goodNodeCounter
def _isGoodNode(self, k, arrRepresentation):
child = arrRepresentation[k]
print('child: '+str(child))
# calculate index of parent
parentIndex = (k-1)//2
isGood = True
# if we have not reached root node
while parentIndex >= 0:
parent = arrRepresentation[parentIndex]
print('parent: '+ str(parent))
# calculate index of parent
parentIndex = (parentIndex-1)//2
if parent is None:
continue
if parent > child:
isGood = False
break
return isGood

Recursion might be easier:
class Node:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
def good_nodes(root, maximum=float('-inf')):
if not root: # null-root
return 0
is_this_good = maximum <= root.val # is this root a good node?
maximum = max(maximum, root.val) # update max
good_from_left = good_nodes(root.left, maximum) if root.left else 0
good_from_right = good_nodes(root.right, maximum) if root.right else 0
return is_this_good + good_from_left + good_from_right
tree = Node(2, Node(4, Node(4)), Node(4, Node(1, Node(5, None, Node(5, Node(4), Node(4)))), Node(3)))
print(good_nodes(tree)) # 6
Basically, recursion traverses the tree while updating the maximum number seen so far. At each iteration, the value of a root is compared with the maximum, incrementing the counter if necessary.

Since you wanted to solve with breadth first search:
from collections import deque
class Solution:
def goodNodes(self,root:TreeNode)->int:
if not root:
return 0
queue=deque()
# run bfs with track of max_val till its parent node
queue.append((root,-inf))
res=0
while queue:
current,max_val=queue.popleft()
if current.val>=max_val:
res+=1
if current.left:
queue.append((current.left,max(max_val,current.val)))
if current.right:
queue.append((current.right,max(max_val,current.val)))
return res
I added the node and its max_value till its parent node. I could not add a global max_value, because look at this tree:
For the first 3 nodes, you would have this [3,1,4] and if you were keeping the max_val globally, max_val would be 4.
Now next node would be 3, leaf node on the left. Since max_node is 4, 3<4 would be incorrect so 3 would not be considered as good node. So instead, I keep track of max_val of each node till its parent node

The binary heap you provided corresponds to the folloring hierarchy:
tree = [2,4,4,4,None,1,3,None,None,5,None,None,None,None,5,4,4]
printHeapTree(tree)
2
/ \
4 4
/ / \
4 1 3
\
5
In that tree, only item value 1 has an ancestor that is greater than itself. The 6 other nodes are good, because they have no ancestor that are greater than themselves (counting the root as good).
Note that there are values in the list that are unreachable because their parent is null (None) so they are not part of the tree (this could be a copy/paste mistake though). If we replace these None values by something else to make them part of the tree, we can see where the unreachable nodes are located in the hierarchy:
t = [2,4,4,4,'*', 1,3,'*',None, 5,None, None,None,None,5,4,4]
printHeapTree(t)
2
__/ \_
4 4
/ \ / \
4 * 1 3
/ / \
* 5 5
/ \
4 4
This is likely where the difference between a result of 8 (not counting root as good) vs 6 (counting root as good) comes from.
You can find the printHeapTree() function here.

Diameter of binary tree fails 4 out of 104 test cases

I am working on Leet Code problem 543. Diameter of Binary Tree:
Given the root of a binary tree, return the length of the diameter of the tree.
The diameter of a binary tree is the length of the longest path between any two nodes in a tree. This path may or may not pass through the root.
The length of a path between two nodes is represented by the number of edges between them.
Example 1
Input: root = [1,2,3,4,5]
Output: 3
Explanation: 3 is the length of the path [4,2,1,3] or [5,2,1,3].
This is my attempt:
def diameterOfBinaryTree(self, root):
return self.getHeight(root.left) + self.getHeight(root.right)
def getHeight(self, root):
if not root:
return 0
return max(self.getHeight(root.left), self.getHeight(root.right)) + 1
I got 100/104 test cases passed.
The test case that I got wrong had an input of [4,-7,-3,null,null,-9,-3,9,-7,-4,null,6,null,-6,-6,null,null,0,6,5,null,9,null,null,-1,-4,null,null,null,-2] with an expected result of 8. However, due to the logics of my solution, I got 7 instead and have no idea how I could be wrong.

The only test case that I got wrong had an input of [4,-7,-3,null,null,-9,-3,9,-7,-4,null,6,null,-6,-6,null,null,0,6,5,null,9,null,null,-1,-4,null,null,null,-2] ... have no idea how I could be wrong.
The provided tree looks like this:
___4___
/ \
-7 ___-3_____
/ \
___-9___ -3
/ \ /
9 -7 -4
/ / \
__6__ -6 -6
/ \ / /
0 6 5 9
\ / /
-1 -4 -2
The longest path is indeed 8, because the longest path in the subtree rooted at -9 has that longest path. Your code does not find that longest path because it requires that the root is part of it.
So, you should check what the longest path is for any subtree (recursively) and retain the longest among these options:
The longest path found in the left subtree
The longest path found in the right subtree
The longest path that can be made by including the root (i.e. left-height + right-height + 1)
Your code does not take the two first options into account and always goes for the third.
The above is implemented in this (hidden) solution. First try it yourself:
class Solution(object):
def diameterOfBinaryTree(self, root):
self.longestPath = 0
def levels(node): # With side-effect: it updates longestPath
if not node:
return 0
leftLevels = levels(node.left)
rightLevels = levels(node.right)
self.longestPath = max(self.longestPath, leftLevels + rightLevels)
return 1 + max(leftLevels, rightLevels)
levels(root) # Not interested in the returned value...
return self.longestPath # ...but all the more in the side effect!

You code assumes that diameter of the binary tree will always go through the root, which is not the case. You have to consider the subtrees with longer diameters. One way to do it can be found below, it is a slightly modified version of your code:
class Solution(object):
maximum = 0
def getHeight(self, root):
if not root:
return 0
left = self.getHeight(root.left)
right = self.getHeight(root.right)
sub_diameter = left + right
if(self.maximum < sub_diameter): self.maximum = sub_diameter
return max(left, right) + 1
def diameterOfBinaryTree(self, root):
return max(self.getHeight(root.left) + self.getHeight(root.right), self.maximum)
Tested, it passes all testcases.
Main logic is to keep the maximum diameter value for subtrees and compare it with the diameter that goes through the root at the end.

Longest increasing subsequence with binary search

I'm trying to implement some algorithm in python and I need help.
Given some array of integers,
I want to build BTS and find the Longest increasing subsequence.
The idea is to give index to each node(by order of inserting)
Next we want to take all the indices from the left tree and put them in stack
Next we want to check for each index in the above stack if we have in the tree index that is bigger than the current node, if yes we insert it to stack and update the value max which is the number of elements in our stack.
I need help at the point of scanning the tree and insert the elements into a stack.
here is my code so far:
class Node:
def __init__(self, key, index = -1):
self.right = None
self.left = None
self.key = key
self.index = index
def __str__(self):
return "key: %s, index: %s" % (str(self.key), str(self.index))
def insert(root, key, value=-1):
if root is None:
root = Node(key, value)
else:
if key < root.key:
root.left = insert(root.left, key, value)
elif key > root.key:
root.right = insert(root.right, key, value)
else:
pass
return root
def LeftSideIndices(root):
res = []
if root:
res = LeftSideIndices(root.left)
res.append(root.index)
return res
def InOrderWithInsert(root,A):
newStack = []
if root:
for i in range(0, len(A)):
newStack = upInOrder(root.left,A)
if root.index > A[i]:
newStack.append(root.key)
newStack = newStack + upInOrder(root.right, A)
return newStack
Example:
The right stack should be: s=[0,2,8,11]

Some general remarks :
providing a MWE is appreciated.
your code does not define upInOrder so we can't run part of it
nitpick: in the insert function your value parameter is passed to the index parameter of your Node constructor, the naming is confusing.
reformulating your question to make it explicit : "given a Binary Search Tree, find the Longest Increasing Subsequence"
there is a bug in your LeftSideIndices : it returns only indices of left (grand)children instead of the indices of the whole left-half of the tree, example :
bst = insert(insert(insert(None, 2, 0), 0, 1), 1, 2)
# val=2,idx=0
# /
# val=0,idx=1
# \
# val=1,idx=2
print(LeftSideIndices(bst)) # [1, 0]
for reference :
print("value | " + " | ".join(str(value).ljust(2) for value in A) + " |")
print("index | " + " | ".join(str(index).ljust(2) for index in range(len(A))) + " |")
# | value | 4 | 1 | 13 | 7 | 0 | 2 | 8 | 11 | 3 |
# | index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
as said in the comments, another solution would be to not use a BST at all, and just search the LIS in A (which uses a much more common algorithm). I would even say that using a BST makes it pointlessly hard because the result has nothing to do with the BST structure, it is strictly defined by the insertion order, irregarding the datastructure considered.
There may be multiple different subsequences having the same length (being the longest possible), example :
bst = insert(insert(insert(insert(None, 1, 0), 22, 1), 14, 2), 15, 3)
# val=1,idx=0
# \
# val=22,idx=1
# /
# val=14,idx=2
# \
# val=15,idx=3
# Expected LIS = { (1, 22), (14, 15) }
I read several times your algorithm explanation (slightly reformatted) :
take all the indices from the left tree and put them in stack
check for each index in the stack if we have in the tree index that is bigger than the current node
if yes, insert it to stack and update the value max which is the number of elements in the stack.
I am not sure to understand your algorithm, and I don't think it works.
I am even not sure there is a simple solution to this problem.
The way your indices are scattered accross your tree prevent a node to know which solution(s) from its left and/or right sub-trees are interesting because there is too much holes in the index sequence in a sub-tree, the information at the node level is too much gappy.
Usually for trees algorithms, it is simple to apply some divide and conquer algorithm, but in this case I can't even tell how recursively the root node could tell which is the longest subsequence, given the results of its left and righ subtrees.
But in your case it looks to me extra hard to find how to implement what you want. And I can't convince myself there is actually an algorithm. So I hope you will be able to prove me wrong.

Maximum Path Sum between 2 Leaf Nodes(GeeksForGeeks)

Given a binary tree in which each node element contains a number. Find the maximum possible sum from one leaf node to another.
Example 1:
Input :
3
/ \
4 5
/ \
-10 4
Output: 16
Explanation :
Maximum Sum lies between leaf node 4 and 5.
4 + 4 + 3 + 5 = 16.
Example 2:
Input :
-15
/ \
5 6
/ \ / \
-8 1 3 9
/ \ \
2 -3 0
/ \
4 -1
/
10
Output : 27
Explanation:
The maximum possible sum from one leaf node
to another is (3 + 6 + 9 + 0 + -1 + 10 = 27)
This is the solution:
'''
# Node Class:
class Node:
def _init_(self,val):
self.data = val
self.left = None
self.right = None
'''
res = -999999999
def maxPathSumUtil(root):
global res
if root is None:
return 0
if root.left is None and root.right is None:
return root.data
ls=maxPathSumUtil(root.left)
rs=maxPathSumUtil(root.right)
if root.left and root.right:
res=max(res,ls+rs+root.data)
return max(ls+root.data,rs+root.data) #Line: Problem
if root.left is None:
return rs+root.data
else:
return ls+root.data
def maxPathSum(root):
global res
res = -999999999
maxPathSumUtil(root)
return res
Can anyone tell me why do we use return max(ls+root.data,rs+root.data). And if we do use return max(ls+root.data,rs+root.data) for checking the maximum value then why do we use res=max(res,ls+rs+root.data) and not just res = max(ls+root.data,rs+root.data).
EDIT:
For example:
Let's take this tree for example:
10
/ \
8 2
/ \
3 5
In this, after recursive calls, ls becomes 3 and rs becomes 5.
res becomes ls+rs+root.data which is 3+5+8 = 16.
Then return max(ls+root.data,rs+root.data) which is max(11,13) = 13.
Now after this according to me the function should just return 13 but that does not happen. Even though return is not a recursive statement. How is the control flow of the code happening?

There are two things that are measured in parallel during execution:
ls+rs+root.data is the max path in the tree rooted by root, between two of the leaves below it. So it is (the value) of a leaf-to-leaf path
The function return value is the maximum path from root to any of the leaves below it. So it is (the value) of a root-to-leaf path
These are two different concepts and should not be mixed up.
Both ls and rs are function return values: ls represents the maximum path from root.left to a leaf. And in the same way rs represents the maximum path from root.right to a leaf.
ls+rs+root.data on the other hand, represents a path from leaf to leaf passing through root.
res should be updated if that latter expression is greater than res, hence the max().
But the function's return value should not represent a leaf-to-leaf path, but a root-to-leaf path. So that is why we have:
return max(ls+root.data,rs+root.data)
This tells the caller what the maximum root-to-leaf path is, not what the maximum leaf-to-leaf path is. The latter is used for determining res, not the function's return value.
I hope this clarifies the distinction between these two concepts and the roles they play in the algorithm.
The example
You presented this tree as example:
10
/ \
8 2
/ \
3 5
Indeed, when the function is called for the node 8, it:
sets res to 16 (the max path between two leaves below the node)
returns 13 (the max path from the node to one of its leaves)
You then ask:
Now after this according to me the function should just return 13 but that does not happen.
But it does happen like that. You should however not forget that this is the return value of maxPathSumUtil, not of maxPathSum. Also, this is not the top-level call of maxPathSumUtil. The value 13 is returned to another execution context of maxPathSumUtil, where root is the node 10. Then -- after another recursive call is made (with root equal to node 2), this top-level execution of the function maxPathSumUtil will:
set res to 25 (the max path between two leaves below the node 10)
return 23 (the max path from the node 10 to one of its leaves)
This toplevel call was made from within maxPathSum, which ignores the value returned by maxPathSumUntil.
It only takes the value of res (25), and returns that:
maxPathSumUtil(root) # notice that return value is ignored.
return res

At each node, we have to check whether that node's left and right child resulted in max path. But when we return, we need to return either the left or right path depending upon whichever is max.
Let's take this tree for example:
10
/ \
8 2
/ \
3 5
In this, after recursive calls, ls becomes 3 and rs becomes 5. res becomes ls+rs+root.data which is 3+5+8 = 16. So res(result) will be updated to 16, and return will be max(11,13) which is 13. Now this 13 value will be used by node 10 as ls(left value).

Python Regex for parsing a tree structure defined in a given BNF like format

I have a a txt file representing a tree that is defined in the following BNF like format:
branches : '(' <value1> <value2> <n_children> <branches> ')'
| ''
Each node for the tree has two values value1 and value2 that are integers, number of children that is an integer, and branches.
an example of a subset of the file looks like:
(1796161 2205411 3
(1796288 2205425 0 )
(1811141 2205419 1
(1811652 2205480 1
(1812161 2205496 4
(1812288 2205521 1
(1812415 2205526 0 ))
(1812034 2205516 0 )
(1827651 2205510 2
(1827906 2205581 2
(1843777 2205588 2
(1844032 2205626 1
(1844159 2205632 0 ))
(1843138 2205617 0 ))
(1828288 2205591 1
(1828161 2205597 0 )))
(1827012 2205563 0 ))
(1811907 2205511 0 ))))
(1796034 2205420 0 ))
Is there a nice way to parse this data using regular expression(s) that wouldn't involve me manually reading the file character by character keeping track of all the parantheses and preserves the relationship (parent-child) ordering.

This is easiest to do with a parser that lets you write your own grammar, like pyPEG. The following creates a tree of Node objects, where each Node can have zero or more children:
import re
from pypeg2 import attr, ignore, List, maybe_some, parse
Int = re.compile(r'[0-9]+') # a positive integer
class Values(List):
'''A pair of values associated with each node in the tree.
For example, in the node
( 1 2 0 )
the values are 1 and 2 (and 0 is the number of children).
'''
grammar = 2, Int
class Node(List):
'''A node in the tree.
Attributes:
values The pair of values associated with this node
children A list of child nodes
'''
def __repr__(self):
return 'Values: ' + ', '.join(self.values)
# Grammar for Node is recursive (a Node can contain other Nodes),
# so we have to set it outside of the Node class definition.
Node.grammar = (
'(',
attr('values', Values),
ignore(Int),
attr('children', maybe_some(Node)),
')'
)
def print_tree(root, indent=0):
'''Print a tree of Nodes recursively'''
print(' ' * indent, end='')
print(root)
if len(root.children) > 0:
for child in root.children:
print_tree(child, indent + 2)
if __name__ == '__main__':
tree = '''
(1796161 2205411 3
(1796288 2205425 0 )
(1811141 2205419 1
(1811652 2205480 1
(1812161 2205496 4
(1812288 2205521 1
(1812415 2205526 0 ))
(1812034 2205516 0 )
(1827651 2205510 2
(1827906 2205581 2
(1843777 2205588 2
(1844032 2205626 1
(1844159 2205632 0 ))
(1843138 2205617 0 ))
(1828288 2205591 1
(1828161 2205597 0 )))
(1827012 2205563 0 ))
(1811907 2205511 0 ))))
(1796034 2205420 0 ))
'''
root = parse(tree, Node)
print_tree(root)
Output:
Values: 1796161, 2205411
Values: 1796288, 2205425
Values: 1811141, 2205419
Values: 1811652, 2205480
Values: 1812161, 2205496
Values: 1812288, 2205521
Values: 1812415, 2205526
Values: 1812034, 2205516
Values: 1827651, 2205510
Values: 1827906, 2205581
Values: 1843777, 2205588
Values: 1844032, 2205626
Values: 1844159, 2205632
Values: 1843138, 2205617
Values: 1828288, 2205591
Values: 1828161, 2205597
Values: 1827012, 2205563
Values: 1811907, 2205511
Values: 1796034, 2205420

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Construct a tree from a list of data - python

Related

Count Number of Good Nodes

Diameter of binary tree fails 4 out of 104 test cases

Longest increasing subsequence with binary search

Maximum Path Sum between 2 Leaf Nodes(GeeksForGeeks)

Python Regex for parsing a tree structure defined in a given BNF like format

Categories

Resources