Counting all nodes of all paths from root to leaves - python

If given a tree with nodes with integers: 1 ~ 10, and branching factor of 3 for all nodes, how can I write a function that traverses through the tree counting from root to leaves for EVERY paths
So for this example, let's say it needs to return this:
{1: 1, 2: 5}
I've tried this helper function:
def tree_lengths(t):
temp = []
for i in t.children:
temp.append(1)
temp += [e + 1 for e in tree_lengths(i)]
return temp
There are too many errors with this code. For one, it leaves behind imprints of every visited node in the traversal in the returning list - so it's difficult to figure out which ones are the values that I need from that list. For another, if the tree is large, it does not leave behind imprints of the root and earlier nodes in the path prior to reaching the line "for i in t.children". It needs to first: duplicate all paths from root leaves; second: return a list exclusively for the final number of each path count.
Please help! This is so difficult.

I'm not sure exactly what you are trying to do, but you'll likely need to define a recursive function that takes a node (the head of a tree or subtree) and an integer (the number of children you've traversed so far), and maybe a list of each visited node so far. If the node has no children, you've reached a leaf and you can print out whatever info you need. Otherwise, for each child, call this recursive function again with new parameters (+1 to count, the child node as head node, etc).

Related

Prune sklearn decision tree to ensure monotony

I need to prune a sklearn decision tree classifier in such a way that the indicated probability (the value on the right in the image) is monotonous increasing. For example, if you program a basic tree in python, you have:
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.tree._tree import TREE_LEAF
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data[:, 0].reshape(-1,1), np.where(iris.target==0,0,1)
tree = DecisionTreeClassifier(max_depth=3, random_state=123)
tree.fit(X,y)
percentages = tree.tree_.value[:,0,1]/np.sum(tree.tree_.value.reshape(-1,2), axis=1)
Now the leaves that do not follow the monotony, as indicated must be eliminated.
remaining as follows:
Although the indicated example does not show it, a rule to consider is that if the leaves have different parents, then the leave with the largest amount of data is kept. To deal with this I have been trying to do a brute force algorithm, but it only performs the first iteration and i need apply the algorithm for bigger trees. The answer is probably using recursion, but with the sklearn tree structure, I don't really know how to do it.
Performing the following sustains the pruning requirements you suggested: A traversal on the tree, identification of non-monotonic leaves, each time removing the non-monotonic leaves of the parent node with least members and repeating this until the monotonicity between leaves is sustained. Even though this each-time-remove-one-node approach adds time complexity, the trees usually have limited depth. The conference paper "Pruning for Monotone Classification Trees" helped me understand the monotonicity in trees. Then I have derived this approach to sustain your scenario.
Since the need is to identify non-monotonic leaves from left to right, the first step is to post-order traverse the tree. If you are not familiar with tree traversals, this is completely normal. I suggest understanding the mechanics of it via studying from the Internet sources before understanding the function. You could run the traversal function to see its findings. Practical output will help you understand.
#We will define a traversal algorithm which will scan the nodes and leaves from left to right
#The traversal is recursive, we declare global lists to collect values from each recursion
traversal=[] #List to collect traversal steps
parents=[]#List to collect the parents of the collected nodes or leaves
is_leaves=[] #List to collect if the collected traversal item are leaves or not
# A function to do postorder tree traversal
def postOrderTraversal(tree,root,parent):
if root!=-1:
#Recursion on left child
postOrderTraversal(tree,tree.tree_.children_left[root],root)
#Recursion on right child
postOrderTraversal(tree,tree.tree_.children_right[root],root)
traversal.append(root) #Collect the name of node or leaf
parents.append(parent) #Collect the parent of the collected node or leaf
is_leaves.append(is_leaf(tree,root)) #Collect if the collected object is leaf
Above, we call the left and right children of nodes with recursion, this is via the provided methods of the decision tree structure. The used is_leaf() is a helper function as below.
def is_leaf(tree,node):
if tree.tree_.children_left[node]==-1:
return True
else:
return False
The decision tree nodes always have two leaves. Therefore checking only the existence of left child yields the information whether object in question is a node or leaf. The tree returns -1 if the child asked does not exist.
As you have defined the non-monotonicity condition, the ratios of classes of 1 within leaves are required. I have called this positive_ratio() (This is what you called "percentages".)
def positive_ratio(tree): #The frequency of 1 values of leaves in binary classification tree:
#Number of samples with value 1 in leaves/total number of samples in nodes/leaves
return tree.tree_.value[:,0,1]/np.sum(tree.tree_.value.reshape(-1,2), axis=1)
The final helper function below returns the tree index of the node (1,2,3 etc.) with the minimum number of samples. This function requires the list of nodes of which leaves exhibit non-monotonic behavior. We call n_node_samples property of tree structure within this helper function. The found node is the node to remove its leaves.
def min_samples_node(tree, nodes): #Finds the node with the minimum number of samples among the provided list
#Make a dictionary of number of samples of given nodes, and their index in the nodes list
samples_dict={tree.tree_.n_node_samples[node]:i for i,node in enumerate(nodes)}
min_samples=min(samples_dict.keys()) #The minimum number of samples among the samples of nodes
i_min=samples_dict[min_samples] #Index of the node with minimum number of samples
return nodes[i_min] #The number of node with the minimum number of samples
After defining the helper functions, the wrapper function that performs the pruning iterates until the monotonicity of the tree is sustained. Desired monotonic tree is returned.
def prune_nonmonotonic(tree): #Prune non-monotonic nodes of a binary classification tree
while True: #Repeat until monotonicity is sustained
#Clear the traversal lists for a new scan
traversal.clear()
parents.clear()
is_leaves.clear()
#Do a post-order traversal of tree so that the leaves will be returned in order from left to right
postOrderTraversal(tree,0,None)
#Filter the traversal outputs by keeping only leaves and leaving out the nodes
leaves=[traversal[i] for i,leaf in enumerate(is_leaves) if leaf == True]
leaves_parents=[parents[i] for i,leaf in enumerate(is_leaves) if leaf == True]
pos_ratio=positive_ratio(tree) #List of positive samples ratio of the nodes of binary classification tree
leaves_pos_ratio=[pos_ratio[i] for i in leaves] #List of positive samples ratio of the traversed leaves
#Detect the non-monotonic pairs by comparing the leaves side-by-side
nonmonotone_pairs=[[leaves[i],leaves[i+1]] for i,ratio in enumerate(leaves_pos_ratio[:-1]) if (ratio>=leaves_pos_ratio[i+1])]
#Make a flattened and unique list of leaves out of pairs
nonmonotone_leaves=[]
for pair in nonmonotone_pairs:
for leaf in pair:
if leaf not in nonmonotone_leaves:
nonmonotone_leaves.append(leaf)
if len(nonmonotone_leaves)==0: #If all leaves show monotonic properties, then break
break
#List the parent nodes of the non-monotonic leaves
nonmonotone_leaves_parents=[leaves_parents[i] for i in [leaves.index(leave) for leave in nonmonotone_leaves]]
node_min=min_samples_node(tree, nonmonotone_leaves_parents) #The node with minimum number of samples
#Prune the tree by removing the children of the detected non-monotonic and lowest number of samples node
tree.tree_.children_left[node_min]=-1
tree.tree_.children_right[node_min]=-1
return tree
The all containing "while" loop continues until the iteration where traversed leaves exhibit non-monotonicity no more. The min_samples_node() identifies the node which contains non-monotonic leaves, and it is the lowest membered among alike. When its left and right children are replaced with the value "-1", the tree is pruned and the next "while" iteration will yield a completely different tree traversal to identify and remove the remaining non-monotonicity.
The below images show the unpruned and pruned trees, respectively.

Maximum Binary Tree (Leetcode) - Optimal Solution Explanation?

I was going through the Maximum Binary Tree leetcode problem. The TL;DR is that you have an array, such as this one:
[3,2,1,6,0,5]
You're supposed to take the maximum element and make that the root of your tree. Then split the array into the part to the left of that element and the part to its right, and these are used to recursively create the left and right subtrees in the same way, respectively.
LeetCode claims that the optimal solution (shown in the "Solution" tab) uses a linear search for the maximum value of the sub-array in each recursive step. This is O(n^2) in the worst case. This is the solution I came up with, and it's simple enough.
However, I was looking through other submissions and found a linear time solution, but I've struggled to understand how it works! It looks something like this:
def constructMaximumBinaryTree(nums):
nodes=[]
for num in nums:
node = TreeNode(num)
while nodes and num>nodes[-1].val:
node.left = nodes.pop()
if nodes:
nodes[-1].right = node
nodes.append(node)
return nodes[0]
I've analysed this function and in aggregate, this appears to be linear time (O(n)), since each unique node is added to and popped from the nodes array at most once. I've tried running it with different example inputs, but I'm struggling to connect the dots and wrap my head around how this works. Can someone please explain it to me?
One way to understand the algorithm is to consider the loop invariants. In this case, the array of nodes always satisfies the condition that before and after each execution of the for-loop, either:
nodes is empty and a max binary tree does not exist (for example, if the input nums was empty)
the first item in nodes is the max binary tree based on the data processed so far from the input nums
The while-loop ensures that the current max binary tree is the first item in the nodes array, since otherwise, it would have been popped and added as a left subtree.
During each iteration of the for-loop, the check:
if nodes:
nodes[-1].right = node
adds the current node as a right subtree to the last item in the nodes array. And when this happens, the current node is less than the last node in the nodes array (since each input integer is defined to be unique). And since the current node is less than the last node in the array, the last node acts as a partition point whose value is greater than the current item, which is why the current node is added as a right subtree.
When there are multiple items in the nodes array, each item is a subtree of the item to its left.
Running Time
For the running time, let n be the length of the input nums. There are n executions of the for-loop. If the input data were sorted in descending order, but with the max input value at the end of the input (such as: 4, 3, 2, 1, 5), then the inner while-loop would be skipped during each iteration until the last for-loop iteration. During the last for-loop iteration, the while loop would run n - 1 times, for a total running time of n + (n - 1) => 2n - 1 => O(n).

Balanced binary tree python

# stack_depth is initialised to 0
def find_in_tree(node, find_condition, stack_depth):
assert (stack_depth < max_stack_depth), 'Deeper than max depth'
stack_depth += 1
result = []
if find_condition(node):
result += [node]
for child_node in node.children:
result.extend(find_in_tree(child_node, find_condition, stack_depth))
return result
I need help understanding this piece of code. The question i want to answer is
The Python function above searches the contents of a balanced binary tree.
If an upper limit of 1,000,000 nodes is assumed what should the max_stack_depth constant be set to?
From what I understand, this is a trick question. If you think about it, stack_depth is incremented every time the find_in_tree() function is called in the recursion. And we are trying to find a particular node in the tree.So the worst case would be when we have to search through all the nodes in the tree before we find it. Hence, max_stack_depth should 1,000,000?
If you look at when stack_depth increments then it looks like we will increment every time we access a node. And in our case we are accessing every single node every time. Because there is no return condition when stops the algorithm when the correct node is found.
Can someone please try to explain me their thought process.
Instead of multiplying the number of nodes on each layer, you have to add them. For example, the number of nodes in the first four layers is 1+2+4+8=15, not 1*2*4*8=64.
# 1
# # + 2
# # # # + 4
# # # # # # # # + 8 = 15
In general, the number of nodes in the first n layers is 2**(n+1)-1. You can use logarithms to get the correct power and get the floor of that number. If you want fewer that that number, you would also have to subtract one from the power.
>>> math.floor(math.log(1000000, 2))
19
>>> sum(2**i for i in range(1, 20))
1048574
Concerning your edit: Yes, stack_depth is incremented with each node, but you are incrementing a local variable. The increment will carry to the child nodes (passed as a parameter) but not to the siblings, i.e. all the nodes at level n will be called with stack_depth == n-1 (assuming it started as 0 on the first level). Thus, max_stack_depth should be 19 (or 20 if it starts with 1) to visit the ~1,000,000 nodes in the first 19 levels of the tree.

I want to add values while a recursive loop unfolds

This is a bottom up approach to check if the tree is an AVL tree or not. So how this code works is:
Suppose this is a tree :
8
3 10
2
1
The leaf node is checked that it is a leaf node(here 1). It then unfolds one recursion when the node with data 2 is the current value. The value of cl = 1, while it compares the right tree. The right branch of 2 is empty i.e does not have any children so the avl_compare will have (1, 0) which is allowed.
After this i want to add one value to cl so that when the node with data 3 is the current value, the value of cl = 2. avl_check is an assignment question. I have done this on my own but i need some help here to play with recursive functions.
def avl_check(self):
cl = cr = 0
if(self.left):
self.left.avl_check()
cl+=1
if(self.right):
self.right.avl_check()
cr += 1
if(not self.avl_compare(cl,cr)):
print("here")
Your immediate problem is that you don't seem to understand local and global variables. cl and cr are local variables; with the given control flow, the only values they can ever have are 0 and 1. Remember that each instance of the routine gets a new set of local variables: you set them to 0, perhaps increment to 1, and then you return. This does not affect the values of the variables in other instances of the function.
A deeper problem is that you haven't thought this through for larger trees. Assume that you do learn to use global variables and correct these increments. Take your current tree, insert nodes 4, 9, 10, and 11 (nicely balanced). Walk through your algorithm, tracing the values of cl and cr. By the time you get to node 10, cl is disturbingly more than the tree depth -- I think this is a fatal error in your logic.
Think through this again: a recursive routine should not have global variables, except perhaps for the data store of a dynamic programming implementation (which does not apply here). The function should check for the base case and return something trivial (such as 0 or 1). Otherwise, the function should reduce the problem one simple step and recur; when the recursion returns, the function does something simple with the result and returns the new result to its parent.
Your task is relatively simple:
Find the depths of the two subtrees.
If their difference > 1, return False
else return True
You should already know how to check the depth of a tree. Implement this first. After that, make your implementation more intelligent: checking the depth of a subtree should also check its balance at each step. That will be your final solution.

List implemented using an inorder binary tree

For the new computer science assignment we are to implement a list/array using an inorder binary tree. I would just like a suggestion rather than a solution.
The idea is having a binary tree that has its nodes accessible via indexes, e.g.
t = ListTree()
t.insert(2,0) # 1st argument is the value, 2nd the index to insert at
t.get(0) # returns 2
The Node class that the values are stored in is not modifiable but has a property size which contains the total number of children below, along with left, right and value that point to children and store the value accordingly.
My chief problem at the moment keeping track of the index - as we're not allowed to store the index of the node in the node itself I must rely on traversing to track it. As I always start with the left node when traversing I haven't yet thought of a way to recursively figure out what index we are currently at.
Any suggestions would be welcome.
Thanks!
You really wouldn't want to store it on the node itself, because then the index would have to be updated on inserts for all nodes with index less than insert index. I think the real question is how to do an in-order traversal. Try having your recursive function return the number of nodes to its left.
I don't think you want to store the index, rather just the size of each subtree. For insance, if you wanted to look up the 10th element in the list, and the left and right subrees had 7 elements each, you would know that the root is the eight element (since it's in-order binary), and the first element of the right subree is 9th. armed with this knowledge, you would recurse into the right subree, looking for the 2nd element.
HTH
Well, a node in a binary tree cannot have a value and an index. They can have multiple pieces of data but the tree can only be keyed/built on one.
Maybe your assignment wants you to use the index value as the key to the tree and attach the value to the node for quick retrieval of the value given an index.
Does the tree have to be balanced? Does the algorithm need to be efficient?
If not, then the simplest thing to do is make a tree in which all the left children are null, i.e., a tree that devolves to a linked list.
To insert, recursively look go to the right child, and then update the size of the node on the way back out. Something like (pseudocode)
function insert(node, value, index, depth)
if depth < index
create the rightchild if necessary
insert( rightchild, value, index, depth + 1)
if depth == size
node.value = value
node.size = rightchild.size + 1
After you have this working, you can modify it to be more balanced. When increasing the length of the list, add nodes to the left or right child nodes depending on which currently has the least, and update the size on the way out of the recursion.
To generalize to be more efficient, you need to work on the index in terms of its binary representation.
For example, and empty list has one node, without children with value null and size 0.
Say you want to insert "hello" at index 1034. Then you want to end up with a tree whose root has two children, with sizes 1024 and 10. The left child has no actual children, but the right node has a right child of its own of size 2. (The left of size 8 is implied.) That node in turn, has one right child of size 0, with the value "hello". (This list has a 1-based index, but a 0-based index is similar.)
So you need to recursively break down the index into its binary parts, and add nodes as necessary. When searching the list, you need to take care when traversing a node with null children
A very easy solution is to do GetFirst() to get the first node of the tree (this is simply finding the leftmost node of the tree). If your index N is 0, return the first node. Otherwise, call GetNodeNext() N times to get the appropriate node. This isn't super efficient though since accessing a node by index takes O(N Lg N) time.
Node *Tree::GetFirstNode()
{
Node *pN,*child;
pN=GetRootNode();
while(NOTNIL(child=GetNodeLeft(pN)))
{
pN=child;
}
return(pN);
}
Node *Tree::GetNodeNext(Node *pNode)
{
Node *temp;
temp=GetNodeRight(pNode);
if(NOTNIL(temp))
{
pNode=temp;
temp=GetNodeLeft(pNode);
while(NOTNIL(temp))
{
pNode=temp;
temp=GetNodeLeft(pNode);
}
return(pNode);
}
else
{
temp=GetNodeParent(pNode);
while( (NOTNIL(temp)) && (GetNodeRight(temp)==pNode) )
{
pNode=temp;
temp=GetNodeParent(pNode);
}
return(temp);
}
}

Categories

Resources