Faster implementation, of tree height function

Faster implementation, of tree height function - python

i' m new to algorithm, and I developed such function for finding the tree height of the input sequence.
import sys, threading
class TreeHeight:
def read(self):
self.n = int(sys.stdin.readline())
self.parent = list(map(int, sys.stdin.readline().split()))
def compute_height(self):
# Replace this code with a faster implementation
maxHeight = 0
for vertex in range(self.n):
height = 0
i = vertex
while i != -1:
height += 1
i = self.parent[i]
maxHeight = max(maxHeight, height)
return maxHeight
def main():
tree = TreeHeight()
tree.read()
print(tree.compute_height())
threading.Thread(target=main).start()
That works like this:
5
4 -1 4 1 1
And output
3
Because there are 5 nodes with numbers from 0 to 4, node 0 is a child of node 4, node 1 is the root,node 2 is a child of node 4, node 3 is a child of node 1 and node 4 is a child of node1. The height of this tree is 3, because the number of vertices on the path from root 1 to leaf 2 is 3
My solution is working, but it is not optimized for deep trees. For example, program crashes for input 100000.
What is the better solution in my case?

Related

Bad Tree design, Data Structure

I tried making a Tree as a part of my Data Structures course. The code works but is extremely slow, almost double the time that is accepted for the course. I do not have experience with Data Structures and Algorithms but I need to optimize the program. If anyone has any tips, advices, criticism I would greatly appreciate it.
The tree is not necessarily a binary tree.
Here is the code:
import sys
import threading
class Node:
def __init__(self,value):
self.value = value
self.children = []
self.parent = None
def add_child(self,child):
child.parent = self
self.children.append(child)
def compute_height(n, parents):
found = False
indices = []
for i in range(n):
indices.append(i)
for i in range(len(parents)):
currentItem = parents[i]
if currentItem == -1:
root = Node(parents[i])
startingIndex = i
found = True
break
if found == False:
root = Node(parents[0])
startingIndex = 0
return recursion(startingIndex,root,indices,parents)
def recursion(index,toWhomAdd,indexes,values):
children = []
for i in range(len(values)):
if index == values[i]:
children.append(indexes[i])
newNode = Node(indexes[i])
toWhomAdd.add_child(newNode)
recursion(i, newNode, indexes, values)
return toWhomAdd
def checkHeight(node):
if node == '' or node == None or node == []:
return 0
counter = []
for i in node.children:
counter.append(checkHeight(i))
if node.children != []:
mostChildren = max(counter)
else:
mostChildren = 0
return(1 + mostChildren)
def main():
n = int(int(input()))
parents = list(map(int, input().split()))
root = compute_height(n, parents)
print(checkHeight(root))
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
threading.Thread(target=main).start()
Edit:
For this input(first number being number of nodes and other numbers the node's values)
5
4 -1 4 1 1
We expect this output(height of the tree)
3
Another example:
Input:
5
-1 0 4 0 3
Output:
4

It looks like the value that is given for a node, is a reference by index of another node (its parent). This is nowhere stated in the question, but if that assumption is right, you don't really need to create the tree with Node instances. Just read the input into a list (which you already do), and you actually have the tree encoded in it.
So for example, the list [4, -1, 4, 1, 1] represents this tree, where the labels are the indices in this list:
1
/ \
4 3
/ \
0 2
The height of this tree — according to the definition given in Wikipedia — would be 2. But apparently the expected result is 3, which is the number of nodes (not edges) on the longest path from the root to a leaf, or — otherwise put — the number of levels in the tree.
The idea to use recursion is correct, but you can do it bottom up (starting at any node), getting the result of the parent recursively, and adding one to 1. Use the principle of dynamic programming by storing the result for each node in a separate list, which I called levels:
def get_num_levels(parents):
levels = [0] * len(parents)
def recur(node):
if levels[node] == 0: # this node's level hasn't been determined yet
parent = parents[node]
levels[node] = 1 if parent == -1 else recur(parent) + 1
return levels[node]
for node in range(len(parents)):
recur(node)
return max(levels)
And the main code could be as you had it:
def main():
n = int(int(input()))
parents = list(map(int, input().split()))
print(get_num_levels(parents))

Is there a bug in my custom tree implementation for finding shortest path to leaf node?

I'm trying to implement a custom tree, for a problem that takes the input as the number of nodes and the edges/connections between each node and gives the output as the length of the shortest path to the leaf node from the root.
For the smaller test cases, I've tried it gave correct output but for some reason, it gives runtime error on hackerank for all the other cases, is there a bug? I tried to find it but I coudnt
Code:
class tree:
class Node:
def __init__(self,parent,children=[]):
self.Parent=parent # p capital
self.children=children
def __init__(self,n,connections=[]):
self.root = self.Node(None,[])
self.n = n # tree.n
self.connections = connections # list of all edges
self.Positions = {} # initializing all positions in dictionary , for all nodes
self.Positions[1]=self.root # 1 is always root
if self.n>1:
for i in range(2,n+1):
self.Positions[i]=self.Node(None,[])
def joinNode(self,parent,node): # number of the two nodes , like 1-3 etc..
self.Positions[parent].children.append(self.Positions[node])
self.Positions[node].Parent=self.Positions[parent]
def joinAll(self):
for p,n in self.connections:
self.joinNode(p,n)
def Depth(self,p):
"""Return the number of levels separating Position p from the root."""
if p==self.root:
return 0
else:
return 1 + self.Depth(p.Parent)
def minDepth(self):
return min(self.Depth(self.Positions[p]) for p in self.leafnodes())
def leafnodes(self):
return list(set(x[0] for x in self.connections)^set(i for i in range(1,n+1)))
if __name__=='__main__':
n=int(input())
if n==1:
print(0)
else:
connections=[]
for i in range(1,n):
array=input()
edge=list(map(int,array.split()))
connections.append(edge)
mytree=tree(n,connections)
mytree.joinAll()
print(mytree.minDepth())
print(mytree.connections)
Sample input
5
1 4
4 2
2 3
3 5
sample output
4

Can someone explain how the root function works in this quick union implementation?

I am trying to implement Quick Union algorithm where ids of the nodes are their root nodes.
id is implemented like this:
self.id = []
for i in range(N)
self.id.append(i)
Which stores the index of the nodes 0 to N - 1
This is the code that I can't understand:
def root(self, i):
while(i != self.id[i]):
i = self.id[i]
return i
What is happening? what is "i"? how does it point to the root node?
I understand the union method:
def union(self, p, q):
i = self.root(p)
j = self.root(q)
self.id[i] = j
Which means something like: change the id, where the value is root of p, to root of q, right?

id[i] points to the parent of i-th element. If this element is not a root (root id points to itself), then code moves to the parent, then to the parent of parent and so on until reaches a root.
Example: we want to find a root of 0-th element
i 0 1 2 3
\/ ----->
id 2 1 1 3
id[0] is not equal to 0, so this element is leaf, we make a step to it's parent 2
i 0 1 2 3
<---\/
id 2 1 1 3
id[2] is not equal to 2, so this element is leaf, we make a step to it's parent 1
i 0 1 2 3
\/
id 2 1 1 3
id[1] is equal to 1, so we have found root fo the 0-th element
Trees stored in that array:
1 3
/
2
\
0

Kosaraju's Algorithm for SCCs, non-recursive

I have an implementation of Kosaraju's algorithm for finding SCCs in Python. The code below contains a recursive (fine on the small test cases) version and a non-recursive one (which I ultimately need because of the size of the real dataset).
I have run both the recursive and non-recursive version on a few test datasets and get the correct answer. However running it on the much larger dataset that I ultimately need to use, produces the wrong result. Going through the real data is not really an option because it contains nearly a million nodes.
My problem is that I don't know how to proceed from here. My suspision is that I either forgot a certain case of graph constellation in my test cases, or that I have a more fundamental misunderstanding about how this algo is supposed to work.
#!/usr/bin/env python3
import heapq
class Node():
"""A class to represent nodes in a DirectedGraph. It has attributes for
performing DFS."""
def __init__(self, i):
self.id = i
self.edges = []
self.rev_edges = []
self.explored = False
self.fin_time = 0
self.leader = 0
def add_edge(self, edge_id):
self.edges.append(edge_id)
def add_rev_edge(self, edge_id):
self.rev_edges.append(edge_id)
def mark_explored(self):
self.explored = True
def set_leader(self, leader_id):
self.leader = leader_id
def set_fin_time(self, fin_time):
self.fin_time = fin_time
class DirectedGraph():
"""A class to represent directed graphs via the adjacency list approach.
Each dictionary entry is a Node."""
def __init__(self, length, list_of_edges):
self.nodes = {}
self.nodes_by_fin_time = {}
self.length = length
self.fin_time = 1 # counter for the finishing time
self.leader_count = 0 # counter for the size of leader nodes
self.scc_heapq = [] # heapq to store the ssc by size
self.sccs_computed = False
for n in range(1, length + 1):
self.nodes[str(n)] = Node(str(n))
for n in list_of_edges:
ns = n[0].split(' ')
self.nodes[ns[0]].add_edge(ns[1])
self.nodes[ns[1]].add_rev_edge(ns[0])
def n_largest_sccs(self, n):
if not self.sccs_computed:
self.compute_sccs()
return heapq.nlargest(n, self.scc_heapq)
def compute_sccs(self):
"""First compute the finishing times and the resulting order of nodes
via a DFS loop. Second use that new order to compute the SCCs and order
them by their size."""
# Go through the given graph in reverse order, computing the finishing
# times of each node, and create a second graph that uses the finishing
# times as the IDs.
i = self.length
while i > 0:
node = self.nodes[str(i)]
if not node.explored:
self.dfs_fin_times(str(i))
i -= 1
# Populate the edges of the nodes_by_fin_time
for n in self.nodes.values():
for e in n.edges:
e_head_fin_time = self.nodes[e].fin_time
self.nodes_by_fin_time[n.fin_time].add_edge(e_head_fin_time)
# Use the nodes ordered by finishing times to calculate the SCCs.
i = self.length
while i > 0:
self.leader_count = 0
node = self.nodes_by_fin_time[str(i)]
if not node.explored:
self.dfs_leaders(str(i))
heapq.heappush(self.scc_heapq, (self.leader_count, node.id))
i -= 1
self.sccs_computed = True
def dfs_fin_times(self, start_node_id):
stack = [self.nodes[start_node_id]]
# Perform depth-first search along the reversed edges of a directed
# graph. While doing this populate the finishing times of the nodes
# and create a new graph from those nodes that uses the finishing times
# for indexing instead of the original IDs.
while len(stack) > 0:
curr_node = stack[-1]
explored_rev_edges = 0
curr_node.mark_explored()
for e in curr_node.rev_edges:
rev_edge_head = self.nodes[e]
# If the head of the rev_edge has already been explored, ignore
if rev_edge_head.explored:
explored_rev_edges += 1
continue
else:
stack.append(rev_edge_head)
# If the current node has no valid, unexplored outgoing reverse
# edges, pop it from the stack, populate the fin time, and add it
# to the new graph.
if len(curr_node.rev_edges) - explored_rev_edges == 0:
sink_node = stack.pop()
# The fin time is 0 if that node has not received a fin time.
# Prevents dealing with the same node twice here.
if sink_node and sink_node.fin_time == 0:
sink_node.set_fin_time(str(self.fin_time))
self.nodes_by_fin_time[str(self.fin_time)] = \
Node(str(self.fin_time))
self.fin_time += 1
def dfs_leaders(self, start_node_id):
stack = [self.nodes_by_fin_time[start_node_id]]
while len(stack) > 0:
curr_node = stack.pop()
curr_node.mark_explored()
self.leader_count += 1
for e in curr_node.edges:
if not self.nodes_by_fin_time[e].explored:
stack.append(self.nodes_by_fin_time[e])
###### Recursive verions below ###################################
def dfs_fin_times_rec(self, start_node_id):
curr_node = self.nodes[start_node_id]
curr_node.mark_explored()
for e in curr_node.rev_edges:
if not self.nodes[e].explored:
self.dfs_fin_times_rec(e)
curr_node.set_fin_time(str(self.fin_time))
self.nodes_by_fin_time[str(self.fin_time)] = Node(str(self.fin_time))
self.fin_time += 1
def dfs_leaders_rec(self, start_node_id):
curr_node = self.nodes_by_fin_time[start_node_id]
curr_node.mark_explored()
for e in curr_node.edges:
if not self.nodes_by_fin_time[e].explored:
self.dfs_leaders_rec(e)
self.leader_count += 1
To run:
#!/usr/bin/env python3
import utils
from graphs import scc_computation
# data = utils.load_tab_delimited_file('data/SCC.txt')
data = utils.load_tab_delimited_file('data/SCC_5.txt')
# g = scc_computation.DirectedGraph(875714, data)
g = scc_computation.DirectedGraph(11, data)
g.compute_sccs()
# for e, v in g.nodes.items():
# print(e, v.fin_time)
# for e, v in g.nodes_by_fin_time.items():
# print(e, v.edges)
print(g.n_largest_sccs(20))
Most complex test case (SCC_5.txt):
1 5
1 4
2 3
2 11
2 6
3 7
4 2
4 8
4 10
5 7
5 5
5 3
6 8
6 11
7 9
8 2
8 8
9 3
10 1
11 9
11 6
Drawing of that test case: https://imgur.com/a/LA3ObpN
This produces 4 SCCs:
Bottom: Size 4, nodes 2, 8, 6, 11
Left: Size 3, nodes 1, 10, 4
Top: Size 1, node 5
Right: Size 3, nodes 7, 3, 9

Ok, I figured out the missing cases. The algorithm wasn't performing correctly on very strongly connected graphs and duplicated edges. Here is an adjusted version of the test case I posted above with a duplicated edge and more edges to turn the whole graph into one big SCC.
1 5
1 4
2 3
2 6
2 11
3 2
3 7
4 2
4 8
4 10
5 1
5 3
5 5
5 7
6 8
7 9
8 2
8 2
8 4
8 8
9 3
10 1
11 9
11 6

Non-binary Tree Height (Optimization)

Introduction
So I'm doing a course on edX and have been working on this practice assignment for
the better part of 3 hours, yet I still can't find a way to implement this method
without it taking to long and timing out the automatic grader.
I've tried 3 different methods all of which did the same thing.
Including 2 recursive approaches and 1 non-recursive approach (my latest).
The problem I think I'm having with my code is that the method to find children just takes way to long because it has to iterate over the entire list of nodes.
Input and output format
Input includes N on the first line which is the size of the list which is given on line 2.
Example:
5
-1 0 4 0 3
To build a tree from this:
Each of the values in the array are a pointer to another index in the array such that in the example above 0 is a child node of -1 (index 0). Since -1 points to no other index it is the root node.
The tree in the example has the root node -1, which has two children 0 (index 1) and 0 (index 3). The 0 with index 1 has no children and the 0 with index 3 has 1 child: 3 (index 4) which in turn has only one child which is 4 (index 2).
The output resulting from the above input is 4. This is because the max height of the branch which included -1 (the root node), 0, 3, and 4 was of height 4 compared to the height of the other branch (-1, and 0) which was height 2.
If you need more elaborate explanation then I can give another example in the comments!
The output is the max height of the tree. The size of the input goes up to 100,000 which was where I was having trouble as it has to do that it in exactly 3 seconds or under.
My code
Here's my latest non-recursive method which I think is the fastest I've made (still not fast enough). I used the starter from the website which I will also include beneath my code. Anyways, thanks for the help!
My code:
# python3
import sys, threading
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
def height(node, parent_list):
h = 0
while not node == -1:
h = h + 1
node = parent_list[node]
return h + 1
def search_bottom_nodes(parent_list):
bottom_nodes = []
for index, value in enumerate(parent_list):
children = [i for i, x in enumerate(parent_list) if x == index]
if len(children) == 0:
bottom_nodes.append(value)
return bottom_nodes
class TreeHeight:
def read(self):
self.n = int(sys.stdin.readline())
self.parent = list(map(int, sys.stdin.readline().split()))
def compute_height(self):
# Replace this code with a faster implementation
bottom_nodes = search_bottom_nodes(self.parent)
h = 0
for index, value in enumerate(bottom_nodes):
h = max(height(value, self.parent), h)
return h
def main():
tree = TreeHeight()
tree.read()
print(tree.compute_height())
threading.Thread(target=main).start()
edX starter:
# python3
import sys, threading
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
class TreeHeight:
def read(self):
self.n = int(sys.stdin.readline())
self.parent = list(map(int, sys.stdin.readline().split()))
def compute_height(self):
# Replace this code with a faster implementation
maxHeight = 0
for vertex in range(self.n):
height = 0
i = vertex
while i != -1:
height += 1
i = self.parent[i]
maxHeight = max(maxHeight, height);
return maxHeight;
def main():
tree = TreeHeight()
tree.read()
print(tree.compute_height())
threading.Thread(target=main).start()

Simply cache the previously computed heights of the nodes you've traversed through in a dict and reuse them when they are referenced as parents.
import sys, threading
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
class TreeHeight:
def height(self, node):
if node == -1:
return 0
if self.parent[node] in self.heights:
self.heights[node] = self.heights[self.parent[node]] + 1
else:
self.heights[node] = self.height(self.parent[node]) + 1
return self.heights[node]
def read(self):
self.n = int(sys.stdin.readline())
self.parent = list(map(int, sys.stdin.readline().split()))
self.heights = {}
def compute_height(self):
maxHeight = 0
for vertex in range(self.n):
maxHeight = max(maxHeight, self.height(vertex))
return maxHeight;
def main():
tree = TreeHeight()
tree.read()
print(tree.compute_height())
threading.Thread(target=main).start()
Given the same input from your question, this (and your original code) outputs:
4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Faster implementation, of tree height function - python

Related

Bad Tree design, Data Structure

Is there a bug in my custom tree implementation for finding shortest path to leaf node?

Can someone explain how the root function works in this quick union implementation?

Kosaraju's Algorithm for SCCs, non-recursive

Non-binary Tree Height (Optimization)

Categories

Resources