recursion in traversing a binary tree

recursion in traversing a binary tree - python

I am trying to do the leetcode problem problem #113, which is "Given a binary tree and a sum, find all root-to-leaf paths where each path's sum equals the given sum"
My problem is why does my code #1 shown below prints the values of all nodes in the tree? How does the recursion stack work in code #1 as opposed to how the recursion stack work in the code #2, which is a correct solution?
Thank you so much for helping me!
#code #1
class Solution:
def pathSum (self, root, sum):
self.res = []
self.dfs(root, sum, [])
return self.res
def dfs(self, root, sum, path):
if not root:
return
sum -= root.val
path += [root.val]
if not root.left and not root.right and sum == 0:
self.res.append(path)
self.dfs(root.left, sum, path)
self.dfs(root.right, sum, path)
#code #2
class Solution:
def pathSum (self, root, sum):
self.res = []
self.dfs (root, sum, [])
return self.res
def dfs (self, root, sum, path):
if not root:
return
sum -= root.val
if not root.left and not root.right and sum == 0:
self.res.append(path + [root.val])
self.dfs(root.left, sum, path+[root.val])
self.dfs(root.right, sum, path+[root.val])

Well, the comments above are well deserved, some examples would help here. For instance, you talk about printing, but there are no print statements here, so how are you driving this and how are you using this?
Having said that, I suspect the problem will trace back to the fact that Code#1 changes the value of path, while code #2 does not. Now if path was a number, this wouldn't matter because it would be passed in by value. However, you passed in [] originally (an empty list) which is an object . . . so it is passed in by reference. As a result, as code #1 proceeds you keep changing the node (path) above you, but in Code #2, the passed in paths never change.

Related

Traverse Tree and finding all possible path to target

Original Question:
Given the root of a binary tree and an integer targetSum, return the number of paths where the sum of the values along the path equals targetSum. The path does not need to start or end at the root or a leaf, but it must go downwards (i.e., traveling only from parent nodes to child nodes).
I have posted my implementation below
Fails on test case: [1,null,2,null,3,null,4,null,5] , tagetSum = 3
it return 3 instead of 2
After some debugging I found out that node with value 3 is recorded twice, does anyone know why?
# Definition for a binary tree node.
# class TreeNode:
# def __init__(self, val=0, left=None, right=None):
# self.val = val
# self.left = left
# self.right = right
class Solution:
def pathSum(self, root: Optional[TreeNode], targetSum: int) -> int:
self.res = 0
def dfs(node, sum):
if not node:
return
sum += node.val
if sum == targetSum:
self.res += 1
dfs(node.left, 0)
dfs(node.right, 0)
dfs(node.left, sum)
dfs(node.right, sum)
dfs(root, 0)
return self.res
Asked chatGPT and got the right solution...
class Solution:
def pathSum(self, root: Optional[TreeNode], targetSum: int) -> int:
self.res = 0
def dfs(node, sum):
if not node:
return
sum += node.val
if sum == targetSum:
self.res += 1
dfs(node.left, sum)
dfs(node.right, sum)
def traverse(node):
if not node:
return
dfs(node, 0)
traverse(node.left)
traverse(node.right)
traverse(root)
return self.res
Can someone explain the difference between the 2 implementaiton, I feel like they are doing the same thing.

As mentioned in comments, in your version two recursive calls are made on node.left. They have different sums (well, if sum is not 0!), but then both of those executions of dfs will have their own recursive call of dfs(node.left, 0) which is done to start new paths at that node. But this means dfs is called multiple times with the same arguments without any reason for a double count.
This does not happen in the correct solution. There the call with 0 is only made in traverse. Those calls determine the starting node of the paths that will be inspected. dfs takes care of trying all possible paths from the same starting node. In your code the two concepts are mixed and lead to duplicates.
I add here an alternative implementation that keeps track of the root-to-node sums in a list and then checks if the current sum (from the root) differs with exactly targetSum from a previous sum on the path.
class Solution:
def pathSum(self, root: Optional[TreeNode], targetSum: int) -> int:
def dfs(node, total, sums):
if not node:
return 0
total += node.val
count = sum((total - start == targetSum) for start in sums)
sums.append(total)
count += dfs(node.left, total, sums) + dfs(node.right, total, sums)
sums.pop()
return count
return dfs(root, 0, [0])

Construct Binary tree from Preorder and inorder Optimized solution not working

For Leetcode 105 I've got the O(N^2) solution working and I'm attempting to optimize to an O(N) solution but the output tree is the wrong structure.
Given two integer arrays preorder and inorder where preorder is the preorder traversal of a binary tree and inorder is the inorder traversal of the same tree, construct and return the binary tree.
My original O(N^2) solution.
def buildTree(self, preorder: List[int], inorder: List[int]) -> Optional[TreeNode]:
preorder_queue = deque(preorder)
def helper(preorder, inorder):
if not preorder or not inorder:
return None
root = TreeNode(preorder.popleft())
root_index = inorder.index(root.val)
root.left = helper(preorder, inorder[:root_index])
root.right = helper(preorder, inorder[root_index+1:])
return root
return helper(preorder_queue, inorder)
The slow operation is here is the inorder.index(root.val) since an index operation in a list is O(N). I'm trying to optimize this by storing a map of Node values -> indexes in my solution below.
def buildTree(self, preorder: List[int], inorder: List[int]) -> Optional[TreeNode]:
preorder_queue = deque(preorder)
cache = {}
for i, node_val in enumerate(inorder):
cache[node_val] = i
def helper(preorder, inorder):
if not preorder or not inorder:
return None
root = TreeNode(preorder.popleft())
root_index = cache[root.val]
root.left = helper(preorder, inorder[:root_index])
root.right = helper(preorder, inorder[root_index+1:])
return root
return helper(preorder_queue, inorder)
However for the input preorder = [3,9,20,15,7] and inorder = [9,3,15,20,7] my output is giving me [3,9,20,null,null,15,null,7] instead of the correct tree structure [3,9,20,null,null,15,7]. Can anyone explain what's wrong with the cache solution?

The reason for the different output is this:
In the first version the value of root_index is taken from the local name inorder, which in general is just a slice of the original inorder (see how the recursive call is made with such a slice as argument). So the value of root_index is an index that is meaningful to the slice, but not to the overall, initial inorder list. Yet the second version of the code will give root_index an index that is meaningful in the initial inorder list, not to the local slice of it.
You can fix that by not creating a local inorder name (as parameter), but instead pass start and end indices which define which slice should be assumed, without actually creating it. That way the code can continue with the initial inorder list and use the indexing that the cache offers.
So:
def helper(preorder, start, end):
if not preorder or start >= end:
return None
root = TreeNode(preorder.popleft())
root_index = cache[root.val]
root.left = helper(preorder, start, root_index)
root.right = helper(preorder, root_index+1, end)
return root
return helper(preorder_queue, 0, len(inorder))

How to optimise the solution to not get memory limit exceeded error or what might be getting me the error?

I came across the following problem.
You are given the root of a binary tree with n nodes.
Each node is uniquely assigned a value from 1 to n.
You are also given an integer startValue representing
the value of the start node s,
and a different integer destValue representing
the value of the destination node t.
Find the shortest path starting from node s and ending at node t.
Generate step-by-step directions of such path as a string consisting of only the
uppercase letters 'L', 'R', and 'U'. Each letter indicates a specific direction:
'L' means to go from a node to its left child node.
'R' means to go from a node to its right child node.
'U' means to go from a node to its parent node.
Return the step-by-step directions of the shortest path from node s to node t
Example 1:
Input: root = [5,1,2,3,null,6,4], startValue = 3, destValue = 6
Output: "UURL"
Explanation: The shortest path is: 3 → 1 → 5 → 2 → 6.
Example 2:
Input: root = [2,1], startValue = 2, destValue = 1
Output: "L"
Explanation: The shortest path is: 2 → 1.
I created the solution by finding the least common ancestor and then doing a depth-first-search to find the elements, Like this:-
# Definition for a binary tree node.
# class TreeNode(object):
# def __init__(self, val=0, left=None, right=None):
# self.val = val
# self.left = left
# self.right = right
class Solution(object):
def getDirections(self, root, startValue, destValue):
"""
:type root: Optional[TreeNode]
:type startValue: int
:type destValue: int
:rtype: str
"""
def lca(root):
if root == None or root.val == startValue or root.val == destValue:
return root
left = lca(root.left)
right = lca(root.right)
if left and right:
return root
return left or right
def dfs(root, value, path):
if root == None:
return ""
if root.val == value:
return path
return dfs(root.left, value, path + "L") + dfs(root.right, value, path + "R")
root = lca(root)
return "U"*len(dfs(root, startValue, "")) + dfs(root, destValue, "")
The solution runs good, however for a very large input it throws "Memory Limit Exceeded" error, can anyone tell me how I can optimise the solution, or what might I be doing that could be getting me into it ?

The reason you're getting a memory limit exceeded is the arguments to the dfs function. Your 'path' variable is a string that can be as large as the height of the tree (which can be the size of the whole tree if it's unbalanced).
Normally that wouldn't be a problem, but path + "L" creates a new string for every recursive call of the function. Besides being very slow, this means that your memory usage is O(n^2), where n is the number of nodes in the tree.
For example, if your final path is "L" * 1000, your call stack for dfs will look like this:
Depth 0: dfs(root, path = "")
Depth 1: dfs(root.left, path = "L")
Depth 2: dfs(root.left.left, path = "LL")
...
Depth 999: path = "L"*999
Depth 1000: path = "L"*1000
Despite all those variables being called path, they are all completely different strings, for a total memory usage of ~(1000*1000)/2 = 500,000 characters at one time. With one million nodes, this is half a trillion characters.
Now, this doesn't happen just because strings are immutable; in fact, even if you were using lists (which are mutable), you'd still have this problem, as path + ["L"] would still be forced to create a copy of path.
To solve this, you need to have exactly one variable for the path stored outside of the dfs function, and only append to it from the recursive dfs function. This will ensure you only ever use O(n) space.
def dfs(root, value, path):
if root is None:
return False
if root.val == value:
return True
if dfs(root.left, value, path):
path.append("L")
return True
elif dfs(root.right, value, path):
path.append("R")
return True
return False
root = lca(root)
start_to_root = []
dfs(root, startValue, start_to_root)
dest_to_root = []
dfs(root, destValue, dest_to_root)
return "U" * len(start_to_root) + ''.join(reversed(dest_to_root))

Maximum depth of binary tree- do we need a 'holder' to keep track of the maximum current depth?

I a writing code to solve the following leetcode problem: https://leetcode.com/problems/maximum-depth-of-binary-tree/
Here is the iterative solution that passes all the tests:
def maxDepth(root):
stack = []
if not root:
return 0
if root:
stack.append((1,root))
depth =0
while stack:
current_depth, root = stack.pop()
depth = max(current_depth,depth)
if root.left:
stack.append((current_depth+1,root.left))
if root.right:
stack.append((current_depth+1,root.right))
return depth
I do understand on the whole what is happening, but my question is with depth = max(current_depth,depth). Am I right in understanding that the only purpose of 'depth' is to act as a holder to hold the current maximum depth as we traverse the tree?
Because when reading the code initially, the first thing that struck me is why not ONLY have current_depth? But then it hit me that we need to store the current_depth somewhere and only keep the largest. Am I right on this point?

my question is with depth = max(current_depth,depth). Am I right in understanding that the only purpose of 'depth' is to act as a holder to hold the current maximum depth as we traverse the tree?
Yes, that is correct. Maybe it helps clarifying this point when you would replace this line with this equivalent code:
if current_depth > depth:
depth = current_depth
we need to store the current_depth somewhere and only keep the largest. Am I right on this point?
Yes, that is correct. During the execution of the algorithm, current_depth is fluctuating up and down, as you move up and down the stack. Actually, current_depth is always one less than the size of the stack after the pop (or equal to it before the pop) so if you really wanted to, you could do this without the current_depth variable, and rely only on len(stack). In that case you don't even have to push that info on the stack. The outcome of the algorithm is really the maximum size that the stack reached during the whole execution:
def maxDepth(root):
stack = []
if not root:
return 0
if root:
stack.append(root)
depth =0
while stack:
depth = max(len(stack), depth)
root = stack.pop()
if root.left:
stack.append(root.left)
if root.right:
stack.append(root.right)
return depth
Recursive versions
The original code you presented really is an almost literal conversion of a recursive function to an iterative function, introducing an explicit stack variable instead of the call stack frames you would produce in a recursive version.
It may also help to see the recursive implementation that this code mimics:
def maxDepth(root):
if not root:
return 0
depth = 0
def dfs(current_depth, root): # <-- these variables live on THE stack
nonlocal depth
depth = max(current_depth, depth)
if root.left:
dfs(current_depth + 1, root.left)
if root.right:
dfs(current_depth + 1, root.right)
dfs(1, root)
return depth
And moving the three similar if statements one level deeper in the recursion tree, so to only have one if, we get:
def maxDepth(root):
depth = 0
def dfs(current_depth, root):
nonlocal depth
if root:
depth = max(current_depth, depth)
dfs(current_depth + 1, root.left)
dfs(current_depth + 1, root.right)
dfs(1, root)
return depth
It is essentially the same algorithm, but it may help clarify what is happening.
We can turn this into a more functional version, which makes dfs return the depth value: that way you can avoid the nonlocal trick to mutate the depth value from inside that function:
def maxDepth(root):
def dfs(current_depth, root):
return max(current_depth,
dfs(current_depth + 1, root.left),
dfs(current_depth + 1, root.right)
) if root else current_depth
return dfs(0, root)
And now we can even merge that inner function with the outside function, by providing it an optional argument (current_depth) -- it should not be provided in the main call of maxDepth:
def maxDepth(root, current_depth=0):
return max(current_depth,
maxDepth(root.left, current_depth + 1),
maxDepth(root.right, current_depth + 1)
) if root else current_depth
And finally, the most elegant solution is to make maxDepth return the depth of the subtree that it is given, so without any context of the larger tree. In that case it is no longer necessary to pass a current_depth argument. The 1 is added after the recursive call is made, to account for the parent node:
def maxDepth(root):
return 1 + max(
maxDepth(root.left), maxDepth(root.right)
) if root else 0

Root to leaf algo bug

Wrote an unnecessarily complex solution to the following question:
Given a binary tree and a sum, determine if the tree has a
root-to-leaf path such that adding up all the values along the path
equals the given sum.
Anyway, I'm trying to debug what went wrong here. I used a named tuple so that I can track both whether the number has been found and the running sum, but it looks like running sum is never incremented beyond zero. At any given leaf node, though, the running sum will be the leaf node's value, and in the next iteration the running sum should be incremented by the current running sum. Anyone know what's wrong with my "recursive leap of faith" here?
def root_to_leaf(target_sum, tree):
NodeData = collections.namedtuple('NodeData', ['running_sum', 'num_found'])
def root_to_leaf_helper(node, node_data):
if not node:
return NodeData(0, False)
if node_data.num_found:
return NodeData(target_sum, True)
left_check = root_to_leaf_helper(node.left, node_data)
if left_check.num_found:
return NodeData(target_sum, True)
right_check = root_to_leaf_helper(node.right, node_data)
if right_check.num_found:
return NodeData(target_sum, True)
new_running_sum = node.val + node_data.running_sum
return NodeData(new_running_sum, new_running_sum == target_sum)
return root_to_leaf_helper(tree, NodeData(0, False)).num_found
EDIT: I realize this is actually just checking if any path (not ending at leaf) has the correct value, but my question still stands on understanding why running sum isn't properly incremented.

I think you need to think clearly about whether information is flowing down the tree (from root to leaf) or up the tree (from leaf to root). It looks like the node_data argument to root_to_leaf_helper is initialized at the top of the tree, the root, and then passed down through each node via recursive calls. That's fine, but as far as I can tell, it's never changed on the way down the tree. It's just passed along untouched. Therefore the first check, for node_data.num_found, will always be false.
Even worse, since node_data is always the same ((0, False)) on the way down the tree, the following line that tries to add the current node's value to a running sum:
new_running_sum = node.val + node_data.running_sum
will always be adding node.val to 0, since node_data.running_sum is always 0.
Hopefully this is clear, I realize that it's a little difficult to explain.
But trying to think very clearly about information flowing down the tree (in the arguments to recursive calls) vs information flowing up the tree (in the return value from the recursive calls) will make this a little bit more clear.

You can keep a running list of the path as part of the signature of the recursive function/method, and call the function/method on both the right and left nodes using a generator. The generator will enable you to find the paths extending from the starting nodes. For simplicity, I have implemented the solution as a class method:
class Tree:
def __init__(self, *args):
self.__dict__ = dict(zip(['value', 'left', 'right'], args))
def get_sums(self, current = []):
if self.left is None:
yield current + [self.value]
else:
yield from self.left.get_sums(current+[self.value])
if self.right is None:
yield current+[self.value]
else:
yield from self.right.get_sums(current+[self.value])
tree = Tree(4, Tree(10, Tree(4, Tree(7, None, None), Tree(6, None, None)), Tree(12, None, None)), Tree(5, Tree(6, None, Tree(11, None, None)), None))
paths = list(tree.get_sums())
new_paths = [a for i, a in enumerate(paths) if a not in paths[:i]]
final_path = [i for i in paths if sum(i) == 15]
Output:
[[4, 5, 6]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

recursion in traversing a binary tree - python

Related

Traverse Tree and finding all possible path to target

Construct Binary tree from Preorder and inorder Optimized solution not working

How to optimise the solution to not get memory limit exceeded error or what might be getting me the error?

Maximum depth of binary tree- do we need a 'holder' to keep track of the maximum current depth?

Root to leaf algo bug

Categories

Resources