Root to leaf algo bug

Root to leaf algo bug - python

Wrote an unnecessarily complex solution to the following question:
Given a binary tree and a sum, determine if the tree has a
root-to-leaf path such that adding up all the values along the path
equals the given sum.
Anyway, I'm trying to debug what went wrong here. I used a named tuple so that I can track both whether the number has been found and the running sum, but it looks like running sum is never incremented beyond zero. At any given leaf node, though, the running sum will be the leaf node's value, and in the next iteration the running sum should be incremented by the current running sum. Anyone know what's wrong with my "recursive leap of faith" here?
def root_to_leaf(target_sum, tree):
NodeData = collections.namedtuple('NodeData', ['running_sum', 'num_found'])
def root_to_leaf_helper(node, node_data):
if not node:
return NodeData(0, False)
if node_data.num_found:
return NodeData(target_sum, True)
left_check = root_to_leaf_helper(node.left, node_data)
if left_check.num_found:
return NodeData(target_sum, True)
right_check = root_to_leaf_helper(node.right, node_data)
if right_check.num_found:
return NodeData(target_sum, True)
new_running_sum = node.val + node_data.running_sum
return NodeData(new_running_sum, new_running_sum == target_sum)
return root_to_leaf_helper(tree, NodeData(0, False)).num_found
EDIT: I realize this is actually just checking if any path (not ending at leaf) has the correct value, but my question still stands on understanding why running sum isn't properly incremented.

I think you need to think clearly about whether information is flowing down the tree (from root to leaf) or up the tree (from leaf to root). It looks like the node_data argument to root_to_leaf_helper is initialized at the top of the tree, the root, and then passed down through each node via recursive calls. That's fine, but as far as I can tell, it's never changed on the way down the tree. It's just passed along untouched. Therefore the first check, for node_data.num_found, will always be false.
Even worse, since node_data is always the same ((0, False)) on the way down the tree, the following line that tries to add the current node's value to a running sum:
new_running_sum = node.val + node_data.running_sum
will always be adding node.val to 0, since node_data.running_sum is always 0.
Hopefully this is clear, I realize that it's a little difficult to explain.
But trying to think very clearly about information flowing down the tree (in the arguments to recursive calls) vs information flowing up the tree (in the return value from the recursive calls) will make this a little bit more clear.

You can keep a running list of the path as part of the signature of the recursive function/method, and call the function/method on both the right and left nodes using a generator. The generator will enable you to find the paths extending from the starting nodes. For simplicity, I have implemented the solution as a class method:
class Tree:
def __init__(self, *args):
self.__dict__ = dict(zip(['value', 'left', 'right'], args))
def get_sums(self, current = []):
if self.left is None:
yield current + [self.value]
else:
yield from self.left.get_sums(current+[self.value])
if self.right is None:
yield current+[self.value]
else:
yield from self.right.get_sums(current+[self.value])
tree = Tree(4, Tree(10, Tree(4, Tree(7, None, None), Tree(6, None, None)), Tree(12, None, None)), Tree(5, Tree(6, None, Tree(11, None, None)), None))
paths = list(tree.get_sums())
new_paths = [a for i, a in enumerate(paths) if a not in paths[:i]]
final_path = [i for i in paths if sum(i) == 15]
Output:
[[4, 5, 6]]

Related

Change in a list in Python when it is called recursively

I was solving some problems on a binary tree and I got stuck in the question Lowest Common Ancestor in a Binary Tree | Set 1.
I am using Python to solve the question.
My question is like in the case where we want to find the common ancestor between 4 and 5, how when the function is called first time [1,2,4] is returned...
How.... shouldn't 3, 6, 7 be also be appended in the list along with 1,2,4 when (root.right!= None and findPath(root.right, path, k) is called recursively.
PS: Please refer to the Python code given in the link

The code you're asking about appears to be the findPath function, copied here:
# Finds the path from root node to given root of the tree.
# Stores the path in a list path[], returns true if path
# exists otherwise false
def findPath( root, path, k):
# Baes Case
if root is None:
return False
# Store this node is path vector. The node will be
# removed if not in path from root to k
path.append(root.key)
# See if the k is same as root's key
if root.key == k :
return True
# Check if k is found in left or right sub-tree
if ((root.left != None and findPath(root.left, path, k)) or
(root.right!= None and findPath(root.right, path, k))):
return True
# If not present in subtree rooted with root, remove
# root from path and return False
path.pop()
return False
(Source: https://www.geeksforgeeks.org/lowest-common-ancestor-binary-tree-set-1/)
And you're confused about how the path output argument works. That's a fair question, as the algorithm is a little indirect about how it builds that list.
Let's look at all of the return codepaths, and specifically focus on what's happening to path when this happens. I've copied the above code without the comments, focusing just on the main logic of the function. The comments are mine. I've also renamed "root" to "current" as this makes the method more clear.
def findPath(current, path, k):
# Put the current node's key at the end of the path list.
path.append(current.key)
# Return immediately if the current node is the target.
if current.key == k:
return True
# Return immediately if one of the current node's children contains the target.
if ((current.left != None and findPath(current.left, path, k)) or
(current.right!= None and findPath(current.right, path, k))):
return True
# Remove the current node's key from the path list.
path.pop()
return False
So, we can think of this function as essentially doing the following (in pseudocode):
def findPath(current, path, target):
path += current.key
if (current or its children contain target):
return True
else
path -= current.key
return False
Note that only in the "happy" case, where current or its children contain target, do we leave current.key in path. In all other code paths, we remove current.key from path. So, in the end, path will contain exactly the keys between root and target.

Is it right approach to fix problem of BST?

I have to check is Binary tree balanced and I am pretty sure my solution should work.
import sys
class Solution:
def isBalanced(self, root: TreeNode) -> bool:
cache = {
max:-sys.maxsize, #min possible number
min:sys.maxsize #max possible number
}
self.checkBalanced(root, cache, 0)
return cache[max] - cache[min] <= 1
def checkBalanced(self,node, cache, depth):
if node is None:
if depth < cache[min]:
cache[min] = depth
if depth > cache[max]:
cache[max] = depth
else:
self.checkBalanced(node.left, cache, depth+1)
self.checkBalanced(node.right, cache, depth+1)
But in this case I have an error
Here is link for question on Leetcode: https://leetcode.com/problems/balanced-binary-tree

There is a definition from the link that you provided:
For this problem, a height-balanced binary tree is defined as: a binary tree in which the depth of the two subtrees of every node never differ by more than 1.
For given "bad" input your code calculates cache[max] = 5, cache[min] = 3, so it returns false. However, if we consider root, its left subtree has depth 4 and right subtree has depth 3, so this satisfies the definition.
You should find depths of left and right subtrees for each root, however your code calculates depth (your cache[max]) and length of shortest path to any leaf of subtree (your cache[min]).
I hope you will easily fix your code after these clarifications.

recursion in traversing a binary tree

I am trying to do the leetcode problem problem #113, which is "Given a binary tree and a sum, find all root-to-leaf paths where each path's sum equals the given sum"
My problem is why does my code #1 shown below prints the values of all nodes in the tree? How does the recursion stack work in code #1 as opposed to how the recursion stack work in the code #2, which is a correct solution?
Thank you so much for helping me!
#code #1
class Solution:
def pathSum (self, root, sum):
self.res = []
self.dfs(root, sum, [])
return self.res
def dfs(self, root, sum, path):
if not root:
return
sum -= root.val
path += [root.val]
if not root.left and not root.right and sum == 0:
self.res.append(path)
self.dfs(root.left, sum, path)
self.dfs(root.right, sum, path)
#code #2
class Solution:
def pathSum (self, root, sum):
self.res = []
self.dfs (root, sum, [])
return self.res
def dfs (self, root, sum, path):
if not root:
return
sum -= root.val
if not root.left and not root.right and sum == 0:
self.res.append(path + [root.val])
self.dfs(root.left, sum, path+[root.val])
self.dfs(root.right, sum, path+[root.val])

Well, the comments above are well deserved, some examples would help here. For instance, you talk about printing, but there are no print statements here, so how are you driving this and how are you using this?
Having said that, I suspect the problem will trace back to the fact that Code#1 changes the value of path, while code #2 does not. Now if path was a number, this wouldn't matter because it would be passed in by value. However, you passed in [] originally (an empty list) which is an object . . . so it is passed in by reference. As a result, as code #1 proceeds you keep changing the node (path) above you, but in Code #2, the passed in paths never change.

What's wrong with this least common ancestor algorithm?

I was asked the following question in a job interview:
Given a root node (to a well formed binary tree) and two other nodes (which are guaranteed to be in the tree, and are also distinct), return the lowest common ancestor of the two nodes.
I didn't know any least common ancestor algorithms, so I tried to make one on the spot. I produced the following code:
def least_common_ancestor(root, a, b):
lca = [None]
def check_subtree(subtree, lca=lca):
if lca[0] is not None or subtree is None:
return 0
if subtree is a or subtree is b:
return 1
else:
ans = sum(check_subtree(n) for n in (subtree.left, subtree.right))
if ans == 2:
lca[0] = subtree
return 0
return ans
check_subtree(root)
return lca[0]
class Node:
def __init__(self, left, right):
self.left = left
self.right = right
I tried the following test cases and got the answer that I expected:
a = Node(None, None)
b = Node(None, None)
tree = Node(Node(Node(None, a), b), None)
tree2 = Node(a, Node(Node(None, None), b))
tree3 = Node(a, b)
but my interviewer told me that "there is a class of trees for which your algorithm returns None." I couldn't figure out what it was and I flubbed the interview. I can't think of a case where the algorithm would make it to the bottom of the tree without ans ever becoming 2 -- what am I missing?

You forgot to account for the case where a is a direct ancestor of b, or vice versa. You stop searching as soon as you find either node and return 1, so you'll never find the other node in that case.
You were given a well-formed binary search tree; one of the properties of such a tree is that you can easily find elements based on their relative size to the current node; smaller elements are going into the left sub-tree, greater go into the right. As such, if you know that both elements are in the tree you only need to compare keys; as soon as you find a node that is in between the two target nodes, or equal to one them, you have found lowest common ancestor.
Your sample nodes never included the keys stored in the tree, so you cannot make use of this property, but if you did, you'd use:
def lca(tree, a, b):
if a.key <= tree.key <= b.key:
return tree
if a.key < tree.key and b.key < tree.key:
return lca(tree.left, a, b)
return lca(tree.right, a, b)
If the tree is merely a 'regular' binary tree, and not a search tree, your only option is to find the paths for both elements and find the point at which these paths diverge.
If your binary tree maintains parent references and depth, this can be done efficiently; simply walk up the deeper of the two nodes until you are at the same depth, then continue upwards from both nodes until you have found a common node; that is the least-common-ancestor.
If you don't have those two elements, you'll have to find the path to both nodes with separate searches, starting from the root, then find the last common node in those two paths.

You are missing the case where a is an ancestor of b.
Look at the simple counter example:
a
b None
a is also given as root, and when invoking the function, you invoke check_subtree(root), which is a, you then find out that this is what you are looking for (in the stop clause that returns 1), and return 1 immidiately without setting lca as it should have been.

Mapping a list to a Huffman Tree whilst preserving relative order

I'm having an issue with a search algorithm over a Huffman tree: for a given probability distribution I need the Huffman tree to be identical regardless of permutations of the input data.
Here is a picture of what's happening vs what I want:
Basically I want to know if it's possible to preserve the relative order of the items from the list to the tree. If not, why is that so?
For reference, I'm using the Huffman tree to generate sub groups according to a division of probability, so that I can run the search() procedure below. Notice that the data in the merge() sub-routine is combined, along with the weight. The codewords themselves aren't as important as the tree (which should preserve the relative order).
For example if I generate the following Huffman codes:
probabilities = [0.30, 0.25, 0.20, 0.15, 0.10]
items = ['a','b','c','d','e']
items = zip(items, probabilities)
t = encode(items)
d,l = hi.search(t)
print(d)
Using the following Class:
class Node(object):
left = None
right = None
weight = None
data = None
code = None
def __init__(self, w,d):
self.weight = w
self.data = d
def set_children(self, ln, rn):
self.left = ln
self.right = rn
def __repr__(self):
return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def merge(self, other):
total_freq = self.weight + other.weight
new_data = self.data + other.data
return Node(total_freq,new_data)
def index(self, node):
return node.weight
def encode(symbfreq):
pdb.set_trace()
tree = [Node(sym,wt) for wt,sym in symbfreq]
heapify(tree)
while len(tree)>1:
lo, hi = heappop(tree), heappop(tree)
n = lo.merge(hi)
n.set_children(lo, hi)
heappush(tree, n)
tree = tree[0]
def assign_code(node, code):
if node is not None:
node.code = code
if isinstance(node, Node):
assign_code(node.left, code+'0')
assign_code(node.right, code+'1')
assign_code(tree, '')
return tree
I get:
'a'->11
'b'->01
'c'->00
'd'->101
'e'->100
However, an assumption I've made in the search algorithm is that more probable items get pushed toward the left: that is I need 'a' to have the '00' codeword - and this should always be the case regardless of any permutation of the 'abcde' sequence. An example output is:
codewords = {'a':'00', 'b':'01', 'c':'10', 'd':'110', 'e':111'}
(N.b even though the codeword for 'c' is a suffix for 'd' this is ok).
For completeness, here is the search algorithm:
def search(tree):
print(tree)
pdb.set_trace()
current = tree.left
other = tree.right
loops = 0
while current:
loops+=1
print(current)
if current.data != 0 and current is not None and other is not None:
previous = current
current = current.left
other = previous.right
else:
previous = other
current = other.left
other = other.right
return previous, loops
It works by searching for the 'leftmost' 1 in a group of 0s and 1s - the Huffman tree has to put more probable items on the left. For example if I use the probabilities above and the input:
items = [1,0,1,0,0]
Then the index of the item returned by the algorithm is 2 - which isn't what should be returned (0 should, as it's leftmost).

The usual practice is to use Huffman's algorithm only to generate the code lengths. Then a canonical process is used to generate the codes from the lengths. The tree is discarded. Codes are assigned in order from shorter codes to longer codes, and within a code, the symbols are sorted. This gives the codes you are expecting, a = 00, b = 01, etc. This is called a Canonical Huffman code.
The main reason this is done is to make the transmission of the Huffman code more compact. Instead of sending the code for each symbol along with the compressed data, you only need to send the code length for each symbol. Then the codes can be reconstructed on the other end for decompression.
A Huffman tree is not normally used for decoding either. With a canonical code, simple comparisons to determine the length of the next code, and an index using the code value will take you directly to the symbol. Or a table-driven approach can avoid the search for the length.
As for your tree, there are arbitrary choices being made when there are equal frequencies. In particular, on the second step the first node pulled is c with probability 0.2, and the second node pulled is b with probability 0.25. However it would have been equally valid to pull, instead of b, the node that was made in the first step, (e,d), whose probability is also 0.25. In fact that is what you'd prefer for your desired end state. Alas, you have relinquished the control of that arbitrary choice to the heapq library.
(Note: since you are using floating point values, 0.1 + 0.15 is not necessarily exactly equal to 0.25. Though it turns out it is. As another example, 0.1 + 0.2 is not equal to 0.3. You would be better off using integers for the frequencies if you want to see what happens when sums of frequencies are equal to other frequencies or sums of frequencies. E.g. 6,5,4,3,2.)
Some of the wrong ordering can be fixed by fixing some mistakes: change lo.merge(high) to hi.merge(lo), and reverse the order of the bits to: assign_code(node.left, code+'1') followed by assign_code(node.right, code+'0'). Then at least a gets assigned 00 and d is before e and b is before c. The ordering is then adebc.
Now that I think about it, even if you pick (e,d) over b, e.g by setting the probability of b to 0.251, you still don't get the complete order that you're after. No matter what, the probability of (e,d) (0.25) is greater than the probability of c (0.2). So even in that case, the final ordering would be (with the fixes above) abdec instead of your desired abcde. So it is not possible to get what you want assuming a consistent tree ordering and bit assignment with respect to the probabilities of groups of symbols. E.g., assuming that for each branch the stuff on the left has a greater or equal probability than the stuff on the right, and 0 is always assigned to left and 1 is always assigned to right. You would need to do something different.
The different thing that comes to mind is what I said at the start of the answer. Use the Huffman algorithm just to get the code lengths. Then you can assign the codes to the symbols in whatever order you like, and build a new tree. That would be much easier than trying to come up with some sort of scheme to coerce the original tree to be what you want, and proving that that works in all cases.

I'll flesh out what Mark Adler said with working code. Everything he said is right :-) The high points:
You must not use floating-point weights, or any other scheme that loses information about weights. Use integers. Simple and correct. If, e.g., you have 3-digit floating probabilities, convert each to an integer via int(round(the_probability * 1000)), then maybe fiddle them to ensure the sum is exactly 1000.
heapq heaps are not "stable": nothing is defined about which item is popped if multiple items have the same minimal weight.
So you can't get what you want while building the tree.
A small variation of "canonical Huffman codes" appears to be what you do want. Constructing a tree for that is a long-winded process, but each step is straightforward enough. The first tree built is thrown away: the only information taken from it is the lengths of the codes assigned to each symbol.
Running:
syms = ['a','b','c','d','e']
weights = [30, 25, 20, 15, 10]
t = encode(syms, weights)
print t
prints this (formatted for readability):
[abcde,,
([ab,0,
([a,00,(None),(None)]),
([b,01,(None),(None)])]),
([cde,1,
([c,10,(None),(None)]),
([de,11,
([d,110,(None),(None)]),
([e,111,(None),(None)])])])]
Best I understand, that's exactly what you want. Complain if it isn't ;-)
EDIT: there was a bug in the assignment of canonical codes, which didn't show up unless weights were very different; fixed it.
class Node(object):
def __init__(self, data=None, weight=None,
left=None, right=None,
code=None):
self.data = data
self.weight = weight
self.left = left
self.right = right
self.code = code
def is_symbol(self):
return self.left is self.right is None
def __repr__(self):
return "[%s,%s,(%s),(%s)]" % (self.data,
self.code,
self.left,
self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def encode(syms, weights):
from heapq import heapify, heappush, heappop
tree = [Node(data=s, weight=w)
for s, w in zip(syms, weights)]
sym2node = {s.data: s for s in tree}
heapify(tree)
while len(tree) > 1:
a, b = heappop(tree), heappop(tree)
heappush(tree, Node(weight=a.weight + b.weight,
left=a, right=b))
# Compute code lengths for the canonical coding.
sym2len = {}
def assign_codelen(node, codelen):
if node is not None:
if node.is_symbol():
sym2len[node.data] = codelen
else:
assign_codelen(node.left, codelen + 1)
assign_codelen(node.right, codelen + 1)
assign_codelen(tree[0], 0)
# Create canonical codes, but with a twist: instead
# of ordering symbols alphabetically, order them by
# their position in the `syms` list.
# Construct a list of (codelen, index, symbol) triples.
# `index` breaks ties so that symbols with the same
# code length retain their original ordering.
triples = [(sym2len[name], i, name)
for i, name in enumerate(syms)]
code = oldcodelen = 0
for codelen, _, name in sorted(triples):
if codelen > oldcodelen:
code <<= (codelen - oldcodelen)
sym2node[name].code = format(code, "0%db" % codelen)
code += 1
oldcodelen = codelen
# Create a tree corresponding to the new codes.
tree = Node(code="")
dir2attr = {"0": "left", "1": "right"}
for snode in sym2node.values():
scode = snode.code
codesofar = ""
parent = tree
# Walk the tree creating any needed interior nodes.
for d in scode:
assert parent is not None
codesofar += d
attr = dir2attr[d]
child = getattr(parent, attr)
if codesofar == scode:
# We're at the leaf position.
assert child is None
setattr(parent, attr, snode)
elif child is not None:
assert child.code == codesofar
else:
child = Node(code=codesofar)
setattr(parent, attr, child)
parent = child
# Finally, paste the `data` attributes together up
# the tree. Why? Don't know ;-)
def paste(node):
if node is None:
return ""
elif node.is_symbol():
return node.data
else:
result = paste(node.left) + paste(node.right)
node.data = result
return result
paste(tree)
return tree
Duplicate symbols
Could I swap the sym2node dict to an ordereddict to deal with
repeated 'a'/'b's etc?
No, and for two reasons:
No mapping type supports duplicate keys; and,
The concept of "duplicate symbols" makes no sense for Huffman encoding.
So, if you're determined ;-) to pursue this, first you have to ensure that symbols are unique. Just add this line at the start of the function:
syms = list(enumerate(syms))
For example, if the syms passed in is:
['a', 'b', 'a']
that will change to:
[(0, 'a'), (1, 'b'), (2, 'a')]
All symbols are now 2-tuples, and are obviously unique since each starts with a unique integer. The only thing the algorithm cares about is that symbols can be used as dict keys; it couldn't care less whether they're strings, tuples, or any other hashable type that supports equality testing.
So nothing in the algorithm needs to change. But before the end, we'll want to restore the original symbols. Just insert this before the paste() function:
def restore_syms(node):
if node is None:
return
elif node.is_symbol():
node.data = node.data[1]
else:
restore_syms(node.left)
restore_syms(node.right)
restore_syms(tree)
That simply walks the tree and strips the leading integers off the symbols' .data members. Or, perhaps simpler, just iterate over sym2node.values(), and transform the .data member of each.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.