I am trying to write a solution for Leet Code problem 261. Graph Valid Tree:
Given n nodes labeled from 0 to n-1 and a list of undirected edges (each edge is a pair of nodes), write a function to check whether these edges make up a valid tree.
Example 1:
Input: n = 5, and edges = [[0,1], [0,2], [0,3], [1,4]]
Output: true
Example 2:
Input: n = 5, and edges = [[0,1], [1,2], [2,3], [1,3], [1,4]]
Output: false
Here is my solution thus far. I believe that the goal is to detect cycles in the tree. I use dfs to do this.
class Node:
def __init__(self, val):
self.val = val
self.outgoing = []
class Solution:
def validTree(self, n: int, edges: List[List[int]]) -> bool:
visited = {}
for pre, end in edges:
if pre not in visited:
"we add a new node to the visited set"
visited[pre] = Node(pre)
if end not in visited:
visited[end] = Node(end)
"We append it to the list"
visited[pre].outgoing.append(visited[end])
def dfs(current, dvisit = set()):
if current.val in dvisit:
print("is the condition happening here")
return True
dvisit.add(current.val)
for nodes in current.outgoing:
dfs(nodes, dvisit)
return False
mdict = set()
for key in visited.keys():
mdict.clear()
if dfs(visited[key], mdict) == True:
return False
return True
It fails this test n = 5, edges = [[0,1],[1,2],[2,3],[1,3],[1,4]]
It is supposed to return false but it returns true.
I placed some print statements in my dfs helper function and it does seem to be hitting the case where dfs is supposed to return true. However for some reason, the case in my for loop does not hit in the end, which causes the entire problem to return true for some reason. Can I receive some guidance on how I can modify this?
A few issues:
The given graph is undirected, so edges should be added in both directions when the tree data structure is built. Without doing this, you might miss cycles.
Once edges are made undirected, the algorithm should not travel back along the edge it just came from. For this purpose keep track of the parent node that the traversal just came from.
In dfs the returned value from the recursive call is ignored. It should not: when the returned value indicates there is a cycle, the loop should be exited and the same indication should be returned to the caller.
The main loop should not clear mdict. In fact, if after the first call to dfs, that loop finds another node that has not been visited, then this means the graph is not a tree: in a tree every pair of nodes is connected. No second call of dfs needs to be made from the main code, which means the main code does not need a loop. It can just call dfs on any node and then check that all nodes were visited by that call.
The function could do a preliminary "sanity" check, since a tree always has one less edge than it has vertices. If that is not true, then there is no need to continue: it is not a tree.
One boundary check could be made: when n is 1, and thus there are no edges, then there is nothing to call dfs on. In that case we can just return True, as this is a valid boundary case.
So a correction could look like this:
class Solution:
def validTree(self, n: int, edges: List[List[int]]) -> bool:
if n != len(edges) + 1: # Quick sanity check
return False
if n == 1: # Boundary case
return True
visited = {}
for pre, end in edges:
if pre not in visited:
visited[pre] = Node(pre)
if end not in visited:
visited[end] = Node(end)
visited[pre].outgoing.append(visited[end])
visited[end].outgoing.append(visited[pre]) # It's undirected
def dfs(parent, current, dvisit):
if current.val in dvisit:
return True # Cycle detected!
dvisit.add(current.val)
for node in current.outgoing:
# Avoid going back along the same edge, and
# check the returned boolean!
if node is not parent and dfs(current, node, dvisit):
return True # Quit as soon as cycle is found
return False
mdict = set()
# Start in any node:
if dfs(None, visited[pre], mdict):
return False # Cycle detected!
# After one DFS traversal there should not be any node that has not been visited
return len(mdict) == n
A tree is a special undirected graph. It satisfies two properties
It is connected
It has no cycle.
No cycle can be expressed as NumberOfNodes ==NumberOfEdges+1.
Based on this, given edges:
1- Create the graph
2- then traverse the graph and store the nodes in a set
3- Finally check if two conditions above are met
class Solution:
def validTree(self, n: int, edges: List[List[int]]) -> bool:
from collections import defaultdict
graph = defaultdict(list)
for src, dest in edges:
graph[src].append(dest)
graph[dest].append(src)
visited = set()
def dfs(root):
visited.add(root)
for node in graph[root]:
if node in visited:
# if you already visited before, means you alredy run dfs so do not run dfs again
continue
dfs(node)
dfs(0)
# this shows we have no cycle and connected
return len(visited) == n and len(edges)+1 == n
This question is locked in leetcode but you can test it here for now:
https://www.lintcode.com/problem/178/description
The problem is to use UnionFind to solve the problem.
Below is the correct solution:
class UnionFind:
def __init__(self, n):
self.arr = [i for i in range(n)]
def find(self, x):
if self.arr[x] == x:
return x
self.arr[x] = self.find(self.arr[x])
return self.arr[x] # this is the correct find()
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX != rootY:
self.arr[rootY] = rootX
class Solution:
def smallestStringWithSwaps(self, s: str, pairs: List[List[int]]) -> str:
size = len(s)
m = UnionFind(size)
for x,y in pairs:
m.union(x, y)
dict = defaultdict(list)
for i in range(size):
root = m.find(i)
dict[root].append(s[i])
for key in dict:
dict[key].sort(reverse=True)
result = []
for i in range(size):
root = m.find(i)
result.append(dict[root].pop())
return ''.join(result)
I kept getting the "time limit exceeded" error if my find() in UnionFind Class is either:
def find(self, x):
if self.arr[x] == x:
return x
return self.find(self.arr[x]) # time limit exceeded error will occur
or
def find(self, x):
while x != self.root[x]:
x = self.root[x]
return x # time limit exceeded error will occur
There are total 36 test cases for this problem. Using the two find methods above will only get 35/36 passed.
So what's the difference between the correct find() and the two above?
Thanks!
Your time-limit-exceeding versions (more or less equivalent to each other) are giving up on path compression, which avoids repeating the work of going up a long path to the root of a tree by making every node a direct child of its root whenever it’s working with that node. Normally that means getting O(log n) operations instead of (practically) O(1) operations, which wouldn’t be a big deal, but your implementation of Union is also suboptimal in that it doesn’t take into account the differences between the two trees it’s merging, so the combination is very slow – O(n) worst case for each operation, pretty much defeating the point of using a union-find structure in the first place.
For detailed explanations, the terms to look up are:
union by weight
union by rank
path compression
I'm trying leetcode problem 572.
Given two non-empty binary trees s and t, check whether tree t has exactly the same structure and node values with a subtree of s. A subtree of s is a tree consists of a node in s and all of this node's descendants. The tree s could also be considered as a subtree of itself.
Since, tree's a great for recursion, I thought about splitting the cases up.
a) If the current tree s is not the subtree t, then recurse on the left and right parts of s if possible
b) If tree s is subtree t, then return True
c) if s is empty, then we've exhausted all the subtrees in s and should return False
def isSubtree(self, s: TreeNode, t: TreeNode) -> bool:
if not s:
return False
if s == t:
return True
else:
if s.left and s.right:
return any([self.isSubtree(s.left, t), self.isSubtree(s.right, t)])
elif s.left:
return self.isSubtree(s.left, t)
elif s.right:
return self.isSubtree(s.right, t)
else:
return False
However, this for some reason returns False even for the cases where they are obviously True
Ex:
My code here returns False, but it should be True. Any pointers on what to do?
This'll simply get through:
class Solution:
def isSubtree(self, a, b):
def sub(node):
return f'A{node.val}#{sub(node.left)}{sub(node.right)}' if node else 'Z'
return sub(b) in sub(a)
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.
You need to change second if statement.
In order to check if tree t is subtree of tree s, each time a node in s matches the root of t, call check method which determines whether two subtrees are identical.
if s.val == t.val and check(s, t):
return True
A check method is look like this.
def check(self, s, t):
if s is None and t is None:
return True
if s is None or t is None or s.val != t.val:
return False
return self.check(s.left, t.left) and self.check(s.right, t.right)
While other code work well, your code will much simpler in else statement like the following. You don't need to check whether left and right nodes are None because first if statement will check that.
else:
return self.isSubtree(s.left, t) or self.isSubtree(s.right, t)
I was asked the following question in a job interview:
Given a root node (to a well formed binary tree) and two other nodes (which are guaranteed to be in the tree, and are also distinct), return the lowest common ancestor of the two nodes.
I didn't know any least common ancestor algorithms, so I tried to make one on the spot. I produced the following code:
def least_common_ancestor(root, a, b):
lca = [None]
def check_subtree(subtree, lca=lca):
if lca[0] is not None or subtree is None:
return 0
if subtree is a or subtree is b:
return 1
else:
ans = sum(check_subtree(n) for n in (subtree.left, subtree.right))
if ans == 2:
lca[0] = subtree
return 0
return ans
check_subtree(root)
return lca[0]
class Node:
def __init__(self, left, right):
self.left = left
self.right = right
I tried the following test cases and got the answer that I expected:
a = Node(None, None)
b = Node(None, None)
tree = Node(Node(Node(None, a), b), None)
tree2 = Node(a, Node(Node(None, None), b))
tree3 = Node(a, b)
but my interviewer told me that "there is a class of trees for which your algorithm returns None." I couldn't figure out what it was and I flubbed the interview. I can't think of a case where the algorithm would make it to the bottom of the tree without ans ever becoming 2 -- what am I missing?
You forgot to account for the case where a is a direct ancestor of b, or vice versa. You stop searching as soon as you find either node and return 1, so you'll never find the other node in that case.
You were given a well-formed binary search tree; one of the properties of such a tree is that you can easily find elements based on their relative size to the current node; smaller elements are going into the left sub-tree, greater go into the right. As such, if you know that both elements are in the tree you only need to compare keys; as soon as you find a node that is in between the two target nodes, or equal to one them, you have found lowest common ancestor.
Your sample nodes never included the keys stored in the tree, so you cannot make use of this property, but if you did, you'd use:
def lca(tree, a, b):
if a.key <= tree.key <= b.key:
return tree
if a.key < tree.key and b.key < tree.key:
return lca(tree.left, a, b)
return lca(tree.right, a, b)
If the tree is merely a 'regular' binary tree, and not a search tree, your only option is to find the paths for both elements and find the point at which these paths diverge.
If your binary tree maintains parent references and depth, this can be done efficiently; simply walk up the deeper of the two nodes until you are at the same depth, then continue upwards from both nodes until you have found a common node; that is the least-common-ancestor.
If you don't have those two elements, you'll have to find the path to both nodes with separate searches, starting from the root, then find the last common node in those two paths.
You are missing the case where a is an ancestor of b.
Look at the simple counter example:
a
b None
a is also given as root, and when invoking the function, you invoke check_subtree(root), which is a, you then find out that this is what you are looking for (in the stop clause that returns 1), and return 1 immidiately without setting lca as it should have been.
I'm having an issue with a search algorithm over a Huffman tree: for a given probability distribution I need the Huffman tree to be identical regardless of permutations of the input data.
Here is a picture of what's happening vs what I want:
Basically I want to know if it's possible to preserve the relative order of the items from the list to the tree. If not, why is that so?
For reference, I'm using the Huffman tree to generate sub groups according to a division of probability, so that I can run the search() procedure below. Notice that the data in the merge() sub-routine is combined, along with the weight. The codewords themselves aren't as important as the tree (which should preserve the relative order).
For example if I generate the following Huffman codes:
probabilities = [0.30, 0.25, 0.20, 0.15, 0.10]
items = ['a','b','c','d','e']
items = zip(items, probabilities)
t = encode(items)
d,l = hi.search(t)
print(d)
Using the following Class:
class Node(object):
left = None
right = None
weight = None
data = None
code = None
def __init__(self, w,d):
self.weight = w
self.data = d
def set_children(self, ln, rn):
self.left = ln
self.right = rn
def __repr__(self):
return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def merge(self, other):
total_freq = self.weight + other.weight
new_data = self.data + other.data
return Node(total_freq,new_data)
def index(self, node):
return node.weight
def encode(symbfreq):
pdb.set_trace()
tree = [Node(sym,wt) for wt,sym in symbfreq]
heapify(tree)
while len(tree)>1:
lo, hi = heappop(tree), heappop(tree)
n = lo.merge(hi)
n.set_children(lo, hi)
heappush(tree, n)
tree = tree[0]
def assign_code(node, code):
if node is not None:
node.code = code
if isinstance(node, Node):
assign_code(node.left, code+'0')
assign_code(node.right, code+'1')
assign_code(tree, '')
return tree
I get:
'a'->11
'b'->01
'c'->00
'd'->101
'e'->100
However, an assumption I've made in the search algorithm is that more probable items get pushed toward the left: that is I need 'a' to have the '00' codeword - and this should always be the case regardless of any permutation of the 'abcde' sequence. An example output is:
codewords = {'a':'00', 'b':'01', 'c':'10', 'd':'110', 'e':111'}
(N.b even though the codeword for 'c' is a suffix for 'd' this is ok).
For completeness, here is the search algorithm:
def search(tree):
print(tree)
pdb.set_trace()
current = tree.left
other = tree.right
loops = 0
while current:
loops+=1
print(current)
if current.data != 0 and current is not None and other is not None:
previous = current
current = current.left
other = previous.right
else:
previous = other
current = other.left
other = other.right
return previous, loops
It works by searching for the 'leftmost' 1 in a group of 0s and 1s - the Huffman tree has to put more probable items on the left. For example if I use the probabilities above and the input:
items = [1,0,1,0,0]
Then the index of the item returned by the algorithm is 2 - which isn't what should be returned (0 should, as it's leftmost).
The usual practice is to use Huffman's algorithm only to generate the code lengths. Then a canonical process is used to generate the codes from the lengths. The tree is discarded. Codes are assigned in order from shorter codes to longer codes, and within a code, the symbols are sorted. This gives the codes you are expecting, a = 00, b = 01, etc. This is called a Canonical Huffman code.
The main reason this is done is to make the transmission of the Huffman code more compact. Instead of sending the code for each symbol along with the compressed data, you only need to send the code length for each symbol. Then the codes can be reconstructed on the other end for decompression.
A Huffman tree is not normally used for decoding either. With a canonical code, simple comparisons to determine the length of the next code, and an index using the code value will take you directly to the symbol. Or a table-driven approach can avoid the search for the length.
As for your tree, there are arbitrary choices being made when there are equal frequencies. In particular, on the second step the first node pulled is c with probability 0.2, and the second node pulled is b with probability 0.25. However it would have been equally valid to pull, instead of b, the node that was made in the first step, (e,d), whose probability is also 0.25. In fact that is what you'd prefer for your desired end state. Alas, you have relinquished the control of that arbitrary choice to the heapq library.
(Note: since you are using floating point values, 0.1 + 0.15 is not necessarily exactly equal to 0.25. Though it turns out it is. As another example, 0.1 + 0.2 is not equal to 0.3. You would be better off using integers for the frequencies if you want to see what happens when sums of frequencies are equal to other frequencies or sums of frequencies. E.g. 6,5,4,3,2.)
Some of the wrong ordering can be fixed by fixing some mistakes: change lo.merge(high) to hi.merge(lo), and reverse the order of the bits to: assign_code(node.left, code+'1') followed by assign_code(node.right, code+'0'). Then at least a gets assigned 00 and d is before e and b is before c. The ordering is then adebc.
Now that I think about it, even if you pick (e,d) over b, e.g by setting the probability of b to 0.251, you still don't get the complete order that you're after. No matter what, the probability of (e,d) (0.25) is greater than the probability of c (0.2). So even in that case, the final ordering would be (with the fixes above) abdec instead of your desired abcde. So it is not possible to get what you want assuming a consistent tree ordering and bit assignment with respect to the probabilities of groups of symbols. E.g., assuming that for each branch the stuff on the left has a greater or equal probability than the stuff on the right, and 0 is always assigned to left and 1 is always assigned to right. You would need to do something different.
The different thing that comes to mind is what I said at the start of the answer. Use the Huffman algorithm just to get the code lengths. Then you can assign the codes to the symbols in whatever order you like, and build a new tree. That would be much easier than trying to come up with some sort of scheme to coerce the original tree to be what you want, and proving that that works in all cases.
I'll flesh out what Mark Adler said with working code. Everything he said is right :-) The high points:
You must not use floating-point weights, or any other scheme that loses information about weights. Use integers. Simple and correct. If, e.g., you have 3-digit floating probabilities, convert each to an integer via int(round(the_probability * 1000)), then maybe fiddle them to ensure the sum is exactly 1000.
heapq heaps are not "stable": nothing is defined about which item is popped if multiple items have the same minimal weight.
So you can't get what you want while building the tree.
A small variation of "canonical Huffman codes" appears to be what you do want. Constructing a tree for that is a long-winded process, but each step is straightforward enough. The first tree built is thrown away: the only information taken from it is the lengths of the codes assigned to each symbol.
Running:
syms = ['a','b','c','d','e']
weights = [30, 25, 20, 15, 10]
t = encode(syms, weights)
print t
prints this (formatted for readability):
[abcde,,
([ab,0,
([a,00,(None),(None)]),
([b,01,(None),(None)])]),
([cde,1,
([c,10,(None),(None)]),
([de,11,
([d,110,(None),(None)]),
([e,111,(None),(None)])])])]
Best I understand, that's exactly what you want. Complain if it isn't ;-)
EDIT: there was a bug in the assignment of canonical codes, which didn't show up unless weights were very different; fixed it.
class Node(object):
def __init__(self, data=None, weight=None,
left=None, right=None,
code=None):
self.data = data
self.weight = weight
self.left = left
self.right = right
self.code = code
def is_symbol(self):
return self.left is self.right is None
def __repr__(self):
return "[%s,%s,(%s),(%s)]" % (self.data,
self.code,
self.left,
self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def encode(syms, weights):
from heapq import heapify, heappush, heappop
tree = [Node(data=s, weight=w)
for s, w in zip(syms, weights)]
sym2node = {s.data: s for s in tree}
heapify(tree)
while len(tree) > 1:
a, b = heappop(tree), heappop(tree)
heappush(tree, Node(weight=a.weight + b.weight,
left=a, right=b))
# Compute code lengths for the canonical coding.
sym2len = {}
def assign_codelen(node, codelen):
if node is not None:
if node.is_symbol():
sym2len[node.data] = codelen
else:
assign_codelen(node.left, codelen + 1)
assign_codelen(node.right, codelen + 1)
assign_codelen(tree[0], 0)
# Create canonical codes, but with a twist: instead
# of ordering symbols alphabetically, order them by
# their position in the `syms` list.
# Construct a list of (codelen, index, symbol) triples.
# `index` breaks ties so that symbols with the same
# code length retain their original ordering.
triples = [(sym2len[name], i, name)
for i, name in enumerate(syms)]
code = oldcodelen = 0
for codelen, _, name in sorted(triples):
if codelen > oldcodelen:
code <<= (codelen - oldcodelen)
sym2node[name].code = format(code, "0%db" % codelen)
code += 1
oldcodelen = codelen
# Create a tree corresponding to the new codes.
tree = Node(code="")
dir2attr = {"0": "left", "1": "right"}
for snode in sym2node.values():
scode = snode.code
codesofar = ""
parent = tree
# Walk the tree creating any needed interior nodes.
for d in scode:
assert parent is not None
codesofar += d
attr = dir2attr[d]
child = getattr(parent, attr)
if codesofar == scode:
# We're at the leaf position.
assert child is None
setattr(parent, attr, snode)
elif child is not None:
assert child.code == codesofar
else:
child = Node(code=codesofar)
setattr(parent, attr, child)
parent = child
# Finally, paste the `data` attributes together up
# the tree. Why? Don't know ;-)
def paste(node):
if node is None:
return ""
elif node.is_symbol():
return node.data
else:
result = paste(node.left) + paste(node.right)
node.data = result
return result
paste(tree)
return tree
Duplicate symbols
Could I swap the sym2node dict to an ordereddict to deal with
repeated 'a'/'b's etc?
No, and for two reasons:
No mapping type supports duplicate keys; and,
The concept of "duplicate symbols" makes no sense for Huffman encoding.
So, if you're determined ;-) to pursue this, first you have to ensure that symbols are unique. Just add this line at the start of the function:
syms = list(enumerate(syms))
For example, if the syms passed in is:
['a', 'b', 'a']
that will change to:
[(0, 'a'), (1, 'b'), (2, 'a')]
All symbols are now 2-tuples, and are obviously unique since each starts with a unique integer. The only thing the algorithm cares about is that symbols can be used as dict keys; it couldn't care less whether they're strings, tuples, or any other hashable type that supports equality testing.
So nothing in the algorithm needs to change. But before the end, we'll want to restore the original symbols. Just insert this before the paste() function:
def restore_syms(node):
if node is None:
return
elif node.is_symbol():
node.data = node.data[1]
else:
restore_syms(node.left)
restore_syms(node.right)
restore_syms(tree)
That simply walks the tree and strips the leading integers off the symbols' .data members. Or, perhaps simpler, just iterate over sym2node.values(), and transform the .data member of each.