LeetCode 1202. Smallest String With Swaps, time limit exceeded problem using Python3

LeetCode 1202. Smallest String With Swaps, time limit exceeded problem using Python3 - python

The problem is to use UnionFind to solve the problem.
Below is the correct solution:
class UnionFind:
def __init__(self, n):
self.arr = [i for i in range(n)]
def find(self, x):
if self.arr[x] == x:
return x
self.arr[x] = self.find(self.arr[x])
return self.arr[x] # this is the correct find()
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX != rootY:
self.arr[rootY] = rootX
class Solution:
def smallestStringWithSwaps(self, s: str, pairs: List[List[int]]) -> str:
size = len(s)
m = UnionFind(size)
for x,y in pairs:
m.union(x, y)
dict = defaultdict(list)
for i in range(size):
root = m.find(i)
dict[root].append(s[i])
for key in dict:
dict[key].sort(reverse=True)
result = []
for i in range(size):
root = m.find(i)
result.append(dict[root].pop())
return ''.join(result)
I kept getting the "time limit exceeded" error if my find() in UnionFind Class is either:
def find(self, x):
if self.arr[x] == x:
return x
return self.find(self.arr[x]) # time limit exceeded error will occur
or
def find(self, x):
while x != self.root[x]:
x = self.root[x]
return x # time limit exceeded error will occur
There are total 36 test cases for this problem. Using the two find methods above will only get 35/36 passed.
So what's the difference between the correct find() and the two above?
Thanks!

Your time-limit-exceeding versions (more or less equivalent to each other) are giving up on path compression, which avoids repeating the work of going up a long path to the root of a tree by making every node a direct child of its root whenever it’s working with that node. Normally that means getting O(log n) operations instead of (practically) O(1) operations, which wouldn’t be a big deal, but your implementation of Union is also suboptimal in that it doesn’t take into account the differences between the two trees it’s merging, so the combination is very slow – O(n) worst case for each operation, pretty much defeating the point of using a union-find structure in the first place.
For detailed explanations, the terms to look up are:
union by weight
union by rank
path compression

Related

Rotate a linked list k times (Python)

While practicing for my final in Python programming I ran into this question "def rotaten" of rotating k times. The problem says that k can range from 0 to any positive integer number (even greater than list size, if k < 0 raise ValueError
and it must execute in O( (n-k)%n ) where n is the length of the list. It also has the following warnings:
WARNING: DO NOT call .rotate() k times !!!!
WARNING: DO NOT try to convert whole linked list to a python list
WARNING: DO NOT swap node data or create nodes
The problem is that I'm not understanding the solution given. Is there an easier way to solve this problem? Thank you in advance
class Node:
def __init__(self,initdata):
self._data = initdata
self._next = None
def get_data(self):
return self._data
def get_next(self):
return self._next
def set_data(self,newdata):
self._data = newdata
def set_next(self,newnext):
self._next = newnext
class LinkedList:
def rotaten(self, k):
if k < 0:
raise ValueError
if self._size > 1:
m = k % self._size
if m > 0 and m < self._size:
current = self._head
for i in range(self._size - m - 1):
current = current.get_next()
chain_b = current.get_next()
old_head = self._head
old_last = self._last
self._last = current
self._last.set_next(None)
self._head = chain_b
old_last.set_next(old_head)

The easiest ways are forbidden
warning forbids implementing the singular rotation, then calling it repeatedly
warning forbids using native python structure - python has list built-in in basic collections list(), which can be then transformed to a deque, which can then be rotated by popping from the end and inserting the same node to the beginning.
warning is preventing you from making your life easier by creating some other nodes or worse - copying data. (copying data from one node to another would remove any advantage you had from storing the data into the lists in the first place)
The example solution is basically this:
Take the current first node(head of the list) and the tail node which is stored in the List structure. (the bookkeeping of the example list consists of holding the head and tail, the rest is done in the nodes themselves)
Find the k-th node - so that you rotate the whole list at once. This is where the list needs to be cut.
Add the last node as the new head, doing all the necessary reference-linking and unlinking. (Which is I guess the point of this question - to test if you understand the references. In C these would be pointers, python has the references implicitly.)
So as far as the linked lists go, this is the most straightforward solution with the requested O((n-k)%n) complexity.
Good luck with your pointers :-).

Same Tree | leetcode wrong complexity analysis of the solution

I have one question regarding the space complexity analysis of the solution of the leetcode problem: 100. Same Tree
The problem:
Given two binary trees, write a function to check if they are the same or not. Two binary trees are considered the same if they are structurally identical and the nodes have the same value.
The solution code:
from collections import deque
class Solution:
def isSameTree(self, p, q):
"""
:type p: TreeNode
:type q: TreeNode
:rtype: bool
"""
def check(p, q):
# if both are None
if not p and not q:
return True
# one of p and q is None
if not q or not p:
return False
if p.val != q.val:
return False
return True
deq = deque([(p, q),])
while deq:
p, q = deq.popleft()
if not check(p, q):
return False
if p:
deq.append((p.left, q.left))
deq.append((p.right, q.right))
return True
My question:
The space complexity analysis of the solution in Leetcode says that it's O(N) for completely unbalanced tree, to keep a deque. However, I think the space complexity is definitely not O(N). I use the following example to show why: Assume that root = 1, root.left = 2, root.left.left = 3, root.left.left.left = 4, N is None. Below are the different stages of deque. For each stage, we popleft one item and add two except when we meet N.
[1]->[2,N]->[N,3,N]->[3,N]->[N,4,N]->[4,N]->[N,N,N]
As you can see from the deque above, the size never reaches N (where N is the number of nodes)
I can tell the space complexity of this algorithm is not O(N) (Am I right?), but I can't tell for sure what it is.

From recursive algorithm to bottom up dynamic programming approach

I have a recursive algorithm in which I calculate some probability values. The input is a list of integers and a single integer value, which represents a constant value.
For instance, p([12,19,13], 2) makes three recursive calls, which are
p([12,19],0) and p([13], 2)
p([12,19],1) and p([13], 1)
p([12,19],2) and p([13], 0)
since 2 can be decomposed as 0+2, 1+1 or 2+0. Then each call follows a similar approach and makes several other recursive calls.
The recursive algorithm I have
limit = 20
def p(listvals, cval):
# base case
if len(listvals) == 0:
return 0
if len(listvals) == 1:
if cval == 0:
return listvals[0]/limit
elif listvals[0] + cval > limit:
return 0
else:
return 1/limit
result = 0
for c in range(0,cval+1):
c1 = c
c2 = cval-c
listvals1 = listvals[:-1]
listvals2 = [listvals[-1]]
if listvals[-1] + c2 <= limit:
r = p(listvals1, c1) * p(listvals2, c2)
result = result+r
return result
I have been trying to convert this into a bottom up DP code, but could not figure out the way I need to make the iteration.
I wrote down all the intermediate steps that are needed to be calculated for the final result, and it is apparent that there are lots of repetitions at the bottom of the recursive calls.
I tried creating a dictionary of pre-calculated values as given below
m[single_value]=[list of calculated values]
and use those values instead of making the second recursive call p(listvals2, c2), but it did not help much as far as the running time is concerned.
How can I improve the running time by using a proper bottom-up approach?

Not sure that I understand what your program wants to compute, so can't help on that, maybe explain a bit more?
Regarding improving performance, you are caching only the leaf nodes of the computations that are repeated in recursive calls. A better way to do that would be have the first parameter of your function p as a tuple instead of a list, and then use tuple of both the arguments to p as caching keys in the dictionary.
Python's standard library functools provides a simple way to do this fairly common piece.
from functools import wraps
def cached(func):
cache = {}
#wraps(func)
def wrapped(listvals, cval):
key = (listvals, cval)
if key not in cache:
cache[key] = func(key)
return cache[key]
return wrapped
Use this decorator to cache all calls function:
#cached
def p(listvals, cval):
Now have your p take tuple instead of list:
p((12,19,13), 2)

Python HeapSort Time Complexity

I've written the following code for HeapSort, which is working fine:
class Heap(object):
def __init__(self, a):
self.a = a
def heapify(self, pos):
left = 2*pos + 1
right = 2*pos + 2
maximum = pos
if left < len(self.a) and self.a[left] > self.a[maximum]:
maximum = left
if right < len(self.a) and self.a[right] > self.a[maximum]:
maximum = right
if maximum != pos:
self.a[pos], self.a[maximum] = self.a[maximum], self.a[pos]
self.heapify(maximum)
def buildHeap(self):
for i in range(len(self.a)/2, -1, -1):
self.heapify(i)
def heapSort(self):
elements = len(self.a)
for i in range(elements):
print self.a[0]
self.a[0] = self.a[-1]
self.a = self.a[:-1]
self.heapify(0)
def printHeap(self):
print self.a
if __name__ == '__main__':
h = Heap(range(10))
h.buildHeap()
h.printHeap()
h.heapSort()
However, it seems that the function heapSort here will take time O(n^2), due to list slicing. (For a list of size 'n', slicing it to 'n-1' will take O(n-1) time).
Can anyone confirm if my thinking is correct over here ?
If yes, what should be the minimal change in heapSort function to make it run in O(nlogn) ?

Yes, I believe you are correct. To make it faster, replace things like this:
self.a = self.a[:-1]
with:
self.a.pop()
The pop() member function of lists removes and returns the last element in the list, with constant time complexity.
lists are stored as contiguous memory, meaning all the elements of a list are stored one after the other. This is why inserting an element in the middle of a list is so expensive: Python has to shift all the elements after the place you're inserting in down by one, to make space for the new element. However, to simply delete the element at the end of list takes negligible time, as Python merely has to erase that element.

Mapping a list to a Huffman Tree whilst preserving relative order

I'm having an issue with a search algorithm over a Huffman tree: for a given probability distribution I need the Huffman tree to be identical regardless of permutations of the input data.
Here is a picture of what's happening vs what I want:
Basically I want to know if it's possible to preserve the relative order of the items from the list to the tree. If not, why is that so?
For reference, I'm using the Huffman tree to generate sub groups according to a division of probability, so that I can run the search() procedure below. Notice that the data in the merge() sub-routine is combined, along with the weight. The codewords themselves aren't as important as the tree (which should preserve the relative order).
For example if I generate the following Huffman codes:
probabilities = [0.30, 0.25, 0.20, 0.15, 0.10]
items = ['a','b','c','d','e']
items = zip(items, probabilities)
t = encode(items)
d,l = hi.search(t)
print(d)
Using the following Class:
class Node(object):
left = None
right = None
weight = None
data = None
code = None
def __init__(self, w,d):
self.weight = w
self.data = d
def set_children(self, ln, rn):
self.left = ln
self.right = rn
def __repr__(self):
return "[%s,%s,(%s),(%s)]" %(self.data,self.code,self.left,self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def merge(self, other):
total_freq = self.weight + other.weight
new_data = self.data + other.data
return Node(total_freq,new_data)
def index(self, node):
return node.weight
def encode(symbfreq):
pdb.set_trace()
tree = [Node(sym,wt) for wt,sym in symbfreq]
heapify(tree)
while len(tree)>1:
lo, hi = heappop(tree), heappop(tree)
n = lo.merge(hi)
n.set_children(lo, hi)
heappush(tree, n)
tree = tree[0]
def assign_code(node, code):
if node is not None:
node.code = code
if isinstance(node, Node):
assign_code(node.left, code+'0')
assign_code(node.right, code+'1')
assign_code(tree, '')
return tree
I get:
'a'->11
'b'->01
'c'->00
'd'->101
'e'->100
However, an assumption I've made in the search algorithm is that more probable items get pushed toward the left: that is I need 'a' to have the '00' codeword - and this should always be the case regardless of any permutation of the 'abcde' sequence. An example output is:
codewords = {'a':'00', 'b':'01', 'c':'10', 'd':'110', 'e':111'}
(N.b even though the codeword for 'c' is a suffix for 'd' this is ok).
For completeness, here is the search algorithm:
def search(tree):
print(tree)
pdb.set_trace()
current = tree.left
other = tree.right
loops = 0
while current:
loops+=1
print(current)
if current.data != 0 and current is not None and other is not None:
previous = current
current = current.left
other = previous.right
else:
previous = other
current = other.left
other = other.right
return previous, loops
It works by searching for the 'leftmost' 1 in a group of 0s and 1s - the Huffman tree has to put more probable items on the left. For example if I use the probabilities above and the input:
items = [1,0,1,0,0]
Then the index of the item returned by the algorithm is 2 - which isn't what should be returned (0 should, as it's leftmost).

The usual practice is to use Huffman's algorithm only to generate the code lengths. Then a canonical process is used to generate the codes from the lengths. The tree is discarded. Codes are assigned in order from shorter codes to longer codes, and within a code, the symbols are sorted. This gives the codes you are expecting, a = 00, b = 01, etc. This is called a Canonical Huffman code.
The main reason this is done is to make the transmission of the Huffman code more compact. Instead of sending the code for each symbol along with the compressed data, you only need to send the code length for each symbol. Then the codes can be reconstructed on the other end for decompression.
A Huffman tree is not normally used for decoding either. With a canonical code, simple comparisons to determine the length of the next code, and an index using the code value will take you directly to the symbol. Or a table-driven approach can avoid the search for the length.
As for your tree, there are arbitrary choices being made when there are equal frequencies. In particular, on the second step the first node pulled is c with probability 0.2, and the second node pulled is b with probability 0.25. However it would have been equally valid to pull, instead of b, the node that was made in the first step, (e,d), whose probability is also 0.25. In fact that is what you'd prefer for your desired end state. Alas, you have relinquished the control of that arbitrary choice to the heapq library.
(Note: since you are using floating point values, 0.1 + 0.15 is not necessarily exactly equal to 0.25. Though it turns out it is. As another example, 0.1 + 0.2 is not equal to 0.3. You would be better off using integers for the frequencies if you want to see what happens when sums of frequencies are equal to other frequencies or sums of frequencies. E.g. 6,5,4,3,2.)
Some of the wrong ordering can be fixed by fixing some mistakes: change lo.merge(high) to hi.merge(lo), and reverse the order of the bits to: assign_code(node.left, code+'1') followed by assign_code(node.right, code+'0'). Then at least a gets assigned 00 and d is before e and b is before c. The ordering is then adebc.
Now that I think about it, even if you pick (e,d) over b, e.g by setting the probability of b to 0.251, you still don't get the complete order that you're after. No matter what, the probability of (e,d) (0.25) is greater than the probability of c (0.2). So even in that case, the final ordering would be (with the fixes above) abdec instead of your desired abcde. So it is not possible to get what you want assuming a consistent tree ordering and bit assignment with respect to the probabilities of groups of symbols. E.g., assuming that for each branch the stuff on the left has a greater or equal probability than the stuff on the right, and 0 is always assigned to left and 1 is always assigned to right. You would need to do something different.
The different thing that comes to mind is what I said at the start of the answer. Use the Huffman algorithm just to get the code lengths. Then you can assign the codes to the symbols in whatever order you like, and build a new tree. That would be much easier than trying to come up with some sort of scheme to coerce the original tree to be what you want, and proving that that works in all cases.

I'll flesh out what Mark Adler said with working code. Everything he said is right :-) The high points:
You must not use floating-point weights, or any other scheme that loses information about weights. Use integers. Simple and correct. If, e.g., you have 3-digit floating probabilities, convert each to an integer via int(round(the_probability * 1000)), then maybe fiddle them to ensure the sum is exactly 1000.
heapq heaps are not "stable": nothing is defined about which item is popped if multiple items have the same minimal weight.
So you can't get what you want while building the tree.
A small variation of "canonical Huffman codes" appears to be what you do want. Constructing a tree for that is a long-winded process, but each step is straightforward enough. The first tree built is thrown away: the only information taken from it is the lengths of the codes assigned to each symbol.
Running:
syms = ['a','b','c','d','e']
weights = [30, 25, 20, 15, 10]
t = encode(syms, weights)
print t
prints this (formatted for readability):
[abcde,,
([ab,0,
([a,00,(None),(None)]),
([b,01,(None),(None)])]),
([cde,1,
([c,10,(None),(None)]),
([de,11,
([d,110,(None),(None)]),
([e,111,(None),(None)])])])]
Best I understand, that's exactly what you want. Complain if it isn't ;-)
EDIT: there was a bug in the assignment of canonical codes, which didn't show up unless weights were very different; fixed it.
class Node(object):
def __init__(self, data=None, weight=None,
left=None, right=None,
code=None):
self.data = data
self.weight = weight
self.left = left
self.right = right
self.code = code
def is_symbol(self):
return self.left is self.right is None
def __repr__(self):
return "[%s,%s,(%s),(%s)]" % (self.data,
self.code,
self.left,
self.right)
def __cmp__(self, a):
return cmp(self.weight, a.weight)
def encode(syms, weights):
from heapq import heapify, heappush, heappop
tree = [Node(data=s, weight=w)
for s, w in zip(syms, weights)]
sym2node = {s.data: s for s in tree}
heapify(tree)
while len(tree) > 1:
a, b = heappop(tree), heappop(tree)
heappush(tree, Node(weight=a.weight + b.weight,
left=a, right=b))
# Compute code lengths for the canonical coding.
sym2len = {}
def assign_codelen(node, codelen):
if node is not None:
if node.is_symbol():
sym2len[node.data] = codelen
else:
assign_codelen(node.left, codelen + 1)
assign_codelen(node.right, codelen + 1)
assign_codelen(tree[0], 0)
# Create canonical codes, but with a twist: instead
# of ordering symbols alphabetically, order them by
# their position in the `syms` list.
# Construct a list of (codelen, index, symbol) triples.
# `index` breaks ties so that symbols with the same
# code length retain their original ordering.
triples = [(sym2len[name], i, name)
for i, name in enumerate(syms)]
code = oldcodelen = 0
for codelen, _, name in sorted(triples):
if codelen > oldcodelen:
code <<= (codelen - oldcodelen)
sym2node[name].code = format(code, "0%db" % codelen)
code += 1
oldcodelen = codelen
# Create a tree corresponding to the new codes.
tree = Node(code="")
dir2attr = {"0": "left", "1": "right"}
for snode in sym2node.values():
scode = snode.code
codesofar = ""
parent = tree
# Walk the tree creating any needed interior nodes.
for d in scode:
assert parent is not None
codesofar += d
attr = dir2attr[d]
child = getattr(parent, attr)
if codesofar == scode:
# We're at the leaf position.
assert child is None
setattr(parent, attr, snode)
elif child is not None:
assert child.code == codesofar
else:
child = Node(code=codesofar)
setattr(parent, attr, child)
parent = child
# Finally, paste the `data` attributes together up
# the tree. Why? Don't know ;-)
def paste(node):
if node is None:
return ""
elif node.is_symbol():
return node.data
else:
result = paste(node.left) + paste(node.right)
node.data = result
return result
paste(tree)
return tree
Duplicate symbols
Could I swap the sym2node dict to an ordereddict to deal with
repeated 'a'/'b's etc?
No, and for two reasons:
No mapping type supports duplicate keys; and,
The concept of "duplicate symbols" makes no sense for Huffman encoding.
So, if you're determined ;-) to pursue this, first you have to ensure that symbols are unique. Just add this line at the start of the function:
syms = list(enumerate(syms))
For example, if the syms passed in is:
['a', 'b', 'a']
that will change to:
[(0, 'a'), (1, 'b'), (2, 'a')]
All symbols are now 2-tuples, and are obviously unique since each starts with a unique integer. The only thing the algorithm cares about is that symbols can be used as dict keys; it couldn't care less whether they're strings, tuples, or any other hashable type that supports equality testing.
So nothing in the algorithm needs to change. But before the end, we'll want to restore the original symbols. Just insert this before the paste() function:
def restore_syms(node):
if node is None:
return
elif node.is_symbol():
node.data = node.data[1]
else:
restore_syms(node.left)
restore_syms(node.right)
restore_syms(tree)
That simply walks the tree and strips the leading integers off the symbols' .data members. Or, perhaps simpler, just iterate over sym2node.values(), and transform the .data member of each.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.