What's the time complexity for max heap? - python

I'm trying to figure out the time complexity for this whole algorithm. Isit O(nlogn) or O(n)? I've been searching online and some says max heap it's O(nlogn) and some are O(n). I am trying to get the time complexity O(n).
def max_heapify(A, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < len(A) and A[left] > A[largest]:
largest = left
if right < len(A) and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)
def build_max_heap(A):
for i in range(len(A) // 2, -1, -1):
max_heapify(A, i)
return A

The code you have in the question rearranges array elements such that they satisfy the heap property i.e. the value of the parent node is greater than that of the children nodes. The time complexity of the heapify operation is O(n).
Here's an extract from [Wikipedia page on Min-max heap](https://en.wikipedia.org/wiki/Min-max_heap#Build
Creating a min-max heap is accomplished by an adaption of Floyd's linear-time heap construction algorithm, which proceeds in a bottom-up fashion.[10] A typical Floyd's build-heap algorithm[11] goes as follows:
function FLOYD-BUILD-HEAP (h):
for each index i from floor(length(h)/2) down to 1 do:
push-down(h, i)
return h
Here the function FLOYD-BUILD-HEAP is same as your build_max_heap function and push-down is same as your max_heapify function.
A suggestion: the naming of your functions is a little confusing. Your max_heapify is not actually heapifying. It is just a part of the heapify operation. A better name could be something like push_down (as used in Wikipedia) or fix_heap.

A heap is a data structure which supports operations including insertion and retrieval. Each operation has its own runtime complexity.
Maybe you were thinking of the runtime complexity of heapsort which is a sorting algorithm that uses a heap. In that case, the runtime complexity is O(n*log(n)).

Related

How to find the recurrence relation, and calculate Master Theorem of a Merge Sort Code?

I'm trying to find the Master Theorem of this Merge Sort Code, but first I need to find its recurrence relation, but I'm struggling to do and understand both. I already saw some similar questions here, but couldn't understand the explanations, like, first I need to find how many operations the code has? Could someone help me with that?
def mergeSort(alist):
print("Splitting ",alist)
if len(alist)>1:
mid = len(alist)//2
lefthalf = alist[:mid]
righthalf = alist[mid:]
mergeSort(lefthalf)
mergeSort(righthalf)
i=0
j=0
k=0
while i < len(lefthalf) and j < len(righthalf):
if lefthalf[i] < righthalf[j]:
alist[k]=lefthalf[i]
i=i+1
else:
alist[k]=righthalf[j]
j=j+1
k=k+1
while i < len(lefthalf):
alist[k]=lefthalf[i]
i=i+1
k=k+1
while j < len(righthalf):
alist[k]=righthalf[j]
j=j+1
k=k+1
print("Merging ",alist)
alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)
To determine the run-time of a divide-and-conquer algorithm using the Master Theorem, you need to express the algorithm's run-time as a recursive function of input size, in the form:
T(n) = aT(n/b) + f(n)
T(n) is how we're expressing the total runtime of the algorithm on an input size n.
a stands for the number of recursive calls the algorithm makes.
T(n/b) represents the recursive calls: The n/b signifies that the input size to the recursive calls is some particular fraction of original input size (the divide part of divide-and-conquer).
f(n) represents the amount of work you need to do to in the main body of the algorithm, generally just to combine solutions from recursive calls into an overall solution (you could say this is the conquer part).
Here's a slightly re-factored definition of mergeSort:
def mergeSort(arr):
if len(arr) <= 1: return # array size 1 or 0 is already sorted
# split the array in half
mid = len(arr)//2
L = arr[:mid]
R = arr[mid:]
mergeSort(L) # sort left half
mergeSort(R) # sort right half
merge(L, R, arr) # merge sorted halves
We need to determine, a, n/b and f(n)
Because each call of mergeSort makes two recursive calls: mergeSort(L) and mergeSort(R), a=2:
T(n) = 2T(n/b) + f(n)
n/b represents the fraction of the current input that recursive calls are made with. Because we are finding the midpoint and splitting the input in half, passing one half the current array to each recursive call, n/b = n/2 and b=2. (if each recursive call instead got 1/4 of the original array b would be 4)
T(n) = 2T(n/2) + f(n)
f(n) represents all the work the algorithm does besides making recursive calls. Every time we call mergeSort, we calculate the midpoint in O(1) time.
We also split the array into L and R, and technically creating these two sub-array copies is O(n). Then, presuming mergeSort(L), sorted the left half of the array, and mergeSort(R) sorted the right half, we still have to merge the sorted sub-arrays together to sort the entire array with the merge function. Together, this makes f(n) = O(1) + O(n) + complexity of merge. Now let's take a look at merge:
def merge(L, R, arr):
i = j = k = 0 # 3 assignments
while i < len(L) and j < len(R): # 2 comparisons
if L[i] < R[j]: # 1 comparison, 2 array idx
arr[k] = L[i] # 1 assignment, 2 array idx
i += 1 # 1 assignment
else:
arr[k] = R[j] # 1 assignment, 2 array idx
j += 1 # 1 assignment
k += 1 # 1 assignment
while i < len(L): # 1 comparison
arr[k] = L[i] # 1 assignment, 2 array idx
i += 1 # 1 assignment
k += 1 # 1 assignment
while j < len(R): # 1 comparison
arr[k] = R[j] # 1 assignment, 2 array idx
j += 1 # 1 assignment
k += 1 # 1 assignment
This function has more going on, but we just need to get it's overall complexity class to be able to apply the Master Theorem accurately. We can count every single operation, that is, every comparison, array index, and assignment, or just reason about it more generally. Generally speaking, you can say that across the three while loops we are going to iterate through every member of L and R and assign them in order to the output array, arr, doing a constant amount of work for each element. Noting that we are processing every element of L and R (n total elements) and doing a constant amount of work for each element would be enough to say that merge is in O(n).
But, you can get more particular with counting operations if you want. For the first while loop, every iteration we make 3 comparisons, 5 array indexes, and 2 assignments (constant numbers), and the loop runs until one of L and R is fully processed. Then, one of the next two while loops may run to process any leftover elements from the other array, performing 1 comparison, 2 array indexes, and 3 variable assignments for each of those elements (constant work). Therefore, because each of the n total elements of L and R cause at most a constant number of operations to be performed across the while loops (either 10 or 6, by my count, so at most 10), and the i=j=k=0 statement is only 3 constant assignments, merge is in O(3 + 10*n) = O(n). Returning to the overall problem, this means:
f(n) = O(1) + O(n) + complexity of merge
= O(1) + O(n) + O(n)
= O(2n + 1)
= O(n)
T(n) = 2T(n/2) + n
One final step before we apply the Master Theorem: we want f(n) written as n^c. For f(n) = n = n^1, c=1. (Note: things change very slightly if f(n) = n^c*log^k(n) rather than simply n^c, but we don't need to worry about that here)
You can now apply the Master Theorem, which in its most basic form says to compare a (how quickly the number of recursive calls grows) to b^c (how quickly the amount of work per recursive call shrinks). There are 3 possible cases, the logic of which I try to explain, but you can ignore the parenthetical explanations if they aren't helpful:
a > b^c, T(n) = O(n^log_b(a)). (The total number of recursive calls is growing faster than the work per call is shrinking, so the total work is determined by the number of calls at the bottom level of the recursion tree. The number of calls starts at 1 and is multiplied by a log_b(n) times because log_b(n) is the depth of the recursion tree. Therefore, total work = a^log_b(n) = n^log_b(a))
a = b^c, T(n) = O(f(n)*log(n)). (The growth in number of calls is balanced by the decrease in work per call. The work at each level of the recursion tree is therefore constant, so total work is just f(n)*(depth of tree) = f(n)*log_b(n) = O(f(n)*log(n))
a < b^c, T(n) = O(f(n)). (The work per call shrinks faster than the number of calls increases. Total work is therefore dominated by the work at the top level of the recursion tree, which is just f(n))
For the case of mergeSort, we've seen that a = 2, b = 2, and c = 1. As a = b^c, we apply the 2nd case:
T(n) = O(f(n)*log(n)) = O(n*log(n))
And you're done. This may seem like a lot work, but coming up with a recurrence for T(n) gets easier the more you do it, and once you have a recurrence it's very quick to check which case it falls under, making the Master Theorem quite a useful tool for solving more complicated divide/conquer recurrences.

sorting a concatenation of sorted arrays

What is the sorting algorithm most optimized for sorting an array that consists of 2 sorted sub-arrays?
This question came up when I was solving: https://leetcode.com/problems/squares-of-a-sorted-array/
My solution is as follows:
def sortedSquares(self, nums: List[int]) -> List[int]:
n = len(nums)
l, r = 0, n-1
res = [0] * n
for i in range(n-1, -1, -1):
if abs(nums[l]) > abs(nums[r]):
res[i] = nums[l] ** 2
l += 1
else:
res[i] = nums[r] ** 2
r -= 1
return res
However this solution beats mine:
def sortedSquares(self, nums: List[int]) -> List[int]:
return sorted([i * i for i in nums])
If timsort is doing insertion sort on small chunks, then wouldn't an array like this cause O(n^2) on half the runs? How is that better than my supposedly O(n) solution?
Thank you!
In this case, the theoretical time complexity is O(n) because you don't need to sort at all (merely merge two ordered lists). Performing a sort generally has a O(NlogN) complexity.
Complexity and performance are two different things however. Your O(n) solution in Python code is competing with O(NlogN) in highly optimized low level C code. In theory there should be a point where the size of the list is large enough for the Python based O(n) solution to catch up with the O(NlogN) profile of the native sort but you can expect that this would be a very large number of elements (probably larger than your computer's memory can hold). If Python's native sort is smart enough to switch to a radix sorting algorithm for integers or if its sorting algorithm benefits from partially sorted data, it will be impossible to catch up.

Python - How to calculate this recursive function time complexity?

I wanted to solve the tower hopper problem in as much ways that I can and calculate each way's time complexity (just for self practice).
One of the solution is this:
def is_hopable(arr):
if len(arr) < 1 or arr[0] == 0:
return False
if arr[0] >= len(arr):
return True
res = False
for i in range(1,arr[0]+1):
res = res or is_hopable(arr[i:]) # This line
return res
I know the general idea of recursive time complexity calculation but I'm having trouble to analyze the commented line (inside the for loop). Usually I calculate the time complexity with T(n) = C + T(that line) and reduce it with a general expression (for example T(n-k)) until I reach the base case and can express k with n, but what is the time complexity of that for loop?
The complexity of that for loop could be up to O(n^2) because every iteration of the loop (up to n iterations) do a slice arr[i:] that return a copy of arr without first i elements O(n). With that in mind overall time is O(n^3).
Mentioned upper bound is tight.
Example: arr = [n-1, n-2, n-3, ..., 1, 1]
Alternative form: arr[i] = n - 1 - i for all i, 0 <= i < n - 1, and arr[n-1] = 1 where n is length of arr.
The recurrence to calculate amount of elemental operations (avoiding the use of constant) can be stated as:
Simplify summation:
Evaluate (unroll) lesser terms of T and search a lower bound:
Use formula of sum of squares from 1 to n:
As T(n) lower bound is a polynomial of degree 3 we have found that such instance of the problem running time is Ω(n^3) proving that the upper bound for the problem (O(n^3)) is tight.
Side note:
If you use as parameters original array and current index the runtime of for loop will be O(n) and overall time O(n^2).

Time complexity of solution to the four sum problem?

Given an array of integers, find all unique quartets summing up to a
specified integer.
I will provide two different solutions below, I was just wondering which one was more efficient with respect to time complexity?
Solution 1:
def four_sum(arr, s):
n = len(arr)
output = set()
for i in range(n-2):
for j in range(i+1, n-1):
seen = set()
for k in range(j+1, n):
target = s - arr[i] - arr[j] - arr[k]
if target in seen:
output.add((arr[i], arr[j], arr[k], target))
else:
seen.add(arr[k])
return print('\n'.join(map(str, list(output))))
I know that this has time complexity of O(n^3).
Solution 2:
def four_sum2(arr, s):
n = len(arr)
seen = {}
for i in range(n-1):
for j in range(i+1, n):
if arr[i] + arr[j] in seen:
seen[arr[i] + arr[j]].add((i, j))
else:
seen[arr[i] + arr[j]] = {(i, j)}
output = set()
for key in seen:
if s - key in seen:
for (i, j) in seen[key]:
for (p, q) in seen[s - key]:
sorted_index = tuple(sorted((arr[i], arr[j], arr[p], arr[q])))
if i not in (p, q) and j not in (p, q):
output.add(sorted_index)
return output
Now, the first block has a time complexity of O(n^2), but I'm not sure what the time complexity is on the second block?
TLDR: the complexity of this algorithm is O(n^4).
In the first part, a tuple is added in seen for all pair (i,j) where j>i.
Thus the number of tuples in seen is about (n-1)*n/2 = O(n^2) as you guess.
The second part is a bit more complex. If we ignore the first condition of the nested loops (critical case), the two first loops can iterate over all possible tuples in seen. Thus the complexity is at least O(n^2). For the third loop, it is a bit tricky: it is hard to know the complexity without making any assumption on the input data. However, we can assume that there is theoretically a critical case where seen[s - key] contains O(n^2) tuples. In such a case, the overall algorithm would run in O(n^4)!
Is this theoretical critical case practical?
Well, sadly yes. Indeed, take the input arr = [5, 5, ..., 5, 5] with s = 20 for example. The seen map will contains one key (10) associated to an array with (n-1)*n/2 = O(n^2) elements. In this case the two first loops of the second part will run in O(n^2) and third nested loop in O(n^2) too.
Thus the overall algorithm run in O(n^4).
However, note that in practice such case should be quite rare and the algorithm should run much faster on random inputs with many different numbers. The complexity can probably be improved to O(n^3) or even O(n^2) if this critical case is fixed (eg. by computing this pathological case separately).

Is the big O runtime of my function N^2 or N log N?

from linkedlist import LinkedList
def find_max(linked_list): # Complexity: O(N)
current = linked_list.get_head_node()
maximum = current.get_value()
while current.get_next_node():
current = current.get_next_node()
val = current.get_value()
if val > maximum:
maximum = val
return maximum
def sort_linked_list(linked_list): # <----- WHAT IS THE COMPLEXITY OF THIS FUNCTION?
print("\n---------------------------")
print("The original linked list is:\n{0}".format(linked_list.stringify_list()))
new_linked_list = LinkedList()
while linked_list.head_node:
max_value = find_max(linked_list)
print(max_value)
new_linked_list.insert_beginning(max_value)
linked_list.remove_node(max_value)
return new_linked_list
Since we loop through the while loop N times, the runtime is at least N. For each loop we call find_max, HOWEVER, for each call to find_max, the linked_list we are parsing to the find_max is reduced by one element. Based on that, isn't the runtime N log N?
Or is it N^2?
It's still O(n²); the reduction in size by 1 each time just makes the effective work n * n / 2 (because on average, you have to deal with half the original length on each pass, and you're still doing n passes). But since constant factors aren't included in big-O notation, that simplifies to just O(n²).
For it to be O(n log n), each step would have to halve the size of the list to scan, not simply reduce it by one.
It's n + n-1 + n-2 + ... + 1 which is arithmetic sequence so it is n(n+1)/2. So in big O notation it is O(n^2).
Don't forget, O-notation deals in terms of worst-case complexity, and describes an entire class of functions. As far as O-notation goes, the following two functions are the same complexity:
64x^2 + 128x + 256 --> O(n^2)
x^2 - 2x + 1 --> O(n^2)
In your case (and your algorithm what's called a selection sort, picking the best element in the list and putting it in the new list; other O(n^2) sorts include insertion sort and bubble sort), you have the following complexities:
0th iteration: n
1st iteration: n-1
2nd iteration: n-2
...
nth iteration: 1
So the entire complexity would be
n + (n-1) + (n-2) + ... + 1 = n(n+1)/2 = 1/2n^2 + 1/2n
which is still O(n^2), though it'd be on the low side of that class.

Categories

Resources