Space Complexity of this quicksort implementation

Space Complexity of this quicksort implementation - python

This question is general, but also has a function in question:
def quick_sort(lst):
if len(lst) < 2: return lst
pivot_lst = lst[0]
left_side = [el for el in lst[1:] if el < pivot_lst]
right_side = [el for el in lst[1:] if el >= pivot_lst]
return quick_sort(left_side) + [pivot_lst] + quick_sort(right_side)
Time complexity: O(nlog(n)) expected, O(n^2) worst case
Space complexity: ???
So for the expected time complexity, which the best case of would be when left and right are split evenly the following series would apply for n size input:
n + n/2 + n/4 + n/8 +... +1
= n(1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... . )
= O(n)
It follows that in the worst case, which occurs when the pivot point selected is the largest or smallest value in the list, this would apply:
n + (n-1) + (n-2) +... + 1
= (n^2 + n) / 2
= O(n^2)
My question is, do the series' above represent expected and worst space complexities of O(n) and O(n^2), respectively?
I'm struggling with the idea of how stack frame memory comes into play here.
Would we just add it on?
So, if its O(log(n)), then space complexity is O(n) + O(log(n)) -> O(n)
Or would its relationship with the auxiliary data be something else?
Can I conclude that when both an auxiliary data structure and recursive stack are present, we only need to calculate the larger of the two?

Summary
In this implementation of Quicksort, yes—the expected auxiliary space complexity is O(n) and the worst-case auxiliary space complexity is O(n^2).
I'm struggling with the idea of how stack frame memory comes into play here. Would we just add it on?
So, if its O(log(n)), then space complexity is O(n) + O(log(n)) -> O(n)
[...]
Can I conclude that when both an auxiliary data structure and recursive stack are present, we only need to calculate the larger of the two?
No.
I think you're correctly noticing that the recursive stack depth is O(log(n)) in the expected case, but incorrectly thinking that that means its space complexity is also O(log(n)) in the expected case. That's not necessarily true.
An individual stack frame can represent more space than O(1).
How much space a frame represents might vary from frame to frame.
So, when finding an algorithm's total space complexity, you can't analyze its recursion depth separately from its data requirements, and then add the two up at the end. You need to analyze them together.
In general, you'll need to understand:
How deep the recursion goes—how many stack frames there will be.
For each of those stack frames, what its space complexity is. This includes function arguments, local variables, and so on.
Then, you can add up the space complexities of all the stack frames that will be simultaneously active.
Example: Expected case
Imagine this function call tree for n=8. I'm using the notation quick_sort(n) to mean "quicksort with a list of n elements."
quick_sort(8)
quick_sort(4)
quick_sort(2)
quick_sort(1)
quick_sort(1)
quick_sort(2)
quick_sort(1)
quick_sort(1)
quick_sort(4)
quick_sort(2)
quick_sort(1)
quick_sort(1)
quick_sort(2)
quick_sort(1)
quick_sort(1)
Since your implementation is single-threaded, only one branch will be active at a time. At its deepest, that will look like:
quick_sort(8)
quick_sort(4)
quick_sort(2)
quick_sort(1)
Or, in general:
quick_sort(n)
quick_sort(n/2)
quick_sort(n/4)
...
quick_sort(1)
Let's look at the space that each frame will consume.
<calling function>
lst: O(n)
quick_sort(n)
lst: O(1)
pivot_lst: O(1)
left_side: O(n/2)
right_side: O(n/2)
quick_sort(n/2)
lst: O(1)
pivot_lst: O(1)
left_side: O(n/4)
right_side: O(n/4)
quick_sort(n/4)
lst: O(1)
pivot_lst: O(1)
left_side: O(n/8)
right_side: O(n/8)
...
quick_sort(1)
lst: O(1)
Note that I'm considering the lst argument to always have a space complexity of O(1) to reflect Python lists being pass-by-reference. If we made it O(n), O(n/2), etc., we would be double-counting it, because it's really the same object as the calling function's left_side or right_side. This won't end up mattering for the final result of this particular algorithm, but you'll need to keep it in mind, in general.
I'm also being notationally sloppy. Writing O(n/2) makes it tempting to immediately simplify it to O(n). Don't do that yet: if you do, you'll end up overstating the total space complexity.
Simplifying a bit:
<calling function>
lst: O(n)
quick_sort(n)
everything: O(n/2)
quick_sort(n/2)
everything: O(n/4)
quick_sort(n/4)
everything: O(n/8)
...
quick_sort(1)
everything: O(1)
Adding them up:
O(n) + O(n/2) + O(n/4) + O(n/8) + ... + O(1)
= O(n)
Example: Worst case
Using the same methodology as above, but skipping some steps for brevity:
<calling function>
lst: O(n)
quick_sort(n)
everything: O(n-1)
quick_sort(n-1)
everything: O(n-2)
quick_sort(n-2)
everything: O(n-3)
...
quick_sort(1)
everything: O(1)
O(n) + O(n-1) + O(n-2) + O(n-3) + ... + O(1)
= O(n^2)

Related

Time complexity of a for or while loop

The time complexity of a for loop with n as the input is O(n) from what I've understood till now but what about the code inside the loop?
while var in arr:
arr.remove(var)
arr is a list with n elements and var can be a string or a number.
How do I know if I should multiply or add time complexities? Is the time complexity of the above code O(n**2) or O(n)?
for i in range(n):
arr.remove(var)
arr.remove(var1)
What would the time complexity be now? What should I add or multiply?
I tried learning about time complexity but couldn't understand how to deal with code having more than one time complexity.

You need to know the time complexity of the content inside the loop.
for i in arr: # O(n)
print(sum(arr) - i) # O(n)
In this case, the .pop(0) is nested in the forloop, so you need to multiply the complexity to the forloop complexity: O(n) * O(n) > O(n*n) > O(n²).
for i in arr: # O(n)
print(sum(arr) - i) # O(n)
print(sum(arr) - i) # O(n)
In this case, it's
O(n) * (O(n) + O(n))
O(n) * O(n+n)
O(n) * O(2n)
O(n) * O(n)
O(n*n)
O(n²)
See When to add and when to multiply to find time complexity for more information about that.
For a while loop, it doesn't change anything: multiply content with the complexity of the while.

Count consonants in string recursive method

This is the method written in python:
def count_con(s, lo, hi):
vls = ['a', 'e', 'i', 'o', 'u']
if lo == hi:
if s[lo] not in vls:
return 1
else:
return 0
mid = (lo + hi) // 2
count_l = count_con(s, lo, mid)
count_r = count_con(s, mid + 1, hi)
return count_l + count_r
Assume we want to find its time complexity. I've come up with a recurrence relation:
T(n) = 2T(n/2) + f(n), if n > 1 or T(n) = O(1), if n <= 1
However, I cannot determine what f(n) would be. Will combining the two halves take linear O(n) time just like in merge-sort or constant O(1) time? I tend to think that since it's only an addition constant O(1) time is more suitable.
And overall, is this method O(n*logn) or O(logn)?

Will combining the two halves take linear O(n) time just like in merge-sort or constant O(1) time? I tend to think that since it's only an addition constant O(1) time is more suitable.
You are correct.
So if we assume one call does 1 unit of work (ignoring the recursive calls), all we have to do is establish the number of calls. If we assume that n = 2k we end up with 1 + 2 + 4 + ... + 2k = 2n - 1 calls. Which is just O(n).

As mentioned in your question, combining the halves consumes only constant time.
What the code does is, simply recursively check (access) an index if its a vowel and increment the counter. Its overall only O(n).
If you need some insight on when logarithmic complexities come into picture, you can see this question to understand time complexities in general : here
f(n) as mentioned in your question would take only constant time here, but in mergesort that's an O(n),because after the recursive calls,before the sorted arrays are returned, you are accessing and linearly arranging the values of the entire array between indices left and right.

Is the big O runtime of my function N^2 or N log N?

from linkedlist import LinkedList
def find_max(linked_list): # Complexity: O(N)
current = linked_list.get_head_node()
maximum = current.get_value()
while current.get_next_node():
current = current.get_next_node()
val = current.get_value()
if val > maximum:
maximum = val
return maximum
def sort_linked_list(linked_list): # <----- WHAT IS THE COMPLEXITY OF THIS FUNCTION?
print("\n---------------------------")
print("The original linked list is:\n{0}".format(linked_list.stringify_list()))
new_linked_list = LinkedList()
while linked_list.head_node:
max_value = find_max(linked_list)
print(max_value)
new_linked_list.insert_beginning(max_value)
linked_list.remove_node(max_value)
return new_linked_list
Since we loop through the while loop N times, the runtime is at least N. For each loop we call find_max, HOWEVER, for each call to find_max, the linked_list we are parsing to the find_max is reduced by one element. Based on that, isn't the runtime N log N?
Or is it N^2?

It's still O(n²); the reduction in size by 1 each time just makes the effective work n * n / 2 (because on average, you have to deal with half the original length on each pass, and you're still doing n passes). But since constant factors aren't included in big-O notation, that simplifies to just O(n²).
For it to be O(n log n), each step would have to halve the size of the list to scan, not simply reduce it by one.

It's n + n-1 + n-2 + ... + 1 which is arithmetic sequence so it is n(n+1)/2. So in big O notation it is O(n^2).

Don't forget, O-notation deals in terms of worst-case complexity, and describes an entire class of functions. As far as O-notation goes, the following two functions are the same complexity:
64x^2 + 128x + 256 --> O(n^2)
x^2 - 2x + 1 --> O(n^2)
In your case (and your algorithm what's called a selection sort, picking the best element in the list and putting it in the new list; other O(n^2) sorts include insertion sort and bubble sort), you have the following complexities:
0th iteration: n
1st iteration: n-1
2nd iteration: n-2
...
nth iteration: 1
So the entire complexity would be
n + (n-1) + (n-2) + ... + 1 = n(n+1)/2 = 1/2n^2 + 1/2n
which is still O(n^2), though it'd be on the low side of that class.

Time complexity regarding O(LogN)

Let's say we have the following code.
def problem(n):
list = []
for i in range(n):
list.append(i)
length = len(list)
return list
The program has time complexity of O(n) if we don't calculate len(list). But if we do, will the time complexity be O(n * log(n)) or O(n^2)? .

No, the len() function has constant time in python and it is not dependent on the length of the element, your time complexity for the above code would remain O(N) governed by your for i in range(n) loop. Here is the time complexity for many CPython functions, like len()! (Get Length in table)

What's the time complexity for max heap?

I'm trying to figure out the time complexity for this whole algorithm. Isit O(nlogn) or O(n)? I've been searching online and some says max heap it's O(nlogn) and some are O(n). I am trying to get the time complexity O(n).
def max_heapify(A, i):
left = 2 * i + 1
right = 2 * i + 2
largest = i
if left < len(A) and A[left] > A[largest]:
largest = left
if right < len(A) and A[right] > A[largest]:
largest = right
if largest != i:
A[i], A[largest] = A[largest], A[i]
max_heapify(A, largest)
def build_max_heap(A):
for i in range(len(A) // 2, -1, -1):
max_heapify(A, i)
return A

The code you have in the question rearranges array elements such that they satisfy the heap property i.e. the value of the parent node is greater than that of the children nodes. The time complexity of the heapify operation is O(n).
Here's an extract from [Wikipedia page on Min-max heap](https://en.wikipedia.org/wiki/Min-max_heap#Build
Creating a min-max heap is accomplished by an adaption of Floyd's linear-time heap construction algorithm, which proceeds in a bottom-up fashion.[10] A typical Floyd's build-heap algorithm[11] goes as follows:
function FLOYD-BUILD-HEAP (h):
for each index i from floor(length(h)/2) down to 1 do:
push-down(h, i)
return h
Here the function FLOYD-BUILD-HEAP is same as your build_max_heap function and push-down is same as your max_heapify function.
A suggestion: the naming of your functions is a little confusing. Your max_heapify is not actually heapifying. It is just a part of the heapify operation. A better name could be something like push_down (as used in Wikipedia) or fix_heap.

A heap is a data structure which supports operations including insertion and retrieval. Each operation has its own runtime complexity.
Maybe you were thinking of the runtime complexity of heapsort which is a sorting algorithm that uses a heap. In that case, the runtime complexity is O(n*log(n)).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Space Complexity of this quicksort implementation - python

Related

Time complexity of a for or while loop

Count consonants in string recursive method

Is the big O runtime of my function N^2 or N log N?

Time complexity regarding O(LogN)

What's the time complexity for max heap?

Categories

Resources