Python HeapSort Time Complexity

Python HeapSort Time Complexity - python

I've written the following code for HeapSort, which is working fine:
class Heap(object):
def __init__(self, a):
self.a = a
def heapify(self, pos):
left = 2*pos + 1
right = 2*pos + 2
maximum = pos
if left < len(self.a) and self.a[left] > self.a[maximum]:
maximum = left
if right < len(self.a) and self.a[right] > self.a[maximum]:
maximum = right
if maximum != pos:
self.a[pos], self.a[maximum] = self.a[maximum], self.a[pos]
self.heapify(maximum)
def buildHeap(self):
for i in range(len(self.a)/2, -1, -1):
self.heapify(i)
def heapSort(self):
elements = len(self.a)
for i in range(elements):
print self.a[0]
self.a[0] = self.a[-1]
self.a = self.a[:-1]
self.heapify(0)
def printHeap(self):
print self.a
if __name__ == '__main__':
h = Heap(range(10))
h.buildHeap()
h.printHeap()
h.heapSort()
However, it seems that the function heapSort here will take time O(n^2), due to list slicing. (For a list of size 'n', slicing it to 'n-1' will take O(n-1) time).
Can anyone confirm if my thinking is correct over here ?
If yes, what should be the minimal change in heapSort function to make it run in O(nlogn) ?

Yes, I believe you are correct. To make it faster, replace things like this:
self.a = self.a[:-1]
with:
self.a.pop()
The pop() member function of lists removes and returns the last element in the list, with constant time complexity.
lists are stored as contiguous memory, meaning all the elements of a list are stored one after the other. This is why inserting an element in the middle of a list is so expensive: Python has to shift all the elements after the place you're inserting in down by one, to make space for the new element. However, to simply delete the element at the end of list takes negligible time, as Python merely has to erase that element.

Related

How to Initialize a Min Heap?

I'm trying to figure out how I can initialize a min heap using an array. So far my function looks like this:
def start_heap(self,n):
# None positions to be filled with Nodes
arr = [none][] * n
for i in arr:
heapify(i)
def heapify(self):
start = self.parent(len(self) - 1) # Start at parent of last leaf
for j in range(start, -1, -1): # going to and including the root.
self.heapify_down(j)
def heapify_down(self, i):
n = len(self._pq)
left, right = self.left(i), self.right(i)
if left < n:
child = left
if right < n and self.pq[right] < self.pq[left]:
child = right
if self.pq[child] < self.pq[i]:
self.swap(i, child)
self.heapify_down(child)
Heapify Down Pseudocode:
Heapify-down(H,i): Let n = length(H) If 2i>n then
Terminate with H unchanged Else if 2i<n then
Let left=2i, and right=2i+1
Let j be the index that minimizes key[H[left]] and key[H[right]] Else if 2i=n then
Let j=2i Endif
If key[H[j]] < key[H[i]] then
swap the array entries H[i] and H[j] Heapify-down(H , j)
Endif
I'm going to build a simple node class that just holds data but I'm not sure how to actually get the start_heap function working. Keep in mind n is the maximum number of elements that can be stored.

Some remarks on the code you provided (not on the code you didn't provide):
arr is a local variable, so whatever you do with it, once the function returns, that arr will be out of scope... and lost. You need an attribute or else subclass list
It is not common practice to "allocate" the array and fill it with None. In Python lists are dynamic, so you don't need to reserve slots ahead of time. You just need an empty list to start with.
There is no need to call heapify when the heap is empty.
There is certainly no need to call heapify many times in a loop. All the logic for heapifying is already present in that method, so no need to call it on each index individually. But as stated in the previous point: no need to call it on an empty list -- there is nothing to move.
So the correction is quite basic:
def start_heap(self, max_size):
self.max_size = max_size
self._pq = []
Then, in many of your other methods, you will have to work with self._pq and self.max_size.
For instance, it could have a method that indicates whether the heap is full:
def is_full(self):
return len(self._pq) >= self.max_size
If you have an add method, it would first check if there is still room:
def add(self, node):
if is_full(self):
raise ValueError("Cannot add value to the heap: it is full")
# ... rest of your code ...

What is the space complexity of a list in a while loop?

I believe that a similar question has been asked for Java but I'm not sure whether the same applies to Python since we don't explicitly use the new keyword
For this particular code:
x = 5
while (x > 0):
arr = []
arr2 = []
arr.append(1)
arr2.append(2)
x -= 1
After executing this code, will there be a total of 10 different lists being created, which is a space complexity of O(10), or will there only be 2 lists being created, which is a space complexity of O(2).
I understand the overall space complexity is still O(1) but just wanted to find out what happens under the hood.

Firstly, since you wrote arr = [] inside the while loop it will rewrite the previous array, hence both arr and arr2 will have at most 1 element
Secondly, with the formal definition of Big-O complexity O(1) and O(2) are considered the same constant-complexity, Big-O complexity is meant to be used with a variable to capture complexity relative to a variable.
If you want to know whether or not python creates a new array with your code, you can override the default list object to log it's operations:
class ilist(list):
def __init__(self, r=list()):
print("Making new list: " + str(r))
list.__init__(self, r)
def __del__(self):
print("Deleting list")
x = 5
while (x > 0):
arr = ilist()
arr2 = []
arr.append(1)
arr2.append(2)
x -= 1
print("Finished while")
output:
Making new list: []
Making new list: []
Deleting list
Making new list: []
Deleting list
Making new list: []
Deleting list
Making new list: []
Deleting list
Finished while
Deleting list
and as you can see it indeed creates and deletes the array every time since that array is created and used only inside the block of while.
But it behaves as it should, if your intention was to create it once, then you should declare it in the outer scope.

Recursion step doesn't update output

I have a problem with the recursion. The function I wrote should recursively generate and return a list of pairs, called chain. The breaking condition is when the pair, (remainder, quotient) already belongs to the chain-list, then stop iterating and return the list. Instead of completing, the recursion just blows up, raising a RecursionError. The list doesn't update and contains only a single term, so the breaking condition is not executed. I don't understand why...
How should I proper implement the recursive step to make the list update?
def proper_long_division(a, b):
"""a < b"""
chain = []
block_size = len(str(b)) - len(str(a))
a_new_str = str(a) + '0' * block_size
a_new = int(a_new_str)
if a_new < b:
a_new = int(a_new_str + '0')
quotient = a_new // b
remainder = a_new - b * quotient
print(remainder)
#print(chain)
# breaking condition <--- !
if (remainder, quotient) in chain:
return chain
# next step
chain.append((remainder, quotient))
chain.extend(proper_long_division(remainder, b))
return chain
try:
a = proper_long_division(78, 91)
print(a)
except RecursionError:
print('boom')
Here a an example of recursion which (should) follows the same structure but the returned list is updated. I don't know why one code works while the other does not.
import random
random.seed(1701)
def recursion():
nrs = []
# breaking condition
if (r := random.random()) > .5:
return nrs
# update
nrs.append(r)
# recursive step
nrs.extend(recursion())
return nrs
a = recursion()
print(a)
# [0.4919374389681155, 0.4654907396198952]

When you enter proper_long_division, the first thing you do is chain = []. That means that the local variable chain refers to a new empty list. Then you do some algebra, which does not affect chain, and check if (remainder, quotient) in chain:. Clearly this will always be False, since chain was and has remained empty.
The next line, chain.append((remainder, quotient)) runs just fine, but remember that only this call to proper_long_division has a reference to it.
Now you call chain.extend(proper_long_division(remainder, b)). You seem to expect that the recursive call will be able to check and modify chain. However, the object referred to by chain in a given call of proper_long_division is only visible within that call.
To fix that, you can use a piece of shared memory that any invocation of the recursive function can see. You could use a global variable, but that would make the function have unpredictable behavior since anyone could modify the list. A better way would be to use a nested function that has access to a list in the enclosing scope:
def proper_long_division(a, b):
"""a < b"""
chain = {}
def nested(a, b):
while a < b:
a *= 10
quotient = a // b
remainder = a - b * quotient
key = (remainder, quotient)
if key in chain:
return chain
# next step
chain[key] = None
nested(remainder, b)
nested(a, b)
return list(chain.keys())
A couple of suggested changes are showcased above. Multiplication by 10 is the same as padding with a zero to the right, so you don't need to play games with strings. Lookup in a hashtable is much faster than a list. Since ordering is important, you can't use a set. Instead, I turned chain into a dict, which is ordered as of python 3.6, and used only the keys for lookup. The values all refer to the singleton None.
The second example does not match the structure of the first in the one way that matters: you do not use nrs as part of your exit criterion.

What is the difference between heappop and pop(0) on a sorted list?

I have a list stored in this format: [(int, (int, int)), ...]
My code looks like this for the first case.
heap.heappush(list_, (a, b)) # b is a tuple
while len(list_):
temp = heap.heappop(list_)[1]
Now my ideal implementation would be
list_.append(a, b) # b is a tuple
while len(list_):
list_.sort(key = lambda x: x[0])
temp = list_.pop(0)[1]
The second implementation causes issues in other parts of my code. Is there any reason the second is incorrect, and how could I correct it to work like the heapq
EDIT: I know heappop() pops the smallest value out, which is why I have sorted the list based off of the 'a' (which heappop uses too, I assume)

To work with heapq you have to be aware python implements min heaps. That means the element at index 0 is always the smallest.
Here is what you want implemented with heapq:
import heapq
from typing import Tuple
class MyObject:
def __init__(self, a: int, b: Tuple(int, int)):
self.a = a
self.b = b
def __lt__(self, other):
return self.a < other.a
l = [MyObject(..,..), MyObject(..,..),..,MyObject(..,..)] # your list
heapq.heapify(l) # convert l to a heap
heapq.heappop(l) # retrieves and removes the smallest element (which lies at index 0)
# After popping the smallest element, the heap internally rearranges the elements to get the new smallest value to index 0. I.e. it maintaines the "heap variant" automatically and you don't need to explicitly sort!
Notice:
I didn't need to sort. The heap is by nature a semi-sorted structure
I needed to create a dedicated class for my objects. This is cleaner than working with tuples of tuples and also allowed me to override less than lt so that the heap knows how to create the tree internally
Here is more detailed info for the avid reader :D
When you work with heaps then you shouldn't need to explicitly sort. A heap by nature is a semi-sorted topology that (in the case of python heapq) keeps the smallest element at index 0. You don't however, see the tree structure underneath (remember, a heap is a tree in whicheach parent node is smaller than all its children and descendants).
Moreover, you don't need to append the elements one by one. Rather use heapq.heapify(). This will also ensure that no redundant heapify-up/heapify-down operations. Otherwise, you will have a serious run time issue if your list is huge :)

Data structure with O(1) random removal and adds for shuffling generator order

I need a data structure that lets you add elements and remove them randomly in O(1) time.
The reason for this that I need to shuffle data from a generator, but I can't load everything into memory at the same time due to size.
This is an example of usage, which automatically shuffles the order of the results generated by a generator expression without loading everything into memory:
def generator_shuffler(generator)
a = magical_data_structure_described_above
for i in generator:
a.add(i)
if len(a) > 10: yield a.poprandom()
Initially I tried a python set(), however from here: Set.pop() isn't random?, it seems that set() doesn't actually remove the items in an arbitrary order. How would I implement the data structure with the above usage?

If you want to pop randomly, why don't you just use a list and implement pop by swapping the last element with some randomly-selected element and then dropping the new last element? That won't preserve the order of the remaining elements in the data structure, but "pop randomly" and "shuffle" suggest that you don't really care.

Finding and removing a random element in a collection is generally O(k) when using pop, however, you can modify the action so that the list is shuffled when checking for length, that way, both the add and pop operations remain O(1):
import random
class RandomStack:
def __init__(self, _d = None):
self.stack = _d if _d else []
def __len__(self):
random.shuffle(self.stack)
return len(self.stack)
def add(self, _val):
self.stack.append(_val)
def poprandom(self):
return self.stack.pop()
a = RandomStack()
for i in range(16):
a.add(i)
if len(a) > 10:
val = a.poprandom()
print(val)
Output:
2
4
9
0
6
12

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python HeapSort Time Complexity - python

Related

How to Initialize a Min Heap?

What is the space complexity of a list in a while loop?

Recursion step doesn't update output

What is the difference between heappop and pop(0) on a sorted list?

Data structure with O(1) random removal and adds for shuffling generator order

Categories

Resources