Heap Sort Algorithm number of comparisons - python

I'm trying to count the number of comparisons in this heap sort algorithm:
import random
import time
#HeapSort Algorithm
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
heapify(arr, i, 0)
return count
print("For n = 1000:")
print("a) Random Generation:")
arr = [x for x in range(1000)]
random.shuffle(arr)
print("Before Sort:")
print (arr)
print("After Sort:")
start_time = time.time()
heapSort(arr)
time = time.time() - start_time
print(arr)
print("Comparisions")
print(heapSort(arr))
print("Time:")
print(time)
I expect the result when n = 1000 integers to be 8421 and when n = 10000 to be 117681
However, each time it either shows 0 or 2001 when I try to count += 1 around the loops and not comparisons.

You seem to be forgetting to take into account the comparisons your recursive solution makes while solving the smaller subproblems. In other words, you are only finding the comparisons made in the topmost level of your solution. Instead, you should update the count variable in the relevant scope whenever you make a call to your heapify function. Notice the updates below where I increased local count variables by the return value of calls to heapify.
def heapify(arr, n, i):
count = 0
largest = i
l = 2 * i + 1
r = 2 * i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
count += 1
arr[i],arr[largest] = arr[largest],arr[i]
count += heapify(arr, n, largest)
return count
def heapSort(arr):
n = len(arr)
count = 0
for i in range(n, -1, -1):
heapify(arr, n, i)
count += heapify(arr, i, 0)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
count += heapify(arr, i, 0)
return count
Here is a working example of your code including the fix given above. I understand that the output is still slightly different than the exact number of comparisons you are expecting, but it is in the ballpark. The relatively small distance is due to the fact that you are randomizing the initial state of the array.

Related

Duplicate pairs in an array

Given an array A with zero index and N integers find equal elements with different positions in the array. Pair of indexes (P,Q) such that 0 <= P < Q < N such that A[P] = A[Q].
My idea:
def function(arr, n) :
count = 0
arr.sort()
i = 0
while i < (n-1) :
if (arr[i] == arr[i + 1]) :
count += 1
i = i + 2
else :
i += 1
return count
Two questions:
How do I avoid counting elements whose first indices are not smaller than the second indices?
How do I build a function where the input is only the array? (So not (arr, n))
What you can do is similar to this:
This one is the naive approach:
def function(arr) :
count = 0
n = len(arr)
i = 0
for i in range(n):
for j in range(i+1,n):
if arr[i]==arr[j]:
count+=1
return count
This one is more optimized approach you can try:
def function(arr) :
mp = dict()
n = len(arr)
for i in range(n):
if arr[i] in mp.keys():
mp[arr[i]] += 1
else:
mp[arr[i]] = 1
ans = 0
for it in mp:
count = mp[it]
ans += (count * (count - 1)) // 2
return ans
You can use collections.Counter to count the number of occurrences of every integer,
then use math.comb with n=count and k=2 to get the number of such pairs for every integer, and simply sum them:
from collections import Counter
from math import comb
def function(arr):
return sum(comb(count, 2) for num,count in Counter(arr).items())
print(function([1,2,3,6,3,6,3,2]))
The reason math.comb(count,2) is exactly the number of pairs is that any 2 elements out of the count you choose, regardless of their order, are a single pair: the former one is P and the latter is Q.
EDIT: Added timeit benchmakrs:
Here's a full example you can test to compare the performance of both methods:
from timeit import timeit
from random import randint
from collections import Counter
from math import comb
def with_comb(arr):
return sum(comb(count, 2) for num,count in Counter(arr).items())
def with_loops(arr):
mp = dict()
n = len(arr)
for i in range(n):
if arr[i] in mp.keys():
mp[arr[i]] += 1
else:
mp[arr[i]] = 1
ans = 0
for it in mp:
count = mp[it]
ans += (count * (count - 1)) // 2
return ans
a = [randint(1,1000) for _ in range(10000)]
time1 = timeit('with_loops(a)', globals=globals(), number=1000)
time2 = timeit('with_comb(a)', globals=globals(), number=1000)
print(time1)
print(time2)
print(time1/time2)
Output (on my laptop):
2.9549962
0.8175686999999998
3.6143705110041524

The indexing of heap sort algorithm from Geeks for Geeks seems little off

Just a quick question. I have been looking at the HeapSort algorithm from Geeks for Geeks in Python, and When it's building maxheap under heapSort function, the range goes from (n,-1,-1) should it not be (n-1,-1,-1)? Below is the code snippet
# Python program for implementation of heap Sort
# To heapify subtree rooted at index i.
# n is size of heap
def heapify(arr, n, i):
largest = i # Initialize largest as root
l = 2 * i + 1 # left = 2*i + 1
r = 2 * i + 2 # right = 2*i + 2
# See if left child of root exists and is
# greater than root
if l < n and arr[i] < arr[l]:
largest = l
# See if right child of root exists and is
# greater than root
if r < n and arr[largest] < arr[r]:
largest = r
# Change root, if needed
if largest != i:
arr[i],arr[largest] = arr[largest],arr[i] # swap
# Heapify the root.
heapify(arr, n, largest)
# The main function to sort an array of given size
def heapSort(arr):
n = len(arr)
# Build a maxheap.
for i in range(n, -1, -1):
heapify(arr, n, i)
# One by one extract elements
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i] # swap
heapify(arr, i, 0)

Why doesn't the following heapsort function produce an error

I took the following code from GeeksforGeeks to try and understand heap sort
def heapify(arr, n, i):
largest = i
l = 2*i + 1
r = 2*i + 2
if l < n and arr[i] < arr[l]:
largest = l
if r < n and arr[largest] < arr[r]:
largest = r
if largest != i:
arr[i],arr[largest] = arr[largest],arr[i]
heapify(arr, n, largest)
def heapSort(arr):
n = len(arr)
for i in range(n, -1, -1):
heapify(arr, n, i)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i]
heapify(arr, i, 0)
arr = [7, 11, 13, 6, 5, 12]
heapSort(arr)
print ("Sorted array is", arr)
On the very first iteration,
n = 6 and l = 13
Then for the following line of code
if l < n and arr[i] < arr[l]
arr[l] points to an index that doesn't exist.
I don't understand why this doesn't flag an error like "out of index" or something. Even though its an "if" statement, it is still surely checking the value in arr[l]. As this doesn't exist, it should "break" and flag an error?
Thanks
if-statement conditions are evaluated in the order that they are defined. they are also optimized.
if l < n and arr[i] < arr[l]
The l < n will be evaluated first. It's False. Since anding anything with False will be false anyway, the arr[i] < arr[l] is never evaluated. Hence you never get the IndexError

Dynamic Programming: Rod cutting and remembering where cuts are made

So I have this code in python and currently it only returns the maximum value for cutting a rod. How can I modify this to also give me where the cuts were made? It takes a list of prices whose indices+1 correspond to the value of the rod at each length, and n, for length of the rod.
the problem:http://www.radford.edu/~nokie/classes/360/dp-rod-cutting.html
def cutRod(price, n):
val = [0 for x in range(n+1)]
val[0] = 0
for i in range(1, n+1):
max_val = 0
for j in range(i):
max_val = max(max_val, price[j] + val[i-j-1])
val[i] = max_val
return val[n]
If this is the question : Rod cutting
Assuming code works fine, You will have to add a condition instead of Max operation to check which of two was picked and push that one in an array :
def cutRod(price, n):
val = [0 for x in range(n+1)]
val[0] = 0
output = list()
for i in range(1, n+1):
max_val = 0
cur_max_index = -1
for j in range(i):
cur_val = price[j] + val[i-j-1]
if(cur_val>max_val):
max_val = cur_val #store current max
cur_max_index = j #and index
if cur_max_index != -1:
output.append(cur_max_index) #append in output index list
val[i] = max_val
print(output) #print array
return val[n]
I know this is old but just in case someone else has a look...I was actually just looking at this problem. I think the issue is here that these dp problems can be tricky when handling indices. The previous answer is not going to print the solution correctly simply because this line needs to be adjusted...
cur_max_index = j which should be cur_max_index = j + 1
The rest...
def cut_rod(prices, length):
values = [0] * (length + 1)
cuts = [-1] * (length + 1)
max_val = -1
for i in range(1, length + 1):
for j in range(i):
temp = prices[j] + values[i - j - 1]
if temp > max_val:
max_val = prices[j] + values[i - j - 1]
cuts[i] = j + 1
values[i] = max_val
return values[length], cuts
def print_cuts(cuts, length):
while length > 0:
print(cuts[length], end=" ")
length -= cuts[length]
max_value, cuts = cut_rod(prices, length)
print(max_value)
print_cuts(cuts, length)
Well, if you need to get the actual pieces that would be the result of this process then you'd probably need a recursion.
For example something like that:
def cutRod(price, n):
val = [0 for x in range(n + 1)]
pieces = [[0, 0]]
val[0] = 0
for i in range(1, n + 1):
max_val = 0
max_pieces = [0, 0]
for j in range(i):
curr_val = price[j] + val[i - j - 1]
if curr_val > max_val:
max_val = curr_val
max_pieces = [j + 1, i - j - 1]
pieces.append(max_pieces)
val[i] = max_val
arr = []
def f(left, right):
if right == 0:
arr.append(left)
return
f(pieces[left][0], pieces[left][1])
f(pieces[right][0], pieces[right][1])
f(pieces[n][0], pieces[n][1])
return val[n], arr
In this code, there is an additional array for pieces which represents the best way to divide our Rod with some length.
Besides, there is a function f that goes through all pieces and figures out the optimal way to divide the whole Rod.

Reducing time complexity of contiguous subarray

I was wondering how I could reduce the time complexity of this algorithm.
It calculates the length of the max subarray having elements that sum to the k integer.
a = an array of integers
k = max integer
ex: a = [1,2,3], k= 3
possible subarrays = [1],[1,2]
length of the max subarray = 2
sys.setrecursionlimit(20000)
def maxLength(a, k):
#a = [1,2,3]
#k = 4
current_highest = 0
no_bigger = len(a)-1
for i in xrange(len(a)): #0 in [0,1,2]
current_sum = a[i]
sub_total = 1
for j in xrange(len(a)):
if current_sum <= k and ((i+sub_total)<=no_bigger) and (k>=(current_sum + a[i+sub_total])):
current_sum += a[i+sub_total]
sub_total += 1
else:
break
if sub_total > current_highest:
current_highest = sub_total
return current_highest
You can use sliding window algorithm for this.
Start at index 0, and calculate sum of subarray as you move forward. When sum exceeds k, start decrementing the initial elements till sum is again less than k and start summing up again.
Find below the python code:
def max_length(a,k):
s = 0
m_len = 0
i,j=0,0
l = len(a)
while i<l:
if s<=k and m_len<(j-i):
m_len = j-i
print i,j,s
if s<=k and j<l:
s+=a[j]
j+=1
else:
s-=a[i]
i+=1
return m_len
a = [1,2,3]
k = 3
print max_length(a,k)
OUTPUT:
2

Categories

Resources