How to implement the Hoare partition scheme in Quickselect? - python

I try to implement the Hoare partition scheme as a part of a Quickselect algorithm but it seems to give me various answers each time.
This is the findKthBest function that finds the Kth largest number in an array given an array (data) and the number of elements in it (low = 0, high = 4 in case of 5 elements):
def findKthBest(k, data, low, high):
# choose random pivot
pivotindex = random.randint(low, high)
# move the pivot to the end
data[pivotindex], data[high] = data[high], data[pivotindex]
# partition
pivotmid = partition(data, low, high, data[high])
# move the pivot back
data[pivotmid], data[high] = data[high], data[pivotmid]
# continue with the relevant part of the list
if pivotmid == k:
return data[pivotmid]
elif k < pivotmid:
return findKthBest(k, data, low, pivotmid - 1)
else:
return findKthBest(k, data, pivotmid + 1, high)
The function partition() gets four variables:
data (a list, of for example 5 elements),
l (the start position of the relevant part in the list, for example 0)
r (the end position of the relevant part in the list, where also the pivot is placed, for example 4)
pivot (the value of the pivot)
def partition(data, l, r, pivot):
while True:
while data[l] < pivot:
#statistik.nrComparisons += 1
l = l + 1
r = r - 1 # skip the pivot
while r != 0 and data[r] > pivot:
#statistik.nrComparisons += 1
r = r - 1
if r > l:
data[r], data[l] = data[l], data[r]
return r
Right now I simply get various results each time and it seems that the recursion doesn't work so well (sometimes it ends with reaching max-recursion error), instead of giving a constant result each time. What am I doing wrong?

First, there appears to be an mistake in the function partition()
If you compare your code with the one in wiki carefully, you will find the difference. The function should be:
def partition(data, l, r, pivot):
while True:
while data[l] < pivot:
#statistik.nrComparisons += 1
l = l + 1
r = r - 1 # skip the pivot
while r != 0 and data[r] > pivot:
#statistik.nrComparisons += 1
r = r - 1
if r >= l:
return r
data[r], data[l] = data[l], data[r]
Second, for example:
You get an array data = [1, 0, 2, 4, 3] with pivotmid=3 after partition
You want to find the 4th largest value (k=4), which is 1
The next array data parsing to findKthBest() will become [1, 0].
Therefore, the next findKthBest() should find the largest value of the array [1, 0] :
def findKthBest(k, data, low, high):
......
# continue with the relevant part of the list
if pivotmid == k:
return data[pivotmid]
elif k < pivotmid:
#Corrected
return findKthBest(k-pivotmid, data, low, pivotmid - 1)
else:
return findKthBest(k, data, pivotmid + 1, high)

Related

Making the complexity smaller (better)

I have an algorithm that looks for the good pairs in a list of numbers. A good pair is being considered as index i being less than j and arr[i] < arr[j]. It currently has a complexity of O(n^2) but I want to make it O(nlogn) based on divide and conquering. How can I go about doing that?
Here's the algorithm:
def goodPairs(nums):
count = 0
for i in range(0,len(nums)):
for j in range(i+1,len(nums)):
if i < j and nums[i] < nums[j]:
count += 1
j += 1
j += 1
return count
Here's my attempt at making it but it just returns 0:
def goodPairs(arr):
count = 0
if len(arr) > 1:
# Finding the mid of the array
mid = len(arr)//2
# Dividing the array elements
left_side = arr[:mid]
# into 2 halves
right_side = arr[mid:]
# Sorting the first half
goodPairs(left_side)
# Sorting the second half
goodPairs(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count
The current previously accepted answer by Fire Assassin doesn't really answer the question, which asks for better complexity. It's still quadratic, and about as fast as a much simpler quadratic solution. Benchmark with 2000 shuffled ints:
387.5 ms original
108.3 ms pythonic
104.6 ms divide_and_conquer_quadratic
4.1 ms divide_and_conquer_nlogn
4.6 ms divide_and_conquer_nlogn_2
Code (Try it online!):
def original(nums):
count = 0
for i in range(0,len(nums)):
for j in range(i+1,len(nums)):
if i < j and nums[i] < nums[j]:
count += 1
j += 1
j += 1
return count
def pythonic(nums):
count = 0
for i, a in enumerate(nums, 1):
for b in nums[i:]:
if a < b:
count += 1
return count
def divide_and_conquer_quadratic(arr):
count = 0
left_count = 0
right_count = 0
if len(arr) > 1:
mid = len(arr) // 2
left_side = arr[:mid]
right_side = arr[mid:]
left_count = divide_and_conquer_quadratic(left_side)
right_count = divide_and_conquer_quadratic(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count + left_count + right_count
def divide_and_conquer_nlogn(arr):
mid = len(arr) // 2
if not mid:
return 0
left = arr[:mid]
right = arr[mid:]
count = divide_and_conquer_nlogn(left)
count += divide_and_conquer_nlogn(right)
i = 0
for r in right:
while i < mid and left[i] < r:
i += 1
count += i
arr[:] = left + right
arr.sort() # linear, as Timsort takes advantage of the two sorted runs
return count
def divide_and_conquer_nlogn_2(arr):
mid = len(arr) // 2
if not mid:
return 0
left = arr[:mid]
right = arr[mid:]
count = divide_and_conquer_nlogn_2(left)
count += divide_and_conquer_nlogn_2(right)
i = 0
arr.clear()
append = arr.append
for r in right:
while i < mid and left[i] < r:
append(left[i])
i += 1
append(r)
count += i
arr += left[i:]
return count
from timeit import timeit
from random import shuffle
arr = list(range(2000))
shuffle(arr)
funcs = [
original,
pythonic,
divide_and_conquer_quadratic,
divide_and_conquer_nlogn,
divide_and_conquer_nlogn_2,
]
for func in funcs:
print(func(arr[:]))
for _ in range(3):
print()
for func in funcs:
arr2 = arr[:]
t = timeit(lambda: func(arr2), number=1)
print('%5.1f ms ' % (t * 1e3), func.__name__)
One of the most well-known divide-and-conquer algorithms is merge sort. And merge sort is actually a really good foundation for this algorithm.
The idea is that when comparing two numbers from two different 'partitions', you already have a lot of information about the remaining part of these partitions, as they're sorted in every iteration.
Let's take an example!
Consider the following partitions, which has already been sorted individually and "good pairs" have been counted.
Partition x: [1, 3, 6, 9].
Partition y: [4, 5, 7, 8].
It is important to note that the numbers from partition x is located further to the left in the original list than partition y. In particular, for every element in x, it's corresponding index i must be smaller than some index j for every element in y.
We will start of by comparing 1 and 4. Obviously 1 is smaller than 4. But since 4 is the smallest element in partition y, 1 must also be smaller than the rest of the elements in y. Consequently, we can conclude that there is 4 additional good pairs, since the index of 1 is also smaller than the index of the remaining elements of y.
The exact same thing happens with 3, and we can add 4 new good pairs to the sum.
For 6 we will conclude that there is two new good pairs. The comparison between 6 and 4 did not yield a good pair and likewise for 6 and 5.
You might now notice how these additional good pairs would be counted? Basically if the element from x is less than the element from y, add the number of elements remaining in y to the sum. Rince and repeat.
Since merge sort is an O(n log n) algorithm, and the additional work in this algorithm is constant, we can conclude that this algorithm is also an O(n log n) algorithm.
I will leave the actual programming as an exercise for you.
#niklasaa has added an explanation for the merge sort analogy, but your implementation still has an issue.
You are partitioning the array and calculating the result for either half, but
You haven't actually sorted either half. So when you're comparing their elements, your two pointer approach isn't correct.
You haven't used their results in the final computation. That's why you're getting an incorrect answer.
For point #1, you should look at merge sort, especially the merge() function. That logic is what will give you the correct pair count without having O(N^2) iteration.
For point #2, store the result for either half first:
# Sorting the first half
leftCount = goodPairs(left_side)
# Sorting the second half
rightCount = goodPairs(right_side)
While returning the final count, add these two results as well.
return count + leftCount + rightCount
Like #Abhinav Mathur stated, you have most of the code down, your problem is with these lines:
# Sorting the first half
goodPairs(left_side)
# Sorting the second half
goodPairs(right_side)
You want to store these in variables that should be declared before the if statement. Here's an updated version of your code:
def goodPairs(arr):
count = 0
left_count = 0
right_count = 0
if len(arr) > 1:
mid = len(arr) // 2
left_side = arr[:mid]
right_side = arr[mid:]
left_count = goodPairs(left_side)
right_count = goodPairs(right_side)
for i in left_side:
for j in right_side:
if i < j:
count += 1
return count + left_count + right_count
Recursion can be difficult at times, look into the idea of merge sort and quick sort to get better ideas on how the divide and conquer algorithms work.

3sum algorithm. I am not getting results for numbers less than the target

How can I get this to print all triplets that have a sum less than or equal to a target? Currently this returns triplets that are = to the target. I've tried to change and think but can't figure out
def triplets(nums):
# Sort array first
nums.sort()
output = []
# We use -2 because at this point the left and right pointers will be at same index
# For example [1,2,3,4,5] current index is 4 and left and right pointer will be at 5, so we know we cant have a triplet
# _ LR
for i in range(len(nums) - 2):
# check if current index and index -1 are same if same continue because we need distinct results
if i > 0 and nums[i] == nums[i - 1]:
continue
left = i + 1
right = len(nums) - 1
while left < right:
currentSum = nums[i] + nums[left] + nums[right]
if currentSum <= 8:
output.append([nums[i], nums[left], nums[right]])
# below checks again to make sure index isnt same with adjacent index
while left < right and nums[left] == nums[left + 1]:
left += 1
while left < right and nums[right] == nums[right - 1]:
right -= 1
# In this case we have to change both pointers since we found a solution
left += 1
right -= 1
elif currentSum > 8:
left += 1
else:
right -= 1
return output
So for example input array is [1,2,3,4,5] we will get the result (1,2,3),(1,2,4),(1,2,5),(1,3,4) Because these have a sum of less than or equal to target of 8.
The main barrier to small changes to your code to solve the new problem is that your original goal of outputting all distinct triplets with sum == target can be solved in O(n^2) time using two loops, as in your algorithm. The size of the output can be of size proportional to n^2, so this is optimal in a certain sense.
The problem of outputting all distinct triplets with sum <= target, cannot always be solved in O(n^2) time, since the output can have size proportional to n^3; for example, with an array nums = [1,2,...,n], target = n^2 + 1, the answer is all possible triples of elements. So your algorithm has to change in a way equivalent to adding a third loop.
One O(n^3) solution is shown below. Being a bit more clever about filtering duplicate elements (like using a hashmap and working with frequencies), this should be improvable to O(max(n^2, H)) where H is the size of your output.
def triplets(nums, target=8):
nums.sort()
output = set()
for i, first in enumerate(nums[:-2]):
if first * 3 > target:
break
# Filter some distinct results
if i + 3 < len(nums) and first == nums[i + 3]:
continue
for j, second in enumerate(nums[i + 1:], i + 1):
if first + 2 * second > target:
break
if j + 2 < len(nums) and second == nums[j + 2]:
continue
for k, third in enumerate(nums[j + 1:], j + 1):
if first + second + third > target:
break
if k + 1 < len(nums) and third == nums[k + 1]:
continue
output.add((first, second, third))
return list(map(list, output))

Python query in list without for loop

I want to find a sum with pair of numbers in python list.
List is sorted
Need to check consecutive combinations
Avoid using for loop
I used a for loop to get the job done and its working fine. I want to learn other optimized way to get the same result.
Can I get the same result with other ways without using a for loop?
How could I use binary search in this situation?
This is my code:
def query_sum(list, find_sum):
"""
This function will find sum of two pairs in list
and return True if sum exist in list
:param list:
:param find_sum:
:return:
"""
previous = 0
for number in list:
sum_value = previous + number
if sum_value == find_sum:
print("Yes sum exist with pair {} {}".format(previous, number))
return True
previous = number
x = [1, 2, 3, 4, 5]
y = [1, 2, 4, 8, 16]
query_sum(x, 7)
query_sum(y, 3)
this is the result.
Yes sum exist with pair 3 4
Yes sum exist with pair 1 2
You can indeed use binary search if your list is sorted (and you are only looking at sums of successive elements), since the sums will be monotonically increasing as well. In a list of N elements, there are N-1 successive pairs. You can copy and paste any properly implemented binary search algorithm you find online and replace the criteria with the sum of successive elements. For example:
def query_sum(seq, target):
def bsearch(l, r):
if r >= l:
mid = l + (r - l) // 2
s = sum(seq[mid:mid + 2])
if s == target:
return mid
elif s > target:
return bsearch(l, mid - 1)
else:
return bsearch(mid + 1, r)
else:
return -1
i = bsearch(0, len(seq) - 1)
if i < 0:
return False
print("Sum {} exists with pair {} {}".format(target, *seq[i:i + 2]))
return True
IDEOne Link
You could use the built-in bisect module, but then you would have to pre-compute the sums. This is a much cheaper method since you only have to compute log2(N) sums.
Also, this solution avoids looping using recursion, but you might be better off writing a loop like while r >= l: around the logic instead of using recursion:
def query_sum(seq, target):
def bsearch(l, r):
while r >= l:
mid = l + (r - l) // 2
s = sum(seq[mid:mid + 2])
if s == target:
return mid
elif s > target:
r = mid - 1
else:
l = mid + 1
return -1
i = bsearch(0, len(seq) - 1)
if i < 0:
return False
print("Yes sum exist with pair {} {}".format(*seq[i:i + 2]))
return True
IDEOne Link
# simpler one:
def query_sum(seq, target):
def search(seq, index, target):
if index < len(seq):
if sum(seq[index:index+2]) == target:
return index
else:
return search(seq, index+1, target)
else:
return -1
return search(seq, 0, target)

Python: Quicksort with median of three

I'm trying to change this quicksort code to work with a pivot that takes a "median of three" instead.
def quickSort(L, ascending = True):
quicksorthelp(L, 0, len(L), ascending)
def quicksorthelp(L, low, high, ascending = True):
result = 0
if low < high:
pivot_location, result = Partition(L, low, high, ascending)
result += quicksorthelp(L, low, pivot_location, ascending)
result += quicksorthelp(L, pivot_location + 1, high, ascending)
return result
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
liste1 = list([3.14159, 1./127, 2.718, 1.618, -23., 3.14159])
quickSort(liste1, False) # descending order
print('sorted:')
print(liste1)
But I'm not really sure how to do that. The median has to be the median of the first, middle and last element of a list. If the list has an even number of elements, middle becomes the last element of the first half.
Here's my median function:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low
Let us first implement the median-of-three for three numbers, so an independent function. We can do that by sorting the list of three elements, and then return the second element, like:
def median_of_three(a, b, c):
return sorted([a, b, c])[1]
Now for a range low .. high (with low included, and high excluded), we should determine what the elements are for which we should construct the median of three:
the first element: L[low],
the last element L[high-1], and
the middle element (in case there are two such, take the first) L[(low+high-1)//2].
So now we only need to patch the partitioning function to:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot = median_of_three(L[low], L[(low+high-1)//2], L[high-1])
i = low + 1
for j in range(low + 1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
EDIT: determining the median of three elements.
The median of three elements is the element that is in the middle of the two other values. So in case a <= b <= c, then b is the median.
So we need to determine in what order the elements are, such that we can determine the element in the middle. Like:
def median_of_three(a, b, c):
if a <= b and b <= c:
return b
if c <= b and b <= a:
return b
if a <= c and c <= b:
return c
if b <= c and c <= a:
return c
return a
So now we have defined the median of three with four if cases.
EDIT2: There is still a problem with this. After you perform a pivot, you swap the element L[i-1] with L[low] in your original code (the location of the pivot). But this of course does not work anymore: since the pivot now can be located at any of the three dimensions. Therfore we need to make the median_of_three(..) smarter: not only should it return the pivot element, but the location of that pivot as well:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low
Now we can solve this problem with:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
i = low + (low == pidx)
for j in range(low, high, 1):
if j == pidx:
continue
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1 + (i+1 == pidx)
L[pidx], L[i-1] = L[i-1], L[pidx]
return i - 1, result
EDIT3: cleaning it up.
Although the above seems to work, it is quite complicated: we need to let i and j "skip" the location of the pivot.
It is probably simpler if we first move the pivot to the front of the sublist (so to the low index):
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
In a "median of three" version of quicksort, you do not only want to find the median to use it as the pivot, you also want to place the maximum and the minimum values in their places so some of the pivoting is already done. In other words, you want to sort those three items in those three places. (Some variations do not want them sorted in the usual way, but I'll stick to a simpler-to-understand version for you here.)
You probably don't want to do this in a function, since function calls are fairly expensive in Python and this particular capability is not broadly useful. So you can do some code like this. Let's say the three values you want to sort are in indices i, j, and k, with i < j < k. In practice you probably would use low, low + 1, and high, but you can make those changes as you like.
if L(i) > L(j):
L(i), L(j) = L(j), L(i)
if L(i) > L(k):
L(i), L(k) = L(k), L(i)
if L(j) > L(k):
L(j), L(k) = L(k), L(j)
There are some optimizations that can be done. For example, you probably will want to use the median value in the pivot process, so you can change the code to have stored the final value of L(j) in a simple variable, which reduces array lookups. Note that you cannot do this in less than three comparisons in general--you cannot reduce it to two comparisons, though in some special cases you could do that.
one possible way can be selecting medians randomly from left and right positions.
def median_of_three(left, right):
"""
Function to choose pivot point
:param left: Left index of sub-list
:param right: right-index of sub-list
"""
# Pick 3 random numbers within the range of the list
i1 = left + random.randint(0, right - left)
i2 = left + random.randint(0, right - left)
i3 = left + random.randint(0, right - left)
# Return their median
return max(min(i1, i2), min(max(i1, i2), i3))

Heap sort Algorithms issue

I followed the clrs book for algo.
I'm trying make heapsort in python. But It give me the error that r falls out side of the index but I don't know why.
def Max_Heapify(A,i,size_of_array):
l = 2*i
r = l + 1
if l <= size_of_array and A[l] > A[i]:
largest = l
else:
largest = i
if r <= size_of_array and A[r] > A[largest]:
largest = r
if i != largest:
A[i], A[largest] = A[largest], A[i]
Max_Heapify(A,largest,size_of_array)
def Build_Max_Heap(A,size_of_array):
for i in range((math.floor(size_of_array/2)) - 1 , 0 ,-1):
Max_Heapify(A,i,size_of_array)
def Heapsort(A,size_of_array):
Build_Max_Heap(A,size_of_array)
for i in range(size_of_array - 1 ,0 ,-1):
A[0],A[i] = A[i],A[0]
size_of_array = size_of_array - 1
Max_Heapify(A,0,size_of_array)
In most of the programming languages, the size of the array is bigger than the last index. For example, the following array: A = [1, 2, 3], its size is 3, but the index of the last element is 2 (A[3] should return that it is out of index). You are verifying if r is less or equal to the array size, so when it is equal, it is bigger than the last index. Your verification should be:
if r < size_of_array

Categories

Resources