Python: Quicksort with median of three - python

I'm trying to change this quicksort code to work with a pivot that takes a "median of three" instead.
def quickSort(L, ascending = True):
quicksorthelp(L, 0, len(L), ascending)
def quicksorthelp(L, low, high, ascending = True):
result = 0
if low < high:
pivot_location, result = Partition(L, low, high, ascending)
result += quicksorthelp(L, low, pivot_location, ascending)
result += quicksorthelp(L, pivot_location + 1, high, ascending)
return result
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
liste1 = list([3.14159, 1./127, 2.718, 1.618, -23., 3.14159])
quickSort(liste1, False) # descending order
print('sorted:')
print(liste1)
But I'm not really sure how to do that. The median has to be the median of the first, middle and last element of a list. If the list has an even number of elements, middle becomes the last element of the first half.
Here's my median function:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low

Let us first implement the median-of-three for three numbers, so an independent function. We can do that by sorting the list of three elements, and then return the second element, like:
def median_of_three(a, b, c):
return sorted([a, b, c])[1]
Now for a range low .. high (with low included, and high excluded), we should determine what the elements are for which we should construct the median of three:
the first element: L[low],
the last element L[high-1], and
the middle element (in case there are two such, take the first) L[(low+high-1)//2].
So now we only need to patch the partitioning function to:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot = median_of_three(L[low], L[(low+high-1)//2], L[high-1])
i = low + 1
for j in range(low + 1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
EDIT: determining the median of three elements.
The median of three elements is the element that is in the middle of the two other values. So in case a <= b <= c, then b is the median.
So we need to determine in what order the elements are, such that we can determine the element in the middle. Like:
def median_of_three(a, b, c):
if a <= b and b <= c:
return b
if c <= b and b <= a:
return b
if a <= c and c <= b:
return c
if b <= c and c <= a:
return c
return a
So now we have defined the median of three with four if cases.
EDIT2: There is still a problem with this. After you perform a pivot, you swap the element L[i-1] with L[low] in your original code (the location of the pivot). But this of course does not work anymore: since the pivot now can be located at any of the three dimensions. Therfore we need to make the median_of_three(..) smarter: not only should it return the pivot element, but the location of that pivot as well:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low
Now we can solve this problem with:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
i = low + (low == pidx)
for j in range(low, high, 1):
if j == pidx:
continue
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1 + (i+1 == pidx)
L[pidx], L[i-1] = L[i-1], L[pidx]
return i - 1, result
EDIT3: cleaning it up.
Although the above seems to work, it is quite complicated: we need to let i and j "skip" the location of the pivot.
It is probably simpler if we first move the pivot to the front of the sublist (so to the low index):
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result

In a "median of three" version of quicksort, you do not only want to find the median to use it as the pivot, you also want to place the maximum and the minimum values in their places so some of the pivoting is already done. In other words, you want to sort those three items in those three places. (Some variations do not want them sorted in the usual way, but I'll stick to a simpler-to-understand version for you here.)
You probably don't want to do this in a function, since function calls are fairly expensive in Python and this particular capability is not broadly useful. So you can do some code like this. Let's say the three values you want to sort are in indices i, j, and k, with i < j < k. In practice you probably would use low, low + 1, and high, but you can make those changes as you like.
if L(i) > L(j):
L(i), L(j) = L(j), L(i)
if L(i) > L(k):
L(i), L(k) = L(k), L(i)
if L(j) > L(k):
L(j), L(k) = L(k), L(j)
There are some optimizations that can be done. For example, you probably will want to use the median value in the pivot process, so you can change the code to have stored the final value of L(j) in a simple variable, which reduces array lookups. Note that you cannot do this in less than three comparisons in general--you cannot reduce it to two comparisons, though in some special cases you could do that.

one possible way can be selecting medians randomly from left and right positions.
def median_of_three(left, right):
"""
Function to choose pivot point
:param left: Left index of sub-list
:param right: right-index of sub-list
"""
# Pick 3 random numbers within the range of the list
i1 = left + random.randint(0, right - left)
i2 = left + random.randint(0, right - left)
i3 = left + random.randint(0, right - left)
# Return their median
return max(min(i1, i2), min(max(i1, i2), i3))

Related

Quicksort algoritm with Binary Search

I'm solving a problem on Leetcode. It must be solved in O(n*logn) time. I've used the Quicksort and Binary Search, but it went wrong on 17th test case in 18.
My code:
class Solution(object):
def sortArray(self, nums):
self.quickSort(nums, 0, len(nums)-1) #=> [arr, low, high] it's for binary search
return nums
def quickSort(self, arr, low, high):
mid = (low + high) // 2
arr[mid], arr[high] = arr[high], arr[mid] # pick the mid as pivot every time
if low < high:
pivot = self.partition(arr, low, high)
self.quickSort(arr, low, pivot-1)
self.quickSort(arr, pivot+1, high)
def partition(self, arr, low, high):
i = low
pivot = arr[high]
for n in range(low, high):
if arr[n] < pivot:
arr[i], arr[n] = arr[n], arr[i]
i += 1
arr[high], arr[i] = arr[i], arr[high]
return i `
More is here
Quicksort has a worst time complexity of O(𝑛²) and in the algorithm version you've chosen this worst case materialises when all values in the input are the same. You could try some other variants of quicksort, or you could first count the frequency of each number and then only sort the unique values:
def sortArray(self, nums):
d = {}
for n in nums:
if n in d:
d[n] += 1
else:
d[n] = 1
nums = list(d.keys())
self.quickSort(nums, 0, len(nums)-1)
return [n for n in nums for _ in range(d[n])]
This change to your sortArray will pass the tests, but its running time is not that great, nor its space usage.
I'd go for an algorithm whose worst case complexity is O(𝑛log𝑛). As the challenge also wants you to use the smallest space complexity possible, a standard merge sort is off the table (but some variants can go without extra memory). I will here suggest to use heap sort.
Below an implementation. I've also avoided the use of len and range so to take the instruction "without using any built-in functions" literally:
class Solution:
def sortArray(self, nums: List[int]) -> List[int]:
def length(): # As we are not allowed to use len()?
size = 0
for _ in nums:
size += 1
return size
def siftdown(i, size):
val = nums[i]
while True:
child = i*2 + 1
if child + 1 < size and nums[child+1] > nums[child]:
child += 1
if child >= size or val > nums[child]:
break
nums[i] = nums[child]
i = child
nums[i] = val
size = length()
# create max heap
i = size // 2
while i:
i -= 1
siftdown(i, size)
# heap sort
while size > 1:
size -= 1
nums[0], nums[size] = nums[size], nums[0]
siftdown(0, size)
return nums

Recursive binary search algorithm doesn't stop executing after condition is met, returns a nonetype object

I have made a binary search algorithm, biSearch(A, high, low, key). It takes in a sorted array and a key, and spits out the position of key in the array. High and low are the min and max of the search range.
It almost works, save for one problem:
On the second "iteration" (not sure what the recursive equivalent of that is), a condition is met and the algorithm should stop running and return "index". I commented where this happens. Instead, what ends up happening is that the code continues on to the next condition, even though the preceding condition is true. The correct result, 5, is then overridden and the new result is a nonetype object.
within my code, I have commented in caps the problems at the location in which they occur. Help is much appreciated, and I thank you in advance!
"""
Created on Sat Dec 28 18:40:06 2019
"""
def biSearch(A, key, low = False, high = False):
if low == False:
low = 0
if high == False:
high = len(A)-1
if high == low:
return A[low]
mid = low + int((high -low)/ 2)
# if key == A[mid] : two cases
if key == A[mid] and high - low == 0: #case 1: key is in the last pos. SHOULD STOP RUNNING HERE
index = mid
return index
elif key == A[mid] and (high - low) > 0:
if A[mid] == A[mid + 1] and A[mid]==A[mid -1]: #case 2: key isnt last and might be repeated
i = mid -1
while A[i] == A[i+1]:
i +=1
index = list(range(mid- 1, i+1))
elif A[mid] == A[mid + 1]:
i = mid
while A[i]== A[i+1]:
i += 1
index = list(range(mid, i+1))
elif A[mid] == A[mid -1]:
i = mid -1
while A[i] == A[i +1]:
i += 1
index = list(range(mid, i +1))
elif key > A[mid] and high - low > 0: # BUT CODE EXECTUES THIS LINE EVEN THOUGH PRECEDING IS ALREADY MET
index = biSearch(A, key, mid+1, high)
elif key < A[mid] and high - low > 0:
index = biSearch(A, key, low, mid -1)
return index
elif A[mid] != key: # if key DNE in A
return -1
#biSearch([1,3,5, 4, 7, 7,7,9], 1, 8, 7)
#x = biSearch([1,3,5, 4, 7,9], 1, 6, 9)
x = biSearch([1,3,5, 4, 7,9],9)
print(x)
# x = search([1,3,5, 4, 7,9], 9)
This function is not a binary search. Binary search's time complexity should be O(log(n)) and works on pre-sorted lists, but the complexity of this algorithm is at least O(n log(n)) because it sorts its input parameter list for every recursive call. Even without the sorting, there are linear statements like list(range(mid, i +1)) on each call, making the complexity quadratic. You'd be better off with a linear search using list#index.
The function mutates its input parameter, which no search function should do (we want to search, not search and sort).
Efficiencies and mutation aside, the logic is difficult to parse and is overkill in any circumstance. Not all nested conditionals lead to a return, so it's possible to return None by default.
You can use the builtin bisect module:
>>> from bisect import *
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 2)
1
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 4)
6
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 4)
10
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 2)
5
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 15)
11
>>> bisect_right([1,2,5,6], 3)
2
If you have to write this by hand as an exercise, start by looking at bisect_left's source code:
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if a[mid] < x: lo = mid+1
else: hi = mid
This is easy to implement recursively (if desired) and then test against the builtin:
def bisect_left(a, target, lo=0, hi=None):
if hi is None: hi = len(a)
mid = (hi + lo) // 2
if lo >= hi:
return mid
elif a[mid] < target:
return bisect_left(a, target, mid + 1, hi)
return bisect_left(a, target, lo, mid)
if __name__ == "__main__":
from bisect import bisect_left as builtin_bisect_left
from random import choice, randint
from sys import exit
for _ in range(10000):
a = sorted(randint(0, 100) for _ in range(100))
if any(bisect_left(a, x) != builtin_bisect_left(a, x) for x in range(-1, 101)):
print("fail")
exit(1)
Logically, for any call frame, there's only 3 possibilities:
The lo and hi pointers have crossed, in which case we've either found the element or figured out where it should be if it were in the list; either way, return the midpoint.
The element at the midpoint is less than the target, which guarantees that the target is in the tail half of the search space, if it exists.
The element at the midpoint matches or is less than the target, which guarantees that the target is in the front half of the search space.
Python doesn't overflow integers, so you can use the simplified midpoint test.

How to implement the Hoare partition scheme in Quickselect?

I try to implement the Hoare partition scheme as a part of a Quickselect algorithm but it seems to give me various answers each time.
This is the findKthBest function that finds the Kth largest number in an array given an array (data) and the number of elements in it (low = 0, high = 4 in case of 5 elements):
def findKthBest(k, data, low, high):
# choose random pivot
pivotindex = random.randint(low, high)
# move the pivot to the end
data[pivotindex], data[high] = data[high], data[pivotindex]
# partition
pivotmid = partition(data, low, high, data[high])
# move the pivot back
data[pivotmid], data[high] = data[high], data[pivotmid]
# continue with the relevant part of the list
if pivotmid == k:
return data[pivotmid]
elif k < pivotmid:
return findKthBest(k, data, low, pivotmid - 1)
else:
return findKthBest(k, data, pivotmid + 1, high)
The function partition() gets four variables:
data (a list, of for example 5 elements),
l (the start position of the relevant part in the list, for example 0)
r (the end position of the relevant part in the list, where also the pivot is placed, for example 4)
pivot (the value of the pivot)
def partition(data, l, r, pivot):
while True:
while data[l] < pivot:
#statistik.nrComparisons += 1
l = l + 1
r = r - 1 # skip the pivot
while r != 0 and data[r] > pivot:
#statistik.nrComparisons += 1
r = r - 1
if r > l:
data[r], data[l] = data[l], data[r]
return r
Right now I simply get various results each time and it seems that the recursion doesn't work so well (sometimes it ends with reaching max-recursion error), instead of giving a constant result each time. What am I doing wrong?
First, there appears to be an mistake in the function partition()
If you compare your code with the one in wiki carefully, you will find the difference. The function should be:
def partition(data, l, r, pivot):
while True:
while data[l] < pivot:
#statistik.nrComparisons += 1
l = l + 1
r = r - 1 # skip the pivot
while r != 0 and data[r] > pivot:
#statistik.nrComparisons += 1
r = r - 1
if r >= l:
return r
data[r], data[l] = data[l], data[r]
Second, for example:
You get an array data = [1, 0, 2, 4, 3] with pivotmid=3 after partition
You want to find the 4th largest value (k=4), which is 1
The next array data parsing to findKthBest() will become [1, 0].
Therefore, the next findKthBest() should find the largest value of the array [1, 0] :
def findKthBest(k, data, low, high):
......
# continue with the relevant part of the list
if pivotmid == k:
return data[pivotmid]
elif k < pivotmid:
#Corrected
return findKthBest(k-pivotmid, data, low, pivotmid - 1)
else:
return findKthBest(k, data, pivotmid + 1, high)

What is the runtime for this particular algorithm?

I am thinking this particular code is (log n)^2 because each findindex function takes logn depth and we are calling it logn times? Can someone confirm this?
I hope one of you can think of this as a small quiz and help me with it.
Given a sorted array of n integers that has been rotated an unknown
number of times, write code to find an element in the array. You may
assume that the array was originally sorted in increasing order.
# Ex
# input find 5 in {15,16,19,20,25,1,3,4,5,7,10,14}
# output 8
# runtime(log n)
def findrotation(a, tgt):
return findindex(a, 0, len(a)-1, tgt, 0)
def findindex(a, low, high, target, index):
if low>high:
return -1
mid = int((high + low) / 2)
if a[mid] == target:
index = index + mid
return index
else:
b = a[low:mid]
result = findindex(b, 0, len(b)-1, target, index)
if result == -1:
index = index + mid + 1
c = a[mid+1:]
return findindex(c, 0, len(c)-1, target, index)
else:
return result
This algorithm is supposed to be O(logn) but is not from implementation perspectives.
In your algorithm, you're not making decision either to go for left subarray or right subarray only, you're trying with both subarray which is O(N).
You're doing slicing on array a[low:mid] and a[mid + 1:] which is O(n).
Which makes your overall complexity O(n^2) in worst case.
Assuming there is no duplicates in the array, an ideal implementation in Python 3 of O(logn) binary search looks like this -
A=[15,16,19,20,25,1,3,4,5,7,10,14]
low = 0
hi = len(A) - 1
def findindex(A, low, hi, target):
if low > hi:
return -1
mid = round((hi + low) / 2.0)
if A[mid] == target:
return mid
if A[mid] >= A[low]:
if target < A[mid] and target >= A[low]:
return findindex(A, low, mid - 1, target)
else :
return findindex(A, mid + 1, hi, target)
if A[mid] < A[low]:
if target < A[mid] or target >= A[low]:
return findindex(A, low, mid - 1, target)
else :
return findindex(A, mid + 1, hi, target)
return -1
print(findindex(A, low, hi, 3))

Binary Search Algorithm with interval

I am trying to change my code so instead of finding a specific value of the array it will output the value of an interval if found, example being 60-70. Any help is appreciated.
def binary (array, value):
while len(array)!= 0:
mid = len(array) // 2
if value == array[mid]:
return value
elif value > array[mid]:
array = array[mid+1:]
elif value < array [mid]:
array = array[0:mid]
sequence = [1,2,5,9,13,42,69,123,256]
print( "found", binary(sequence,70) )
I have this so far and want it to find an specified interval, so if i specify 60-70 it will find what is in between.
Actually this is pretty simple:
While searching for the elements in the interval (lower, upper), perform a binary search on the array arr for the index of the smallest element arr[n], such that arr[n] >= lower and the index of the largest element arr[m], such that arr[m] <= upper.
Now there are several possibilities:
n < m: there exist multiple solutions in the array. All of the are in the subarray starting at index n up to index m inclusively
n = m: there exists precisely one solution: arr[n]
n > m: no solutions exist
Searching for values beyond a certain threshold can be done using binary search like this:
def lowestGreaterThan(arr, threshold):
low = 0
high = len(arr)
while low < high:
mid = math.floor((low + high) / 2)
print("low = ", low, " mid = ", mid, " high = ", high)
if arr[mid] == threshold:
return mid
elif arr[mid] < threshold and mid != low:
low = mid
elif arr[mid] > threshold and mid != high:
high = mid
else:
# terminate with index pointing to the first element greater than low
high = low = low + 1
return low
Sorry bout the looks of the code, my python is far from perfect. Anyways, this ought to show the basic idea behind the approach. The algorithm basically searches for the index ind of the first element in the array with the property arr[ind] >= threshold.

Categories

Resources