Quicksort algoritm with Binary Search - python

I'm solving a problem on Leetcode. It must be solved in O(n*logn) time. I've used the Quicksort and Binary Search, but it went wrong on 17th test case in 18.
My code:
class Solution(object):
def sortArray(self, nums):
self.quickSort(nums, 0, len(nums)-1) #=> [arr, low, high] it's for binary search
return nums
def quickSort(self, arr, low, high):
mid = (low + high) // 2
arr[mid], arr[high] = arr[high], arr[mid] # pick the mid as pivot every time
if low < high:
pivot = self.partition(arr, low, high)
self.quickSort(arr, low, pivot-1)
self.quickSort(arr, pivot+1, high)
def partition(self, arr, low, high):
i = low
pivot = arr[high]
for n in range(low, high):
if arr[n] < pivot:
arr[i], arr[n] = arr[n], arr[i]
i += 1
arr[high], arr[i] = arr[i], arr[high]
return i `
More is here

Quicksort has a worst time complexity of O(𝑛²) and in the algorithm version you've chosen this worst case materialises when all values in the input are the same. You could try some other variants of quicksort, or you could first count the frequency of each number and then only sort the unique values:
def sortArray(self, nums):
d = {}
for n in nums:
if n in d:
d[n] += 1
else:
d[n] = 1
nums = list(d.keys())
self.quickSort(nums, 0, len(nums)-1)
return [n for n in nums for _ in range(d[n])]
This change to your sortArray will pass the tests, but its running time is not that great, nor its space usage.
I'd go for an algorithm whose worst case complexity is O(𝑛log𝑛). As the challenge also wants you to use the smallest space complexity possible, a standard merge sort is off the table (but some variants can go without extra memory). I will here suggest to use heap sort.
Below an implementation. I've also avoided the use of len and range so to take the instruction "without using any built-in functions" literally:
class Solution:
def sortArray(self, nums: List[int]) -> List[int]:
def length(): # As we are not allowed to use len()?
size = 0
for _ in nums:
size += 1
return size
def siftdown(i, size):
val = nums[i]
while True:
child = i*2 + 1
if child + 1 < size and nums[child+1] > nums[child]:
child += 1
if child >= size or val > nums[child]:
break
nums[i] = nums[child]
i = child
nums[i] = val
size = length()
# create max heap
i = size // 2
while i:
i -= 1
siftdown(i, size)
# heap sort
while size > 1:
size -= 1
nums[0], nums[size] = nums[size], nums[0]
siftdown(0, size)
return nums

Related

Optimizing the closest 3sum solution to avoid time limit exceeded error

I was going through this closest 3-sum leetcode problem which says:
Given an integer array nums of length n and an integer target, find three integers in nums such that the sum is closest to target.
Return the sum of the three integers.
You may assume that each input would have exactly one solution.
I have created the following solution and this appears correct but it fails with the Time Limit Exceeded error. How could I optimize this code? I have already added one of the optimization I felt but not sure how can I improve this further.
class Solution:
def threeSumClosest(self, nums: List[int], target: int) -> int:
nums.sort()
csum = None
min_diff = float("+inf")
for i in range(0,len(nums)-2):
l = i + 1
r = len(nums)-1
if i > 0 and nums[i] == nums[i-1]:
continue # OPTIMIZATION TO AVOID SAME CALCULATION
while l < r:
sum = nums[i] + nums[l] + nums[r]
diff = abs(target-sum)
if sum == target:
csum = target
min_diff = 0
break
elif sum > target:
r -= 1
else:
l += 1
if min_diff > diff:
min_diff = diff
csum = sum
return nums[0] if csum is None else csum
Maybe this reference approach can help: Try it first and see if you have any question. Note - see this from a recent post, it performs really well - exceeds 90% of submission in Python category.
def threeSumClosest(self, nums: List[int], target: int) -> int:
nums.sort()
return self.kSumClosest(nums, 3, target)
def kSumClosest(self, nums: List[int], k: int, target: int) -> int:
N = len(nums)
if N == k: return sum(nums[:k]) # found it
# too small
tot = sum(nums[:k])
if tot >= target: return tot
# too big
tot = sum(nums[-k:])
if tot <= target: return tot
if k == 1:
return min([(x, abs(target - x)) for x in nums], key = lambda x: x[1])[0]
closest = sum(nums[:k])
for i, x in enumerate(nums[:-k+1]):
if i > 0 and x == nums[i-1]:
continue
current = self.kSumClosest(nums[i+1:], k-1, target - x) + x
if abs(target - current) < abs(target - closest):
if current == target:
return target
else:
closest = current
return closest

Why am I getting Time Limit Exceeded error on O(n) time complexity code?

The question, https://leetcode.com/problems/first-missing-positive/, asks:
Given an unsorted integer array nums, return the smallest missing positive integer.
You must implement an algorithm that runs in O(n) time and uses constant extra space.
Example 1:
Input: nums = [1,2,0]
Output: 3
Example 2:
Input: nums = [3,4,-1,1]
Output: 2
Example 3:
Input: nums = [7,8,9,11,12]
Output: 1
Constraints:
1 <= nums.length <= 5 * 10**5
-2**31 <= nums[i] <= 2**31 - 1
Thus my code satisfies this:
class Solution:
def firstMissingPositive(self, nums: List[int]) -> int:
nums=sorted(list(filter(lambda x: x>=0, nums)))
nums= list(dict.fromkeys(nums))
if 1 not in nums: return 1
x=nums[0]
for num in nums:
if nums.index(num) != 0:
dif = num - x
if dif!=1:
return x + 1
x=num
return num+1
Glad for anyone to offer help.
As the comments described, sorted() doesn't take linear time. sorted() also creates a new list, so your solution also violates the O(1) memory constraint.
Here's a linear-time, constant-space solution. The problem asks for two things (for simplicity, let n = len(nums)):
a data structure that can in O(1) time, determine whether a positive integer in the interval [1, n] is in nums. (We have n numbers to check, and the runtime of our algorithm has to be linear.) For this problem, our strategy is to create a table such that for every integer i between 1 and n, if i is in nums, then nums[i - 1] = i. (The answer has to be positive, and the answer can't be greater than n + 1 -- the only way for the answer to be n + 1 is if nums contains every integer in the interval [1, n]).
a procedure to generate the data structure in-place to meet the memory constraint.
Here's a solution that does this.
class Solution:
def firstMissingPositive(self, nums: List[int]) -> int:
# Match elements to their indicies.
for index, num in enumerate(nums):
num_to_place = num
while num_to_place > 0 and num_to_place <= len(nums) and num_to_place != nums[num_to_place - 1]:
next_num_to_place = nums[num_to_place - 1]
nums[num_to_place - 1] = num_to_place
num_to_place = next_num_to_place
# Find smallest number that doesn't exist in the array.
for i in range(len(nums)):
if nums[i] != i + 1:
return i + 1
return len(nums) + 1
Both for loops takes linear time. The reasoning for the second is obvious, but the time analysis of the first is a bit more subtle:
Notice that the while loop contains this condition: num_to_place != nums[num_to_place - 1]. For each iteration of this while loop, the number of values that meet this condition decreases by 1. So, this while loop can only execute at most n times across all iterations, meaning the first for loop takes O(n) time.
# O(n) time and O(1) space
class Solution:
def firstMissingPositive(self, nums: List[int]) -> int:
index = 0
while index < len(nums):
if nums[index] > 0 and nums[index] - 1 < len(nums) and nums[index] != nums[nums[index] - 1]:
nums[nums[index]-1], nums[index] = nums[index], nums[nums[index] - 1]
else:
index += 1
for i, integer in enumerate(nums):
if integer != i + 1:
return i + 1
return len(nums) + 1

Recursive binary search algorithm doesn't stop executing after condition is met, returns a nonetype object

I have made a binary search algorithm, biSearch(A, high, low, key). It takes in a sorted array and a key, and spits out the position of key in the array. High and low are the min and max of the search range.
It almost works, save for one problem:
On the second "iteration" (not sure what the recursive equivalent of that is), a condition is met and the algorithm should stop running and return "index". I commented where this happens. Instead, what ends up happening is that the code continues on to the next condition, even though the preceding condition is true. The correct result, 5, is then overridden and the new result is a nonetype object.
within my code, I have commented in caps the problems at the location in which they occur. Help is much appreciated, and I thank you in advance!
"""
Created on Sat Dec 28 18:40:06 2019
"""
def biSearch(A, key, low = False, high = False):
if low == False:
low = 0
if high == False:
high = len(A)-1
if high == low:
return A[low]
mid = low + int((high -low)/ 2)
# if key == A[mid] : two cases
if key == A[mid] and high - low == 0: #case 1: key is in the last pos. SHOULD STOP RUNNING HERE
index = mid
return index
elif key == A[mid] and (high - low) > 0:
if A[mid] == A[mid + 1] and A[mid]==A[mid -1]: #case 2: key isnt last and might be repeated
i = mid -1
while A[i] == A[i+1]:
i +=1
index = list(range(mid- 1, i+1))
elif A[mid] == A[mid + 1]:
i = mid
while A[i]== A[i+1]:
i += 1
index = list(range(mid, i+1))
elif A[mid] == A[mid -1]:
i = mid -1
while A[i] == A[i +1]:
i += 1
index = list(range(mid, i +1))
elif key > A[mid] and high - low > 0: # BUT CODE EXECTUES THIS LINE EVEN THOUGH PRECEDING IS ALREADY MET
index = biSearch(A, key, mid+1, high)
elif key < A[mid] and high - low > 0:
index = biSearch(A, key, low, mid -1)
return index
elif A[mid] != key: # if key DNE in A
return -1
#biSearch([1,3,5, 4, 7, 7,7,9], 1, 8, 7)
#x = biSearch([1,3,5, 4, 7,9], 1, 6, 9)
x = biSearch([1,3,5, 4, 7,9],9)
print(x)
# x = search([1,3,5, 4, 7,9], 9)
This function is not a binary search. Binary search's time complexity should be O(log(n)) and works on pre-sorted lists, but the complexity of this algorithm is at least O(n log(n)) because it sorts its input parameter list for every recursive call. Even without the sorting, there are linear statements like list(range(mid, i +1)) on each call, making the complexity quadratic. You'd be better off with a linear search using list#index.
The function mutates its input parameter, which no search function should do (we want to search, not search and sort).
Efficiencies and mutation aside, the logic is difficult to parse and is overkill in any circumstance. Not all nested conditionals lead to a return, so it's possible to return None by default.
You can use the builtin bisect module:
>>> from bisect import *
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 2)
1
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 4)
6
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 4)
10
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 2)
5
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 15)
11
>>> bisect_right([1,2,5,6], 3)
2
If you have to write this by hand as an exercise, start by looking at bisect_left's source code:
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if a[mid] < x: lo = mid+1
else: hi = mid
This is easy to implement recursively (if desired) and then test against the builtin:
def bisect_left(a, target, lo=0, hi=None):
if hi is None: hi = len(a)
mid = (hi + lo) // 2
if lo >= hi:
return mid
elif a[mid] < target:
return bisect_left(a, target, mid + 1, hi)
return bisect_left(a, target, lo, mid)
if __name__ == "__main__":
from bisect import bisect_left as builtin_bisect_left
from random import choice, randint
from sys import exit
for _ in range(10000):
a = sorted(randint(0, 100) for _ in range(100))
if any(bisect_left(a, x) != builtin_bisect_left(a, x) for x in range(-1, 101)):
print("fail")
exit(1)
Logically, for any call frame, there's only 3 possibilities:
The lo and hi pointers have crossed, in which case we've either found the element or figured out where it should be if it were in the list; either way, return the midpoint.
The element at the midpoint is less than the target, which guarantees that the target is in the tail half of the search space, if it exists.
The element at the midpoint matches or is less than the target, which guarantees that the target is in the front half of the search space.
Python doesn't overflow integers, so you can use the simplified midpoint test.

Python: Quicksort with median of three

I'm trying to change this quicksort code to work with a pivot that takes a "median of three" instead.
def quickSort(L, ascending = True):
quicksorthelp(L, 0, len(L), ascending)
def quicksorthelp(L, low, high, ascending = True):
result = 0
if low < high:
pivot_location, result = Partition(L, low, high, ascending)
result += quicksorthelp(L, low, pivot_location, ascending)
result += quicksorthelp(L, pivot_location + 1, high, ascending)
return result
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
liste1 = list([3.14159, 1./127, 2.718, 1.618, -23., 3.14159])
quickSort(liste1, False) # descending order
print('sorted:')
print(liste1)
But I'm not really sure how to do that. The median has to be the median of the first, middle and last element of a list. If the list has an even number of elements, middle becomes the last element of the first half.
Here's my median function:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low
Let us first implement the median-of-three for three numbers, so an independent function. We can do that by sorting the list of three elements, and then return the second element, like:
def median_of_three(a, b, c):
return sorted([a, b, c])[1]
Now for a range low .. high (with low included, and high excluded), we should determine what the elements are for which we should construct the median of three:
the first element: L[low],
the last element L[high-1], and
the middle element (in case there are two such, take the first) L[(low+high-1)//2].
So now we only need to patch the partitioning function to:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot = median_of_three(L[low], L[(low+high-1)//2], L[high-1])
i = low + 1
for j in range(low + 1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
EDIT: determining the median of three elements.
The median of three elements is the element that is in the middle of the two other values. So in case a <= b <= c, then b is the median.
So we need to determine in what order the elements are, such that we can determine the element in the middle. Like:
def median_of_three(a, b, c):
if a <= b and b <= c:
return b
if c <= b and b <= a:
return b
if a <= c and c <= b:
return c
if b <= c and c <= a:
return c
return a
So now we have defined the median of three with four if cases.
EDIT2: There is still a problem with this. After you perform a pivot, you swap the element L[i-1] with L[low] in your original code (the location of the pivot). But this of course does not work anymore: since the pivot now can be located at any of the three dimensions. Therfore we need to make the median_of_three(..) smarter: not only should it return the pivot element, but the location of that pivot as well:
def median_of_three(L, low, high):
mid = (low+high-1)//2
a = L[low]
b = L[mid]
c = L[high-1]
if a <= b <= c:
return b, mid
if c <= b <= a:
return b, mid
if a <= c <= b:
return c, high-1
if b <= c <= a:
return c, high-1
return a, low
Now we can solve this problem with:
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
i = low + (low == pidx)
for j in range(low, high, 1):
if j == pidx:
continue
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1 + (i+1 == pidx)
L[pidx], L[i-1] = L[i-1], L[pidx]
return i - 1, result
EDIT3: cleaning it up.
Although the above seems to work, it is quite complicated: we need to let i and j "skip" the location of the pivot.
It is probably simpler if we first move the pivot to the front of the sublist (so to the low index):
def Partition(L, low, high, ascending = True):
print('Quicksort, Parameter L:')
print(L)
result = 0
pivot, pidx = median_of_three(L, low, high)
L[low], L[pidx] = L[pidx], L[low]
i = low + 1
for j in range(low+1, high, 1):
result += 1
if (ascending and L[j] < pivot) or (not ascending and L[j] > pivot):
L[i], L[j] = L[j], L[i]
i += 1
L[low], L[i-1] = L[i-1], L[low]
return i - 1, result
In a "median of three" version of quicksort, you do not only want to find the median to use it as the pivot, you also want to place the maximum and the minimum values in their places so some of the pivoting is already done. In other words, you want to sort those three items in those three places. (Some variations do not want them sorted in the usual way, but I'll stick to a simpler-to-understand version for you here.)
You probably don't want to do this in a function, since function calls are fairly expensive in Python and this particular capability is not broadly useful. So you can do some code like this. Let's say the three values you want to sort are in indices i, j, and k, with i < j < k. In practice you probably would use low, low + 1, and high, but you can make those changes as you like.
if L(i) > L(j):
L(i), L(j) = L(j), L(i)
if L(i) > L(k):
L(i), L(k) = L(k), L(i)
if L(j) > L(k):
L(j), L(k) = L(k), L(j)
There are some optimizations that can be done. For example, you probably will want to use the median value in the pivot process, so you can change the code to have stored the final value of L(j) in a simple variable, which reduces array lookups. Note that you cannot do this in less than three comparisons in general--you cannot reduce it to two comparisons, though in some special cases you could do that.
one possible way can be selecting medians randomly from left and right positions.
def median_of_three(left, right):
"""
Function to choose pivot point
:param left: Left index of sub-list
:param right: right-index of sub-list
"""
# Pick 3 random numbers within the range of the list
i1 = left + random.randint(0, right - left)
i2 = left + random.randint(0, right - left)
i3 = left + random.randint(0, right - left)
# Return their median
return max(min(i1, i2), min(max(i1, i2), i3))

What is the runtime for this particular algorithm?

I am thinking this particular code is (log n)^2 because each findindex function takes logn depth and we are calling it logn times? Can someone confirm this?
I hope one of you can think of this as a small quiz and help me with it.
Given a sorted array of n integers that has been rotated an unknown
number of times, write code to find an element in the array. You may
assume that the array was originally sorted in increasing order.
# Ex
# input find 5 in {15,16,19,20,25,1,3,4,5,7,10,14}
# output 8
# runtime(log n)
def findrotation(a, tgt):
return findindex(a, 0, len(a)-1, tgt, 0)
def findindex(a, low, high, target, index):
if low>high:
return -1
mid = int((high + low) / 2)
if a[mid] == target:
index = index + mid
return index
else:
b = a[low:mid]
result = findindex(b, 0, len(b)-1, target, index)
if result == -1:
index = index + mid + 1
c = a[mid+1:]
return findindex(c, 0, len(c)-1, target, index)
else:
return result
This algorithm is supposed to be O(logn) but is not from implementation perspectives.
In your algorithm, you're not making decision either to go for left subarray or right subarray only, you're trying with both subarray which is O(N).
You're doing slicing on array a[low:mid] and a[mid + 1:] which is O(n).
Which makes your overall complexity O(n^2) in worst case.
Assuming there is no duplicates in the array, an ideal implementation in Python 3 of O(logn) binary search looks like this -
A=[15,16,19,20,25,1,3,4,5,7,10,14]
low = 0
hi = len(A) - 1
def findindex(A, low, hi, target):
if low > hi:
return -1
mid = round((hi + low) / 2.0)
if A[mid] == target:
return mid
if A[mid] >= A[low]:
if target < A[mid] and target >= A[low]:
return findindex(A, low, mid - 1, target)
else :
return findindex(A, mid + 1, hi, target)
if A[mid] < A[low]:
if target < A[mid] or target >= A[low]:
return findindex(A, low, mid - 1, target)
else :
return findindex(A, mid + 1, hi, target)
return -1
print(findindex(A, low, hi, 3))

Categories

Resources