find upper and lower bound value using binary search - python

Here is the code for binary search, wondering if there is no match value, the question is the upper bound should be positioned at either small or small+1? And lower bound should be located at small or small-1, correct? Thanks.
def binary_search(my_list, key):
assert my_list == sorted(my_list)
large = len(my_list) -1
small = 0
while (small <= large):
mid = (small + large) // 2 )
# print small, mid, large, " - ", my_list[mid], "~", key
if my_list[mid] < key:
small = mid + 1
elif my_list[mid] > key:
large = mid - 1
else:
return mid
raise ValueError

#Paul is close, it should return mid+1 for the upper value. Instead of raising an exception, here is the code in Python that will return the lower and upper values:
value = my_list[mid]
lower = mid if value < key else mid-1
upper = mid if value > key else mid+1
return (lower, upper)
Here is an even simpler answer:
return (small-1, small)
Because small ends up above the key's index when you're looking for it, the lower bound is small-1 and the upper bound is small.

That's not that simple. You'll have to take account of the fact, that the algorithm might either approximate the position of the searched item from the left (lower values) or right (higher values), which can't be determined after exiting the while-loop. Thus you'll have to check whether the value at mid is smaller or greater than the key:
lower = (my_list[mid] < key ? my_list[mid] : my_list[mid - 1]);
upper = (my_list[mid] > key ? my_list[mid] : my_list[mid + 1]);

Related

First occurrence of a positive integer using binary search

I've tried to write a code that finds the first occurrence of a positive integer in a sorted array using binary search, but it doesn't work.
Here's the code:
def findFirstOccurrence(arr):
(left, right) = (0, len(arr) - 1)
result = -1
while left <= right:
mid = (left + right) // 2
if 0 < arr[mid]:
result = mid
right = mid - 1
elif 0 > arr[mid]:
right = mid - 1
else:
left = mid + 1
return result
Below code will help.
def findFirstOccurrence(arr):
left,right=0,len(arr)-1
while not((left==right) or (right==left+1)):
mid=(left+right)//2
if arr[mid]>0:
right=mid
else:
left=mid
if arr[left]>0:
return left
if arr[right]>0:
return right
return None
print(findFirstOccurrence([-4,-3,-2,-1,0,1,2,5,3,4,6]))
Output
5
The error comes from the lines
elif 0 > arr[mid]:
right = mid - 1
If arr[mid] is negative, we decrease the right pointer — i.e. we look further to the left. But if the array is sorted in ascending order, then anything further to the left of a negative number is still negative, so we'll never reach a positive number and the program will fail.
What we want to do instead is look to the right, where the positive numbers are. The lines
else:
left = mid + 1
already do that in the case that arr[mid] == 0. Removing the two erroneous lines would allow the case that arr[mid] < 0 to fall through and do what we want.
Final code:
if arr[mid] > 0:
result = mid
right = mid - 1
else:
left = mid + 1

Binary Search on Sorted List with Duplicates

To learn divide-and-conquer algorithms, I am implementing a function in Python called binary_search that will get the index of the first occurrence of a number in a non-empty, sorted list (elements of the list are non-decreasing, positive integers). For example, binary_search([1,1,2,2,3,4,4,5,5], 4) == 5, binary_search([1,1,1,1,1], 1) == 0, and binary_search([1,1,2,2,3], 5) == -1, where -1 means the number cannot be found in the list.
Below is my solution. Although the solution below passed all the tests I created manually it failed test cases from a black box unit tester. Could someone let me know what's wrong with the code below?
def find_first_index(A,low,high,key):
if A[low] == key:
return low
if low == high:
return -1
mid = low+(high-low)//2
if A[mid]==key:
if A[mid-1]==key:
return find_first_index(A,low,mid-1,key)
else:
return mid
if key <A[mid]:
return find_first_index(A,low,mid-1,key)
else:
return find_first_index(A, mid+1, high,key)
def binary_search(keys, number):
index = find_first_index(A=keys, low=0,high=len(keys)-1,key=number)
return(index)
This should work:
def find_first_index(A, low, high, key):
if A[low] == key:
return low
if low == high:
return -1
mid = low + (high - low) // 2
if A[mid] >= key:
return find_first_index(A, low, mid, key)
return find_first_index(A, mid + 1, high, key)
def binary_search(keys, number):
return find_first_index(keys, 0, len(keys) - 1, number)
Your solution does not work, as you have already realized. For example, it breaks with the following input:
>>> binary_search([1, 5], 0)
...
RecursionError: maximum recursion depth exceeded in comparison
As you can see, the function does not even terminate, there's an infinite recursion going on here. Try to "run" your program on a piece of paper to understand what's going on (or use a debugger), it's very formative.
So, what's the error? The problem is that starting from some function call high < low. In this specific case, in the first function call low == 0 and high == 1. Then mid = 0 (because int(low + (high - low) / 2) == 0). But then you call find_first_index(A, low, mid - 1, key), which is basically find_first_index(A, 0, -1, key). The subsequent call will be exactly the same (because with low == 0 and high == -1 you will have again mid == 0). Therefor, you have an infinite recursion.
A simple solution in this case would be to have
if low >= high:
return -1
Or just use my previous solution: checking mid - 1 in my opinion is not a good idea, or at least you must be much more careful when doing that.

Implementing binary search to find index?

So I understand conceptually how binary search works, but I always have problems with implementing it when trying to find an index in an array. For instance, for the Search Insert Position on LC, this is what I wrote:
def searchInsert(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: int
"""
if target > nums[-1]:
return len(nums)
low = 0
high = len(nums) - 1
while high > low:
mid = (low + high) // 2
if nums[mid] == target:
return mid
elif nums[mid] > target:
high = mid
else:
low = mid + 1
return low
It works, but I don't understand why I have to update low as mid + 1 instead of updating low as mid. Similarly, why am I updating high as mid instead of mid - 1. I've tried updating low/high as every combination of mid, mid - 1, and mid + 1 and the above is the only one that works but I have no idea why.
When implementing binary search for these kinds of problems, is there a way to reason through how you update the low/high values?
This is personal favorite:
while high >= low:
mid = (low + high) // 2
if nums[mid] >= target:
high = mid - 1
else:
low = mid + 1
return low
# or return nums[low] == target for boolean
It has difference in the case that has same values.
for example, Let's assume the array is [1,1,2,2,3,3,3,3,4].
with your function, search(arr, 1) returned 1 BUT search(arr, 2) returned 2.
why does it returned most RIGHT index on the interval 1s, and returned most LEFT index on 2s?
As i think, the key is at if nums[mid] >= target:.
when it finds the target exactly same one, the range changes by high = mid - 1. it means high might not be answer because the answer we found is mid. [1]
At the last step of binary search, the range is going to close to zero. and finally loop breaks by they crossed. thus, the answer must be low or high. but we know high is not an answer at [1].

Recursive binary search algorithm doesn't stop executing after condition is met, returns a nonetype object

I have made a binary search algorithm, biSearch(A, high, low, key). It takes in a sorted array and a key, and spits out the position of key in the array. High and low are the min and max of the search range.
It almost works, save for one problem:
On the second "iteration" (not sure what the recursive equivalent of that is), a condition is met and the algorithm should stop running and return "index". I commented where this happens. Instead, what ends up happening is that the code continues on to the next condition, even though the preceding condition is true. The correct result, 5, is then overridden and the new result is a nonetype object.
within my code, I have commented in caps the problems at the location in which they occur. Help is much appreciated, and I thank you in advance!
"""
Created on Sat Dec 28 18:40:06 2019
"""
def biSearch(A, key, low = False, high = False):
if low == False:
low = 0
if high == False:
high = len(A)-1
if high == low:
return A[low]
mid = low + int((high -low)/ 2)
# if key == A[mid] : two cases
if key == A[mid] and high - low == 0: #case 1: key is in the last pos. SHOULD STOP RUNNING HERE
index = mid
return index
elif key == A[mid] and (high - low) > 0:
if A[mid] == A[mid + 1] and A[mid]==A[mid -1]: #case 2: key isnt last and might be repeated
i = mid -1
while A[i] == A[i+1]:
i +=1
index = list(range(mid- 1, i+1))
elif A[mid] == A[mid + 1]:
i = mid
while A[i]== A[i+1]:
i += 1
index = list(range(mid, i+1))
elif A[mid] == A[mid -1]:
i = mid -1
while A[i] == A[i +1]:
i += 1
index = list(range(mid, i +1))
elif key > A[mid] and high - low > 0: # BUT CODE EXECTUES THIS LINE EVEN THOUGH PRECEDING IS ALREADY MET
index = biSearch(A, key, mid+1, high)
elif key < A[mid] and high - low > 0:
index = biSearch(A, key, low, mid -1)
return index
elif A[mid] != key: # if key DNE in A
return -1
#biSearch([1,3,5, 4, 7, 7,7,9], 1, 8, 7)
#x = biSearch([1,3,5, 4, 7,9], 1, 6, 9)
x = biSearch([1,3,5, 4, 7,9],9)
print(x)
# x = search([1,3,5, 4, 7,9], 9)
This function is not a binary search. Binary search's time complexity should be O(log(n)) and works on pre-sorted lists, but the complexity of this algorithm is at least O(n log(n)) because it sorts its input parameter list for every recursive call. Even without the sorting, there are linear statements like list(range(mid, i +1)) on each call, making the complexity quadratic. You'd be better off with a linear search using list#index.
The function mutates its input parameter, which no search function should do (we want to search, not search and sort).
Efficiencies and mutation aside, the logic is difficult to parse and is overkill in any circumstance. Not all nested conditionals lead to a return, so it's possible to return None by default.
You can use the builtin bisect module:
>>> from bisect import *
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 2)
1
>>> bisect_left([1,2,2,2,2,3,4,4,4,4,5], 4)
6
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 4)
10
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 2)
5
>>> bisect_right([1,2,2,2,2,3,4,4,4,4,5], 15)
11
>>> bisect_right([1,2,5,6], 3)
2
If you have to write this by hand as an exercise, start by looking at bisect_left's source code:
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
# Use __lt__ to match the logic in list.sort() and in heapq
if a[mid] < x: lo = mid+1
else: hi = mid
This is easy to implement recursively (if desired) and then test against the builtin:
def bisect_left(a, target, lo=0, hi=None):
if hi is None: hi = len(a)
mid = (hi + lo) // 2
if lo >= hi:
return mid
elif a[mid] < target:
return bisect_left(a, target, mid + 1, hi)
return bisect_left(a, target, lo, mid)
if __name__ == "__main__":
from bisect import bisect_left as builtin_bisect_left
from random import choice, randint
from sys import exit
for _ in range(10000):
a = sorted(randint(0, 100) for _ in range(100))
if any(bisect_left(a, x) != builtin_bisect_left(a, x) for x in range(-1, 101)):
print("fail")
exit(1)
Logically, for any call frame, there's only 3 possibilities:
The lo and hi pointers have crossed, in which case we've either found the element or figured out where it should be if it were in the list; either way, return the midpoint.
The element at the midpoint is less than the target, which guarantees that the target is in the tail half of the search space, if it exists.
The element at the midpoint matches or is less than the target, which guarantees that the target is in the front half of the search space.
Python doesn't overflow integers, so you can use the simplified midpoint test.

Binary Search Algorithm with interval

I am trying to change my code so instead of finding a specific value of the array it will output the value of an interval if found, example being 60-70. Any help is appreciated.
def binary (array, value):
while len(array)!= 0:
mid = len(array) // 2
if value == array[mid]:
return value
elif value > array[mid]:
array = array[mid+1:]
elif value < array [mid]:
array = array[0:mid]
sequence = [1,2,5,9,13,42,69,123,256]
print( "found", binary(sequence,70) )
I have this so far and want it to find an specified interval, so if i specify 60-70 it will find what is in between.
Actually this is pretty simple:
While searching for the elements in the interval (lower, upper), perform a binary search on the array arr for the index of the smallest element arr[n], such that arr[n] >= lower and the index of the largest element arr[m], such that arr[m] <= upper.
Now there are several possibilities:
n < m: there exist multiple solutions in the array. All of the are in the subarray starting at index n up to index m inclusively
n = m: there exists precisely one solution: arr[n]
n > m: no solutions exist
Searching for values beyond a certain threshold can be done using binary search like this:
def lowestGreaterThan(arr, threshold):
low = 0
high = len(arr)
while low < high:
mid = math.floor((low + high) / 2)
print("low = ", low, " mid = ", mid, " high = ", high)
if arr[mid] == threshold:
return mid
elif arr[mid] < threshold and mid != low:
low = mid
elif arr[mid] > threshold and mid != high:
high = mid
else:
# terminate with index pointing to the first element greater than low
high = low = low + 1
return low
Sorry bout the looks of the code, my python is far from perfect. Anyways, this ought to show the basic idea behind the approach. The algorithm basically searches for the index ind of the first element in the array with the property arr[ind] >= threshold.

Categories

Resources