O(log n) Search in a sorted python dictionary

O(log n) Search in a sorted python dictionary - python

I'm solving a programming question and stuck on the last piece of the puzzle.
This is the question: https://leetcode.com/problems/daily-temperatures/
I have a sorted (for values) dictionary and now I want to do a log(n) complexity search on the dictionary. Here's the code I have written so far.
def dailyTemperatures(self, T):
if len(T) == 0:
return []
if len(T) == 1:
return [0]
R = [None] * len(T)
#create map, populate map
M = {}
for i in range(0, len(T)):
M[i] = T[i]
#sort map by value(temps)
MS = sorted(M.items(), key=lambda x: x[1])
for i in MS:
print(i[0], i[1])
for i in range(0,len(T)):
t = T[i] #base value for comparison
R[i] = 0
x = 0
# find smallest x for which temp T[x] > T[i]
# Dictionary is sorted for Temps
R[i] = x - i
return R
The commented part in the loop is where I have trouble. I could not find an answer anywhere which would search a sorted dictionary and then filter by key.
Any tips or new suggestions to tackle this are also appreciated.

Your code could possibly be made to work, but: This algorithm is really just adding more layers of complexity on top of the naive brute force bubble sort-like algorithm, due to needing to backtrack for indexes.
Simplest modification is just to search for the minimum index > than current index. Store the position in the dict's .items() as part of the value so you can retrieve it. But, you can't binary search on index, because it is sorted by value, and index is not in order. This should give you an acceptable O(N) lookup.
You still have to search by index in the end (which has priority over temperature). Even with binary search, your attempted algorithm, ignoring the N log N complexity of pre-sorting, would at best still require O(N * log N * log N) for searching. Your current attempt would actually be O(N^2 log N), but with a third cached index table, nearest index lookup could be turned into log N.
It will be a very convoluted and inefficient algorithm, due to basically having to backtrack your search order. And it will have no advantage over a naive brute force (it's objectively worse).
Note: key point is that you need the nearest index, which is not in sorted order if you sort by value
If you still want to do it that way (I guess as a code golf challenge), you will want to add its position index in .items() of the dict to your dictionary, so when you look up your key in dict, you can find which starting position to start your search through the temperature sorted list. To get the log N, you will need to store each range of temperatures and their range of indexes. This part will probably be particularly complicated to implement. And of course you'll need to implement a binary search algorithm.
Stack algorithm:
Basic idea of below algorithm is that any lower temperatures that follow no longer matter.
eg: [...] 10 >20< 9 6 7 21. After 20; 9 6 7 (or anything <= 20) do not matter. After 9; 6 and 7 don't matter. etc.
So iterate from the end, adding numbers to the stack, popping off the stack numbers less than the current number.
Note that because the number of temperates is bound to 70 values, and numbers less than the current temperature are pruned off the stack at each iteration, both the complexity of searching for the next temperature, and the size of the stack, is bound to 70. In other words constant.
So for each item in T, you will search a maximum of 70 values in the worst case, ie: len(T) * 70.
Thus the complexity of the algorithm is O(N): number of items in T.
def dailyTemperatures(T):
res = [0]*len(T)
stack = []
for i, x in reversed([*enumerate(T)]):
if len(stack) < 1:
stack.append((i,x))
else:
while(len(stack)>0 and stack[-1][1]<=x):
stack.pop()
if len(stack)>0 and stack[-1][1]>x:
res[i] = stack[-1][0] - i
print(x, stack)
stack.append((i,x))
return res
print(dailyTemperatures([73, 74, 75, 71, 69, 72, 76, 73]))

Related

Finding minimum number of jumps increasing the value of the element

Optimizing a leetcode-style question - DP/DFS
The task is the following:
Given N heights, find the minimum number of suboptimal jumps required to go from start to end. [1-D Array]
A jump is suboptimal, if the height of the starting point i is less or equal to the height of the target point j.
A jump is possible, if j-i >= k, where k is the maximal jump distance.
For the first subtask, there is only one k value.
For the second subtask, there are two k values; output the amount of suboptimal jumps for each k value.
For the third subtask, there are 100 k values; output the amount of suboptimal jumps for each k value.
My Attempt
The following snippet is my shot at solving the problem, it gives the correct solution.
This was optimized to handle multiple k values without having to do a lot of unnecessary work.
The Problem is that even a solution with a single k value is o(n^2) in the worst case. (As k <= N)
A solution would be to eliminate the nested for loop, this is what I'm uncertain about how to approach it.
def solve(testcase):
N, Q = 10, 1
h = [1 , 2 , 4 ,2 , 8, 1, 2, 4, 8, 16] # output 3
# ^---- + ---^ 0 ^--- + --^ + ^
k = [3]
l_k = max(k)
distances = [99999999999] * N
distances[N-1] = 0
db = [ [0]*N for i in range(N)]
for i in range(N-2, -1, -1):
minLocalDistance = 99999999999
for j in range(min(i+l_k, N-1), i, -1):
minLocalDistance = min(minLocalDistance, distances[j] + (h[i] <= h[j]))
db[i][j] = distances[j] + (h[i] <= h[j])
distances[i] = minLocalDistance
print(f"Case #{testcase}: {distances[0]}")
NOTE: This is different from the classic min. jumps problem

Consider the best cost to get to a position i. It is the smaller of:
The minimum cost to get to any of the preceding k positions, plus one (a suboptimal jump); or
The minimum cost to get to any of the lower-height position in the same window (an optimal jump).
Case (1) can be handled with the sliding-window-minimum algorithm that you can find described, for example, here: Sliding window maximum in O(n) time. This takes amortized constant time per position, or O(N) all together.
Case (2) has a somewhat obvious solution with a BST: As the window moves, insert each new position into a BST sorted by height. Remove positions that are no longer in the window. Additionally, in each node, store the minimum cost within its subtree. With this structure, you can find the minimum cost for any height bound in O(log k) time.
The expense in case 2 leads to a total complexity of O(N log k) for a single k-value. That's not too bad for complexity, but such BSTs are somewhat complicated and aren't usually provided in standard libraries.
You can make this simpler and faster by recognizing that if the minimum cost in the window is C, then optimal jumps are only beneficial if they come from predecessors of cost C, because cost C+1 is attainable with a sub-optimal jump.
For each cost, then, you can use that same sliding-window-minimum algorithm to keep track of the minimum height in the window for nodes with that cost. Then for case (2), you just need to check to see if that minimum height for the minimum cost is lower than the height you want to jump to.
Maintaining these sliding windows again takes amortized constant time per operation, leading to O(N) time for the whole single-k-value algorithm.
I doubt that there would be any benefit in trying to manage multiple k-values at once.

recursion vs iteration time complexity

Could anyone explain exactly what's happening under the hood to make the recursive approach in the following problem much faster and efficient in terms of time complexity?
The problem: Write a program that would take an array of integers as input and return the largest three numbers sorted in an array, without sorting the original (input) array.
For example:
Input: [22, 5, 3, 1, 8, 2]
Output: [5, 8, 22]
Even though we can simply sort the original array and return the last three elements, that would take at least O(nlog(n)) time as the fastest sorting algorithm would do just that. So the challenge is to perform better and complete the task in O(n) time.
So I was able to come up with a recursive solution:
def findThreeLargestNumbers(array, largest=[]):
if len(largest) == 3:
return largest
max = array[0]
for i in array:
if i > max:
max = i
array.remove(max)
largest.insert(0, max)
return findThreeLargestNumbers(array, largest)
In which I kept finding the largest number, removing it from the original array, appending it to my empty array, and recursively calling the function again until there are three elements in my array.
However, when I looked at the suggested iterative method, I composed this code:
def findThreeLargestNumbers(array):
sortedLargest = [None, None, None]
for num in array:
check(num, sortedLargest)
return sortedLargest
def check(num, sortedLargest):
for i in reversed(range(len(sortedLargest))):
if sortedLargest[i] is None:
sortedLargest[i] = num
return
if num > sortedLargest[i]:
shift(sortedLargest, i, num)
return
def shift(array, idx, element):
if idx == 0:
array[0] = element
return array
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
return array
Both codes passed successfully all the tests and I was convinced that the iterative approach is faster (even though not as clean..). However, I imported the time module and put the codes to the test by providing an array of one million random integers and calculating how long each solution would take to return back the sorted array of the largest three numbers.
The recursive approach was way much faster (about 9 times faster) than the iterative approach!
Why is that? Even though the recursive approach is traversing the huge array three times and, on top of that, every time it removes an element (which takes O(n) time as all other 999 elements would need to be shifted in the memory), whereas the iterative approach is traversing the input array only once and yes making some operations at every iteration but with a very negligible array of size 3 that wouldn't even take time at all!
I really want to be able to judge and pick the most efficient algorithm for any given problem so any explanation would tremendously help.

Advice for optimization.
Avoid function calls. Avoid creating temporary garbage. Avoid extra comparisons. Have logic that looks at elements as little as possible. Walk through how your code works by hand and look at how many steps it takes.
Your recursive code makes only 3 function calls, and as pointed out elsewhere does an average of 1.5 comparisons per call. (1 while looking for the min, 0.5 while figuring out where to remove the element.)
Your iterative code makes lots of comparisons per element, calls excess functions, and makes calls to things like sorted that create/destroy junk.
Now compare with this iterative solution:
def find_largest(array, limit=3):
if len(array) <= limit:
# Special logic not needed.
return sorted(array)
else:
# Initialize the answer to values that will be replaced.
min_val = min(array[0:limit])
answer = [min_val for _ in range(limit)]
# Now scan for smallest.
for i in array:
if answer[0] < i:
# Sift elements down until we find the right spot.
j = 1
while j < limit and answer[j] < i:
answer[j-1] = answer[j]
j = j+1
# Now insert.
answer[j-1] = i
return answer
There are no function calls. It is possible that you can make up to 6 comparisons per element (verify that answer[0] < i, verify that (j=1) < 3, verify that answer[1] < i, verify that (j=2) < 3, verify that answer[2] < i, then find that (j=3) < 3 is not true). You will hit that worst case if array is sorted. But most of the time you only do the first comparison then move to the next element. No muss, no fuss.
How does it benchmark?
Note that if you wanted the smallest 100 elements, then you'd find it worthwhile to use a smarter data structure such as a heap to avoid the bubble sort.

I am not really confortable with python, but I have a different approach to the problem for what it's worth.
As far as I saw, all solutions posted are O(NM) where N is the length of the array and M the length of the largest elements array.
Because of your specific situation whereN >> M you could say it's O(N), but the longest the inputs the more it will be O(NM)
I agree with #zvone that it seems you have more steps in the iterative solution, which sounds like an valid explanation to your different computing speeds.
Back to my proposal, implements binary search O(N*logM) with recursion:
import math
def binarySearch(arr, target, origin = 0):
"""
Recursive binary search
Args:
arr (list): List of numbers to search in
target (int): Number to search with
Returns:
int: index + 1 from inmmediate lower element to target in arr or -1 if already present or lower than the lowest in arr
"""
half = math.floor((len(arr) - 1) / 2);
if target > arr[-1]:
return origin + len(arr)
if len(arr) == 1 or target < arr[0]:
return -1
if arr[half] < target and arr[half+1] > target:
return origin + half + 1
if arr[half] == target or arr[half+1] == target:
return -1
if arr[half] < target:
return binarySearch(arr[half:], target, origin + half)
if arr[half] > target:
return binarySearch(arr[:half + 1], target, origin)
def findLargestNumbers(array, limit = 3, result = []):
"""
Recursive linear search of the largest values in an array
Args:
array (list): Array of numbers to search in
limit (int): Length of array returned. Default: 3
Returns:
list: Array of max values with length as limit
"""
if len(result) == 0:
result = [float('-inf')] * limit
if len(array) < 1:
return result
val = array[-1]
foundIndex = binarySearch(result, val)
if foundIndex != -1:
result.insert(foundIndex, val)
return findLargestNumbers(array[:-1],limit, result[1:])
return findLargestNumbers(array[:-1], limit,result)
It is quite flexible and might be inspiration for a more elaborated answer.

The recursive solution
The recursive function goes through the list 3 times to fins the largest number and removes the largest number from the list 3 times.
for i in array:
if i > max:
...
and
array.remove(max)
So, you have 3×N comparisons, plus 3x removal. I guess the removal is optimized in C, but there is again about 3×(N/2) comparisons to find the item to be removed.
So, a total of approximately 4.5 × N comparisons.
The other solution
The other solution goes through the list only once, but each time it compares to the three elements in sortedLargest:
for i in reversed(range(len(sortedLargest))):
...
and almost each time it sorts the sortedLargest with these three assignments:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
So, you are N times:
calling check
creating and reversing a range(3)
accessing sortedLargest[i]
comparing num > sortedLargest[i]
calling shift
comparing idx == 0
and about 2×N/3 times doing:
array[0] = array[1]
array[idx-1] = array[idx]
array[idx] = element
and N/3 times array[0] = element
It is difficult to count, but that is much more than 4.5×N comparisons.

Find the maximum Index difference given constraints

In my quest on learning algorithm design, i started practicing questions and there is this particular questions that i have trouble with finding an efficient solution.
Given an array A of integers, find the maximum of j - i subjected to
the constraint of A[i] <= A[j]. A : [3 5 4 2] Output : 2 for the pair
(3, 4)
def maxIndex(arr):
max_val = float("-inf")
for i in range(0,len(arr)):
for j in range(i + 1 , len(arr)):
#print(arr[i],arr[j])
if arr[i] <= arr[j]:
diff_i = j - i
if diff_i > max_val:
max_val = diff_i
return max_val
A = [3, 5, 4, 2]
print("Result :",maxIndex(A))
My naive approach above will work but the time complexity is O(n^2) with a space complexity of O(1).
Here both the value and the indexes are important.IF i sort the list out of place and store the indices in a dictionary , i will still have to use a nested for loop to check for j - 1 constraint.
How can i improve the time complexity?

You can create two auxiliary array such that the min array stores at index i the minimum value till the index i, similarly the max array contains the max array value till index i (traversed in reverse)
You can find the answer here https://www.geeksforgeeks.org/given-an-array-arr-find-the-maximum-j-i-such-that-arrj-arri/

As has been mentioned, there is an O(n) solution which is the most efficient. I will add another way of solving it in O(n log n):
We can think of this problem as for each index i, know the furthest index j > i where a[i] <= a[j]. If we had this, we only need to evaluate the difference of the indexes and keep a maximum over it. So, how to calculate this information?
Add all elements to a set, in the form of a pair (element, index) so it first sorts by element, and then by index.
Now iterate the array backwards starting from last element. For every pair in the set where element is lower or equal to current element, we set its furthest index as current index and we remove it from set.
After all is done, evaluate the furthest index j of each i and the answer is the max of those
Note that for each element, we need to search in the set all values that are lower. The search is O(log n), and while we could iterate more, as we remove it later from the set we only end up iterating each element once, so the overall complexity is O(n log n).

A possible solution that I can think from the top of my head for the given problem would be to create a list of pair from the given list which preserves the list indices along with the list value, that is, list of (Ai, i) for all elements in the list.
You can sort this given list of pairs in ascending order and iterate from left-to-right. We maintain a variable which represents the minimum index we have encountered till now in our iteration min_index. Now at every step i, we update our answer as ans = max(ans, indexi - min_index) iff indexi > min_index and also our min_index and also our min_index as min_index = min(indexi, min_index) Since our list is sorted, it's guaranteed that A[i] >= A[min_index]
Since we need to sort the array initially, the overall complexity of the solution is O(nlog(n))

There is approach with O(nlogn) time (while I have feeling that linear algorithm should exist).
Make list min-candidates
Walk through the source list.
If current item is less than current minimum, add its index to min-candidates. So corresponding values are sorted in descending order.
If current item is larger than current minimum, search for the first less item in min-candidates with binary search. Find index difference and compare with the current best result.

This could be solved in O(nlogn) time and O(n) space.
Create a list of tuples of [value, index]
Sort them by value
Initialize min_index to some max value ( list.length + 1 )
Initialize a max value for difference of indices
Initialize a tuple to capture indices that has max difference.
Now go through following steps ( pseudo code ):
min_index = list.length + 1
max = 0
max_tuple = []
for tuple t in list:
min_index = minimum( t.index, min_index )
if ( t.index != min_index )
if ( t.index - min_index >= max )
max = t.index - min_index
max_tuple = [min_index, t.index]
In other words, you keep track of minimum index and because your list is sorted, as you go through the list in increasing value order, you will get a difference between the min index and your current index which you need to maximize.

max sum of list elements each separated by (at least) k elements

given a list of numbers to find the maximum sum of non-adjacent elements with time complexity o(n) and space complexity of o(1), i could use this :
sum1= 0
sum2= list[0]
for i in range(1, len(list)):
num= sum1
sum1= sum2+ list[i]
sum2= max(num, sum2)
print(max(sum2, sum1))
this code will work only if the k = 1 [ only one element between the summing numbers] how could improve it by changing k value using dynamic programming. where k is the number of elements between the summing numbers.
for example:
list = [5,6,4,1,2] k=1
answer = 11 # 5+4+2
list = [5,6,4,1,2] k=2
answer = 8 # 6+2
list = [5,3,4,10,2] k=1
answer = 15 # 5+10

It's possible to solve this with space O(k) and time O(nk). if k is a constant, this fits the requirements in your question.
The algorithm loops from position k + 1 to n. (If the array is shorter than that, it can obviously be solved in O(k)). At each step, it maintains an array best of length k + 1, such that the jth entry of best is the best solution found so far, such that the last element it used is at least j to the left of the current position.
Initializing best is done by setting, for its entry j, the largest non-negative entry in the array in positions 1, ..., k + 1 - j. So, for example, best[1] is the largest non-negative entry in positions 1, ..., k, and best[k + 1] is 0.
When at position i of the array, element i is used or not. If it is used, the relevant best until now is best[1], so write u = max(best[1] + a[i], best[1]). If element i is not used, then each "at least" part shifts one, so for j = 2, ..., k + 1, best[j] = max(best[j], best[j - 1]). Finally, set best[1] = u.
At the termination of the algorithm, the solution is the largest item in best.

EDIT:
I had misunderstood the question, if you need to have 'atleast' k elements in between then following is an O(n^2) solution.
If the numbers are non-negative, then the DP recurrence relation is:
DP[i] = max (DP[j] + A[i]) For all j st 0 <= j < i - k
= A[i] otherwise.
If there are negative numbers in the array as well, then we can use the idea from Kadane's algorithm:
DP[i] = max (DP[j] + A[i]) For all j st 0 <= j < i - k && DP[j] + A[i] > 0
= max(0,A[i]) otherwise.

Here's a quick implementation of the algorithm described by Ami Tavory (as far as I understand it). It should work for any sequence, though if your list is all negative, the maximum sum will be 0 (the sum of an empty subsequence).
import collections
def max_sum_separated_by_k(iterable, k):
best = collections.deque([0]*(k+1), k+1)
for item in iterable:
best.appendleft(max(item + best[-1], best[0]))
return best[0]
This uses O(k) space and O(N) time. All of the deque operations, including appending a value to one end (and implicitly removing one from the other end so the length limit is maintained) and reading from the ends, are O(1).
If you want the algorithm to return the maximum subsequence (rather than only its sum), you can change the initialization of the deque to start with empty lists rather than 0, and then append max([item] + best[-1], best[0], key=sum) in the body of the loop. That will be quite a bit less efficient though, since it adds O(N) operations all over the place.

Not sure for the complexity but coding efficiency landed me with
max([sum(l[i::j]) for j in range(k,len(l)) for i in range(len(l))])
(I've replace list variable by l not to step on a keyword).

Better algorithm (than using a dict) for enumerating pairs with a given sum.

Given a number, I have to find out all possible index-pairs in a given array whose sum equals that number. I am currently using the following algo:
def myfunc(array,num):
dic = {}
for x in xrange(len(array)): # if 6 is the current key,
if dic.has_key(num-array[x]): #look at whether num-x is there in dic
for y in dic[num-array[x]]: #if yes, print all key-pair values
print (x,y),
if dic.has_key(array[x]): #check whether the current keyed value exists
dic[array[x]].append(x) #if so, append the index to the list of indexes for that keyed value
else:
dic[array[x]] = [x] #else create a new array
Will this run in O(N) time? If not, then what should be done to make it so? And in any case, will it be possible to make it run in O(N) time without using any auxiliary data structure?

Will this run in O(N) time?
Yes and no. The complexity is actually O(N + M) where M is the output size.
Unfortunately, the output size is in O(N^2) worst case, for example the array [3,3,3,3,3,...,3] and number == 6 - it will result in quadric number of elements needed to be produced.
However - asymptotically speaking - it cannot be done better then this, because it is linear in the input size and output size.

Very, very simple solution that actually does run in O(N) time by using array references. If you want to enumerate all the output pairs, then of course (as amit notes) it must take O(N^2) in the worst case.
from collections import defaultdict
def findpairs(arr, target):
flip = defaultdict(list)
for i, j in enumerate(arr):
flip[j].append(i)
for i, j in enumerate(arr):
if target-j in flip:
yield i, flip[target-j]
Postprocessing to get all of the output values (and filter out (i,i) answers):
def allpairs(arr, target):
for i, js in findpairs(arr, target):
for j in js:
if i < j: yield (i, j)

This might help - Optimal Algorithm needed for finding pairs divisible by a given integer k
(With a slight modification, there we are seeing for all pairs divisible by given number and not necessarily just equal to given number)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.