Two sum running time O(n^2) or O(n) - python

For the two sum problems, find two numbers in a list that adds up to the target.
My solution is to create a dictionary/hash_table, and then store everything in it as (value, index) [Note: For duplicate numbers in the list: higher index would override lower index]
Then traverse through the list again to find the other item.
def twoSum(nums, target):
lookup = dict((v, i) for i, v in enumerate(nums))
for i, v in enumerate(nums):
if target - v in lookup and i != lookup[target-v]:
return [lookup[target - v], i]
I think the above algorithm would take O(n * n/2) =, hence O(n^2) time but I see some others say that it only takes linear time. Can someone confirm on this?

That algorithm takes constant time because the operation target - v in lookup runs in constant time. There is only one depth of for loop.
def twoSum(nums, target):
lookup = dict((v, i) for i, v in enumerate(nums)) # N
for i, v in enumerate(nums): # N
if target - v in lookup and i != lookup[target - v]: # average constant
return [lookup[target - v], i] # constant
If you perform an O(N) operation followed by another O(N) operation, the sequence is still O(N).
Here we're talking only about average time complexity. It's possible to have a really bad hashing function with a lot of collisions, such that target - v in lookup actually takes O(N) time, so the worst-case complexity is actually O(N^2). But with a dict you're unlikely to run into this scenario.

Related

O(log n) Search in a sorted python dictionary

I'm solving a programming question and stuck on the last piece of the puzzle.
This is the question: https://leetcode.com/problems/daily-temperatures/
I have a sorted (for values) dictionary and now I want to do a log(n) complexity search on the dictionary. Here's the code I have written so far.
def dailyTemperatures(self, T):
if len(T) == 0:
return []
if len(T) == 1:
return [0]
R = [None] * len(T)
#create map, populate map
M = {}
for i in range(0, len(T)):
M[i] = T[i]
#sort map by value(temps)
MS = sorted(M.items(), key=lambda x: x[1])
for i in MS:
print(i[0], i[1])
for i in range(0,len(T)):
t = T[i] #base value for comparison
R[i] = 0
x = 0
# find smallest x for which temp T[x] > T[i]
# Dictionary is sorted for Temps
R[i] = x - i
return R
The commented part in the loop is where I have trouble. I could not find an answer anywhere which would search a sorted dictionary and then filter by key.
Any tips or new suggestions to tackle this are also appreciated.
Your code could possibly be made to work, but: This algorithm is really just adding more layers of complexity on top of the naive brute force bubble sort-like algorithm, due to needing to backtrack for indexes.
Simplest modification is just to search for the minimum index > than current index. Store the position in the dict's .items() as part of the value so you can retrieve it. But, you can't binary search on index, because it is sorted by value, and index is not in order. This should give you an acceptable O(N) lookup.
You still have to search by index in the end (which has priority over temperature). Even with binary search, your attempted algorithm, ignoring the N log N complexity of pre-sorting, would at best still require O(N * log N * log N) for searching. Your current attempt would actually be O(N^2 log N), but with a third cached index table, nearest index lookup could be turned into log N.
It will be a very convoluted and inefficient algorithm, due to basically having to backtrack your search order. And it will have no advantage over a naive brute force (it's objectively worse).
Note: key point is that you need the nearest index, which is not in sorted order if you sort by value
If you still want to do it that way (I guess as a code golf challenge), you will want to add its position index in .items() of the dict to your dictionary, so when you look up your key in dict, you can find which starting position to start your search through the temperature sorted list. To get the log N, you will need to store each range of temperatures and their range of indexes. This part will probably be particularly complicated to implement. And of course you'll need to implement a binary search algorithm.
Stack algorithm:
Basic idea of below algorithm is that any lower temperatures that follow no longer matter.
eg: [...] 10 >20< 9 6 7 21. After 20; 9 6 7 (or anything <= 20) do not matter. After 9; 6 and 7 don't matter. etc.
So iterate from the end, adding numbers to the stack, popping off the stack numbers less than the current number.
Note that because the number of temperates is bound to 70 values, and numbers less than the current temperature are pruned off the stack at each iteration, both the complexity of searching for the next temperature, and the size of the stack, is bound to 70. In other words constant.
So for each item in T, you will search a maximum of 70 values in the worst case, ie: len(T) * 70.
Thus the complexity of the algorithm is O(N): number of items in T.
def dailyTemperatures(T):
res = [0]*len(T)
stack = []
for i, x in reversed([*enumerate(T)]):
if len(stack) < 1:
stack.append((i,x))
else:
while(len(stack)>0 and stack[-1][1]<=x):
stack.pop()
if len(stack)>0 and stack[-1][1]>x:
res[i] = stack[-1][0] - i
print(x, stack)
stack.append((i,x))
return res
print(dailyTemperatures([73, 74, 75, 71, 69, 72, 76, 73]))

Time complexity of solution to the four sum problem?

Given an array of integers, find all unique quartets summing up to a
specified integer.
I will provide two different solutions below, I was just wondering which one was more efficient with respect to time complexity?
Solution 1:
def four_sum(arr, s):
n = len(arr)
output = set()
for i in range(n-2):
for j in range(i+1, n-1):
seen = set()
for k in range(j+1, n):
target = s - arr[i] - arr[j] - arr[k]
if target in seen:
output.add((arr[i], arr[j], arr[k], target))
else:
seen.add(arr[k])
return print('\n'.join(map(str, list(output))))
I know that this has time complexity of O(n^3).
Solution 2:
def four_sum2(arr, s):
n = len(arr)
seen = {}
for i in range(n-1):
for j in range(i+1, n):
if arr[i] + arr[j] in seen:
seen[arr[i] + arr[j]].add((i, j))
else:
seen[arr[i] + arr[j]] = {(i, j)}
output = set()
for key in seen:
if s - key in seen:
for (i, j) in seen[key]:
for (p, q) in seen[s - key]:
sorted_index = tuple(sorted((arr[i], arr[j], arr[p], arr[q])))
if i not in (p, q) and j not in (p, q):
output.add(sorted_index)
return output
Now, the first block has a time complexity of O(n^2), but I'm not sure what the time complexity is on the second block?
TLDR: the complexity of this algorithm is O(n^4).
In the first part, a tuple is added in seen for all pair (i,j) where j>i.
Thus the number of tuples in seen is about (n-1)*n/2 = O(n^2) as you guess.
The second part is a bit more complex. If we ignore the first condition of the nested loops (critical case), the two first loops can iterate over all possible tuples in seen. Thus the complexity is at least O(n^2). For the third loop, it is a bit tricky: it is hard to know the complexity without making any assumption on the input data. However, we can assume that there is theoretically a critical case where seen[s - key] contains O(n^2) tuples. In such a case, the overall algorithm would run in O(n^4)!
Is this theoretical critical case practical?
Well, sadly yes. Indeed, take the input arr = [5, 5, ..., 5, 5] with s = 20 for example. The seen map will contains one key (10) associated to an array with (n-1)*n/2 = O(n^2) elements. In this case the two first loops of the second part will run in O(n^2) and third nested loop in O(n^2) too.
Thus the overall algorithm run in O(n^4).
However, note that in practice such case should be quite rare and the algorithm should run much faster on random inputs with many different numbers. The complexity can probably be improved to O(n^3) or even O(n^2) if this critical case is fixed (eg. by computing this pathological case separately).

Merging n/k sorted lists in nlog(n/k) - python

I have round(n/k) sorted sublists, meaning that the length of each sublist is k (and a single list with less than k length). I need to merge them into a single n-length sorted list using the classic O(m+n) merge function, so it would take O(n*log(n/k)).
I had two implementations, one with recursion (which seems to be right, but wouldn't work unless I'd change the recursion depth, which I am not allowed to, and I don't understand why actually, when the input list has no more than 10 sublists, each in length k=3):
def merge_sorted_blocks(lst):
i=0
pairs_lst=[]
n=len(lst)
while i<n-1:
pairs_lst.append(merge(lst[i],lst[i+1]))
i+=2
if n%2>0:
pairs_lst.append(lst[n-1])
if type(pairs_lst[0])!=list:
return pairs_lst
return merge_sorted_blocks(pairs_lst)
and one with consecutive the output list with the next sublist:
def merge_sorted_blocks(lst):
pairs_lst=[]
for i in lst:
pairs_lst=merge(pairs_lst,i)
return pairs_lst
but I don't thing it has the desired complexity, more like O(n*(k+2k+...))=O(n^2)).
I found this thread which suggests it does but I don't understand how:
https://math.stackexchange.com/questions/881599/on-log-k-for-merging-of-k-lists-with-total-of-n-elements
Is there something I'm missing, regarding each of these solutions?
For the second algorithm your computation has a fallacy. Moreover, the thread that you mentioned has some differences with your question.
You have k sublist with size of n/k. Since the complexity of merge function for two sets with size of n1 and n2 is O(n1 + n2), computation complexity of first merge of two sublist is O(2 * n/k), and complexity of the current sublist with the third sublist is O(3 * n/k). Hence, the complexity of the second algorithm is O(2*(n/k) + 3*(n/k) + ... + k*(n/k)) = O(nk).
For the first implementation, some details are missed. For example, if there is just one set (for example for the last step) the loop will be failed.
In addition, complexity analysis for the first algorithm is not accurate. If you want to implement the referenced algorithm, the algorithm is O(n/k * log(k)).

max sum of list elements each separated by (at least) k elements

given a list of numbers to find the maximum sum of non-adjacent elements with time complexity o(n) and space complexity of o(1), i could use this :
sum1= 0
sum2= list[0]
for i in range(1, len(list)):
num= sum1
sum1= sum2+ list[i]
sum2= max(num, sum2)
print(max(sum2, sum1))
this code will work only if the k = 1 [ only one element between the summing numbers] how could improve it by changing k value using dynamic programming. where k is the number of elements between the summing numbers.
for example:
list = [5,6,4,1,2] k=1
answer = 11 # 5+4+2
list = [5,6,4,1,2] k=2
answer = 8 # 6+2
list = [5,3,4,10,2] k=1
answer = 15 # 5+10
It's possible to solve this with space O(k) and time O(nk). if k is a constant, this fits the requirements in your question.
The algorithm loops from position k + 1 to n. (If the array is shorter than that, it can obviously be solved in O(k)). At each step, it maintains an array best of length k + 1, such that the jth entry of best is the best solution found so far, such that the last element it used is at least j to the left of the current position.
Initializing best is done by setting, for its entry j, the largest non-negative entry in the array in positions 1, ..., k + 1 - j. So, for example, best[1] is the largest non-negative entry in positions 1, ..., k, and best[k + 1] is 0.
When at position i of the array, element i is used or not. If it is used, the relevant best until now is best[1], so write u = max(best[1] + a[i], best[1]). If element i is not used, then each "at least" part shifts one, so for j = 2, ..., k + 1, best[j] = max(best[j], best[j - 1]). Finally, set best[1] = u.
At the termination of the algorithm, the solution is the largest item in best.
EDIT:
I had misunderstood the question, if you need to have 'atleast' k elements in between then following is an O(n^2) solution.
If the numbers are non-negative, then the DP recurrence relation is:
DP[i] = max (DP[j] + A[i]) For all j st 0 <= j < i - k
= A[i] otherwise.
If there are negative numbers in the array as well, then we can use the idea from Kadane's algorithm:
DP[i] = max (DP[j] + A[i]) For all j st 0 <= j < i - k && DP[j] + A[i] > 0
= max(0,A[i]) otherwise.
Here's a quick implementation of the algorithm described by Ami Tavory (as far as I understand it). It should work for any sequence, though if your list is all negative, the maximum sum will be 0 (the sum of an empty subsequence).
import collections
def max_sum_separated_by_k(iterable, k):
best = collections.deque([0]*(k+1), k+1)
for item in iterable:
best.appendleft(max(item + best[-1], best[0]))
return best[0]
This uses O(k) space and O(N) time. All of the deque operations, including appending a value to one end (and implicitly removing one from the other end so the length limit is maintained) and reading from the ends, are O(1).
If you want the algorithm to return the maximum subsequence (rather than only its sum), you can change the initialization of the deque to start with empty lists rather than 0, and then append max([item] + best[-1], best[0], key=sum) in the body of the loop. That will be quite a bit less efficient though, since it adds O(N) operations all over the place.
Not sure for the complexity but coding efficiency landed me with
max([sum(l[i::j]) for j in range(k,len(l)) for i in range(len(l))])
(I've replace list variable by l not to step on a keyword).

Better algorithm (than using a dict) for enumerating pairs with a given sum.

Given a number, I have to find out all possible index-pairs in a given array whose sum equals that number. I am currently using the following algo:
def myfunc(array,num):
dic = {}
for x in xrange(len(array)): # if 6 is the current key,
if dic.has_key(num-array[x]): #look at whether num-x is there in dic
for y in dic[num-array[x]]: #if yes, print all key-pair values
print (x,y),
if dic.has_key(array[x]): #check whether the current keyed value exists
dic[array[x]].append(x) #if so, append the index to the list of indexes for that keyed value
else:
dic[array[x]] = [x] #else create a new array
Will this run in O(N) time? If not, then what should be done to make it so? And in any case, will it be possible to make it run in O(N) time without using any auxiliary data structure?
Will this run in O(N) time?
Yes and no. The complexity is actually O(N + M) where M is the output size.
Unfortunately, the output size is in O(N^2) worst case, for example the array [3,3,3,3,3,...,3] and number == 6 - it will result in quadric number of elements needed to be produced.
However - asymptotically speaking - it cannot be done better then this, because it is linear in the input size and output size.
Very, very simple solution that actually does run in O(N) time by using array references. If you want to enumerate all the output pairs, then of course (as amit notes) it must take O(N^2) in the worst case.
from collections import defaultdict
def findpairs(arr, target):
flip = defaultdict(list)
for i, j in enumerate(arr):
flip[j].append(i)
for i, j in enumerate(arr):
if target-j in flip:
yield i, flip[target-j]
Postprocessing to get all of the output values (and filter out (i,i) answers):
def allpairs(arr, target):
for i, js in findpairs(arr, target):
for j in js:
if i < j: yield (i, j)
This might help - Optimal Algorithm needed for finding pairs divisible by a given integer k
(With a slight modification, there we are seeing for all pairs divisible by given number and not necessarily just equal to given number)

Categories

Resources