Space complexity of dictionary python - python

Its a leethcode Question
https://leetcode.com/problems/find-the-duplicate-number/
Here they are saying :
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n^2).
There is only one duplicate number in the array, but it could be repeated more than once.
So in my code i am creating a dictionary using Collection in python.
How my code is satisfying the this line "You must use only constant, O(1) extra space." and what do they mean by this line are they taking about Space complexity. Below is my code, which clears all testcase.
from collections import Counter
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
dict1=Counter(nums)
for i in dict1:
if(dict1[i]>1):
return(i)
Please help. Thanks in advance.

Generally, a dictionary has always space complexity of O(N), because it depends on the number of elements of your array.
A space complexity of O(1) means that you have the same number of pointers regardless of the array size. For instance, if you use a boolean variable in your search algorithm to get your duplicate, this would imply O(1).
Side note:
Another thing is the runtime complexity, which in the case of a dictionary, is O(1) since they are based on hash-tables where you only need a key to get the value. Oppositely, to find a particular value in a list, the runtime complexity is O(N), since in the worst case you have to iterate all the elements.

Dictionaries take O(n) space, so your solution takes O(n) space and violates the O(1) space requirement.
This is an old LeetCode problem, when LeetCode's focus was on job interviews, where such requirements can come up and be discussed (and used to be discussed in LeetCode's forum). It was never enforced by the LeetCode system, that's why your solution gets accepted despite violating the requirement. By now LeetCode is competition-focused and has become just like any other coding challenges site: It only matters whether you get your solution accepted, now how. They still don't (can't?) enforce such space requirements, and I think their new questions don't ask for something like that anymore. I miss the old days.

Your main question has been answered already. For this problem, we'd binary search:
class Solution:
def findDuplicate(self, nums):
lo, hi = 0, len(nums) - 1
mid = (lo + hi) // 2
while hi - lo > 1:
count = 0
for num in nums:
if mid < num <= hi:
count += 1
if count > hi - mid:
lo = mid
else:
hi = mid
mid = (lo + hi) // 2
return hi
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.

Related

How to write in with its fundamental brute steps?

I was looking to solve the longest consecutive sequence question on Leetcode and this is the provided solution.
The question lies in the inner loop right after # how to rewrite this part without in?
class Solution:
def longestConsecutive(self, nums):
longest_streak = 0
for num in nums:
current_num = num
current_streak = 1
# how to rewrite this part without in?
while current_num + 1 in nums:
current_num += 1
current_streak += 1
longest_streak = max(longest_streak, current_streak)
return longest_streak
I wrote a version that of the inner loop doesn't use in like this.
while j < n and i != j:
if nums[j] == currentSequenceNumber + 1:
currentSequenceLength += 1
currentSequenceNumber = nums[j]
j += 1
I realized after running pdb that this approach would only work for 2 consecutive numbers but not more. How could I rewrite my original portion to keep checking without using in. I have a feeling that in using a similar approach as find when it comes to sequences. I have seen this link for find in strings but it is not the brute force approach that I would like to write out.
I think seeing this how this can be rewritten would clarify why the space complexity is O(n^3) as the solution states. I currently can't understand why with their explanation.
Would write this as a comment but I lack rep (!). I think time complexity of in depends on the structure it is applied to, for example if it is done on a list it has to search every item in the list in series but for a set or dict then it can lookup by hash.
So I can't tell you how to write in unless I know what type nums is (a good use case for type hints here also). However:
list: average O(n)
set: average O(1) worst O(n)
BTW, sets are awesome and a great reason to use Python. Oftentimes people use lists all over the place and wonder why their code doesn't scale...
EDIT: due to stupid rules I still can't comment, so leaving this here. No, I'm not going to write a for, if, break loop for you, it's absolutely trivial.

Most efficient way to find mode in an array using python? Return type is an array of integers

Here is my solution, which works in O(N) time and O(N) space:
def find_mode(array):
myDict = {}
result = []
for i in range(len(array)):
if array[i] in myDict:
myDict[array[i]] += 1
else:
myDict[array[i]] = 1
maximum = max(myDict.values())
for key, value in myDict.items():
if value == maximum:
result.append(key)
return result
I can't think of a more efficient solution than O(N) but if anyone has any improvements to this function please let me know. The return type is an array of integers.
First, you should note that O(n) worst-case time cannot be improved upon with a deterministic, non-randomized algorithm, since we may need to check all elements.
Second, since you want all modes, not just one, the best space complexity of any possible algorithm is O(|output|), not O(1).
Third, this is as hard as the Element distinctness problem. This implies that any algorithm that is 'expressible' in terms of decision trees only, can at best achieve Omega(n log n) runtime. To beat this, you need to be able to hash elements or use numbers to index the computer's memory or some other non-combinatorial operation. This isn't a rigorous proof that O(|output|) space complexity with O(n) time is impossible, but it means you'll need to specify a model of computation to get a more precise bound on runtime, or specify bounds on the range of integers in your array.
Lastly, and most importantly, you should profile your code if you are worried about performance. If this is truly the bottleneck in your program, then Python may not be the right language to achieve the absolute minimum number of operations needed to solve this problem.
Here's a more Pythonic approach, using the standard library's very useful collections.Counter(). The Counter initialization (in CPython) is usually done through a C function, which will be faster than your for loop. It is still O(n) time and space, though.
def find_mode(array: List[int]) -> List[int]:
counts = collections.Counter(array)
maximum = max(counts.values())
return [key for key, value in counts.items()
if value == maximum]

O(N) Time complexity for simple Python function

I just took a Codility demo test. The question and my answer can be seen here, but I'll paste my answer here as well. My response:
def solution(A):
# write your code in Python 2.7
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 # increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
return retresult
My question is around time complexity, which I hope to better understand by your response. The question asks for expected worst-case time complexity is O(N).
Does my function have O(N) time complexity? Does the fact that I sort the array increase the complexity, and if so how?
Codility reports (for my answer)
Detected time complexity:
O(N) or O(N * log(N))
So, what is the complexity for my function? And if it is O(N*log(N)), what can I do to decrease the complexity to O(N) as the problem states?
Thanks very much!
p.s. my background reading on time complexity comes from this great post.
EDIT
Following the reply below, and the answers described here for this problem, I would like to expand on this with my take on the solutions:
basicSolution has an expensive time complexity and so is not the right answer for this Codility test:
def basicSolution(A):
# 0(N*log(N) time complexity
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 #increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
else:
continue; # negative numbers and 0 don't need any work
return retresult
hashSolution is my take on what is described in the above article, in the "use hashing" paragraph. As I am new to Python, please let me know if you have any improvements to this code (it does work though against my test cases), and what time complexity this has?
def hashSolution(A):
# 0(N) time complexity, I think? but requires 0(N) extra space (requirement states to use 0(N) space
table = {}
for i in A:
if i > 0:
table[i] = True # collision/duplicate will just overwrite
for i in range(1,100000+1): # the problem says that the array has a maximum of 100,000 integers
if not(table.get(i)): return i
return 1 # default
Finally, the actual 0(N) solution (O(n) time and O(1) extra space solution) I am having trouble understanding. I understand that negative/0 values are pushed at the back of the array, and then we have an array of just positive values. But I do not understand the findMissingPositive function - could anyone please describe this with Python code/comments? With an example perhaps? I've been trying to work through it in Python and just cannot figure it out :(
It does not, because you sort A.
The Python list.sort() function uses Timsort (named after Tim Peters), and has a worst-case time complexity of O(NlogN).
Rather than sort your input, you'll have to iterate over it and determine if any integers are missing by some other means. I'd use a set of a range() object:
def solution(A):
expected = set(range(1, len(A) + 1))
for i in A:
expected.discard(i)
if not expected:
# all consecutive digits for len(A) were present, so next is missing
return len(A) + 1
return min(expected)
This is O(N); we create a set of len(A) (O(N) time), then we loop over A, removing elements from expected (again O(N) time, removing elements from a set is O(1)), then test for expected being empty (O(1) time), and finally get the smallest element in expected (at most O(N) time).
So we make at most 3 O(N) time steps in the above function, making it a O(N) solution.
This also fits the storage requirement; all use is a set of size N. Sets have a small overhead, but always smaller than N.
The hash solution you found is based on the same principle, except that it uses a dictionary instead of a set. Note that the dictionary values are never actually used, they are either set to True or absent. I'd rewrite that as:
def hashSolution(A):
seen = {i for i in A if i > 0}
if not seen:
# there were no positive values, so 1 is the first missing.
return 1
for i in range(1, 10**5 + 1):
if i not in seen:
return i
# we can never get here because the inputs are limited to integers up to
# 10k. So either `seen` has a limited number of positive values below
# 10.000 or none at all.
The above avoids looping all the way to 10.000 if there were no positive integers in A.
The difference between mine and theirs is that mine starts with the set of expected numbers, while they start with the set of positive values from A, inverting the storage and test.

Recursive algorithm using memoization

My problem is as follows:
I have a list of missions each taking a specific amount of time and grants specific amount of points, and a time 'k' given to perform them:
e.g: missions = [(14,3),(54,5),(5,4)] and time = 15
in this example I have 3 missions and the first one gives me 14 points and takes 3 minutes.
I have 15 minutes total.
Each mission is a tuple with the first value being num of points for this mission and second value being num of minutes needed to perform this mission.
I have to find recursively using memoization the maximum amount of points I am able to get for a given list of missions and given time.
I am trying to implement a function called choose(missions,time) that will operate recursively and use the function choose_mem(missions,time,mem,k) to achive my goal.
the function choose_mem should get 'k' which is the number of missions to choose from, and mem which is an empty dictionary, mem, which will contain all the problems that were already been solved before.
This is what I got so far, I need help implementing what is required above, I mean the dictionary usage (which is currently just there and empty all the time), and also the fact that my choose_mem function input is i,j,missions,d and it should be choose_mem(missions, time, mem, k) where mem = d and k is the number of missions to choose from.
If anyone can help me adjust my code it would be very appreciated.
mem = {}
def choose(missions, time):
j = time
result = []
for i in range(len(missions), 0, -1):
if choose_mem(missions, j, mem, i) != choose_mem(missions, j, mem, i-1):
j -= missions[i - 1][1]
return choose_mem(missions, time, mem, len(missions))
def choose_mem(missions, time, mem, k):
if k == 0: return 0
points, a = missions[k - 1]
if a > time:
return choose_mem(missions, time, mem, k-1)
else:
return max(choose_mem(missions, time, mem, k-1),
choose_mem(missions, time-a, mem, k-1) + points)
This is a bit vague, but your problem is roughly translated to a very famous NP-complete problem, the Knapsack Problem.
You can read a bit more about it on wikipedia, if you replace weight with time, you have your problem.
Dynamic programming is a common way to approach that problem, as you can see here:
http://en.wikipedia.org/wiki/Knapsack_problem#Dynamic_programming
Memoization is more or less equivalent to Dynamic Programming, for pratical matters, so don't let the fancy name fool you.
The base concept is that you use an additional data structure to store parts of your problem that you already solved. Since the solution you're implementing is recursive, many sub-problems will overlap, and memoization allows you to only calculate each of them once.
So, the hard part is for you to think about your problem, what what you need to store in the dictionary, so that when you call choose_mem with values that you already calculated, you simply retrieve them from the dictionary, instead of doing another recursive call.
If you want to check an implementation of the generic 0-1 Knapsack Problem (your case, since you can't add items partially), then this seemed to me like a good resource:
https://sites.google.com/site/mikescoderama/Home/0-1-knapsack-problem-in-p
It's well explained, and the code is readable enough. If you understand the usage of the matrix to store costs, then you'll have your problem worked out for you.

creating a hash-based sorting algorithm

For experimental and learning purposes. I was trying to create a sorting algorithm from a hash function that gives a value biased on alphabetical sequence of the string, it then would ideally place it in the right place from that hash. i tryed looking for a hash-biased sorting function but only found one for integers and would be a memory hog if adapted for my purposes.
The reasoning is that theoretically if done right this algorithm can achieve O(n) speeds or nearly so.
So here is what i have worked out in python so far:
letters = {'a':0,'b':1,'c':2,'d':3,'e':4,'f':5,'g':6,'h':7,'i':8,'j':9,
'k':10,'l':11,'m':12,'n':13,'o':14,'p':15,'q':16,'r':17,
's':18,'t':19,'u':20,'v':21,'w':22,'x':23,'y':24,'z':25,
'A':0,'B':1,'C':2,'D':3,'E':4,'F':5,'G':6,'H':7,'I':8,'J':9,
'K':10,'L':11,'M':12,'N':13,'O':14,'P':15,'Q':16,'R':17,
'S':18,'T':19,'U':20,'V':21,'W':22,'X':23,'Y':24,'Z':25}
def sortlist(listToSort):
listLen = len(listToSort)
newlist = []
for i in listToSort:
k = letters[i[0]]
for j in i[1:]:
k = (k*26) + letters[j]
norm = k/pow(26,len(i)) # get a float hash that is normalized(i think thats what it is called)
# 2nd part
idx = int(norm*len(newlist)) # get a general of where it should go
if newlist: #find the right place from idx
if norm < newlist[idx][1]:
while norm < newlist[idx][1] and idx > 0: idx -= 1
if norm > newlist[idx][1]: idx += 1
else:
while norm > newlist[idx][1] and idx < (len(newlist)-1): idx += 1
if norm > newlist[idx][1]: idx += 1
newlist.insert(idx,[i,norm])# put it in the right place with the "norm" to ref later when sorting
return newlist
i think that the 1st part is good, but the 2nd part needs help. so the Qs would be what would be the best way to do something like this or is it even possible to get O(n) time (or near that) out of this?
the testing i did with an 88,000 word list took prob about 5 min, 10,000 took about 30 sec it got a lot worse as the list count went up.
if this idea actually works out then i would recode it in C to get some real speed and optimizations.
The 2nd part is there only because it works - even if slow, and i cant think of a better way to do it for the life of me, i would like to replace it with something that would not have to do the other loops if at all possible.
thank for any advice or ideas that you could give.
On sorting in O(n): you can't do it generally for all inputs, period. It is simply, fundamentally, mathematically impossible.
Here's the nice, short information-theoretic proof of impossibility: to sort, you have to be able to distinguish among the n! possible orderings of the input; to do so, you have to get log2(n!) bits of data; to do that, you need to do O(log (n!)) comparisons, which is O(n log n). Any sorting algorithm that claims to run in O(n) is either running on specialized data (e.g. data with a fixed number of bits), or is not correct.
Implementing a sorting algorithm is a good learning exercise, but you may want to stick to existing algorithms until you are comfortable with the concepts and methods commonly employed. It might be rather frustrating otherwise if the algorithm doesn't work.
Have fun learning!
P.S. Python's built-in timsort algorithm is really good on a lot of real-world data. So, if you need a general sorting algorithm for production code, you can usually rely on .sort/sorted to be fast enough for your needs. (And, if you can understand timsort, you'll do better than 90% of the Python-wielding population :)

Categories

Resources