This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 1 year ago.
I am trying to wrap my head around time complexity when it comes to algorithm design using Python.
I've been tasked with writing a function that meets the following requirements:
Must be linear O(n) time
must return the nth smallest number from a list of random numbers
I have found the following example online:
def nsmallest(numbers, nth):
result = []
for i in range(nth):
result.append(min(numbers))
numbers.remove(min(numbers))
return result
As I understand it, Big-O is an approximation and only dominant part of the function is considered when analyzing it time complexity.
So my question is:
Does calling min() within the loop influence the time complexity or does the function remain O(n) because min() executes in O(n) time?
Also, would adding another loop (not nested) to further parse the resulting list for the specific number keep the algorithm in linear time even if it contains two or three more constant operations per loop?
Does calling min() within the loop influence the time complexity or does the function remain O(n) because min() executes in O(n) time?
Yes it does. min() takes O(N) time to run and if you use it inside a loop that runs for O(N) time then the total time is now - O(N^2)
Also, would adding another loop (not nested) to further parse the resulting list for the specific number keep the algorithm in linear time even if it contains two or three more constant operations per loop?
Depends on what your loop does. Since you haven't mentioned what that loop does, it is difficult to guess.
min() complexity is O(n) (linear search)
list.remove() remove is also O(n)
So, each loop is O(n).
You are using it k times (and k can be up to n), so result complexity would be O(n^2) (O(kn)).
The idea of worst case linear time algorithm that you are looking for is described here (for example).
kthSmallest(arr[0..n-1], k)
Divide arr[] into ⌈n/5⌉ groups where
size of each group is 5 except possibly the last group which may have
less than 5 elements.
Sort the above created ⌈n/5⌉ groups and find
median of all groups. Create an auxiliary array ‘median[]’ and store
medians of all ⌈n/5⌉ groups in this median array. // Recursively call
this method to find median of median[0..⌈n/5⌉-1]
medOfMed =
kthSmallest(median[0..⌈n/5⌉-1], ⌈n/10⌉)
Partition arr[] around
medOfMed and obtain its position. pos = partition(arr, n, medOfMed)
If pos == k return medOfMed 6) If pos > k return
kthSmallest(arr[l..pos-1], k) 7) If pos < k return
kthSmallest(arr[pos+1..r], k-pos+l-1)
Related
I'm looking to iterate over every third element in my list. But in thinking about Big-O notation, would the Big-O complexity be O(n) where n is the number of elements in the list, or O(n/3) for every third element?
In other words, even if I specify that the list should only be iterated over every third element, is Python still looping through the entire list?
Example code:
def function(lst):
#iterating over every third list
for i in lst[2::3]:
pass
When using Big-O notation we ignore any scalar multiples out the front of the functions. This is because the algorithm still takes "linear time". We do this because Big-O notation considers the behaviour of a algorithm as it scales to large inputs.
Meaning it doesn't matter if the algorithm is considering every element of the list or every third element the time complexity still scales linearly to the input size. For example if the input size is doubled, it would take twice as long to execute, no matter if you are looking at every element or every third element.
Mathematically we can say this because of the M term in the definition (https://en.wikipedia.org/wiki/Big_O_notation):
abs(f(x)) <= M * O(f(x))
Big O notation would remain O(n) here.
Consider the following:
n = some big number
for i in range(n):
print(i)
print(i)
print(i)
Does doing 3 actions count as O(3n) or O(n)? O(n). Does the real world performance slow down by doing three actions instead of one? Absolutely!
Big O notation is about looking at the growth rate of the function, not about the physical runtime.
Consider the following from the pandas library:
# simple iteration O(n)
df = DataFrame([{a:4},{a:3},{a:2},{a:1}])
for row in df:
print(row["a"])
# iterrows iteration O(n)
for idx, row in df.iterrows():
print(row["a"])
# apply/lambda iteration O(n)
df.apply(lambda x: print(x["row"])
All of these implementations can be considered O(n) (constant is dropped), however that doesn't necessarily mean that the runtime will be the same. In fact, method 3 should be about 800 times faster than method 1 (https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06)!
Another answer that may help you: Why is the constant always dropped from big O analysis?
Which is best algo for a normal search in array where array length is not known
O(n) (A simple for loop)
Or
O(n log n) (sorted(array)) in python
Correct me if the Time Complexity is wrong and also add any useful additional information.
Thanks in advance
It depends on how often you'll be doing the search.
Looking for a specific item in an unsorted list takes O(n) time.
Looking for a specific item in a sorted list takes O(lg n) time.
Turning an unsorted list into a sorted list takes O(n lg n) time.
Doing a single linear search in an unsorted list is faster than sorting the list in order to do a single binary search.
But doing multiple linear searches in an unsorted list can be much slower than sorting the list once and doing multiple binary searches in the sorted list.
When will it be faster overall to spend the time sorting the list to speed up future searches compared to just biting the bullet and doing linear searches? You can only answer that by carefully consider the size of the list, and trying to anticipate your likely workload.
Consider k searches. Sorting comes out on top if (using some very handwavy math) O(n lg n) + kO(lg n) is less than kO(n). That's true somewhere around the time k starts to exceed lg n. Given how slowly lg n grows, it typically does not take long for the initial investment in sorting to pay off.
yes you are correct :
for a single for loop of n integers the time complexity is O(n)
for(int i=0 ;i<n ; i++0)
if we have two for loops one inside another the it was o(n^2) best example is bubble sort
for(int i=0 ;i<n ; i++){
for (int j=0;j<n;j++){
.....................}}
if i value decreases into half for every iteration then the complexity will be in terms of log best example is mergsort ,binary search
for(int i=0 ;i<n ; i=i/2){}
searching for a item in a sorted list takes O(log n) time it was the best method. best application is binary search
converting an unsorted list into a sorted list takes O(n*logn) time i.e merge sort ,heap sort.
I am learning how to traverse a 2D matrix spirally, and I came across this following algorithm:
def spiralOrder(self, matrix):
result = []
while matrix:
result.extend(matrix.pop(0))
matrix = zip(*matrix)[::-1]
return result
I am currently having a hard time figuring out the time complexity of this question with the zip function being in the while loop.
It would be greatly appreciated if anyone could help me figure out the time complexity with explanations.
Thank you!
The known time complexity for this problem is a constant O(MxN) where M is the number of rows and N is the number of columns in a MxN matrix. This is an awesome algorithm but it looks like it might be slower.
Looking at it more closely, with every iteration of the loop you are undergoing the following operations:
pop() # O(1)
extend() # O(k) where k is the number of elements added that operation
*matrix # O(1) -> python optimizes unpacking for native lists
list(zip()) # O(j) -> python 2.7 constructs the list automatically and python 3 requires the list construction to run
[::-1] # O(j/2) -> reverse sort divided by two because zip halved the items
Regardless of how many loop iterations, by the time this completes you will have at least called result.extend on every element (MxN elements) in the matrix. So best case is O(MxN).
Where I am less sure is how much time the repeated zips and list reversals are adding. The loop is only getting called roughly M+N-1 times but the zip/reverse is done on (M-1) * N elements and then on (M-1) * (N-1) elements, and so on. My best guess is that this type of function is at least logarithmic so I would guess overall time complexity is somewhere around O(MxN log(MxN)).
https://wiki.python.org/moin/TimeComplexity
No matter how you traverse a 2D matrix, the time complexity will always be quadratic in terms of the dimensions.
A m×n matrix therefore takes O(mn) time to traverse, regardless if it is spiral or row-major.
def check_set(S, k):
S2 = k - S
set_from_S2=set(S2.flatten())
for x in S:
if(x in set_from_S2):
return True
return False
I have a given integer k. I want to check if k is equal to sum of two element of array S.
S = np.array([1,2,3,4])
k = 8
It should return False in this case because there are no two elements of S having sum of 8. The above code work like 8 = 4 + 4 so it returned True
I can't find an algorithm to solve this problem with complexity of O(n).
Can someone help me?
You have to account for multiple instances of the same item, so set is not good choice here.
Instead you can exploit dictionary with value_field = number_of_keys (as variant - from collections import Counter)
A = [3,1,2,3,4]
Cntr = {}
for x in A:
if x in Cntr:
Cntr[x] += 1
else:
Cntr[x] = 1
#k = 11
k = 8
ans = False
for x in A:
if (k-x) in Cntr:
if k == 2 * x:
if Cntr[k-x] > 1:
ans = True
break
else:
ans = True
break
print(ans)
Returns True for k=5,6 (I added one more 3) and False for k=8,11
Adding onto MBo's answer.
"Optimal" can be an ambiguous term in terms of algorithmics, as there is often a compromise between how fast the algorithm runs and how memory-efficient it is. Sometimes we may also be interested in either worst-case resource consumption or in average resource consumption. We'll loop at worst-case here because it's simpler and roughly equivalent to average in our scenario.
Let's call n the length of our array, and let's consider 3 examples.
Example 1
We start with a very naive algorithm for our problem, with two nested loops that iterate over the array, and check for every two items of different indices if they sum to the target number.
Time complexity: worst-case scenario (where the answer is False or where it's True but that we find it on the last pair of items we check) has n^2 loop iterations. If you're familiar with the big-O notation, we'll say the algorithm's time complexity is O(n^2), which basically means that in terms of our input size n, the time it takes to solve the algorithm grows more or less like n^2 with multiplicative factor (well, technically the notation means "at most like n^2 with a multiplicative factor, but it's a generalized abuse of language to use it as "more or less like" instead).
Space complexity (memory consumption): we only store an array, plus a fixed set of objects whose sizes do not depend on n (everything Python needs to run, the call stack, maybe two iterators and/or some temporary variables). The part of the memory consumption that grows with n is therefore just the size of the array, which is n times the amount of memory required to store an integer in an array (let's call that sizeof(int)).
Conclusion: Time is O(n^2), Memory is n*sizeof(int) (+O(1), that is, up to an additional constant factor, which doesn't matter to us, and which we'll ignore from now on).
Example 2
Let's consider the algorithm in MBo's answer.
Time complexity: much, much better than in Example 1. We start by creating a dictionary. This is done in a loop over n. Setting keys in a dictionary is a constant-time operation in proper conditions, so that the time taken by each step of that first loop does not depend on n. Therefore, for now we've used O(n) in terms of time complexity. Now we only have one remaining loop over n. The time spent accessing elements our dictionary is independent of n, so once again, the total complexity is O(n). Combining our two loops together, since they both grow like n up to a multiplicative factor, so does their sum (up to a different multiplicative factor). Total: O(n).
Memory: Basically the same as before, plus a dictionary of n elements. For the sake of simplicity, let's consider that these elements are integers (we could have used booleans), and forget about some of the aspects of dictionaries to only count the size used to store the keys and the values. There are n integer keys and n integer values to store, which uses 2*n*sizeof(int) in terms of memory. Add to that what we had before and we have a total of 3*n*sizeof(int).
Conclusion: Time is O(n), Memory is 3*n*sizeof(int). The algorithm is considerably faster when n grows, but uses three times more memory than example 1. In some weird scenarios where almost no memory is available (embedded systems maybe), this 3*n*sizeof(int) might simply be too much, and you might not be able to use this algorithm (admittedly, it's probably never going to be a real issue).
Example 3
Can we find a trade-off between Example 1 and Example 2?
One way to do that is to replicate the same kind of nested loop structure as in Example 1, but with some pre-processing to replace the inner loop with something faster. To do that, we sort the initial array, in place. Done with well-chosen algorithms, this has a time-complexity of O(n*log(n)) and negligible memory usage.
Once we have sorted our array, we write our outer loop (which is a regular loop over the whole array), and then inside that outer loop, use dichotomy to search for the number we're missing to reach our target k. This dichotomy approach would have a memory consumption of O(log(n)), and its time complexity would be O(log(n)) as well.
Time complexity: The pre-processing sort is O(n*log(n)). Then in the main part of the algorithm, we have n calls to our O(log(n)) dichotomy search, which totals to O(n*log(n)). So, overall, O(n*log(n)).
Memory: Ignoring the constant parts, we have the memory for our array (n*sizeof(int)) plus the memory for our call stack in the dichotomy search (O(log(n))). Total: n*sizeof(int) + O(log(n)).
Conclusion: Time is O(n*log(n)), Memory is n*sizeof(int) + O(log(n)). Memory is almost as small as in Example 1. Time complexity is slightly more than in Example 2. In scenarios where the Example 2 cannot be used because we lack memory, the next best thing in terms of speed would realistically be Example 3, which is almost as fast as Example 2 and probably has enough room to run if the very slow Example 1 does.
Overall conclusion
This answer was just to show that "optimal" is context-dependent in algorithmics. It's very unlikely that in this particular example, one would choose to implement Example 3. In general, you'd see either Example 1 if n is so small that one would choose whatever is simplest to design and fastest to code, or Example 2 if n is a bit larger and we want speed. But if you look at the wikipedia page I linked for sorting algorithms, you'll see that none of them is best at everything. They all have scenarios where they could be replaced with something better.
I just took a Codility demo test. The question and my answer can be seen here, but I'll paste my answer here as well. My response:
def solution(A):
# write your code in Python 2.7
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 # increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
return retresult
My question is around time complexity, which I hope to better understand by your response. The question asks for expected worst-case time complexity is O(N).
Does my function have O(N) time complexity? Does the fact that I sort the array increase the complexity, and if so how?
Codility reports (for my answer)
Detected time complexity:
O(N) or O(N * log(N))
So, what is the complexity for my function? And if it is O(N*log(N)), what can I do to decrease the complexity to O(N) as the problem states?
Thanks very much!
p.s. my background reading on time complexity comes from this great post.
EDIT
Following the reply below, and the answers described here for this problem, I would like to expand on this with my take on the solutions:
basicSolution has an expensive time complexity and so is not the right answer for this Codility test:
def basicSolution(A):
# 0(N*log(N) time complexity
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 #increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
else:
continue; # negative numbers and 0 don't need any work
return retresult
hashSolution is my take on what is described in the above article, in the "use hashing" paragraph. As I am new to Python, please let me know if you have any improvements to this code (it does work though against my test cases), and what time complexity this has?
def hashSolution(A):
# 0(N) time complexity, I think? but requires 0(N) extra space (requirement states to use 0(N) space
table = {}
for i in A:
if i > 0:
table[i] = True # collision/duplicate will just overwrite
for i in range(1,100000+1): # the problem says that the array has a maximum of 100,000 integers
if not(table.get(i)): return i
return 1 # default
Finally, the actual 0(N) solution (O(n) time and O(1) extra space solution) I am having trouble understanding. I understand that negative/0 values are pushed at the back of the array, and then we have an array of just positive values. But I do not understand the findMissingPositive function - could anyone please describe this with Python code/comments? With an example perhaps? I've been trying to work through it in Python and just cannot figure it out :(
It does not, because you sort A.
The Python list.sort() function uses Timsort (named after Tim Peters), and has a worst-case time complexity of O(NlogN).
Rather than sort your input, you'll have to iterate over it and determine if any integers are missing by some other means. I'd use a set of a range() object:
def solution(A):
expected = set(range(1, len(A) + 1))
for i in A:
expected.discard(i)
if not expected:
# all consecutive digits for len(A) were present, so next is missing
return len(A) + 1
return min(expected)
This is O(N); we create a set of len(A) (O(N) time), then we loop over A, removing elements from expected (again O(N) time, removing elements from a set is O(1)), then test for expected being empty (O(1) time), and finally get the smallest element in expected (at most O(N) time).
So we make at most 3 O(N) time steps in the above function, making it a O(N) solution.
This also fits the storage requirement; all use is a set of size N. Sets have a small overhead, but always smaller than N.
The hash solution you found is based on the same principle, except that it uses a dictionary instead of a set. Note that the dictionary values are never actually used, they are either set to True or absent. I'd rewrite that as:
def hashSolution(A):
seen = {i for i in A if i > 0}
if not seen:
# there were no positive values, so 1 is the first missing.
return 1
for i in range(1, 10**5 + 1):
if i not in seen:
return i
# we can never get here because the inputs are limited to integers up to
# 10k. So either `seen` has a limited number of positive values below
# 10.000 or none at all.
The above avoids looping all the way to 10.000 if there were no positive integers in A.
The difference between mine and theirs is that mine starts with the set of expected numbers, while they start with the set of positive values from A, inverting the storage and test.