What is the time complexity of the heapq.merge in python? - python

I read that the heapq.merge function is specifically used to merge 2 sorted arrays? is the time complexity O(n)? if not what is it and why? Also what is its space-complexity.
I was solving the question to merger 2 sorted arrays with 2 pointers and could achieve O(n) time complexity and O(n) space complexity.

If you are merging K sorted arrays and each array has N elements then heapq.merge will perform the operation in time complexity of NK(logK). Because NK is the total number of elements across all the K arrays and each element has to be compared. While logK is the time complexity of the bubble down operations from the top of the heap (that is the height of the heap).
In your case K=2, log(2) = 1. So in your special case it is O(n)

heapq.merge can be used to merge any number of sorted iterables. Its time complexity is O(NlogK) where N is the total number of elements whereas K are the items fed into the minheap for comparison.
The space complexity is O(K) because the minheap has K items at any given point in time during the execution.

Related

Big-O Notation for iteration over steps in list -Python

I'm looking to iterate over every third element in my list. But in thinking about Big-O notation, would the Big-O complexity be O(n) where n is the number of elements in the list, or O(n/3) for every third element?
In other words, even if I specify that the list should only be iterated over every third element, is Python still looping through the entire list?
Example code:
def function(lst):
#iterating over every third list
for i in lst[2::3]:
pass
When using Big-O notation we ignore any scalar multiples out the front of the functions. This is because the algorithm still takes "linear time". We do this because Big-O notation considers the behaviour of a algorithm as it scales to large inputs.
Meaning it doesn't matter if the algorithm is considering every element of the list or every third element the time complexity still scales linearly to the input size. For example if the input size is doubled, it would take twice as long to execute, no matter if you are looking at every element or every third element.
Mathematically we can say this because of the M term in the definition (https://en.wikipedia.org/wiki/Big_O_notation):
abs(f(x)) <= M * O(f(x))
Big O notation would remain O(n) here.
Consider the following:
n = some big number
for i in range(n):
print(i)
print(i)
print(i)
Does doing 3 actions count as O(3n) or O(n)? O(n). Does the real world performance slow down by doing three actions instead of one? Absolutely!
Big O notation is about looking at the growth rate of the function, not about the physical runtime.
Consider the following from the pandas library:
# simple iteration O(n)
df = DataFrame([{a:4},{a:3},{a:2},{a:1}])
for row in df:
print(row["a"])
# iterrows iteration O(n)
for idx, row in df.iterrows():
print(row["a"])
# apply/lambda iteration O(n)
df.apply(lambda x: print(x["row"])
All of these implementations can be considered O(n) (constant is dropped), however that doesn't necessarily mean that the runtime will be the same. In fact, method 3 should be about 800 times faster than method 1 (https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06)!
Another answer that may help you: Why is the constant always dropped from big O analysis?

Python algorithms and Big-O [duplicate]

This question already has answers here:
Big O, how do you calculate/approximate it?
(24 answers)
Closed 1 year ago.
I am trying to wrap my head around time complexity when it comes to algorithm design using Python.
I've been tasked with writing a function that meets the following requirements:
Must be linear O(n) time
must return the nth smallest number from a list of random numbers
I have found the following example online:
def nsmallest(numbers, nth):
result = []
for i in range(nth):
result.append(min(numbers))
numbers.remove(min(numbers))
return result
As I understand it, Big-O is an approximation and only dominant part of the function is considered when analyzing it time complexity.
So my question is:
Does calling min() within the loop influence the time complexity or does the function remain O(n) because min() executes in O(n) time?
Also, would adding another loop (not nested) to further parse the resulting list for the specific number keep the algorithm in linear time even if it contains two or three more constant operations per loop?
Does calling min() within the loop influence the time complexity or does the function remain O(n) because min() executes in O(n) time?
Yes it does. min() takes O(N) time to run and if you use it inside a loop that runs for O(N) time then the total time is now - O(N^2)
Also, would adding another loop (not nested) to further parse the resulting list for the specific number keep the algorithm in linear time even if it contains two or three more constant operations per loop?
Depends on what your loop does. Since you haven't mentioned what that loop does, it is difficult to guess.
min() complexity is O(n) (linear search)
list.remove() remove is also O(n)
So, each loop is O(n).
You are using it k times (and k can be up to n), so result complexity would be O(n^2) (O(kn)).
The idea of worst case linear time algorithm that you are looking for is described here (for example).
kthSmallest(arr[0..n-1], k)
Divide arr[] into ⌈n/5⌉ groups where
size of each group is 5 except possibly the last group which may have
less than 5 elements.
Sort the above created ⌈n/5⌉ groups and find
median of all groups. Create an auxiliary array ‘median[]’ and store
medians of all ⌈n/5⌉ groups in this median array. // Recursively call
this method to find median of median[0..⌈n/5⌉-1]
medOfMed =
kthSmallest(median[0..⌈n/5⌉-1], ⌈n/10⌉)
Partition arr[] around
medOfMed and obtain its position. pos = partition(arr, n, medOfMed)
If pos == k return medOfMed 6) If pos > k return
kthSmallest(arr[l..pos-1], k) 7) If pos < k return
kthSmallest(arr[pos+1..r], k-pos+l-1)

Comparision between O(n) and O(n logn)

Which is best algo for a normal search in array where array length is not known
O(n) (A simple for loop)
Or
O(n log n) (sorted(array)) in python
Correct me if the Time Complexity is wrong and also add any useful additional information.
Thanks in advance
It depends on how often you'll be doing the search.
Looking for a specific item in an unsorted list takes O(n) time.
Looking for a specific item in a sorted list takes O(lg n) time.
Turning an unsorted list into a sorted list takes O(n lg n) time.
Doing a single linear search in an unsorted list is faster than sorting the list in order to do a single binary search.
But doing multiple linear searches in an unsorted list can be much slower than sorting the list once and doing multiple binary searches in the sorted list.
When will it be faster overall to spend the time sorting the list to speed up future searches compared to just biting the bullet and doing linear searches? You can only answer that by carefully consider the size of the list, and trying to anticipate your likely workload.
Consider k searches. Sorting comes out on top if (using some very handwavy math) O(n lg n) + kO(lg n) is less than kO(n). That's true somewhere around the time k starts to exceed lg n. Given how slowly lg n grows, it typically does not take long for the initial investment in sorting to pay off.
yes you are correct :
for a single for loop of n integers the time complexity is O(n)
for(int i=0 ;i<n ; i++0)
if we have two for loops one inside another the it was o(n^2) best example is bubble sort
for(int i=0 ;i<n ; i++){
for (int j=0;j<n;j++){
.....................}}
if i value decreases into half for every iteration then the complexity will be in terms of log best example is mergsort ,binary search
for(int i=0 ;i<n ; i=i/2){}
searching for a item in a sorted list takes O(log n) time it was the best method. best application is binary search
converting an unsorted list into a sorted list takes O(n*logn) time i.e merge sort ,heap sort.

find longest noncontiguous nondecreasing subsequence in array

I want to prove why finding the longest non contiguous non decreasing sub sequence of an array of size n, in O(n).
By "find" I mean know its length, and a list of the relevant indices.
Here is a solution in NlogN.
Here is the Wikipedia article.
I want to convince myself that it can't be done any faster.
Here is partial proof:
Assume this were possible faster than O(nlogn), for simplicity, O(n) but this holds for anything better than O(nlogn)
We can merge two sorted arrays into a single sorted array composed of all elements of both in O(n1 + n2).
Given an array A, we could then find its longest longest non contiguous non decreasing sub sequence in O(n).
If this sequence is smaller then n/2, then for reversed(A) it is larger or equal to n/2 [I need proof for that]
This way, we can split the array into sorted chunks, each time in O(n), and being left with a remainder of size k that can also be split and sorted in O(k) + O(remainer) until we are left with one element which is O(1).
Thus, sorting the array would take O(n)
Here is a solution in O(n+klogk), where k is the number of elements which are in an unsorted position.
Armed with the above algorithm, we can find which of the k elements are out of their position by sorting using the above algorithm O(n+klogk) and traversing the unsorted and the sorted arrays together, finding the indices of the k (or less) differences (single pass, O(n)).
The rest of the indices would define a sorted array which is also a non consecutive subsequence of the input array, which is of the maximal length, because by definition, those indices are sorted, and the other k aren't.
In general, this is O(n log n), but in case there is a need to search for a longest non consecutive subsequence, it would make sense to optimize like this, because the problem will probably be such that n << k`.

Time Complexity of Spirally Traversing a 2D Matrix?

I am learning how to traverse a 2D matrix spirally, and I came across this following algorithm:
def spiralOrder(self, matrix):
result = []
while matrix:
result.extend(matrix.pop(0))
matrix = zip(*matrix)[::-1]
return result
I am currently having a hard time figuring out the time complexity of this question with the zip function being in the while loop.
It would be greatly appreciated if anyone could help me figure out the time complexity with explanations.
Thank you!
The known time complexity for this problem is a constant O(MxN) where M is the number of rows and N is the number of columns in a MxN matrix. This is an awesome algorithm but it looks like it might be slower.
Looking at it more closely, with every iteration of the loop you are undergoing the following operations:
pop() # O(1)
extend() # O(k) where k is the number of elements added that operation
*matrix # O(1) -> python optimizes unpacking for native lists
list(zip()) # O(j) -> python 2.7 constructs the list automatically and python 3 requires the list construction to run
[::-1] # O(j/2) -> reverse sort divided by two because zip halved the items
Regardless of how many loop iterations, by the time this completes you will have at least called result.extend on every element (MxN elements) in the matrix. So best case is O(MxN).
Where I am less sure is how much time the repeated zips and list reversals are adding. The loop is only getting called roughly M+N-1 times but the zip/reverse is done on (M-1) * N elements and then on (M-1) * (N-1) elements, and so on. My best guess is that this type of function is at least logarithmic so I would guess overall time complexity is somewhere around O(MxN log(MxN)).
https://wiki.python.org/moin/TimeComplexity
No matter how you traverse a 2D matrix, the time complexity will always be quadratic in terms of the dimensions.
A m×n matrix therefore takes O(mn) time to traverse, regardless if it is spiral or row-major.

Categories

Resources