How do I identify O(nlogn) exactly? - python

I have understood O(logn) in a sense that it increases quickly but with larger inputs, the rate of increase retards.
I am not able to completely understand
O(nlogn)
the difference between an algorithm with complexity nlogn and complexity n + logn.
I could use a modification of the phone book example and/or some basic python code to understand the two queries

How do you think of O(n ^ 2)?
Personally, I like to think of it as doing O(n) work O(n) times.
A contrived O(n ^ 2) algorithm would be to iterate through all pairs of numbers in 0, 1, ..., n - 1
def print_pairs(n):
for i in range(n):
for j in range(i + 1, n):
print('({},{})'.format(i, j))
Using similar logic as above, you could do O(log n) work O(n) times and have a time complexity of O(n log n).
As an example, we are going to use binary search to find all indices of elements in an array.
Yes, I understand this is a dumb example but here I don't want to focus on the usefulness of the algorithm but rather the complexity. For the sake of the correctness of our algorithm let us assume that the input array is sorted. Otherwise, our binary search does not work as intended and could possibly run indefinitely.
def find_indices(arr):
indices = []
for num in arr:
index = binary_search(arr, 0, len(arr), num)
indices.append(index)
return indices
def binary_search(arr, l, r, x):
# Check base case
if r >= l:
mid = l + (r - l)/2
# If element is present at the middle itself
if arr[mid] == x:
return mid
# If element is smaller than mid, then it
# can only be present in left subarray
elif arr[mid] > x:
return binary_search(arr, l, mid-1, x)
# Else the element can only be present
# in right subarray
else:
return binary_search(arr, mid + 1, r, x)
else:
# Element is not present in the array
return -1
As for your second question,
surely, log n << n as n tends to infinity so
O(n + log n) = O(n)
In theory, the log n is dwarfed by the n as we get arbitrarily large so we don't include it in our Big O analysis.
Juxtaposed to practice, where you might want to consider this extra log n work if your algorithm is suffering performance and/or scaling issues.

log n is a much slower growing function than n. When computer scientists speak of big-O, they are interested in the growth of the function for extremely large input values. What the function does near some small number or inflection point is immaterial.
Many common algorithms have time complexity of n log n. For example, merge sort requires n steps to be taken log_2(n) times as the input data is split in half. After studying the algorithm, the fact that its complexity is n log n may come to you by intuition, but you could arrive at the same conclusion by studying the recurrence relation that describes the (recursive) algorithm--in this case T(n) = 2 * T(n / 2) + n. More generally but perhaps least intuitively, the master theorem can be applied to arrive at this n log n expression. In short, don't feel intimidated if it isn't immediately obvious why certain algorithms have certain running times--there are many ways you can take to approach the analysis.
Regarding "complexity n + log n", this isn't how big-O notation tends to get used. You may have an algorithm that does n + log n work, but instead of calling that O(n + log n), we'd call that O(n) because n grows so much faster than log n that the log n term is negligible. The point of big-O is to state only the growth rate of the fastest growing term.
Compared with n log n, an log n algorithm is less complex. If log n is the time complexity of inserting an item into a self-balancing search tree, n log n would be the complexity of inserting n items into such a structure.

There is Grokking algorithms awesome book that explains algorithms complexity detection (among other things) exhaustively and by a very simple language.

Technically, algorithms with complexity O(n + log n) and complexity O(n) are the same, as the log n term becomes negligible when n grows.
O(n) grows linearly. The slope is constant.
O(n log n) grows super-linearly. The slope increases (slowly).

Related

i have written a python code to find maximum element in a list

can you tell me the time complexity of the code, I am using the divide and conquer technique?
def max_of_list(l):
if(len(l)==1):
return l[0]
else:
left_max=max_of_list(l[:len(l)//2])
righ_max=max_of_list(l[len(l)//2:])
return max(left_max,righ_max)
You'll need to use the master theorem since this is a recursive algorithm:
T(n) = a T(n/b) + f(n)
a: number of subproblems
b: size reduction of subproblems
f(n): complexity of split/join of subproblems process
This algorithm is recursive heavy, since f(n) the process of split/joining has complexity O(1). As such the complexity of the algorithm is O(n^c) where c is the critical exponent and is given by:
c = log(a) / log(b)
In this particular case:
c = log(2)/log(2) = 1
Thus, the complexity of the algorithm is linear: e.g. O(n)
You can read more about the Master theorem

Time complexity of python function

I am trying to solve the time complexity of this function (I'm still new to solving complexity problems) and was wondering what the time complexity of this function would be:
def mystery(lis):
n = len(lis)
for index in range(n):
x = 2*index % n
lis[index],lis[x] = lis[x],lis[index]
print(lis)
I believe the answer is O(n) but I am not 100% sure as the line: x = 2*index % n is making me wonder if it is maybe O(n log n).
The operation to * two operands together is usually consider constant time in time complexity analysis. Same with %.
The fact that you have n as one of the operand doesn't make it O(n) because n is a single number. To make it O(n) you need to perform an operation n times.

Why the time complexity for shell sort is nlogn in my data?

environment: python3.6, Anaconda 5.1, Jupyter notebook, numba.
I used a random array generated by Python to measure the time complexity of shell sort, but found that its time complexity is more in line with NlogN.
I understand that the time complexity of shell sort is O(n^2), I am confused.
Shell sort code:
def shell_sort(list):
n = len(list)
gap = n // 2
while gap > 0:
for i in range(gap, n):
temp = list[i]
j = i
while j >= gap and list[j - gap] > temp:
list[j] = list[j - gap]
j -= gap
list[j] = temp
gap = gap // 2
return list
shell sort time complexity analysis
O(n^2) is only the worst case time complexity, so the algorithm can run in less time than that on a random input and even on average (or even on almost all its inputs...).
Also the complexity of Shellsort depends on the "gap sequence" you selected.
Certain gap sequences result in a worst time case smaller than O(n^2), like O(n^1.5) for the gap sequence 1, 4, 13, 40, 121, ... or even O(nlog^2(n)) for 1, 2, 3, 4, 6, 8, 9, 12, ... (both due to Pratt, 1971). In other words: just trying on one input is not significant at all and the claim about O(n^2) may be false depending on the exact implementation of the algorithm.
There exists a lot of problems relative to Shell sort complexity and it is suspected that with some appropriate choice of parameters and for some inputs, its complexity could be O(n.logn).
I have studied shellsort for smaller n and I can state unequivocally that the gap sequence that produces the best average (number of comparisons) for n = 10, which my software tested on n! different orderings (the entire set) and on 2^(n-2) gap sequences (all possible sequences for n = 10) is {9,6,1}.
The average is clearly O(n * log(n)) as is the worst case.
The best case is similar to insertion sort's n-1 comparisons - O(n) complexity - best case (already ordered data) not only because it can be similarly calculated:
(n-9)+(n-6)+(n-1) = (n * # of gaps)-sum(gaps) = 14.
I know that most would say this is O(n * log(n)) complexity. Personally, I think this is unwarranted because it does not need to be estimated: for any n worked on by any gap sequence the best case is easily and exactly determined.
I'll be stubborn and drive this home: let's just assume, for the sake of argument, that the best case for any shellsort is (n * # of gaps). For n = 10 & # of gaps = 3 the best case would be 10 * 3 or 30. Would this be O(n) complexity? I can see why it could be. Yet shellsort's best case is significantly less than (n * # of gaps) so why is it O(n * log(n))?
It is possible (though extreeeeemely unlikely) that the OP managed to pinpoint the best (or close to it) gap sequence for his n, resulting in O(n * log(n)) complexity. But determining that requires more analysis than guts can provide.

What's the time complexity for the following python function?

def func(n):
if n == 1:
return 1
return func(n-1) + n*(n-1)
print func(5)
Getting confused. Not sure what exactly it is. Is it O(n)?
Calculating the n*(n-1) is a fixed time operation. The interesting part of the function is calling func(n-1) until n is 1. The function will make n such calls, so it's complexity is O(n).
If we assume that arithmetic operations are constant time operations (and they really are when numbers are relatively small) then time complexity is O(n):
T(n) = T(n-1) + C = T(n-2) + C + C = ... = n * C = O(n)
But the multiplication complexity in practice depends on the underlying type (and we are talking about Python where the type depends on the value). It depends on the N as N approaches infinity. Thus, strictly speaking, the complexity is equal to:
T(n) = O(n * multComplexity(n))
And this multComplexity(n) depends on a specific algorithm that is used for multiplication of huge numbers.
As described in other answers, the answer is close to O(n) for practical purposes. For a more precise analysis, if you don't want to make the approximation that multiplication is constant-time:
Calculating n*(n-1) takes O(log n * log n) (or O(log n)^1.58, depending on the algorithm Python uses, which depends on the size of the integer). See here - note that we need to take the log because the complexity is relative to the number of digits.
Adding the two terms takes O(log n), so we can ignore that.
The multiplication gets done O(n) times, so the total is O(n * log n * log n). (It might be possible to get this bound tighter, but it's certainly larger than O(n) - see the WolframAlpha plot).
In practice, the log terms won't really matter unless n gets very large.

Time Complexity - Codility - Ladder - Python

The question is available here. My Python code is
def solution(A, B):
if len(A) == 1:
return [1]
ways = [0] * (len(A) + 1)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
result = [1] * len(A)
for i in xrange(len(A)):
result[i] = ways[A[i]] & ((1<<B[i]) - 1)
return result
The detected time complexity by the system is O(L^2) and I can't see why. Thank you in advance.
First, let's show that the runtime genuinely is O(L^2). I copied a section of your code, and ran it with increasing values of L:
import time
import matplotlib.pyplot as plt
def solution(L):
if L == 0:
return
ways = [0] * (L+5)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
points = []
for L in xrange(0, 100001, 10000):
start = time.time()
solution(L)
points.append(time.time() - start)
plt.plot(points)
plt.show()
The result graph is this:
To understand why this O(L^2) when the obvious "time complexity" calculation suggests O(L), note that "time complexity" is not a well-defined concept on its own since it depends on which basic operations you're counting. Normally the basic operations are taken for granted, but in some cases you need to be more careful. Here, if you count additions as a basic operation, then the code is O(N). However, if you count bit (or byte) operations then the code is O(N^2). Here's the reason:
You're building an array of the first L Fibonacci numbers. The length (in digits) of the i'th Fibonacci number is Theta(i). So ways[i] = ways[i-1] + ways[i-2] adds two numbers with approximately i digits, which takes O(i) time if you count bit or byte operations.
This observation gives you an O(L^2) bit operation count for this loop:
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
In the case of this program, it's quite reasonable to count bit operations: your numbers are unboundedly huge as L increases and addition of huge numbers is linear in clock time rather than O(1).
You can fix the complexity of your code by computing the Fibonacci numbers mod 2^32 -- since 2^32 is a multiple of 2^B[i]. That will keep a finite bound on the numbers you're dealing with:
for i in xrange(3, len(ways)):
ways[i] = (ways[i-1] + ways[i-2]) & ((1<<32) - 1)
There are some other issues with the code, but this will fix the slowness.
I've taken the relevant parts of the function:
def solution(A, B):
for i in xrange(3, len(A) + 1): # replaced ways for clarity
# ...
for i in xrange(len(A)):
# ...
return result
Observations:
A is an iterable object (e.g. a list)
You're iterating over the elements of A in sequence
The behavior of your function depends on the number of elements in A, making it O(A)
You're iterating over A twice, meaning 2 O(A) -> O(A)
On point 4, since 2 is a constant factor, 2 O(A) is still in O(A).
I think the page is not correct in its measurement. Had the loops been nested, then it would've been O(A²), but the loops are not nested.
This short sample is O(N²):
def process_list(my_list):
for i in range(0, len(my_list)):
for j in range(0, len(my_list)):
# do something with my_list[i] and my_list[j]
I've not seen the code the page is using to 'detect' the time complexity of the code, but my guess is that the page is counting the number of loops you're using without understanding much of the actual structure of the code.
EDIT1:
Note that, based on this answer, the time complexity of the len function is actually O(1), not O(N), so the page is not incorrectly trying to count its use for the time-complexity. If it were doing that, it would've incorrectly claimed a larger order of growth because it's used 4 separate times.
EDIT2:
As #PaulHankin notes, asymptotic analysis also depends on what's considered a "basic operation". In my analysis, I've counted additions and assignments as "basic operations" by using the uniform cost method, not the logarithmic cost method, which I did not mention at first.
Most of the time simple arithmetic operations are always treated as basic operations. This is what I see most commonly being done, unless the algorithm being analysed is for a basic operation itself (e.g. time complexity of a multiplication function), which is not the case here.
The only reason why we have different results appears to be this distinction. I think we're both correct.
EDIT3:
While an algorithm in O(N) is also in O(N²), I think it's reasonable to state that the code is still in O(N) b/c, at the level of abstraction we're using, the computational steps that seem more relevant (i.e. are more influential) are in the loop as a function of the size of the input iterable A, not the number of bits being used to represent each value.
Consider the following algorithm to compute an:
def function(a, n):
r = 1
for i in range(0, n):
r *= a
return r
Under the uniform cost method, this is in O(N), because the loop is executed n times, but under logarithmic cost method, the algorithm above turns out to be in O(N²) instead due to the time complexity of the multiplication at line r *= a being in O(N), since the number of bits to represent each number is dependent on the size of the number itself.
Codility Ladder competition is best solved in here:
It is super tricky.
We first compute the Fibonacci sequence for the first L+2 numbers. The first two numbers are used only as fillers, so we have to index the sequence as A[idx]+1 instead of A[idx]-1. The second step is to replace the modulo operation by removing all but the n lowest bits

Categories

Resources