What's The Big(O) Complexity of this loop? - python

def cups (a,b):
i=0
j=0
done = False
while not done:
if a[i]==b[j] :
print("A[" + str(i) + "] with B[" + str(j) + "]")
i += 1
j = 0
if i == len(a):
i=0
done = True
if a[i] != b[j] :
j += 1
I'm trying to compare two lists and print the indices of two values that are the same in the two lists
I'm curious whether the complexity is O(1) or O(n)?

I suspect that your code might not be working the way you want it to, but this question seems to be more about algorithmic complexity and less about your particular implementation, so I'll focus on that.
In general, two sorted lists can be compared in the way you describe in linear time by advancing pointers. We can use two streets of numbered houses as a concrete example: if you and I want to find out what house numbers are duplicated on Main Street and Elm street, we can do the following:
you start at the bottom of Main Street, I start at the bottom of Elm Street.
we each report the number that we see
if the numbers match, we record that number
if not, one of us is seeing a number that is lower than the other. That one walks up the street until they find a number which is equal to or greater than the last one reported by the other and repeat from step 3
In this case, neither of us ever goes back to the bottom of the street.
However, if the lists are not sorted, then we have to use a different approach. Assuming that Main Street and Elm street are numbered in random order, we have to do the following:
I start at the bottom of Elm Street.
I report the number that I see.
you start at the bottom of Main street and walk up until you find a house that has that number, or until you reach the end of Main. If you find a match, we record it.
I advance to the next house and we repeat from step 3.
This is an O(n**2) algorithm, as you can see (you walk up Main St. once for each house on Elm, if there are n houses on each street then we're looking at n * n operations)
This is the state you're in - the problem you're stating cannot be solved in O(n)
However, I will point out that sorting a list is an O(n log n) operation, which would allow you to reduce the problem to the linear case, for a final complexity of O(n log n)

Putting aside any bugs you have in the code - and concentrating on the runtime complexity question:
O(1) means the code is running at a constant time regardless of the size of the input, and clearly here it is not the case since different sizes of a and b will result in different numbers of loop iterations.
In your case your while loop can iterate at most len(a) * len(b) times (roughly)
So you can say that the code has a run time complexity of the order of O(n), where n=len(a)*len(b)

You're scanning all elements of at least one list, so it cannot be constant
The best case, since you're setting j=0, is when all elements of a equal the first element of b, therefore making all elements of a the same, and this is linear time, but big-O does not measure best case
In the worst case, you're scanning all elements of both lists. For each non-equal element, you're scanning all of b until matched; and when you find a match, you're resetting j back to the start of b, so really it would be O(N*M), and for equal length lists, this is O(N^2)
Note: the more generic algorithm for this is this
for a_elem, i in enumerate(a):
for b_elem, j in enumerate(b, start=i)
if a_elem == b_elem:
print(f"a[{i}] with b[{j}]")

Related

Find runtime (number of operations of function) and calculate Big O

For the python function given below, I have to find the number of Operations and Big O.
def no_odd_number(list_nums):
i = 0
while i < len(list_nums):
num = list_nums[i]
if num % 2 != 0:
return False
i += 1
return True
From my calculation, the number of operations is 4 + 3n but I'm not sure as I don't know how to deal with if...else statements.
I am also given options to choose the correct Big O from, from my calculation, I think it should be d. O(n) but I'm not sure. Help please!
a. O(n^2)
b. O(1)
c. O(log n)
d. O(n)
e. None of these
Big O notation typically considers the worst case scenario. The function you have is pretty simple, but the early return seems to complicate things. However, since we care about the worst case you can ignore the if block. The worst case will be one where you don't return early. It would be a list like [2,4,6,8], which would run the loop four times.
Now, look at the things inside the while loop, with the above in mind. It doesn't matter how big list_nums is: inside the loop you just increment i and lookup something in a list. Both of those are constant time operations that are the same regardless of how large list_nums is.
The number of times you do this loop is the length of list_nums. This means as list_nums grows, the number of operations grows at the same rate. That makes this O(n) as you suspect.

Big-O Notation for iteration over steps in list -Python

I'm looking to iterate over every third element in my list. But in thinking about Big-O notation, would the Big-O complexity be O(n) where n is the number of elements in the list, or O(n/3) for every third element?
In other words, even if I specify that the list should only be iterated over every third element, is Python still looping through the entire list?
Example code:
def function(lst):
#iterating over every third list
for i in lst[2::3]:
pass
When using Big-O notation we ignore any scalar multiples out the front of the functions. This is because the algorithm still takes "linear time". We do this because Big-O notation considers the behaviour of a algorithm as it scales to large inputs.
Meaning it doesn't matter if the algorithm is considering every element of the list or every third element the time complexity still scales linearly to the input size. For example if the input size is doubled, it would take twice as long to execute, no matter if you are looking at every element or every third element.
Mathematically we can say this because of the M term in the definition (https://en.wikipedia.org/wiki/Big_O_notation):
abs(f(x)) <= M * O(f(x))
Big O notation would remain O(n) here.
Consider the following:
n = some big number
for i in range(n):
print(i)
print(i)
print(i)
Does doing 3 actions count as O(3n) or O(n)? O(n). Does the real world performance slow down by doing three actions instead of one? Absolutely!
Big O notation is about looking at the growth rate of the function, not about the physical runtime.
Consider the following from the pandas library:
# simple iteration O(n)
df = DataFrame([{a:4},{a:3},{a:2},{a:1}])
for row in df:
print(row["a"])
# iterrows iteration O(n)
for idx, row in df.iterrows():
print(row["a"])
# apply/lambda iteration O(n)
df.apply(lambda x: print(x["row"])
All of these implementations can be considered O(n) (constant is dropped), however that doesn't necessarily mean that the runtime will be the same. In fact, method 3 should be about 800 times faster than method 1 (https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06)!
Another answer that may help you: Why is the constant always dropped from big O analysis?

Coding exercise for practicing adjacency list and BFS

I have a coding exercise with a row of trampolines, each with a minimum and maximum "bounciness". I have the index of a starting trampoline and an end trampoline, and with this, I need to find the minimum amount of jumps required to reach the end trampoline from the start trampoline.
I have tried creating an adjacency-list, in which I list all possible jumps from a trampoline. This is fine until I reach a large number of trampolines. The Problem is it takes O(n^2) time.
This is how I create the Adjacency List:
def createAL (a, b, l):
al = [list() for _ in range(l)]
for i in range(l):
for j in range(a[i], b[i]+1):
if (i+j) <= l-1:
al[i].append(i+j)
if (i-j) >= 0:
al[i].append(i-j)
for i in range(len(al)):
al[i] = list(set(al[i]))
return al
"a" is the min. bounciness, "b" the max bounciness and "l" is the length of the two lists.
As you can see, the problem is I have 2 nested loops. Does anyone have an idea for a more efficient way of doing this? (preferably wo/ the loops)
Assuming "bounciness" is strictly positive, you can omit this part:
for i in range(len(al)):
al[i] = list(set(al[i]))
...as there is no way you could have duplicates in those lists.
(If however bounciness could be 0 or negative, then first replace any values below 1 by 1 in a)
The building of a can be made a bit faster by:
making the ranges on the actual target indexes (so you don't need i+j in every loop),
cliping those ranges using min() and max(), avoiding if statements in the loop
avoiding individual append calls, using list comprehension
Result:
al = [
[*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))]
for i in range(l)
]
Finally, as this adjacency list presumably serves a BFS algorithm, you could also consider that building the adjacency list may not be necessary, as finding the adjacent nodes during BFS is a piece of cake using a and b on-the-spot. I wonder if you really gain time by creating the adjacency list.
In your BFS code, you probably have something like this (where i is the "current" node):
for neighbor in al[i]:
This could be replaced with:
for neighbor in (*range(max(0, i-b[i]), i-a[i]+1), *range(i+a[i], min(l, i+b[i]+1))):
We should also realise that if the target trampoline is found in a number of steps that is much smaller than the number of trampolines, then there is a probability that not all trampolines are visited during the BFS search. And in that case it would have been a waste of time to have created the complete adjacency list...

how to calculate the minimum unfairness sum of a list

I have tried to summarize the problem statement something like this::
Given n, k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive.
For an array (or list) myList, The Unfairness Sum is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList.
To explain: if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum or MUS. Please note that elements are considered unique by their index in list not their values
MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|
If you actually need to look at the problem statement, It's HERE
My Objective
given n, k, arr(as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]
what I have tried
well, there is a lot to be added in here, so I'll try to be as short as I can.
My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) i.e. the sub array with minimum variance has to be the sub array with MUS??
You only have to check the LetMeDoIt(n, k, arr) function. If you need MCVE, check the second code snippet below.
from itertools import combinations as cmb
from statistics import variance as varn
def LetMeDoIt(n, k, arr):
v = []
s = []
subs = [list(x) for x in list(cmb(arr, k))] # getting all sub arrays from arr in a list
i = 0
for sub in subs:
if i != 0:
var = varn(sub) # the variance thingy
if float(var) < float(min(v)):
v.remove(v[0])
v.append(var)
s.remove(s[0])
s.append(sub)
else:
pass
elif i == 0:
var = varn(sub)
v.append(var)
s.append(sub)
i = 1
final = []
f = list(cmb(s[0], 2)) # getting list of all pairs (after determining sub array with least MUS)
for r in f:
final.append(abs(r[0]-r[1])) # calculating the MUS in my messy way
return sum(final)
The above code works fine for n<30 but raised a MemoryError beyond that.
In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.
I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).
Second Approach
from itertools import combinations as cmb
def myvar(arr): # a function to calculate variance
l = len(arr)
m = sum(arr)/l
return sum((i-m)**2 for i in arr)/l
def LetMeDoIt(n, k, arr):
sorted_list = sorted(arr) # i think sorting the array makes it easy to get the sub array with MUS quickly
variance = None
min_variance_sub = None
for i in range(n - k + 1):
sub = sorted_list[i:i+k]
var = myvar(sub)
if variance is None or var<variance:
variance = var
min_variance_sub=sub
final = []
f = list(cmb(min_variance_sub, 2)) # again getting all possible pairs in my messy way
for r in f:
final.append(abs(r[0] - r[1]))
return sum(final)
def MainApp():
n = int(input())
k = int(input())
arr = list(int(input()) for _ in range(n))
result = LetMeDoIt(n, k, arr)
print(result)
if __name__ == '__main__':
MainApp()
This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000).
=====
How would you approach this problem to take care of all the test cases in given time limits (5 sec) ? (problem was listed under algorithm & dynamic programming)
(for your references you can have a look on
successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can
explain that approach for me and future visitors)
an editorial by the problem setter explaining how to approach the question
a solution code by problem setter himself (py2, C++).
Input data (test cases) and expected output
Edit1 ::
For future visitors of this question, the conclusions I have till now are,
that variance and unfairness sum are not perfectly related (they are strongly related) which implies that among a lots of lists of integers, a list with minimum variance doesn't always have to be the list with minimum unfairness sum. If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)
As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)
Thank you for any help or suggestions :)
You must work on your list SORTED and check only sublists with consecutive elements. This is because BY DEFAULT, any sublist that includes at least one element that is not consecutive, will have higher unfairness sum.
For example if the list is
[1,3,7,10,20,35,100,250,2000,5000] and you want to check for sublists with length 3, then solution must be one of [1,3,7] [3,7,10] [7,10,20] etc
Any other sublist eg [1,3,10] will have higher unfairness sum because 10>7 therefore all its differences with rest of elements will be larger than 7
The same for [1,7,10] (non consecutive on the left side) as 1<3
Given that, you only have to check for consecutive sublists of length k which reduces the execution time significantly
Regarding coding, something like this should work:
def myvar(array):
return sum([abs(i[0]-i[1]) for i in itertools.combinations(array,2)])
def minsum(n, k, arr):
res=1000000000000000000000 #alternatively make it equal with first subarray
for i in range(n-k):
res=min(res, myvar(l[i:i+k]))
return res
I see this question still has no complete answer. I will write a track of a correct algorithm which will pass the judge. I will not write the code in order to respect the purpose of the Hackerrank challenge. Since we have working solutions.
The original array must be sorted. This has a complexity of O(NlogN)
At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum". This is also explained in archer's answer
The last check passage, to find the minimum "unfairness sum" can be done in O(N). You need to calculate the US for every consecutive k-long subarray. The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N). It can be done in O(1) as the editorial you posted shows, including mathematic formulae. It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).
It works but terminates due to time out for n<=10000.
(from comments on archer's question)
To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages. The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc.
If it helps, think of it like a snake. One block is removed and one is added.
There is no need to perform the whole O(k) scrolling. Just figure the maths as explained in the editorial and you will do it in O(1).
So the final complexity will asymptotically be O(NlogN) due to the first sort.

Find the element in virtually infinite list

I'm trying to solve this problem:
A list is initialized to ["Sheldon", "Leonard", "Penny", "Rajesh", "Howard"], and then undergoes a series of operations. In each operation, the first element of the list is moved to the end of the list and duplicated. For example, in the first operation, the list becomes ["Leonard", "Penny", "Rajesh", "Howard", "Sheldon", "Sheldon"] (with "Sheldon" being moved and duplicated); in the second operation, it becomes ["Penny", "Rajesh", "Howard", "Sheldon", "Sheldon", "Leonard", "Leonard"] (with "Leonard" being moved and duplicated); etc. Given a positive integer n, find the string that is moved and duplicated in the nth operation. [paraphrased from https://codeforces.com/problemset/problem/82/A]
I've written a working solution, but it's too slow when n is huge:
l = ['Sheldon','Leonard','Penny','Rajesh','Howard']
n = int(input()) # taking input from user to print the name of the person
# standing at that position
for i in range(n):
t = l.pop(0)
l.append(t)
l.append(t)
#debug
# print(l)
print(t)
How can I do this faster?
Here's a solution that runs in O(log(input/len(l))) without doing any actual computation (no list operations):
l = ['Sheldon','Leonard','Penny','Rajesh','Howard']
n = int(input()) # taking input from user to print the name of the person
# standing at that position
i = 0
while n>(len(l)*2**i):
n = n - len(l)* (2**i)
i = i + 1
index = int((n-1)/(2**i ))
print(l[index])
Explanation: every time you push back the entire list, the list length will grow by exactly len(l) x 2^i. But you have to first find out how many times this happens. This is what the while is doing (that's what n = n - len(l)* (2**i) is doing). The while stops when it realized that i times of appending the double list will happen. Finally, after you have figured i out, you have to compute the index. But in the i-th appeneded list, every element is copied 2^i times, so you have to devide the number by 2**i. One minor detail is that for the index you have to subtract by 1 because lists in Python are 0-indexed while your input is 1-indexed.
As #khelwood said, you can deduce how many times you have to double the list.
To understand this, note that if you start with a list of 5 people and do 5 steps of your iteration, you will the same order as before just with everyone twice in it.
I am not 100% sure what you mean with the nth position as it shifts all the time, but if you mean the person in front after n iterations, solve for the largest integer i that fulfills
5*2^i<n
to get the number of times your list doubled. Then just look at the remaining list (each name is mentioned i times) to get the name at position n-5*2^i.
You are not going to be able to avoid calculating the list, but maybe you can make it a bit easier:
Every cycle (When sheldon is first again) the length of the list has doubled, so it looks like this:
After 1 cycle: SSLLPPRRHH
After 2 cycles: SSSSLLLLPPPPRRRRHHHH
...
while the number of cola's they drunk is 5*((2**n)-1) where the n is the number of cycles.
So you can calculate the state of the list at the closest ended cycle.
E.g.
Cola number 50:
5*((2**3)) = 40 means that after 40 cokes sheldon is next in line.
Then you can use the algorithm described in the task and get the last one in the line.
Hope this helps.

Categories

Resources