Python: Algorithm [closed]

Python: Algorithm [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Question: Given a list of unordered timestamps, find the largest span of time that overlaps
For example: [1,3],[10,15],[2,7],[11,13],[12,16],[5,8] => [1,8] and [10,16]
I was asked to solve the above question.
My initial approach was the following:
times = [[1,3],[10,15],[2,7],[11,13],[12,16],[5,8]]
import itertools
def flatten(listOfLists):
return itertools.chain.from_iterable(listOfLists)
start = [i[0] for i in times]
end = [i[1] for i in times]
times = sorted(list(flatten(times)))
# 1=s, 2=s, 3=e, 5=s, 7=e, 8=e, 10=s, 11=s, 12=s, 13=e, 15=e, 16=e
num_of_e = 0
num_of_s = 0
first_s = 0
for time in times:
if first_s == 0:
first_s = time
if time not in end:
num_of_s += 1
if time in end:
num_of_e += 1
if num_of_e == num_of_s:
num_of_e = 0
num_of_s = 0
print [first_s, time]
first_s = 0
Then, the questioner insisted that I should solve it by ordering the times first because "it's better" so I did the following
times = [[1,3],[10,15],[2,7],[11,13],[12,16],[5,8]]
def merge(a,b):
return[min(a[0],b[0]), max(a[1],b[1])]
times.sort()
# [1,3] [2,7] [5,8] [10,15] [11,13] [12,16]
cur = []
for time in times:
if not cur:
cur = time
continue
if time[0] > cur[0] and time[0] < cur[1]:
cur = merge(time,cur)
else:
print cur
cur = time
print cur
Is there such thing as a "better" approach (or maybe another approach that could be better)? I know I could time it and see which one is faster or just evaluate based on big O notation (both O(N) for the actual work part).
Just wanted to see if you guys have any opinions on this?
Which one would you prefer and why?
Or maybe other ways to do it?

Here is a suggestion for eluding the risks related to time in end time computation and specific cases issues:
times = [[1,3],[10,15],[2,7],[11,13],[12,16],[5,8]]
start = [(i[0], 0) for i in times]
end = [(i[1], 1) for i in times]
# Using 0 for start and 1 for end ensures that starts are resolved before ends
times = sorted(start + end)
span_count = 0
first_s = 0
for time, is_start in times:
if first_s == 0:
first_s = time
if is_start == 0:
span_count += 1
else:
span_count -= 1
if span_count == 0:
print [first_s, time]
first_s = 0
Also, it has an easily computable complexity of O(n) (actual work) + O(n*log(n)) (sort) = O(n*log(n))

Speed is often the most important consideration when evaluating an algorithm, but it may not be the only one. But let's look at speed first.
It this case, there are two kinds of speed to consider: asymptotic (which is what big Ω-Θ-O notation characterizes), and non-asymptotic. Even if two algorithms have the same asymptotic behavior, one may still perform considerably better than the other because of other costs in the algorithm that will be significant at smaller data sizes.
In your first algorithm you iterate through the list two times before sorting it, and then iterate through the list a third time after sorting it. In the second answer you only iterate through the list once. I would expect the second to be faster, but in Python, performance can sometimes be surprising, so it's good to measure if you need the speed.
You may also evaluate an algorithm's use of memory. Your first algorithm creates two temporary lists of start and end times, and a third temporary list holding the sorted time spans. Those could be expensive if the data set is large! The second algorithm avoids much of this, but creates a new list of length 2 each time merge is called. That could still be a significant amount of memory being allocated, and might be something to look at optimizing further. There may also be some memory use hidden behind the scenes: your use of sort, for example, may not in fact use much less memory than sorted does when you look at how it's implemented.
A final consideration when evaluating an algorithm is your audience. If you are in an interview, for example, speed and memory may not be as critical for your first attempt at implementing an algorithm as clarity and style.

Related

what approach is best to decrease the time complexity of this problem

I want to preface this thread by stating I am still learning the basics of data structures and algorithms I'm not looking for the correct code for this problem but rather what the correct approach is. So that I can learn what situations call for which data structure. That being said I am now going to try and correctly explain this code.
The code below is a solution I had written for a medium-level leetcode problem. Please see the link to read the problem
Correct me if I am wrong, currently the time complexity of this algorithn is O(n)
class Solution:
def canCompleteCircuit(self, gas: List[int], cost: List[int]):
startingStation = 0
didCircuit = -1
tank = 0
i = 0
while i <= len(gas):
if startingStation == len(gas):
return -1
if startingStation == i:
didCircuit += 1
if didCircuit == 1:
return startingStation
tank += gas[i] - cost[i]
if tank >= 0:
i += 1
if i == len(gas):
i = 0
if tank < 0:
didCircuit = -1
startingStation += 1
i = startingStation
tank = 0
The code works fine but the time complexity is too slow to iterate through each test case. What I am asking is if this algorithm is O(n) what approach could I have used to make the runtime complexity of this algorithm O(log(n)) or just faster?
side question - I know having a lot of if statements is bad and ugly code but if all of the iterations are O(1) does the amount of if statements have any impact on the performance of this function if scaled to a high iteration count?

"Correct me if I am wrong, currently the time complexity of this algorithn is O(n)"
This algorithm is O(n^2) rather than O(n). In the best case, it will return an answer in only "n" iterations of the while loop, but in the situation where there is no answer, it needs to run the loop (n*(n+1))/2 times.
O() notation tells us to ignore practical values of n and remove terms that become insignificant as n grows very large. So we ignore the +n and the /2 in the iterations, with the most significant component being the n^2.
So it is an O(n^2) algorithm.
"if all of the iterations are O(1) does the amount of if statements have any impact on the performance of this function if scaled to a high iteration count"
No, the O() of the algorithm is not impacted by the number of logic statements, but beware of hidden loops and expensive operations. For example, a logic statement of if x in list can be O(n) on the number of items in the list without data-specific optimizations, so if you have an O(n) loop around it (for the same list) you could have an O(n^2) algorithm. None of your logic statements have this issue, you can ignore them for O() purposes.
Assignments can be treated the same.
"What I am asking is if this algorithm is O(n) what approach could I have used to make the runtime complexity of this algorithm O(log(n)) or just faster?"
Since the algorithm is not O(n), better to ask how you might get there. You can get there by finding a way to not have to loop over the arrays more than once.
You ask about data structures, but you talk about time complexity.
The best algorithm in this case is O(n) in time, and O(1) in additional space. It requires you to store one integer in addition to the two arrays. You can even implement it with three integers of storage if you keep reading the gas and cost values from streams of data.
"I'm not looking for the correct code for this problem but rather what the correct approach is"
They've given you a gift with the statement that any success solution is unique. From this we know that the amount of gas available is no more than the sum of all costs plus the smallest difference between a station's cost and gas. If it were otherwise, then there would two points in the loop where you could start.
That means that as soon as we find an i where the sum of the gas available at stations 0 to i exceeds the cost of travel from 0 to i we have found the unique starting position. If we get to the end of the line and have not found this, we know it is impossible to do so for any starting position.

What is the difference in time complexity between these two blocks of code (if any) and why?

Trying to solidify my knowledge about Time Complexity. I think I know the answer to this, but would like to hear some good explanations.
main = []
while len(main) < 5:
sub = []
while len(sub) < 5:
sub.append(random.randint(1,10))
main.append(sub)
VS
main = []
sub = []
while len(main) < 5:
sub.append(random.randint(1,10))
if len(sub) == 5:
main.append(list(sub))
sub = []

There's no difference, since the time complexity is constant in both cases - you perform a constant amount of operations both times.

The time complexity in both are O(1) - constant time because they both perform a constant number of operations as #Yakov Dan already stated.
This is because time complexity is usually expressed as a function of a variable number(say n) and tends to show how changing the value of n will change the time the algorithm will take.
Now, assuming you had n instead of 5, then you would have O(n^2) for both cases. It may be tricky for the second case since a basic way of checking the polynomial complexity is to count the number of nested loops and you can be lead to conclude that the second version is O(n) since it has a single loop.
However, carefully looking at it will show you that the loop runs n(5 in this case) times for sub for each value appended to main, so it is essentially the same.
This of course assumes that the in-built list.append is atomic or runs in a constant time.

O(N) Time complexity for simple Python function

I just took a Codility demo test. The question and my answer can be seen here, but I'll paste my answer here as well. My response:
def solution(A):
# write your code in Python 2.7
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 # increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
return retresult
My question is around time complexity, which I hope to better understand by your response. The question asks for expected worst-case time complexity is O(N).
Does my function have O(N) time complexity? Does the fact that I sort the array increase the complexity, and if so how?
Codility reports (for my answer)
Detected time complexity:
O(N) or O(N * log(N))
So, what is the complexity for my function? And if it is O(N*log(N)), what can I do to decrease the complexity to O(N) as the problem states?
Thanks very much!
p.s. my background reading on time complexity comes from this great post.
EDIT
Following the reply below, and the answers described here for this problem, I would like to expand on this with my take on the solutions:
basicSolution has an expensive time complexity and so is not the right answer for this Codility test:
def basicSolution(A):
# 0(N*log(N) time complexity
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 #increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
else:
continue; # negative numbers and 0 don't need any work
return retresult
hashSolution is my take on what is described in the above article, in the "use hashing" paragraph. As I am new to Python, please let me know if you have any improvements to this code (it does work though against my test cases), and what time complexity this has?
def hashSolution(A):
# 0(N) time complexity, I think? but requires 0(N) extra space (requirement states to use 0(N) space
table = {}
for i in A:
if i > 0:
table[i] = True # collision/duplicate will just overwrite
for i in range(1,100000+1): # the problem says that the array has a maximum of 100,000 integers
if not(table.get(i)): return i
return 1 # default
Finally, the actual 0(N) solution (O(n) time and O(1) extra space solution) I am having trouble understanding. I understand that negative/0 values are pushed at the back of the array, and then we have an array of just positive values. But I do not understand the findMissingPositive function - could anyone please describe this with Python code/comments? With an example perhaps? I've been trying to work through it in Python and just cannot figure it out :(

It does not, because you sort A.
The Python list.sort() function uses Timsort (named after Tim Peters), and has a worst-case time complexity of O(NlogN).
Rather than sort your input, you'll have to iterate over it and determine if any integers are missing by some other means. I'd use a set of a range() object:
def solution(A):
expected = set(range(1, len(A) + 1))
for i in A:
expected.discard(i)
if not expected:
# all consecutive digits for len(A) were present, so next is missing
return len(A) + 1
return min(expected)
This is O(N); we create a set of len(A) (O(N) time), then we loop over A, removing elements from expected (again O(N) time, removing elements from a set is O(1)), then test for expected being empty (O(1) time), and finally get the smallest element in expected (at most O(N) time).
So we make at most 3 O(N) time steps in the above function, making it a O(N) solution.
This also fits the storage requirement; all use is a set of size N. Sets have a small overhead, but always smaller than N.
The hash solution you found is based on the same principle, except that it uses a dictionary instead of a set. Note that the dictionary values are never actually used, they are either set to True or absent. I'd rewrite that as:
def hashSolution(A):
seen = {i for i in A if i > 0}
if not seen:
# there were no positive values, so 1 is the first missing.
return 1
for i in range(1, 10**5 + 1):
if i not in seen:
return i
# we can never get here because the inputs are limited to integers up to
# 10k. So either `seen` has a limited number of positive values below
# 10.000 or none at all.
The above avoids looping all the way to 10.000 if there were no positive integers in A.
The difference between mine and theirs is that mine starts with the set of expected numbers, while they start with the set of positive values from A, inverting the storage and test.

Counting number of steps of codes (time complexity) [duplicate]

This question already has answers here:
How can I find the time complexity of an algorithm?
(10 answers)
Closed 5 years ago.
I need help in counting the number of steps regarding the time complexity of code fragments.
total = 0
i = 0
while i<3:
j=0
while j<3:
total = total + 1
j = j+1
i = i+1
return total
I have the solution stating: 2+3*(2+3*3+2)+2 = 43
the first two lines from the top where total = 0 and i = 0, yes i know that each of them is 1 time step each therefore adding up gives me 2. for the while statement, I'm not sure how its obtained but since i<3, its 3 time step? and then j = 0 is 1 time step.
Now here's where i don't quite get it. if there is a nested i and j loop, how do i determine the time complexity? in the solution, i notice there is *(multiple) and I will appreciate if anyone could break it down in simpler terms for me.

Time complexity takes an argument. For example, O(n^2).
As it's written, I don't know what part of your function would change, so it's just constant, O(1).
Let's say the thing that i is compared to, 3 in this case, is what can change. Like your function is "do a j-thing three times for each i." In that case, you'll see that if you increase that variable, you'll add three more steps to the loop. That means the complexity would look like O(3n). Since we can remove constant multiples, it's just O(n).
What I just wrote is hypothetical, though. It depends on what varies in your function.

Why does backward recursion execute faster than forward recursion in python

I made an algorithm in Python for counting the number of ways of getting an amount of money with different coin denominations:
#measure
def countChange(n, coin_list):
maxIndex = len(coin_list)
def count(n, current_index):
if n>0 and maxIndex>current_index:
c = 0
current = coin_list[current_index]
max_coeff = int(n/current)
for coeff in range(max_coeff+1):
c+=count(n-coeff*current, current_index+1)
elif n==0: return 1
else: return 0
return c
return count(n, 0)
My algorithm uses an index to get a coin denomination and, as you can see, my index is increasing in each stack frame I get in. I realized that the algorithm could be written in this way also:
#measure
def countChange2(n, coin_list):
maxIndex = len(coin_list)
def count(n, current_index):
if n>0 and 0<=current_index:
c = 0
current = coin_list[current_index]
max_coeff = int(n/current)
for coeff in range(max_coeff+1):
c+=count(n-coeff*current, current_index-1)
elif n==0: return 1
else: return 0
return c
return count(n, maxIndex-1)
This time, the index is decreasing each stack frame I get in. I compared the execution time of the functions and I got a very noteworthy difference:
print(countChange(30, range(1, 31)))
print(countChange2(30, range(1, 31)))
>> Call to countChange took 0.9956174254208345 secods.
>> Call to countChange2 took 0.037631815734429974 secods.
Why is there a great difference in the execution times of the algorithms if I'm not even caching the results? Why does the increasing order of the index affect this execution time?

This doesn't really have anything to do with dynamic programming, as I understand it. Just reversing the indices shouldn't make something "dynamic".
What's happening is that the algorithm is input sensitive. Try feeding the input in reversed order. For example,
print(countChange(30, list(reversed(range(1, 31)))))
print(countChange2(30, list(reversed(range(1, 31)))))
Just as some sorting algorithms are extremely fast with already sorted data and very slow with reversed data, you've got that kind of algorithm here.
In the case where the input is increasing, countChange needs a lot more iterations to arrive at its final answer, and thus seems a lot slower. However, when the input is decreasing, the performance characteristics are reversed.

thre number combinations are not huge
the reason is that going forward you have to explore every possibility, however when you go backwards you can eliminate large chunks of invalid solutions without having to actually calculate them
going forward you call count 500k times
going backwards your code only makes 30k calls to count ...
you can make both of these faster by memoizing the calls , (or changing your algorithm to not make duplicate calls)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.