What is the space complexity of the following algorithm? - python

The following algorithm finds the largest element of a list using recursion.
def largest(s):
if len(s) == 0:
return 'List can\'t be empty'
elif len(s) == 1:
return s[0]
elif s[0] <= s[1]:
return largest(s[1:])
else:
s.remove(s[1])
return largest(s)
The time complexity is O(n) because we are making total of n calls to the function largest and each call does O(1) operations.
I am having trouble figuring out the space complexity. I think it's O(n) but I am not sure.

First of all, the time complexity is not O(n) because the list.remove operation is not O(1), but O(n).
So your time complexity would be O(n^2) - Imagine applying largest over this array [5 4 3 2 1]
You can see here a list of python operation complexity.
The space complexity is O(n^2) because when you are doing return largest(s[1:]) you are copying the list, not getting a reference, so you're keeping all the intermediate cuts of the list. Doing s.remove(s[0]) and then return largest(s) will give you O(n) space complexity because you're working with references.

Slicing a standard list does create a (shallow!) copy of the slice. You're correct about this making it O(n). (In additional memory allocated; not counting the list itself, which is of course already in memory.)
As Reut points out in the comments, this is an implementation detail of the Python interpreter, but I couldn't say for sure whether any interpreters handle slices differently. Any implementation that does create a slice without copying would have to use copy-on-write instead.

Related

What is the time complexity of the *in* operation on arrays in python

This code returns the first two numbers in the array that sum up to the targetSum. So for example
print(twoNumberSum([3, 5, -4, 8, 11, 1, -1, 6],10)) should return [11,-1]
def twoNumberSum(array,targetSum):
for i in array:
remainder = targetSum - i
if (remainder != i) and (remainder in array):
return [i,remainder]
return []
The code works but it is said to execute in O(n) time. My intuition is this - we first loop through the array and choose a number. For each number, we find the remainder. Using each remainder, we again loop through the entire array. Shouldn't this be an O(n^2) operation? Is the in operation in python not an O(n) operation?
The in operation will have different complexities based on the type of container it is referred to. Here i in array will become array.__contains__(i) and it is referred to a list type container.
(list, tuple) as you guessed are O(n).
Trees would be average O(log n).
set/dict - Average: O(1), Worst: O(n).
See this Document if you have any further queries.
Take a look at this. For the case of an list, it makes no sense that this will take less than O(n^2) time. The outer loop takes O(n) time, and for each iteration O(n) time to check if the element is present or not.
If instead of a list, you use a dict, then the in operation is O(1). Then I could say that the whole of this code takes linear time.

Does one for loop mean a time complexity of n in this case?

So, I've run into this problem in the daily coding problem challenge, and I've devised two solutions. However, I am unsure if one is better than the other in terms of time complexity (Big O).
# Given a list of numbers and a number k,
# return whether any two numbers from the list add up to k.
#
# For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
#
# Bonus: Can you do this in one pass?
# The above part seemed to denote this can be done in O(n).
def can_get_value(lst=[11, 15, 3, 7], k=17):
for x in lst:
for y in lst:
if x+y == k:
return True
return False
def optimized_can_get_value(lst=[10, 15, 3, 7], k=17):
temp = lst
for x in lst:
if k-x in temp:
return True
else:
return False
def main():
print(can_get_value())
print(optimized_can_get_value())
if __name__ == "__main__":
main()
I think the second is better than the first since it has one for loop, but I'm not sure if it is O(n), since I'm still running through two lists. Another solution I had in mind that was apparently a O(n) solution was using the python equivalent of "Java HashSets". Would appreciate confirmation, and explanation of why/why not it is O(n).
The first solution can_get_value() is textbook O(n^2). You know this.
The second solution is as well. This is because elm in list has O(n) complexity, and you're executing it n times. O(n) * O(n) = O(n^2).
The O(n) solution here is to convert from a list into a set (or, well, any type of hash table - dict would work too). The following code runs through the list exactly twice, which is O(n):
def can_get_value(lst, k):
st = set(lst) # make a hashtable (set) where each key is the same as its value
for x in st: # this executes n times --> O(n)
if k-x in st: # unlike for lists, `in` is O(1) for hashtables
return True
return False
This is thus O(n) * O(1) = O(n) in most cases.
In order to analyze the asymptotic runtime of your code, you need to know the runtime of each of the functions which you call as well. We generally think of arithmetic expressions like addition as being constant time (O(1)), so your first function has two for loops over n elements and the loop body only takes constant time, coming out to O(n * n * 1) = O(n^2).
The second function has only one for loop, but checking membership for a list is an O(n) function in the length of the list, so you still have O(n * n) = O(n^2). The latter option may still be faster (Python probably has optimized code for checking list membership), but it won't be asymptotically faster (the runtime still increases quadratically in n).
EDIT - as #Mark_Meyer pointed out, your second function is actually O(1) because there's a bug in it; sorry, I skimmed it and didn't notice. This answer assumes a corrected version of the second function like
def optimized_can_get_value(lst, k=17):
for x in lst:
if k - x in lst:
return True
return False
(Note - don't have a default value for you function which is mutable. See this SO question for the troubles that can bring. I also removed the temporary list because there's no need for that; it was just pointing to the same list object anyway.)
EDIT 2: for fun, here are a couple of O(n) solutions to this (both use that checking containment for a set is O(1)).
A one-liner which still stops as soon as a solution is found:
def get_value_one_liner(lst, k):
return any(k - x in set(lst) for x in lst)
EDIT 3: I think this is actually O(n^2) because we call set(lst) for each x. Using Python 3.8's assignment expressions could, I think, give us a one-liner that is still efficient. Does anybody have a good Python <3.8 one-liner?
And a version which tries not to do extra work by building up a set as it goes (not sure if this is actually faster in practice than creating the whole set at the start; it probably depends on the actual input data):
def get_value_early_stop(lst, k):
values = set()
for x in lst:
if x in values:
return True
values.add(k - x)
return False

Why this answer passes Leetcode Q41

Question here: https://leetcode.com/problems/first-missing-positive/description/
Your algorithm should run in O(n) time and uses constant extra space.
I have a very naive solution that passes, since the question is marked as hard and most people's solution in discussion is much more complicated.
def firstMissingPositive(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
if nums == []:
return 1
for i in range(1, max(nums)+2):
if i not in nums:
return i
find max uses O(n), since loop would stop once found the missing positive it would be O(n). range in py3 returns an iterable, every loop of the for statement produces the next number on the fly. So the time complexity should be O(n)
space complexity is O(1) since only i is created
I suppose the OJ only checks the correctness but not the space/ time complexity. However I cant see how this solution is wrong. Could anyone point it out?
Explicit loop for i in range(1, max(nums)+2): with nested implicit loop if i not in nums: is not O(n) ;)
You have two loops inside each other. You have a iterating from 1 to max(nums)+2 and inside that if i not in nums: which iterates over nums. so your complexity will be something like O(n^2).

What is the big-Oh runtime of two recursive O(logn) calls?

def f(L):
if len(L) < 1 billion:
return L
else:
return f(L[:len(L) // 2]) + f(L[len(L) // 2:])
L is a list of size n
I know that if it was a single recursive call, then it would be O(logn), but there are two recursive calls here.
But it started to exhibit more of a O(n) runtime as I began to run it on a visualizer.
In my opinion it should be O(logn+logn) = O(2logn) = O(logn). Am I correct?
Consider how many calls you're doing. At the first level of the recursion you'll do 2 calls. For each of those you'll do two more calls. Etc ... This means that at level i of the recursion you'll have made a total of O(2^i) function calls.
How many levels of the recursion are there? This is just the height of a binary tree with n elements, which is O(log_2 n).
So by the time you reach all the leaves of the recursion you will have done O(2^(log_2 n)) = O(n) function calls.
--
Another way of looking at it is that you eventually have to piece back together the entire list, so how could you possibly do that in less than O(n) time?
Your algorithm as it stands is going to be O(n) if len(L) is at least 1 billion because you will break the list into two, and then add the two halves back together. Both slicing and adding are O(n) operations.
If you want to test the runtime of the two recursive calls,
1. Pass in a start and end index, and call
f(L, start, start+(end-start)//2) + f(L, start+(end-start)//2, end)
2. Return end-start or some other O(1) value when end-start is less than 1 billion

What is a typical way to add a reverse feature to an insertion sort?

I wrote the following insertion sort algorithm
def insertionSort(L, reverse=False):
for j in xrange(1,len(L)):
valToInsert = L[j]
i=j-1
while i>=0 and L[i] > valToInsert:
L[i+1] = L[i]
i-=1
L[i+1] = valToInsert
return L
Edit: All you need to do is change the final > to < to get it to work in reverse.
However, what do most people do in these situations? Write the algorithm twice in two if-statements, one where it's > and the other where it's < instead? What is the "correct" way to typically handle these kinds of scenarios where the change is minor but it simply changes the nature of the loop/code entirely?
I know this question is a little subjective.
You could use a variable for the less-than operator:
import operator
def insertionSort(L, reverse=False):
lt = operator.gt if reverse else operator.lt
for j in xrange(1,len(L)):
valToInsert = L[j]
i = j-1
while 0 <= i and lt(valToInsert, L[i]):
L[i+1] = L[i]
i -= 1
L[i+1] = valToInsert
return L
Option 1:
def insertionSort(L, reverse=False):
# loop is the same...
if reverse:
L.reverse()
return L
Option 2:
def insertionSort(L, reverse=False):
if reverse:
cmpfunc = lambda a, b: cmp(b, a)
else:
cmpfunc = cmp
for j in xrange(1,len(L)):
valToInsert = L[j]
i=j-1
while i>=0 and cmpfunc(L[i], valToInsert) > 0:
L[i+1] = L[i]
i-=1
L[i+1] = valToInsert
return L
You'll probably notice that sorted and list.sort and all other functions that do any kind of potentially-decorated processing have a key parameter, and those that specifically do ordering also have a reverse parameter. (The Sorting Mini-HOWTO covers this.)
So, you can look at how they're implemented. Unfortunately, in CPython, all of this stuff is implemented in C. Plus, it uses a custom algorithm called "timsort" (described in listsort.txt). But I think can explain the key parts here, since it's blindingly simple. The list.sort code is separate from the sorted code, and they're both spread out over a slew of functions. But if you just look at the top-level function listsort, you can see how it handles the reverse flag:
1982 /* Reverse sort stability achieved by initially reversing the list,
1983 applying a stable forward sort, then reversing the final result. */
1984 if (reverse) {
1985 if (keys != NULL)
1986 reverse_slice(&keys[0], &keys[saved_ob_size]);
1987 reverse_slice(&saved_ob_item[0], &saved_ob_item[saved_ob_size]);
1988 }
Why reverse the list at the start as well as the end? Well, in the case where the list is nearly-sorted in the first place, many sort algorithms—including both timsort and your insertion sort—will do a lot better starting in the right order than in backward order. Yes, it wastes an O(N) reverse call, but you're already doing one of those—and, since any sort algorithm is at least O(N log N), and yours is specifically O(N^2), this doesn't make it algorithmically worse. Of course for smallish N, and a better sort, and a list in random order, this wasted 2N is pretty close to N log N, so it can make a difference in practice. It'll be a difference that vanishes as N gets huge, but if you're sorting millions of smallish lists, rather than a few huge ones, it might be worth worrying about.
Second, notice that it does the reversing by creating a reverse slice. This, at least potentially, could be optimized by referencing the original list object with __getitem__ in reverse order, meaning the two reversals are actually O(1). The simplest way to do this is to literally create a reverse slice: lst[::-1]. Unfortunately, this actually creates a new reversed list, so timsort includes its own custom reverse-slice object. But you can do the same thing in Python by creating a ReversedList class.
This probably won't actually be faster in CPython, because the cost of the extra function calls is probably high enough to swamp the differences. But you're complaining about the algorithmic cost of the two reverse calls, and this solves the problem, in effectively the same way that the built-in sort functions do.
You can also look at how PyPy does it. Its list is implemented in listobject.py. It delegates to one of a few different Strategy classes depending on what the list contains, but if you look over all of the strategies (except the ones that have nothing to do), they basically do the same thing: sort the list, then reverse it.
So, it's good enough for CPython, and for PyPy… it's probably good enough for you.

Categories

Resources