Can anyone advise the space and time complexity of the below code?
I know that the time complexity should be O(n) because the function is called n times, and the space complexity is at least O(n) (because of stack space), but does passing a[1:] to function result in an increase in the space complexity? I think a[1:] will create a new copy of a while omitting the first element, is that right?
def sum(a):
if len(a) == 1:
return a[0]
return a[0] + sum(a[1:])
As a recursive function, if no tail-call optimizations are applied, it will certainly have a space complexity of at least O(n) in this case, considering its execution on the memory stack. But let us analyze it further:
Time complexity
We know that sum is recursive and its stop criteria is when the input array is single length. So we know that Sum will be called at least O(n) times in the worst-case scenario, considering an input array of size n. Consider the recursion for what it is, i. e., a loop.
Inside the function, however, we have a slice operation. The slice operation l[a:b] is O(b-a), so this operation will have a complexity of O(n-1) in the first run, O(n-2) in the second run, and so on. We consider that primitively, it will perform a copy of the whole array. The overall time complexity for this function should be O(n^2) because it creates a slice per item in an array of size n.
Space complexity
Now talking about space in memory.
len(a) == 1
Here we have one copy from the return value of len(a).
return a[0]
&
return a[0] + sum(a[1:])
In the both lines above we'll have another copy of a value that will be stored into the return address of the function. The slice also has a O(n) space complexity.
Seeing this, and considering no major-breaking optimizations are being applied by the compiler, such as a reduction, we say that the space complexity of this function is O(n) because it will make a constant number of copies for each input AND will perform a slice operation in a array of size n.
Since we said in the beginning that recursion is like a loop, considering no tail-call optimizations, this whole function will be performed n times in the worst-case scenario. The program will increase the function's memory stack until it reaches the stop criteria, until it can finally 'pop' return values from the stack of calls. The total space complexity is thus O(n*log n) as well (because every execution the input array is smaller).
Ps:
I also considered len(a) to have an O(1) time complexity, according to this.
The time complexity is something like theta(n^2) because each time you do a[i:] you basically copy the list from i to the end, so you have to iterate through it. As for space complexity, the app stack will have all of your lists that you will call, first a list with n elements, then n-1 and so on until 1, where you will start emptying the stack. So you will end up with a theta(n^2) complexity for that too.
Related
I'm looking to iterate over every third element in my list. But in thinking about Big-O notation, would the Big-O complexity be O(n) where n is the number of elements in the list, or O(n/3) for every third element?
In other words, even if I specify that the list should only be iterated over every third element, is Python still looping through the entire list?
Example code:
def function(lst):
#iterating over every third list
for i in lst[2::3]:
pass
When using Big-O notation we ignore any scalar multiples out the front of the functions. This is because the algorithm still takes "linear time". We do this because Big-O notation considers the behaviour of a algorithm as it scales to large inputs.
Meaning it doesn't matter if the algorithm is considering every element of the list or every third element the time complexity still scales linearly to the input size. For example if the input size is doubled, it would take twice as long to execute, no matter if you are looking at every element or every third element.
Mathematically we can say this because of the M term in the definition (https://en.wikipedia.org/wiki/Big_O_notation):
abs(f(x)) <= M * O(f(x))
Big O notation would remain O(n) here.
Consider the following:
n = some big number
for i in range(n):
print(i)
print(i)
print(i)
Does doing 3 actions count as O(3n) or O(n)? O(n). Does the real world performance slow down by doing three actions instead of one? Absolutely!
Big O notation is about looking at the growth rate of the function, not about the physical runtime.
Consider the following from the pandas library:
# simple iteration O(n)
df = DataFrame([{a:4},{a:3},{a:2},{a:1}])
for row in df:
print(row["a"])
# iterrows iteration O(n)
for idx, row in df.iterrows():
print(row["a"])
# apply/lambda iteration O(n)
df.apply(lambda x: print(x["row"])
All of these implementations can be considered O(n) (constant is dropped), however that doesn't necessarily mean that the runtime will be the same. In fact, method 3 should be about 800 times faster than method 1 (https://towardsdatascience.com/how-to-make-your-pandas-loop-71-803-times-faster-805030df4f06)!
Another answer that may help you: Why is the constant always dropped from big O analysis?
Can you please help me with how to calculate space and time complexity for below code
def isPalindrome(string):
# Write your code here.
string1=string[::-1]
if string1==string:
return True
else :
return False
The easiest way to find complexity is to go line by line and under each operation.
Let's start with the first line
string1=string[::-1]
This is a string slicing operation, which reverses the string and according to this, it takes time proportional to the number of characters which is being copied, in this case (your code) it is the whole string, hence it will be O(n)
This is just line 1. Let's move ahead
if string1==string:
here we are doing a string comparison, in the condition section of the if statement. according to this, it is again O(n) for line 2
now, the following lines are just return and else block which will be done in constant time i.e O(1)
hence for the total complexity, we just sum up all the line's complexity. i.e
O(n) + O(n) + O(1) + O(1)
you can refer to this to learn more about simplifying it.
So the final time complexity will be O(n)
This function can be broken down into complexity of its sub-processes. In calculating this time complexity, let the amount of characters in string be n (n = len(string) in Python terms). Now, let's look at the 2 sub-processes:
Traverse all characters in string in reverse order and assign to string1 (this is done by string1=string[::-1]) - O(n) linear time since there are n characters in string.
Compare if string1 == string - O(n) linear time because n characters in each string will be compared to each other, so that is n 1-to-1 comparisons.
Therefore, the total time complexity is O(n) + O(n) where n is len(string). In shorter terms, we simplify this to mean O(n) complexity.
The following two operations decides the time complexity of the above code:
With the below operation you are creating a copy of the list in the reversed manner which takes O(n) time and O(n) space:
string1 = string[::-1]
Checking the strings in the below line for equality again takes O(n) operations as you need to compare all the characters in the worst case:
if string1==string:
From the above, we can conclude the following:
Time complexity: O(n)
Space complexity: O(n)
where n represents the length of the input string.
You would also like go through this document which summarizes the time complexities for different operations in Python.
Below I have provided the function to calculate the LCF (longest common prefix). I want to know the Big O time-complexity and space complexity. Can I say it is O(n)? Or do zip() and join() affect the time-complexity? I am wondering the space complexity is O(1). Please correct me if I am wrong. The input to the function is a list containing strings e.g., ["flower","flow","flight"].
def longestCommonPrefix(self, strs):
res = []
for x in zip(*strs):
if len(set(x)) == 1:
res.append(x[0])
else:
break
return "".join(res)
Iterating to get a single tuple value from zip(*strs) takes O(len(strs)) time and space. That's just the time it takes to allocate and fill a tuple of that length.
Iterating to consume the whole iterator takes O(len(strs) * min(len(s) for s in strs)) time, but shouldn't take any additional space over a single iteration.
Your iteration code is a bit trickier, because you may stop iterating early, when you find the first place within your strings where some characters don't match. In the worst case, all the strings are identical (up to the length of the shortest one) and so you'd use the time complexity above. And in the best case there is no common prefix, so you can use the single-value iteration as your best case.
But there's no good way to describe "average case" performance because it depends a lot on the distributions of the different inputs. If your inputs were random strings, you could do some statistics and predict an average number of iterations, but if your input strings are words, or even more likely, specific words expected to have common prefixes, then it's very likely that all bets are off.
Perhaps the best way to describe that part of the function's performance is actually in terms of its own output. It takes O(len(strs) * len(self.longestCommonPrefix(strs)) time to run.
As for str.join, running "".join(res) if we know nothing about res takes O(len(res) + len("".join(res))) for both time and space. Because your code only joins individual characters, the two lengths are going to be the same, so we can say that the join in your function takes O(len(self.longestCommonPrefix(strs))) time and space.
Putting things together, we can see that the main loop takes a multiple of the time taken by the join call, so we can ignore the latter and say that the function's time complexity is just O(len(strs) * len(self.longestCommonPrefix(strs)). However, the memory usage complexities for the two parts are independent and we can't easily predict if the number of strings or the length of the output will grow faster. So we need to combine them and say that you need O(len(strs) + len(self.longestCommonPrefix(strs))) space.
Time:
Your code is O(n * m), where n is the lenght of the list and m is the lenght of the biggest string in the list.
zip() is O(1) in python 3.x. The function allocates a special iterable (called the zip object), and assigns the parameter array to an internal field. In case of zip(*x) (pointed from #juanpa.arrivillaga), it builds a tuple, so it is O(n). As a result, you will get an O(n) because you iterate over the list (tuple) plus the zip(*x) call staying at the end with O(n).
join() is O(n). Where n is the total length of the input.
set() is O(m). Where m is the total lenght of the word.
Space:
It is O(n), because in the worst scenario, res will need to append x[0] n times.
def check_set(S, k):
S2 = k - S
set_from_S2=set(S2.flatten())
for x in S:
if(x in set_from_S2):
return True
return False
I have a given integer k. I want to check if k is equal to sum of two element of array S.
S = np.array([1,2,3,4])
k = 8
It should return False in this case because there are no two elements of S having sum of 8. The above code work like 8 = 4 + 4 so it returned True
I can't find an algorithm to solve this problem with complexity of O(n).
Can someone help me?
You have to account for multiple instances of the same item, so set is not good choice here.
Instead you can exploit dictionary with value_field = number_of_keys (as variant - from collections import Counter)
A = [3,1,2,3,4]
Cntr = {}
for x in A:
if x in Cntr:
Cntr[x] += 1
else:
Cntr[x] = 1
#k = 11
k = 8
ans = False
for x in A:
if (k-x) in Cntr:
if k == 2 * x:
if Cntr[k-x] > 1:
ans = True
break
else:
ans = True
break
print(ans)
Returns True for k=5,6 (I added one more 3) and False for k=8,11
Adding onto MBo's answer.
"Optimal" can be an ambiguous term in terms of algorithmics, as there is often a compromise between how fast the algorithm runs and how memory-efficient it is. Sometimes we may also be interested in either worst-case resource consumption or in average resource consumption. We'll loop at worst-case here because it's simpler and roughly equivalent to average in our scenario.
Let's call n the length of our array, and let's consider 3 examples.
Example 1
We start with a very naive algorithm for our problem, with two nested loops that iterate over the array, and check for every two items of different indices if they sum to the target number.
Time complexity: worst-case scenario (where the answer is False or where it's True but that we find it on the last pair of items we check) has n^2 loop iterations. If you're familiar with the big-O notation, we'll say the algorithm's time complexity is O(n^2), which basically means that in terms of our input size n, the time it takes to solve the algorithm grows more or less like n^2 with multiplicative factor (well, technically the notation means "at most like n^2 with a multiplicative factor, but it's a generalized abuse of language to use it as "more or less like" instead).
Space complexity (memory consumption): we only store an array, plus a fixed set of objects whose sizes do not depend on n (everything Python needs to run, the call stack, maybe two iterators and/or some temporary variables). The part of the memory consumption that grows with n is therefore just the size of the array, which is n times the amount of memory required to store an integer in an array (let's call that sizeof(int)).
Conclusion: Time is O(n^2), Memory is n*sizeof(int) (+O(1), that is, up to an additional constant factor, which doesn't matter to us, and which we'll ignore from now on).
Example 2
Let's consider the algorithm in MBo's answer.
Time complexity: much, much better than in Example 1. We start by creating a dictionary. This is done in a loop over n. Setting keys in a dictionary is a constant-time operation in proper conditions, so that the time taken by each step of that first loop does not depend on n. Therefore, for now we've used O(n) in terms of time complexity. Now we only have one remaining loop over n. The time spent accessing elements our dictionary is independent of n, so once again, the total complexity is O(n). Combining our two loops together, since they both grow like n up to a multiplicative factor, so does their sum (up to a different multiplicative factor). Total: O(n).
Memory: Basically the same as before, plus a dictionary of n elements. For the sake of simplicity, let's consider that these elements are integers (we could have used booleans), and forget about some of the aspects of dictionaries to only count the size used to store the keys and the values. There are n integer keys and n integer values to store, which uses 2*n*sizeof(int) in terms of memory. Add to that what we had before and we have a total of 3*n*sizeof(int).
Conclusion: Time is O(n), Memory is 3*n*sizeof(int). The algorithm is considerably faster when n grows, but uses three times more memory than example 1. In some weird scenarios where almost no memory is available (embedded systems maybe), this 3*n*sizeof(int) might simply be too much, and you might not be able to use this algorithm (admittedly, it's probably never going to be a real issue).
Example 3
Can we find a trade-off between Example 1 and Example 2?
One way to do that is to replicate the same kind of nested loop structure as in Example 1, but with some pre-processing to replace the inner loop with something faster. To do that, we sort the initial array, in place. Done with well-chosen algorithms, this has a time-complexity of O(n*log(n)) and negligible memory usage.
Once we have sorted our array, we write our outer loop (which is a regular loop over the whole array), and then inside that outer loop, use dichotomy to search for the number we're missing to reach our target k. This dichotomy approach would have a memory consumption of O(log(n)), and its time complexity would be O(log(n)) as well.
Time complexity: The pre-processing sort is O(n*log(n)). Then in the main part of the algorithm, we have n calls to our O(log(n)) dichotomy search, which totals to O(n*log(n)). So, overall, O(n*log(n)).
Memory: Ignoring the constant parts, we have the memory for our array (n*sizeof(int)) plus the memory for our call stack in the dichotomy search (O(log(n))). Total: n*sizeof(int) + O(log(n)).
Conclusion: Time is O(n*log(n)), Memory is n*sizeof(int) + O(log(n)). Memory is almost as small as in Example 1. Time complexity is slightly more than in Example 2. In scenarios where the Example 2 cannot be used because we lack memory, the next best thing in terms of speed would realistically be Example 3, which is almost as fast as Example 2 and probably has enough room to run if the very slow Example 1 does.
Overall conclusion
This answer was just to show that "optimal" is context-dependent in algorithmics. It's very unlikely that in this particular example, one would choose to implement Example 3. In general, you'd see either Example 1 if n is so small that one would choose whatever is simplest to design and fastest to code, or Example 2 if n is a bit larger and we want speed. But if you look at the wikipedia page I linked for sorting algorithms, you'll see that none of them is best at everything. They all have scenarios where they could be replaced with something better.
These codes gives the sum of even integers in a list without using loop statement. I would like to know the time complexity and space complexity of both codes. which is best?
CODE 1:
class EvenSum:
#Initialize the class
def __init__(self):
self.res = 0
def sumEvenIntegers(self, integerlist):
if integerlist:
if not integerlist[0] % 2:
self.res += integerlist[0]
del integerlist[0]
self.sumEvenIntegers(integerlist)
else:
del integerlist[0]
self.sumEvenIntegers(integerlist)
return self.res
#main method
if __name__ == "__main__":
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even = EvenSum()
print even.sumEvenIntegers(l)
CODE 2:
import numpy as np
def sum_of_all_even_integers(list):
list_sum = sum(list)
bin_arr = map(lambda x:x%2, list)
return list_sum - sum(list*bin_arr)
if __name__ == "__main__":
list = np.array([1,2,3,4,5,6,7,8,9,10])
print sum_of_all_even_integers(list)
According to the Python wiki, deleting an item from a list takes linear time proportional to the number of elements in the list. Since you delete every item in the list, and each deletion takes linear time, the overall runtime is proportional to the square of number of items in the list.
In your second code snippet, both sum as well as map take linear time. So the overall complexity is linear proportional to the number of elements in the list. Interestingly, sum_of_elements isn't used at all (but it doesn't sum all even elements either).
what about the following?
import numpy as np
a = np.arange(20)
print np.sum(a[a%2==0])
It seems to be much more lightweight compared to your two code snippets.
Small timings with an np.arange(998):
Pure numpy:
248502
0.0
Class recursion:
248502
0.00399994850159
List/Numpy one:
248502
0.00200009346008
And, if there's a 999 element array, your class runs in failure, because the maximum recursion depth is reached.
First code use item deletion in list and recursivity, two thing at which python is not so good : time deletion take an O(n) time, since you recreate the whole list, and python does not optimize recursive calls (to keep full info about the traceback I think).
So I would go for the second code (which I think actually use "for loops", only the loops are hidden in the reduce and map).
If you use numpy, you could actually do something like :
a = np.array([1,2,3,4,5,6,7,8,9,10])
np.sum(np.where((a+1)%2,a,0))
Or like anki proposed :
np.sum( a[a%2 == 0] )
Which I think would be best since numpy is optimized for array manipulation.
By the way, never name an object list, as it overwrites the list constructor.
EDIT :
If you just want the sum of all even number in [0,n], you don't need a sum or anything. There is a mathematical formula for that :
s=(n//2)*(n//2+1)
First have O(N^2) time complexity and O(N) space complexity. The second have O(N) time complexity and space complexity.
The first uses one stack frame (one piece of stack memory of constant but quite large size) for each element in the array. In addition it executes the function for each element, but for each time it deletes the first element of the array, which is an O(N) operation.
The second happens much behind the scene. The map function generates a new list of the same size of the original, in addition it calls a function for each element - giving the complexity directly. Similarily for the reduce and sum functions - they do the same operation for each element in the list, although they don't use more memory than constant. Adding up these doesn't make the complexities worse - twice or thrice O(N) is still O(N).
The best is probably the later, but then again - it depends on what your preferences are. Maybe you want to consume a lot of time and stack space? In that case the first would suit your preferences much better.
Also note that the first solution modifies the input data - they don't do the same thing in other words. After calling the first the list sent to the function would be empty (which may or may not be a bad thing).