What is the Big O of this naive solution? - python

Here is a simple fucntion that takes in two input strings. It returns True if the second string is an anagram of the first.
def validAnagram(str1, str2):
if len(str1) != len(str2):
return False
str1_arr = [char for char in str1]
str2_arr = [char for char in str2]
for char in str1_arr:
if char in str2_arr:
str2_arr.remove(char)
else:
return False
return True
I am learning to calculate the Big O of the programs I write. Is this function's runtime O(N2) or O(N3)?
I assume its O(N3) because the "if" condition also runs O(N). So its 3 nested O(N) operations, resulting in O(N3) runtime. Please correct me if I am wrong.

It is O(N^2). You have O(N) iterations, in which you perform an O(N) operation. This results in O(N^2) complexity overall.
I think what you got wrong is calculating this part to be O(N^2), while it's actually O(N):
if char in str2_arr:
str2_arr.remove(char)
because you have O(N) + O(N) here, which is still just O(N).

Related

Why '==' is fast than manually traversal of two strings comparison in Python3?

I try to solve the problem 28. Implement Str on LeetCode.
However, I have some questions about the time complexity of the two versions of the implemented codes.
# Version 1
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
for j in range(len_n):
if i+j >= len_h or haystack[i+j] != needle[j]:
found = False
break
if found:
return i
i += 1
return -1
In this version, I try to find the needle substring in the haystack using the double loops.
I think the time complexity of the code is O(mn) where m is the length of the haystack and n is the length of the needle.
Unfortunately, the code cannot pass the tests due to the time exceeding.
Then, I try to optimize my code and get version 2 of the code.
# Version 2
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
if haystack[i:i+len_n] == needle:
return i
i += 1
return -1
I compare the needle and the substring of the haystack using string-slice and '==' instead of the manual comparison. Then, the code passes the tests.
Now, I have some questions:
What is the time complexity of the string slice?
What is the time complexity of the check operation (==) between two strings?
Why version 2 is fast than version 1 if the time complexity of the check operation is O(n)?
Thanks for any advice.
str.__eq__(self, other) (that is, equality for strings) is implemented in C and is lightning fast (as fast as any other language once it starts).
Your Python-implemented character-wise string comparison is slow for two reasons. First, the looping logic is implemented in Python, and Python loops are never very fast. Second, when you say needle[j] that is slicing one string to construct another one. That by itself is slow, and you do it in a nested loop, so the overall runtime will be disastrous. You end up calling str.__eq__ once per character, and every time it's called it has to check the length of the strings on each side (it does not know you just sliced a single character).

Does one for loop mean a time complexity of n in this case?

So, I've run into this problem in the daily coding problem challenge, and I've devised two solutions. However, I am unsure if one is better than the other in terms of time complexity (Big O).
# Given a list of numbers and a number k,
# return whether any two numbers from the list add up to k.
#
# For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
#
# Bonus: Can you do this in one pass?
# The above part seemed to denote this can be done in O(n).
def can_get_value(lst=[11, 15, 3, 7], k=17):
for x in lst:
for y in lst:
if x+y == k:
return True
return False
def optimized_can_get_value(lst=[10, 15, 3, 7], k=17):
temp = lst
for x in lst:
if k-x in temp:
return True
else:
return False
def main():
print(can_get_value())
print(optimized_can_get_value())
if __name__ == "__main__":
main()
I think the second is better than the first since it has one for loop, but I'm not sure if it is O(n), since I'm still running through two lists. Another solution I had in mind that was apparently a O(n) solution was using the python equivalent of "Java HashSets". Would appreciate confirmation, and explanation of why/why not it is O(n).
The first solution can_get_value() is textbook O(n^2). You know this.
The second solution is as well. This is because elm in list has O(n) complexity, and you're executing it n times. O(n) * O(n) = O(n^2).
The O(n) solution here is to convert from a list into a set (or, well, any type of hash table - dict would work too). The following code runs through the list exactly twice, which is O(n):
def can_get_value(lst, k):
st = set(lst) # make a hashtable (set) where each key is the same as its value
for x in st: # this executes n times --> O(n)
if k-x in st: # unlike for lists, `in` is O(1) for hashtables
return True
return False
This is thus O(n) * O(1) = O(n) in most cases.
In order to analyze the asymptotic runtime of your code, you need to know the runtime of each of the functions which you call as well. We generally think of arithmetic expressions like addition as being constant time (O(1)), so your first function has two for loops over n elements and the loop body only takes constant time, coming out to O(n * n * 1) = O(n^2).
The second function has only one for loop, but checking membership for a list is an O(n) function in the length of the list, so you still have O(n * n) = O(n^2). The latter option may still be faster (Python probably has optimized code for checking list membership), but it won't be asymptotically faster (the runtime still increases quadratically in n).
EDIT - as #Mark_Meyer pointed out, your second function is actually O(1) because there's a bug in it; sorry, I skimmed it and didn't notice. This answer assumes a corrected version of the second function like
def optimized_can_get_value(lst, k=17):
for x in lst:
if k - x in lst:
return True
return False
(Note - don't have a default value for you function which is mutable. See this SO question for the troubles that can bring. I also removed the temporary list because there's no need for that; it was just pointing to the same list object anyway.)
EDIT 2: for fun, here are a couple of O(n) solutions to this (both use that checking containment for a set is O(1)).
A one-liner which still stops as soon as a solution is found:
def get_value_one_liner(lst, k):
return any(k - x in set(lst) for x in lst)
EDIT 3: I think this is actually O(n^2) because we call set(lst) for each x. Using Python 3.8's assignment expressions could, I think, give us a one-liner that is still efficient. Does anybody have a good Python <3.8 one-liner?
And a version which tries not to do extra work by building up a set as it goes (not sure if this is actually faster in practice than creating the whole set at the start; it probably depends on the actual input data):
def get_value_early_stop(lst, k):
values = set()
for x in lst:
if x in values:
return True
values.add(k - x)
return False

What is the space complexity of the following algorithm?

The following algorithm finds the largest element of a list using recursion.
def largest(s):
if len(s) == 0:
return 'List can\'t be empty'
elif len(s) == 1:
return s[0]
elif s[0] <= s[1]:
return largest(s[1:])
else:
s.remove(s[1])
return largest(s)
The time complexity is O(n) because we are making total of n calls to the function largest and each call does O(1) operations.
I am having trouble figuring out the space complexity. I think it's O(n) but I am not sure.
First of all, the time complexity is not O(n) because the list.remove operation is not O(1), but O(n).
So your time complexity would be O(n^2) - Imagine applying largest over this array [5 4 3 2 1]
You can see here a list of python operation complexity.
The space complexity is O(n^2) because when you are doing return largest(s[1:]) you are copying the list, not getting a reference, so you're keeping all the intermediate cuts of the list. Doing s.remove(s[0]) and then return largest(s) will give you O(n) space complexity because you're working with references.
Slicing a standard list does create a (shallow!) copy of the slice. You're correct about this making it O(n). (In additional memory allocated; not counting the list itself, which is of course already in memory.)
As Reut points out in the comments, this is an implementation detail of the Python interpreter, but I couldn't say for sure whether any interpreters handle slices differently. Any implementation that does create a slice without copying would have to use copy-on-write instead.

What is big O notation of these permutation algorithms

Working my way through "Cracking the coding interview", and a practice question says
Given 2 strings, write a method to decide if one is a permutation of the other.
The author's python solution is as follows:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
counter = Counter()
for c in str1:
counter[c] += 1
for c in str2:
if counter[c] == 0:
return False
counter[c] -= 1
return True
Which claims to be in O(N) time.
My solution is as follows:
def perm(str1,str2):
if(len(str1) != len(str2)):
return False
for c in str1:
if c not in Str2:
return False
return True
And I believe this to also be O(N). Is this true? Which algorithm is favorable? The author's data type seems unneccesary.
And lastly, is this algorithm O(NlogN)?
def perm(str1,str2):
return sorted(str1)==sorted(str2)
First, the author's solution is an optimized version of Counter(str1) == Counter(str2) (it returns False faster and creates a single instance of a Counter).
It is, indeed, O(n) because hash table (Counter) access is O(1).
Next, your solution is quadratic (O(n^2)) because each in is O(n) - it has to traverse the whole string.
It is also wrong on strings with repetitions.
Third, sorted(str1) == sorted(str2) is, indeed, linearithmic (O(n*log(n)))
and thus is worse than the original linear solution.
Note, however, that for small strings the constants may make a
difference and the linearithmic (sorted) solution may turn out to be
faster than the linear (Counter) one.
Finally, beware that Python is usually implemented using an interpreter, so the actual performance may depend on whether you are using features implemented in C or in Python. E.g., if Counter is implemented in C, then Counter(str1) == Counter(str2) will probably outperform the author's solution hands down, even though algorithmically the author's solution is better.
For the first code, it could be easy by using collection.Counter instead of loops:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
return Counter(str1) == Counter(str2)
And it is O(n) again. The last algorihtm, as there is a sorting and using sorted it is O(nlogn).
Your algorithm is not true as you find a character inside the other string without concern of the number of the repetition of that character. If it was true, it would be O(n^2).
Therefore, in a general sence, the first algorithm has the best time complexity and easy to be implemented.

Algorithm analysis, which one is correct?

I'm a programmer, but I've not learned CS, so I've poor understanding of algorithms analysis. I'm reading a book about this theme and I have a question:
Suppose we have a problem: Given two strings, we need to understand is the first string an anagram of the second string.
The first solution that I though was:
def anagram(s1, s2):
for char in s1:
if not char in s2:
return False
return True
In analysis of such an algorithm, should I care about complexity of this piece of code?
if not char in s2
To be more precise is it important, which algorithm is used in search operation, which will be executed in each iteration of the for loop?
Ps: sorry for misunderstanding, I know that the algorithm is wrong, because an anagram strings should be the same length. But it is not important for now.
First, you analyse the complexity of each line (n, m = len(s1), len(s2) and I will assume n > m):
def anagram(s1, s2):
for char in s1: # O(n)
if not char in s2: # O(m)
return False # O(1)
return True # O(1)
Note that if not char in s2: is O(m) as, in the worst case, you have to check every character in s2 to be sure char isn't there.
Then you combine; as you have nested operations, the overall complexity is O(n * m).
As pointed out in the comments, you can significantly improve by noting that membership testing for a set is O(1) (except where every hash collides, see e.g. https://wiki.python.org/moin/TimeComplexity):
def anagram(s1, s2):
s2 = set(s2) # O(m)
for char in s1: # O(n)
if not char in s2: # O(1)
return False # O(1)
return True # O(1)
By moving the O(m) operation out of the loop, you reduce the overall complexity to O(n).
However, this algorithm does not actually determine whether s1 and s2 are anagrams.
If you are looking for an efficient way to actually solve the problem, note that Python's sort ("Timsort") is O(n log n):
def anagram(s1, s2):
s1 = sorted(s1) # O(n log n)
s2 = sorted(s2) # O(m log m)
return s1 == s2 # O(m)
Now you have no nesting, so the total complexity is O(n log n). This is admittedly slower than O(n), but it has the advantage of working.
It just depends what you are analyzing the code for.
If your question is "how many times is an 'in' search performed", then you needn't worry about what 'in' does.
If your question is about the running time of the global algorithm, then yes you have to worry about the running time of 'in'.
This is why when you use the C++ STL containers, you should read about the time complexity in the manual. Unfortunately, as far as I know, this information is not available for Python.

Categories

Resources