Big O notation of simple anagram function - python

I've worked up the following code which finds anagrams. I had thought the big O notation for this was O(n) But was informed by my instructor that I am incorrect. I am confused on why this is not correct however, would anyone be able to offer any advice?
# Define an anagram.
def anagram(s1, s2):
return sorted(s1) == sorted(s2)
# Main function.
def Question1(t, s):
# use built in any function to check any anagram of t is substring of s
return any(anagram(s[i: i+len(t)], t)
for i in range(len(s)-len(t)+ 1))
Function Call:
# Simple test case.
print Question1("app", "paple")
# True

any anagram of t is substring of s
That's not what your code says.
You have "any substring of s is an anagram of t", which might be equivalent, but it's easier to understand that way.
As for complexity, you need to define what you're calling N... Is it len(s)-len(t)+ 1?
The function any() has complexity N, in that case, yes.
However, you've additionally called anagram over an input of T length, and you seem to have ignored that.
anagram calls sorted twice. Each call to sorted is closer to O(T * log(T)) itself assuming merge sort. You're also performing a list slice, so it could be slightly higher.
Let's say your complexity is somewhere on the order of (S-T) * 2 * (T * log(T)) where T and S are lengths of strings.
The answer depends on which string of your input is larger.
Best case is that they are the same length because then your range only has one element.
Big O notation is worst case, though, so you need to figure out which conditions generate the most complexity in terms of total operations. For example, what if T > S? Then len(s)-len(t)+ 1 will be non positive, so does the code run more or less than equal length strings? And what about S < T or S = 0?

This is not N complexity due a few factors. First one sorted has O(n log n) complexity. And Potentially you can call it few times (and sort T and S), if T long enough.

Related

Find runtime (number of operations of function) and calculate Big O

For the python function given below, I have to find the number of Operations and Big O.
def no_odd_number(list_nums):
i = 0
while i < len(list_nums):
num = list_nums[i]
if num % 2 != 0:
return False
i += 1
return True
From my calculation, the number of operations is 4 + 3n but I'm not sure as I don't know how to deal with if...else statements.
I am also given options to choose the correct Big O from, from my calculation, I think it should be d. O(n) but I'm not sure. Help please!
a. O(n^2)
b. O(1)
c. O(log n)
d. O(n)
e. None of these
Big O notation typically considers the worst case scenario. The function you have is pretty simple, but the early return seems to complicate things. However, since we care about the worst case you can ignore the if block. The worst case will be one where you don't return early. It would be a list like [2,4,6,8], which would run the loop four times.
Now, look at the things inside the while loop, with the above in mind. It doesn't matter how big list_nums is: inside the loop you just increment i and lookup something in a list. Both of those are constant time operations that are the same regardless of how large list_nums is.
The number of times you do this loop is the length of list_nums. This means as list_nums grows, the number of operations grows at the same rate. That makes this O(n) as you suspect.

Trying to calculate algorithm time complexity

So last night I solved this LeetCode question. My solution is not great, quite slow. So I'm trying to calculate the complexity of my algorithm to compare with standard algorithms that LeetCode lists in the Solution section. Here's my solution:
class Solution:
def longestCommonPrefix(self, strs: List[str]) -> str:
# Get lengths of all strings in the list and get the minimum
# since common prefix can't be longer than the shortest string.
# Catch ValueError if list is empty
try:
min_len = min(len(i) for i in strs)
except ValueError:
return ''
# split strings into sets character-wise
foo = [set(list(zip(*strs))[k]) for k in range(min_len)]
# Now go through the resulting list and check whether resulting sets have length of 1
# If true then add those characters to the prefix list. Break as soon as encounter
# a set of length > 1.
prefix = []
for i in foo:
if len(i) == 1:
x, = i
prefix.append(x)
else:
break
common_prefix = ''.join(prefix)
return common_prefix
I'm struggling a bit with calculating complexity. First step - getting minimum length of strings - takes O(n) where n is number of strings in the list. Then the last step is also easy - it should take O(m) where m is the length of the shortest string.
But the middle bit is confusing. set(list(zip(*strs))) should hopefully take O(m) again and then we do it n times so O(mn). But then overall complexity is O(mn + m + n) which seems way too low for how slow the solution is.
The other option is that the middle step is O(m^2*n), which makes a bit more sense. What is the proper way to calculate complexity here?
Yes, the middle portion is O{mn}, as well the overall is O{mn} because that dwarfs the O{m} and O{n} terms for large values of m and n.
Your solution has an ideal order of runtime complexity.
Optimize: Short-Circuit
However, you are probably dismayed that others have faster solutions. I suspect that others likely short-circuit on the first non-matching index.
Let's consider a test case of 26 strings (['a'*500, 'b'*500, 'c'*500, ...]). Your solution would proceed to create a list that is 500 long, with each entry containing a set of 26 elements. Meanwhile, if you short-circuited, you would only process the first index, ie one set of 26 characters.
Try changing your list into a generator. This might be all you need to short-circuit.
foo = (set(x) for x in zip(*strs)))
You can skip min_len check because default behaviour of zip is to iterate only as long as the shortest input.
Optimize: Generating Intermediate Results
I see that you append each letter to a list, then ''.join(lst). This is efficient, especially compared to the alternative of iteratively appending to a string.
However, we could just as easily save a counter match_len. Then when we detect the first mis-match, just:
return strs[0][:match_len]

What's the overhead of using built-in python functions like zip() and join() on the performance of my function?

Below I have provided the function to calculate the LCF (longest common prefix). I want to know the Big O time-complexity and space complexity. Can I say it is O(n)? Or do zip() and join() affect the time-complexity? I am wondering the space complexity is O(1). Please correct me if I am wrong. The input to the function is a list containing strings e.g., ["flower","flow","flight"].
def longestCommonPrefix(self, strs):
res = []
for x in zip(*strs):
if len(set(x)) == 1:
res.append(x[0])
else:
break
return "".join(res)
Iterating to get a single tuple value from zip(*strs) takes O(len(strs)) time and space. That's just the time it takes to allocate and fill a tuple of that length.
Iterating to consume the whole iterator takes O(len(strs) * min(len(s) for s in strs)) time, but shouldn't take any additional space over a single iteration.
Your iteration code is a bit trickier, because you may stop iterating early, when you find the first place within your strings where some characters don't match. In the worst case, all the strings are identical (up to the length of the shortest one) and so you'd use the time complexity above. And in the best case there is no common prefix, so you can use the single-value iteration as your best case.
But there's no good way to describe "average case" performance because it depends a lot on the distributions of the different inputs. If your inputs were random strings, you could do some statistics and predict an average number of iterations, but if your input strings are words, or even more likely, specific words expected to have common prefixes, then it's very likely that all bets are off.
Perhaps the best way to describe that part of the function's performance is actually in terms of its own output. It takes O(len(strs) * len(self.longestCommonPrefix(strs)) time to run.
As for str.join, running "".join(res) if we know nothing about res takes O(len(res) + len("".join(res))) for both time and space. Because your code only joins individual characters, the two lengths are going to be the same, so we can say that the join in your function takes O(len(self.longestCommonPrefix(strs))) time and space.
Putting things together, we can see that the main loop takes a multiple of the time taken by the join call, so we can ignore the latter and say that the function's time complexity is just O(len(strs) * len(self.longestCommonPrefix(strs)). However, the memory usage complexities for the two parts are independent and we can't easily predict if the number of strings or the length of the output will grow faster. So we need to combine them and say that you need O(len(strs) + len(self.longestCommonPrefix(strs))) space.
Time:
Your code is O(n * m), where n is the lenght of the list and m is the lenght of the biggest string in the list.
zip() is O(1) in python 3.x. The function allocates a special iterable (called the zip object), and assigns the parameter array to an internal field. In case of zip(*x) (pointed from #juanpa.arrivillaga), it builds a tuple, so it is O(n). As a result, you will get an O(n) because you iterate over the list (tuple) plus the zip(*x) call staying at the end with O(n).
join() is O(n). Where n is the total length of the input.
set() is O(m). Where m is the total lenght of the word.
Space:
It is O(n), because in the worst scenario, res will need to append x[0] n times.

O(N) Time complexity for simple Python function

I just took a Codility demo test. The question and my answer can be seen here, but I'll paste my answer here as well. My response:
def solution(A):
# write your code in Python 2.7
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 # increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
return retresult
My question is around time complexity, which I hope to better understand by your response. The question asks for expected worst-case time complexity is O(N).
Does my function have O(N) time complexity? Does the fact that I sort the array increase the complexity, and if so how?
Codility reports (for my answer)
Detected time complexity:
O(N) or O(N * log(N))
So, what is the complexity for my function? And if it is O(N*log(N)), what can I do to decrease the complexity to O(N) as the problem states?
Thanks very much!
p.s. my background reading on time complexity comes from this great post.
EDIT
Following the reply below, and the answers described here for this problem, I would like to expand on this with my take on the solutions:
basicSolution has an expensive time complexity and so is not the right answer for this Codility test:
def basicSolution(A):
# 0(N*log(N) time complexity
retresult = 1; # the smallest integer we can return, if it is not in the array
A.sort()
for i in A:
if i > 0:
if i==retresult: retresult += 1 #increment the result since the current result exists in the array
elif i>retresult: break # we can go out of the loop since we found a bigger number than our current positive integer result
else:
continue; # negative numbers and 0 don't need any work
return retresult
hashSolution is my take on what is described in the above article, in the "use hashing" paragraph. As I am new to Python, please let me know if you have any improvements to this code (it does work though against my test cases), and what time complexity this has?
def hashSolution(A):
# 0(N) time complexity, I think? but requires 0(N) extra space (requirement states to use 0(N) space
table = {}
for i in A:
if i > 0:
table[i] = True # collision/duplicate will just overwrite
for i in range(1,100000+1): # the problem says that the array has a maximum of 100,000 integers
if not(table.get(i)): return i
return 1 # default
Finally, the actual 0(N) solution (O(n) time and O(1) extra space solution) I am having trouble understanding. I understand that negative/0 values are pushed at the back of the array, and then we have an array of just positive values. But I do not understand the findMissingPositive function - could anyone please describe this with Python code/comments? With an example perhaps? I've been trying to work through it in Python and just cannot figure it out :(
It does not, because you sort A.
The Python list.sort() function uses Timsort (named after Tim Peters), and has a worst-case time complexity of O(NlogN).
Rather than sort your input, you'll have to iterate over it and determine if any integers are missing by some other means. I'd use a set of a range() object:
def solution(A):
expected = set(range(1, len(A) + 1))
for i in A:
expected.discard(i)
if not expected:
# all consecutive digits for len(A) were present, so next is missing
return len(A) + 1
return min(expected)
This is O(N); we create a set of len(A) (O(N) time), then we loop over A, removing elements from expected (again O(N) time, removing elements from a set is O(1)), then test for expected being empty (O(1) time), and finally get the smallest element in expected (at most O(N) time).
So we make at most 3 O(N) time steps in the above function, making it a O(N) solution.
This also fits the storage requirement; all use is a set of size N. Sets have a small overhead, but always smaller than N.
The hash solution you found is based on the same principle, except that it uses a dictionary instead of a set. Note that the dictionary values are never actually used, they are either set to True or absent. I'd rewrite that as:
def hashSolution(A):
seen = {i for i in A if i > 0}
if not seen:
# there were no positive values, so 1 is the first missing.
return 1
for i in range(1, 10**5 + 1):
if i not in seen:
return i
# we can never get here because the inputs are limited to integers up to
# 10k. So either `seen` has a limited number of positive values below
# 10.000 or none at all.
The above avoids looping all the way to 10.000 if there were no positive integers in A.
The difference between mine and theirs is that mine starts with the set of expected numbers, while they start with the set of positive values from A, inverting the storage and test.

Permute a string to print all possible words

The code that i have written seems to be looking bad with asymptotic measure of running time and space
I am getting
T(N) = T(N-1)*N + O((N-1!)*N) where N is the size of input. I need advise to optimize it
Since it is an algorithm based interview question we are required to implement the logic in most efficient way without using any libraries
Here is my code
def str_permutations(str_input,i):
if len(str_input) == 1:
return [str_input]
comb_list = []
while i < len(str_input):
key = str_input[i]
if i+1 != len(str_input):
remaining_str = "".join((str_input[0:i],str_input[i+1:]))
else:
remaining_str = str_input[0:i]
all_combinations = str_permutations(remaining_str,0)
for index,value in enumerate(all_combinations):
all_combinations[index] = "".join((key,value))
comb_list.extend(all_combinations)
i = i+1
return comb_list
As I mentioned in a comment to the question, in the general case you won't get below exponential complexity since for n distinct characters, there are n! permutations of the input string, and O(2n) is a subset of O(n!).
Now the following won't improve the asymptotic complexity for the general case, but you can optimize the brute-force approach of producing all permutations for strings that have some characters with multiple occurrences. Take for example the string daedoid; if you blindly produce all permutations of it, you'll get every permutation 6 = 3! times since you have three occurrences of d. You can avoid that by first eliminating multiple occurrences of the same letter and instead remembering how often to use each letter. So if there is a letter c that has kc occurrences, you'll save kc! permutations. So in total, this saves you a factor of "product over kc! for all c".
If you don't need to write your own, see itertools.permutations and combinations.

Categories

Resources