What is big O notation of these permutation algorithms - python

Working my way through "Cracking the coding interview", and a practice question says
Given 2 strings, write a method to decide if one is a permutation of the other.
The author's python solution is as follows:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
counter = Counter()
for c in str1:
counter[c] += 1
for c in str2:
if counter[c] == 0:
return False
counter[c] -= 1
return True
Which claims to be in O(N) time.
My solution is as follows:
def perm(str1,str2):
if(len(str1) != len(str2)):
return False
for c in str1:
if c not in Str2:
return False
return True
And I believe this to also be O(N). Is this true? Which algorithm is favorable? The author's data type seems unneccesary.
And lastly, is this algorithm O(NlogN)?
def perm(str1,str2):
return sorted(str1)==sorted(str2)

First, the author's solution is an optimized version of Counter(str1) == Counter(str2) (it returns False faster and creates a single instance of a Counter).
It is, indeed, O(n) because hash table (Counter) access is O(1).
Next, your solution is quadratic (O(n^2)) because each in is O(n) - it has to traverse the whole string.
It is also wrong on strings with repetitions.
Third, sorted(str1) == sorted(str2) is, indeed, linearithmic (O(n*log(n)))
and thus is worse than the original linear solution.
Note, however, that for small strings the constants may make a
difference and the linearithmic (sorted) solution may turn out to be
faster than the linear (Counter) one.
Finally, beware that Python is usually implemented using an interpreter, so the actual performance may depend on whether you are using features implemented in C or in Python. E.g., if Counter is implemented in C, then Counter(str1) == Counter(str2) will probably outperform the author's solution hands down, even though algorithmically the author's solution is better.

For the first code, it could be easy by using collection.Counter instead of loops:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
return Counter(str1) == Counter(str2)
And it is O(n) again. The last algorihtm, as there is a sorting and using sorted it is O(nlogn).
Your algorithm is not true as you find a character inside the other string without concern of the number of the repetition of that character. If it was true, it would be O(n^2).
Therefore, in a general sence, the first algorithm has the best time complexity and easy to be implemented.

Related

My Approach is Brute-Force or Linear Search?

My Problem is the following:
Given a sequence of integer values, determines if there is a distinct pair of numbers in the sequence whose product is odd. Please provide two Python functions, oddpair_bf() and oddpair_linear() for this problem. The function will take the given sequence of integers as a list. The oddpair_bf() function uses a Brute-force approach and check the possible pairs sequestially. When there is pair whose product is odd, it returns True; otherwise, it reports False. The second one, oddpair_linear(), uses a linear-scan approach and will visit each element once. Pleace have a way to determine this with a linear-scan.
I tried solving it on my own and got:
def oddpair_bf(list):
for i in list:
for j in list:
if i != j:
product = i*j
if product & 1:
return True
return False
Now my question is, is this a brute-force approach or "linear-scan" approach? And how would I need to approach it differently?
Here is a concise linear function that checks to see if there is more than one odd number in the sequence (which would give at least one odd product) and returns True.
def oddpair_linear(seq):
return len([x for x in seq if x & 1]) > 1
Your approach is brute force as it explores all possible combinations.
Nice usage of the & operator instead of the classical (x % 2) == 1.
Brute force
I would suggest two improvements in your code:
Do not use list as a variable name because it is a reserved language keywords for list.
Halve tests as multiplication is commutative (symmetrical).
It leads to:
def oddpair_bf(seq):
n = len(seq)
for i in range(n):
for j in range(i+1, n):
if seq[i]*seq[j] & 1:
return True
return False
Which can be condensed using itertools:
def oddpair_bf2(seq):
for x, y in itertools.combinations(seq, 2):
if x*y & 1:
return True
return False
This new version is still in O(n^2) for the worst case. But you spare unnecessary comparisons by removing n + n*(n-1)/2 (diagonal and lower triangle) cases from n^2 (square, two nested loops of size n) simply because multiplication is commutative: We do not need to check y*x in addition of x*y.
Linear
Reducing complexity before brute force version is generally done by highlighting an inherent property of the system that makes computations easier, less intensive and thus more tractable.
For linear version, use a well known property of the problem: Any product of an even number will always be an even number because it has at least a 2 factor coming from the even number.
Therefore, solving this problem is equivalent to check if there is at least two odd numbers in the list. This can be written as:
def oddpair_linear(seq):
n = 0
for x in seq:
if x & 1:
n += 1
if n >= 2:
return True
return False
This snippet is O(n) in the worst case (a single loop of size n). This check has been nicely condensed into a one-liner by #pakpe.
since you said distinct, you can use a set:
def oddpair_linear(seq):
return len({s for s in seq if s&1})>1
or a slightly better way
def oddpair_linear(seq):
found=0
for s in seq:
if s&1:
if not found:
found=s
else:
return True
return False

What is the Big O of this naive solution?

Here is a simple fucntion that takes in two input strings. It returns True if the second string is an anagram of the first.
def validAnagram(str1, str2):
if len(str1) != len(str2):
return False
str1_arr = [char for char in str1]
str2_arr = [char for char in str2]
for char in str1_arr:
if char in str2_arr:
str2_arr.remove(char)
else:
return False
return True
I am learning to calculate the Big O of the programs I write. Is this function's runtime O(N2) or O(N3)?
I assume its O(N3) because the "if" condition also runs O(N). So its 3 nested O(N) operations, resulting in O(N3) runtime. Please correct me if I am wrong.
It is O(N^2). You have O(N) iterations, in which you perform an O(N) operation. This results in O(N^2) complexity overall.
I think what you got wrong is calculating this part to be O(N^2), while it's actually O(N):
if char in str2_arr:
str2_arr.remove(char)
because you have O(N) + O(N) here, which is still just O(N).

Why '==' is fast than manually traversal of two strings comparison in Python3?

I try to solve the problem 28. Implement Str on LeetCode.
However, I have some questions about the time complexity of the two versions of the implemented codes.
# Version 1
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
for j in range(len_n):
if i+j >= len_h or haystack[i+j] != needle[j]:
found = False
break
if found:
return i
i += 1
return -1
In this version, I try to find the needle substring in the haystack using the double loops.
I think the time complexity of the code is O(mn) where m is the length of the haystack and n is the length of the needle.
Unfortunately, the code cannot pass the tests due to the time exceeding.
Then, I try to optimize my code and get version 2 of the code.
# Version 2
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
if haystack[i:i+len_n] == needle:
return i
i += 1
return -1
I compare the needle and the substring of the haystack using string-slice and '==' instead of the manual comparison. Then, the code passes the tests.
Now, I have some questions:
What is the time complexity of the string slice?
What is the time complexity of the check operation (==) between two strings?
Why version 2 is fast than version 1 if the time complexity of the check operation is O(n)?
Thanks for any advice.
str.__eq__(self, other) (that is, equality for strings) is implemented in C and is lightning fast (as fast as any other language once it starts).
Your Python-implemented character-wise string comparison is slow for two reasons. First, the looping logic is implemented in Python, and Python loops are never very fast. Second, when you say needle[j] that is slicing one string to construct another one. That by itself is slow, and you do it in a nested loop, so the overall runtime will be disastrous. You end up calling str.__eq__ once per character, and every time it's called it has to check the length of the strings on each side (it does not know you just sliced a single character).

Does one for loop mean a time complexity of n in this case?

So, I've run into this problem in the daily coding problem challenge, and I've devised two solutions. However, I am unsure if one is better than the other in terms of time complexity (Big O).
# Given a list of numbers and a number k,
# return whether any two numbers from the list add up to k.
#
# For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
#
# Bonus: Can you do this in one pass?
# The above part seemed to denote this can be done in O(n).
def can_get_value(lst=[11, 15, 3, 7], k=17):
for x in lst:
for y in lst:
if x+y == k:
return True
return False
def optimized_can_get_value(lst=[10, 15, 3, 7], k=17):
temp = lst
for x in lst:
if k-x in temp:
return True
else:
return False
def main():
print(can_get_value())
print(optimized_can_get_value())
if __name__ == "__main__":
main()
I think the second is better than the first since it has one for loop, but I'm not sure if it is O(n), since I'm still running through two lists. Another solution I had in mind that was apparently a O(n) solution was using the python equivalent of "Java HashSets". Would appreciate confirmation, and explanation of why/why not it is O(n).
The first solution can_get_value() is textbook O(n^2). You know this.
The second solution is as well. This is because elm in list has O(n) complexity, and you're executing it n times. O(n) * O(n) = O(n^2).
The O(n) solution here is to convert from a list into a set (or, well, any type of hash table - dict would work too). The following code runs through the list exactly twice, which is O(n):
def can_get_value(lst, k):
st = set(lst) # make a hashtable (set) where each key is the same as its value
for x in st: # this executes n times --> O(n)
if k-x in st: # unlike for lists, `in` is O(1) for hashtables
return True
return False
This is thus O(n) * O(1) = O(n) in most cases.
In order to analyze the asymptotic runtime of your code, you need to know the runtime of each of the functions which you call as well. We generally think of arithmetic expressions like addition as being constant time (O(1)), so your first function has two for loops over n elements and the loop body only takes constant time, coming out to O(n * n * 1) = O(n^2).
The second function has only one for loop, but checking membership for a list is an O(n) function in the length of the list, so you still have O(n * n) = O(n^2). The latter option may still be faster (Python probably has optimized code for checking list membership), but it won't be asymptotically faster (the runtime still increases quadratically in n).
EDIT - as #Mark_Meyer pointed out, your second function is actually O(1) because there's a bug in it; sorry, I skimmed it and didn't notice. This answer assumes a corrected version of the second function like
def optimized_can_get_value(lst, k=17):
for x in lst:
if k - x in lst:
return True
return False
(Note - don't have a default value for you function which is mutable. See this SO question for the troubles that can bring. I also removed the temporary list because there's no need for that; it was just pointing to the same list object anyway.)
EDIT 2: for fun, here are a couple of O(n) solutions to this (both use that checking containment for a set is O(1)).
A one-liner which still stops as soon as a solution is found:
def get_value_one_liner(lst, k):
return any(k - x in set(lst) for x in lst)
EDIT 3: I think this is actually O(n^2) because we call set(lst) for each x. Using Python 3.8's assignment expressions could, I think, give us a one-liner that is still efficient. Does anybody have a good Python <3.8 one-liner?
And a version which tries not to do extra work by building up a set as it goes (not sure if this is actually faster in practice than creating the whole set at the start; it probably depends on the actual input data):
def get_value_early_stop(lst, k):
values = set()
for x in lst:
if x in values:
return True
values.add(k - x)
return False

Sorting numbers of 10**6 digits in Python 2 efficiently

I used the long type to store numbers and sort using normal sort method, but it was not efficient enough.
I think long(raw_input()) takes too much time.
Can someone think of an efficient way to solve it?
n = int(raw_input().strip())
unsorted = []
for i in xrange(n):
term = long(raw_input())
unsorted.append(term)
for i in sorted(unsorted):
print i
Since you said it is a competitive programming question this solution would work otherwise it would never work as it needs way too much of memory and would break in the bad cases. But in competitive programming the constraints are not so much and it should work.
We merely add comparators ie the gt and lt methods which decide the comparison between 2 objects.
class VeryBigNumber(object):
def __init__(self,number_string):
self.number = number_string
def __gt__(self,other):
if len(self.number)>len(other.number):
return True
elif len(self.number)<len(other.number):
return False
for i in xrange(min(len(self.number),len(other.number))):
if int(self.number[i])>int(other.number[i]):
return True
elif int(self.number[i])<int(other.number[i]):
return False
return False
def __eq__(self,other):
return self.number == other.number
def __lt__(self,other):
return not (self.__eq__(other) or self.__gt__(other))
arr = []
for i in xrange(n):
arr.append(VeryBigNumber(raw_input().strip())
arr = sorted(arr)
for a in arr:
print a.number,
Big Explanation:
In the comments you had said it's for a coding competition. This assures us that the memory if small enough for us to be able to read in under a second.
We don't convert such a big string to a number as that would be unnecessary. Instead we keep the string as is.
How do we sort it then? We make our own class and use string comparisons.
These work by comparing the character digits between strings and thus only one character is converted to int at a time which is very efficient.
I've tested the above code and it works correctly

Categories

Resources