Why '==' is fast than manually traversal of two strings comparison in Python3?

Why '==' is fast than manually traversal of two strings comparison in Python3? - python

I try to solve the problem 28. Implement Str on LeetCode.
However, I have some questions about the time complexity of the two versions of the implemented codes.
# Version 1
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
for j in range(len_n):
if i+j >= len_h or haystack[i+j] != needle[j]:
found = False
break
if found:
return i
i += 1
return -1
In this version, I try to find the needle substring in the haystack using the double loops.
I think the time complexity of the code is O(mn) where m is the length of the haystack and n is the length of the needle.
Unfortunately, the code cannot pass the tests due to the time exceeding.
Then, I try to optimize my code and get version 2 of the code.
# Version 2
class Solution:
def strStr(self, haystack, needle):
len_h = len(haystack)
len_n = len(needle)
if not needle:
return 0
if len_n > len_h:
return -1
i = 0
while i<len_h :
found = True
if haystack[i] == needle[0]:
if haystack[i:i+len_n] == needle:
return i
i += 1
return -1
I compare the needle and the substring of the haystack using string-slice and '==' instead of the manual comparison. Then, the code passes the tests.
Now, I have some questions:
What is the time complexity of the string slice?
What is the time complexity of the check operation (==) between two strings?
Why version 2 is fast than version 1 if the time complexity of the check operation is O(n)?
Thanks for any advice.

str.__eq__(self, other) (that is, equality for strings) is implemented in C and is lightning fast (as fast as any other language once it starts).
Your Python-implemented character-wise string comparison is slow for two reasons. First, the looping logic is implemented in Python, and Python loops are never very fast. Second, when you say needle[j] that is slicing one string to construct another one. That by itself is slow, and you do it in a nested loop, so the overall runtime will be disastrous. You end up calling str.__eq__ once per character, and every time it's called it has to check the length of the strings on each side (it does not know you just sliced a single character).

Related

how does python indexing affect runtime of o-notation?

I am new to O-notation and am trying to find the worst-case runtime for some of my codes. The only issue is that I'm confused on how O-notation runs with indexing and appending so I thought I'd ask for help with the following sample codes:
def sums_1(L):
n = len(L)
tot = 0
M = []
for i in L[:n//2]:
M.append(i)
for i in L[n//2:]:
M.extend(L)
return sum(M)
def sums_2(s):
def help_e(s, pos):
if pos >= len(s):
return ''
return help_e(s, pos+1) + s[pos]
return help_e(s, 0)
I think both codes would run o(n) times but I wanted some clarification on indexing and how that may affect the runtime, thanks!

Here you have the wiki file for the big-o notation for almost every python data-structure operations: https://wiki.python.org/moin/TimeComplexity

My Approach is Brute-Force or Linear Search?

My Problem is the following:
Given a sequence of integer values, determines if there is a distinct pair of numbers in the sequence whose product is odd. Please provide two Python functions, oddpair_bf() and oddpair_linear() for this problem. The function will take the given sequence of integers as a list. The oddpair_bf() function uses a Brute-force approach and check the possible pairs sequestially. When there is pair whose product is odd, it returns True; otherwise, it reports False. The second one, oddpair_linear(), uses a linear-scan approach and will visit each element once. Pleace have a way to determine this with a linear-scan.
I tried solving it on my own and got:
def oddpair_bf(list):
for i in list:
for j in list:
if i != j:
product = i*j
if product & 1:
return True
return False
Now my question is, is this a brute-force approach or "linear-scan" approach? And how would I need to approach it differently?

Here is a concise linear function that checks to see if there is more than one odd number in the sequence (which would give at least one odd product) and returns True.
def oddpair_linear(seq):
return len([x for x in seq if x & 1]) > 1

Your approach is brute force as it explores all possible combinations.
Nice usage of the & operator instead of the classical (x % 2) == 1.
Brute force
I would suggest two improvements in your code:
Do not use list as a variable name because it is a reserved language keywords for list.
Halve tests as multiplication is commutative (symmetrical).
It leads to:
def oddpair_bf(seq):
n = len(seq)
for i in range(n):
for j in range(i+1, n):
if seq[i]*seq[j] & 1:
return True
return False
Which can be condensed using itertools:
def oddpair_bf2(seq):
for x, y in itertools.combinations(seq, 2):
if x*y & 1:
return True
return False
This new version is still in O(n^2) for the worst case. But you spare unnecessary comparisons by removing n + n*(n-1)/2 (diagonal and lower triangle) cases from n^2 (square, two nested loops of size n) simply because multiplication is commutative: We do not need to check y*x in addition of x*y.
Linear
Reducing complexity before brute force version is generally done by highlighting an inherent property of the system that makes computations easier, less intensive and thus more tractable.
For linear version, use a well known property of the problem: Any product of an even number will always be an even number because it has at least a 2 factor coming from the even number.
Therefore, solving this problem is equivalent to check if there is at least two odd numbers in the list. This can be written as:
def oddpair_linear(seq):
n = 0
for x in seq:
if x & 1:
n += 1
if n >= 2:
return True
return False
This snippet is O(n) in the worst case (a single loop of size n). This check has been nicely condensed into a one-liner by #pakpe.

since you said distinct, you can use a set:
def oddpair_linear(seq):
return len({s for s in seq if s&1})>1
or a slightly better way
def oddpair_linear(seq):
found=0
for s in seq:
if s&1:
if not found:
found=s
else:
return True
return False

"Time Limit Exceeded" on LeetCode's Longest Palindromic Subsequence question

I'm trying to solve this problem on LeetCode, which reads:
Following the most upvoted Java solution, I came up with the following memoized solution:
import functools
class Solution:
def longestPalindromeSubseq(self, s):
return longest_palindromic_subsequence(s)
#functools.lru_cache(maxsize=None)
def longest_palindromic_subsequence(s):
if not s:
return 0
if len(s) == 1:
return 1
if s[0] == s[-1]:
return 2 + longest_palindromic_subsequence(s[1:-1])
return max(
longest_palindromic_subsequence(s[0:-1]),
longest_palindromic_subsequence(s[1:]))
The problem is that the time limit is exceeded for an input string which appears to have many repeated characters:
As I understand from the cited discussion, without the functools.lru_cache, the time complexity of this algorithm is O(2^N) because, at each reduction of the string length by one character, two recursive calls are made.
However, the discussion states that the memoized solution is O(N^2), which shouldn't exceed the time limit. I don't really see how memoization reduces the time complexity, however, and it doesn't seem to be the case here.
What further puzzles me is that if the solution consists of many repeated characters, it should actually run in O(N) time since each time the first and last characters are the same, only one recursive call is made.
Can someone explain to me why this test is failing?

String slicing in Python is O(n) (n being the length of the slice) while java's substring is O(1) as it merely creates a view on the same underlying char[]. You can take the slices out of the equation, however, by simply operating on the same string with two moving indexes. Moreover, you can move indexes past blocks of identical letters when first and last are not the same:
#functools.lru_cache(maxsize=None)
def longest_palindromic_subsequence(s, start=None, end=None):
if start is None:
start = 0
if end is None:
end = len(s) - 1
if end < start:
return 0
if end == start:
return 1
if s[start] == s[end]:
return 2 + longest_palindromic_subsequence(s, start+1, end-1)
# you can move indexes until you meet a different letter!
start_ = start
end_ = end
while s[start_] == s[start]:
start_ += 1
while s[end_] == s[end]:
end_ -= 1
return max(
longest_palindromic_subsequence(s, start, end_),
longest_palindromic_subsequence(s, start_, end))
Memoizaton should help significantly. Take input "abcde". In the return max(...) part, eventually two recursive calls will be made for "bcd", and even more calls for the further embedded substrings.

What is big O notation of these permutation algorithms

Working my way through "Cracking the coding interview", and a practice question says
Given 2 strings, write a method to decide if one is a permutation of the other.
The author's python solution is as follows:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
counter = Counter()
for c in str1:
counter[c] += 1
for c in str2:
if counter[c] == 0:
return False
counter[c] -= 1
return True
Which claims to be in O(N) time.
My solution is as follows:
def perm(str1,str2):
if(len(str1) != len(str2)):
return False
for c in str1:
if c not in Str2:
return False
return True
And I believe this to also be O(N). Is this true? Which algorithm is favorable? The author's data type seems unneccesary.
And lastly, is this algorithm O(NlogN)?
def perm(str1,str2):
return sorted(str1)==sorted(str2)

First, the author's solution is an optimized version of Counter(str1) == Counter(str2) (it returns False faster and creates a single instance of a Counter).
It is, indeed, O(n) because hash table (Counter) access is O(1).
Next, your solution is quadratic (O(n^2)) because each in is O(n) - it has to traverse the whole string.
It is also wrong on strings with repetitions.
Third, sorted(str1) == sorted(str2) is, indeed, linearithmic (O(n*log(n)))
and thus is worse than the original linear solution.
Note, however, that for small strings the constants may make a
difference and the linearithmic (sorted) solution may turn out to be
faster than the linear (Counter) one.
Finally, beware that Python is usually implemented using an interpreter, so the actual performance may depend on whether you are using features implemented in C or in Python. E.g., if Counter is implemented in C, then Counter(str1) == Counter(str2) will probably outperform the author's solution hands down, even though algorithmically the author's solution is better.

For the first code, it could be easy by using collection.Counter instead of loops:
def check_permutation(str1, str2):
if len(str1) != len(str2):
return False
return Counter(str1) == Counter(str2)
And it is O(n) again. The last algorihtm, as there is a sorting and using sorted it is O(nlogn).
Your algorithm is not true as you find a character inside the other string without concern of the number of the repetition of that character. If it was true, it would be O(n^2).
Therefore, in a general sence, the first algorithm has the best time complexity and easy to be implemented.

Sorting numbers of 10**6 digits in Python 2 efficiently

I used the long type to store numbers and sort using normal sort method, but it was not efficient enough.
I think long(raw_input()) takes too much time.
Can someone think of an efficient way to solve it?
n = int(raw_input().strip())
unsorted = []
for i in xrange(n):
term = long(raw_input())
unsorted.append(term)
for i in sorted(unsorted):
print i

Since you said it is a competitive programming question this solution would work otherwise it would never work as it needs way too much of memory and would break in the bad cases. But in competitive programming the constraints are not so much and it should work.
We merely add comparators ie the gt and lt methods which decide the comparison between 2 objects.
class VeryBigNumber(object):
def __init__(self,number_string):
self.number = number_string
def __gt__(self,other):
if len(self.number)>len(other.number):
return True
elif len(self.number)<len(other.number):
return False
for i in xrange(min(len(self.number),len(other.number))):
if int(self.number[i])>int(other.number[i]):
return True
elif int(self.number[i])<int(other.number[i]):
return False
return False
def __eq__(self,other):
return self.number == other.number
def __lt__(self,other):
return not (self.__eq__(other) or self.__gt__(other))
arr = []
for i in xrange(n):
arr.append(VeryBigNumber(raw_input().strip())
arr = sorted(arr)
for a in arr:
print a.number,
Big Explanation:
In the comments you had said it's for a coding competition. This assures us that the memory if small enough for us to be able to read in under a second.
We don't convert such a big string to a number as that would be unnecessary. Instead we keep the string as is.
How do we sort it then? We make our own class and use string comparisons.
These work by comparing the character digits between strings and thus only one character is converted to int at a time which is very efficient.
I've tested the above code and it works correctly

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why '==' is fast than manually traversal of two strings comparison in Python3? - python

Related

how does python indexing affect runtime of o-notation?

My Approach is Brute-Force or Linear Search?

"Time Limit Exceeded" on LeetCode's Longest Palindromic Subsequence question

What is big O notation of these permutation algorithms

Sorting numbers of 10**6 digits in Python 2 efficiently

Categories

Resources