I have written a code for a problem and used 2 double-nested loops within the implementation, but this code runs too long with big O as O(n^2).
So I googled for a faster solution for the same problem and found the second code below, which uses a tripled-nested loop with big O as O(n^3).
Is it because the number of computations is higher for the first code, although it has lower big O?
If so can I conclude that big O is not reliable for small "n" values and I have to do experimentation to be able to judge?
Code 1:
def sherlockAndAnagrams(s):
# 1 . Traverse all possible substrings within string
count = 0
lst_char_poss_str = []
len_s = len(s)
for i in range(len_s):#for each char in string
temp_str = ""#a temp string to include characters next to evaluating char
for j in range(i , len_s):#for all possible length of string from that char
temp_str += s[j] #possible substrings from that char
lst_char_poss_str.append(temp_str)#All possible substrings within string
# 2 . Check if any two substrings of equal length are anagrams
new_lst_char_poss_str = []
for i in lst_char_poss_str:
i = list(i)#sorted list, so, "abb" and "bba" will be both "abb"
i.sort()
new_lst_char_poss_str.append(i)#a 2-d list of lists of characters for All possible substrings within string
len_new_s = len(new_lst_char_poss_str)
for i in range (len_new_s - 1):
for j in range (i + 1, len_new_s):
if new_lst_char_poss_str[i] == new_lst_char_poss_str[j]:
count += 1
return(count)
Code 2:
def sherlockAndAnagrams(s):
count = 0
slen = len(s)
for i in range(slen):
for j in range(i+1, slen):
substr = ''.join(sorted(s[i:j]))#Sortingall characters after a char in string
sublen = len(substr)
for x in range(i+1, slen):
if x + sublen > slen: #if index out of range
break
substr2 = ''.join(sorted(s[x:x+sublen]))
if substr == substr2:
anagrams += 1
return count
You might have an algorithm whose running time is 1,000,000 n, because you may be doing some other operations. But you might have an algorithm of this running time. 1,000,000n is O (n), because this is <= some constant time n and you might have some other algorithm with the running time of 2 n^2.
You would say that 1,000,000 n algorithm is better than 2 n^2. The one with the linear running time which is O (n) running time is better than O ( n^2). It is true but in the limit and the limit is achieved very late when n is really large. For small instances this 2 n^2 might actually take less amount of time than your 1,000,000 n. We must be careful about the constants also.
There are lot of point to be considered:
the second algorithm always return 0, nobody increment count
in the first : temp_str += s[j] is not efficient, in the second this string concatenation is not used.
the second is faster because use slicing to retrieve pieces of the string. but to be sure maybe you must do a precise profile of the code.
other than this, as told by #pjs big O notation is an asymptotical explanation.
Related
This is my current approach:
def isPalindrome(s):
if (s[::-1] == s):
return True
return False
def solve(s):
l = len(s)
ans = ""
for i in range(l):
subStr = s[i]
for j in range(i + 1, l):
subStr += s[j]
if (j - i + 1 <= len(ans)):
continue
if (isPalindrome(subStr)):
ans = max(ans, subStr, key=len)
return ans if len(ans) > 1 else s[0]
print(solve(input()))
My code exceeds the time limit according to the auto scoring system. I've already spend some time to look up on Google, all of the solutions i found have the same idea with no optimization or using dynamic programming, but sadly i must and only use brute force to solve this problem. I was trying to break the loop earlier by skipping all the substrings that are shorter than the last found longest palindromic string, but still end up failing to meet the time requirement. Is there any other way to break these loops earlier or more time-efficient approach than the above?
With subStr += s[j], a new string is created over the length of the previous subStr. And with s[::-1], the substring from the previous offset j is copied over and over again. Both are inefficient because strings are immutable in Python and have to be copied as a new string for any string operation. On top of that, the string comparison in s[::-1] == s is also inefficient because you've already compared all of the inner substrings in the previous iterations and need to compare only the outermost two characters at the current offset.
You can instead keep track of just the index and the offset of the longest palindrome so far, and only slice the string upon return. To account for palindromes of both odd and even lengths, you can either increase the index by 0.5 at a time, or double the length to avoid having to deal with float-to-int conversions:
def solve(s):
length = len(s) * 2
index_longest = offset_longest = 0
for index in range(length):
offset = 0
for offset in range(1 + index % 2, min(index, length - index), 2):
if s[(index - offset) // 2] != s[(index + offset) // 2]:
offset -= 2
break
if offset > offset_longest:
index_longest = index
offset_longest = offset
return s[(index_longest - offset_longest) // 2: (index_longest + offset_longest) // 2 + 1]
Solved by using the approach "Expand Around Center", thanks #Maruthi Adithya
This modification of your code should improve performance. You can stop your code when the max possible substring is smaller than your already computed answer. Also, you should start your second loop with j+ans+1 instead of j+1 to avoid useless iterations :
def solve(s):
l = len(s)
ans = ""
for i in range(l):
if (l-i+1 <= len(ans)):
break
subStr = s[i:len(ans)]
for j in range(i + len(ans) + 1, l+1):
if (isPalindrome(subStr)):
ans = subStr
subStr += s[j]
return ans if len(ans) > 1 else s[0]
This is a solution that has a time complexity greater than the solutions provided.
Note: This post is to think about the problem better and does not specifically answer the question. I have taken a mathematical approach to find a time complexity greater than 2^L (where L is size of input string)
Note: This is a post to discuss potential algorithms. You will not find the answer here. And the logic shown here has not been proven extensively.
Do let me know if there is something that I haven't considered.
Approach: Create set of possible substrings. Compare and find the maximum pair* from this set that has the highest possible pallindrome.
Example case with input string: "abc".
In this example, substring set has: "a","b","c","ab","ac","bc","abc".
7 elements.
Comparing each element with all other elements will involve: 7^2 = 49 calculations.
Hence, input size is 3 & no of calculations is 49.
Time Complexity:
First compute time complexity for generating the substring set:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )" />
(The math equation is shown in the code snippet)
Here, we are adding all the different substring size combination from the input size L.
To make it clear: In the above example input size is 3. So we find all the pairs with size =1 (i.e: "a","b","c"). Then size =2 (i.e: "ab","ac","bc") and finally size = 3 (i.e: "abc").
So choosing 1 character from input string = combination of taking L things 1 at a time without repetition.
In our case number of combinations = 3.
This can be mathematically shown as (where a = 1):
<img src="https://latex.codecogs.com/gif.latex?C_{a}^{L}" title="C_{a}^{L}" />
Similarly choosing 2 char from input string = 3
Choosing 3 char from input string = 1
Finding time complexity of palindrome pair from generated set with maximum length:
Size of generated set: N
For this we have to compare each string in set with all other strings in set.
So N*N, or 2 for loops. Hence the final time complexity is:
<img src="https://latex.codecogs.com/gif.latex?\sum_{a=1}^{L}\left&space;(&space;C_{a}^{L}&space;\right&space;)^{2}" title="\sum_{a=1}^{L}\left ( C_{a}^{L} \right )^{2}" />
This is diverging function greater than 2^L for L > 1.
However, there can be multiple optimizations applied to this. For example: there is no need to compare "a" with "abc" as "a" will also be compared with "a". Even if this optimization is applied, it will still have a time complexity > 2^L (For the most cases).
Hope this gave you a new perspective to the problem.
PS: This is my first post.
You should not find the string start from the beginning of that string, but you should start from the middle of it & expand the current string
For example, for the string xyzabccbalmn, your solution will cost ~ 6 * 11 comparison but searching from the middle will cost ~ 11 * 2 + 2 operations
But anyhow, brute-forcing will never ensure that your solution will run fast enough for any arbitrary string.
Try this:
def solve(s):
if len(s)==1:
print(0)
return '1'
if len(s)<=2 and not(isPalindrome(s)):
print (0)
return '1'
elif isPalindrome(s):
print( len(s))
return '1'
elif isPalindrome(s[0:len(s)-1]) or isPalindrome(s[1:len(s)]):
print (len(s)-1)
return '1'
elif len(s)>=2:
solve(s[0:len(s)-1])
return '1'
return 0
I am trying to learn Python and so I ran into a problem: for my courses there are requirments: max time 1 sec and max memory 512Mb. The task is to find smallest palindrome in alphabetical order. minimal long for palindrome is 2.
for example: ghghwwdkjnccjknjn here are: ghg, cc, ww, njn. We need the smallest - cc or ww - in alphabet c is in front of w (like in dictionaries). aba is in front of aca (c>b) and so on
Here is my code:
s = input("")
lst = []
for i in range(0, len(s)):
for j in range(i + 1, len(s) + 1):
p = s[i:j]
if p == p[::-1] and len(p)>=2:
lst.append(p)
lst.sort()
del p
if not lst:
print("-1")
else:
#lst.sort()
print(sorted(lst, key = len)[0])
In this way I get 1.088s 9.89Mb and with lst.sort() moving to the end I get 0.901s 527.30Mb - both bad. How can I do it better? Thank you!
Efficient clean implementation of all improvements mentioned below:
def substrings(string, length):
for i in range(len(string) - length + 1):
yield string[i : i+length]
def palindromes(strings):
for string in strings:
if string == string[::-1]:
yield string
def best_palindrome(string):
for length in 2, 3:
if result := min(palindromes(substrings(string, length)), default=None):
return result
return -1
print(best_palindrome(input()))
Got accepted at Code Forces with 218 ms and 796 KB (using Python 3.7.2).
Perhaps the simplest modification to make it a lot more efficient is to add this as the first thing in the j-loop:
if j - i > 3:
break
That is, don't check lengths above 3. Because any longer palindrome, like abba or abcba, contains a shorter one, like bb or bcb in those cases. Since you want a shortest anyway, any longer ones are always useless.
Also, do sort only at the end, not after every append.
With those two changes, I got it accepted at Code Forces (link from your comment below).
Further possible improvements:
Don't sort, just get the minimum.
Start j at i + 2 instead of at i + 1 and remove the len(p)>=2 check.
For memory reduction, don't collect everything in a list (use a set or produce the candidates in a generator).
First try only all substrings of length 2, and only if that fails, try length 3.
I have to recursively or with list comprehension calculate the lingo score of two given strings. There is one point for ever letter that the two strings share.
I tried doing this, but it only works if s[0] is in t but otherwise it doesn't do what it is supposed to and I cannot see what is actually going wrong here.
def count(e, L):
lc = [1 for x in L if x == e]
return sum(lc)
def lingo(s, t):
if s == '' or t == '':
return 0
elif s == t:
return len(s)
if s[0] in t:
lc = [count(s[x], t) for x in range(len(t))]
return sum(lc)
else:
#remove s[0] and try again
lingo(s[:1], t)
these assertions are with the assignment:
assert lingo('diner', 'proza') == 1
assert lingo('beeft', 'euvel') == 2
assert lingo('gattaca', 'aggtccaggcgc') == 5
assert lingo('gattaca', '') == 0
The most obvious mistake
You are missing a return statement on the last line of your code. Instead of:
else:
#remove s[0] and try again
lingo(s[:1], t)
it should be:
else:
#remove s[0] and try again
return lingo(s[:1], t)
A redundancy in your code
The following piece of your code is unnecessary:
elif s == t:
return len(s)
Although this returns the correct result, it is a special case and doesn't particularly help the general case. In most cases s and t will be different; and the logic to calculate their amount of shared letters should work also when they are equal.
A mistake in the algorithm logic
This line of your code is highly suspicious:
lc = [count(s[x], t) for x in range(len(t))]
First of all, x is in range of the length of t, but is used as an index for s. If t is longer than s, this will immediately raise an IndexError exception. If t is shorter than or same length as s, then it will not raise an exception, but will most likely return the wrong result.
Note this interesting test case that was provided:
assert lingo('beeft', 'euvel') == 2
The letter 'e' appears twice in 'beeft' and twice in 'euvel', and the result is 2. Yet if you calculate count(s[1], t) + count(s[2], t) you will find the value 4. This is because the first 'e' of s is found twice in t, and the second 'e' of s is also found twice in t.
Janecx's answer provides one way to carefully fix this. You need to understand the logic behind min(s.count(s[0]), t.count(s[0])).
Other python solutions
Right now you absolutely want to use recursion and list comprehensions. In case you are interested in other ways to solve your problem, here are different algorithms.
Sorting the strings (sorting is a powerful tool that makes many problems easy)
def lingo(s, t):
s = sorted(s) # this doesn't modify the original string, it makes a local copy
t = sorted(t) # this doesn't modify the original string, it makes a local copy
result = 0
i = 0
j = 0
while (i < len(s) and j < len(t)):
if s[i] == t[j]:
result += 1
i += 1
j += 1
elif s[i] < t[j]:
i += 1
else:
j += 1
return result
Complexity analysis: sorting takes N log N + M log M operations, where N=len(s) and M=len(t). The whole while loop only takes N + M operations; it is that fast because s and t are sorted in the same order, so we reach an element of s as the same time as the corresponding element in t, so we don't need to compare every element of s against every element of t.
collections.Counter (a python object specifically designed for counting occurrences)
import collections
def lingo(s, t):
return sum((collections.Counter(s) & collections.Counter(t)).values())
Complexity analysis: this takes N + M operations, where N=len(s) and M=len(t). Counter simply counts the number of occurrences of each letter in s by going through s once, and the number of occurrences of each letter in t by going through t once; then the & operation keeps the minimum of the two counts for each letter (reminiscent of Janecx's min(...) operation); then all the counts are summed up. Summing up only takes as many operations as there are distinct letters, which in the case of a DNA sequence is 4; in the case of an alphabetical word is 26; and in general in a ASCII/Latin-1 string is at most 256.
Recursive approach from Janecx's answer Complexity analysis: takes N * M operations, where N=len(s) and M=len(t). This is much slower than the other two approaches, because for every element of s we need to go through every element of t; written iteratively, this would be a for loop nested inside a second for loop.
There you go. What this code does? If one of the string is empty, return 0. In the other cases, it finds the minimal number of occurences of s[0] in s and t, and then we use recursion to calculate the minimal number of occurences of s[1] in the version of t without the first character, and so on.
def lingo(s, t):
if s == '' or t == '':
return 0
return min(s.count(s[0]), t.count(s[0])) + lingo(s[1:], t.replace(s[0], ''))
assert lingo('diner', 'proza') == 1
assert lingo('beeft', 'euvel') == 2
assert lingo('gattaca', 'aggtccaggcgc') == 5
assert lingo('gattaca', '') == 0
There is a problem which is "Last Digit of the Sum of Fibonacci Numbers". I have already optimised the naive approach but this code is not working for higher values. Example: 613455
Everytime I run this program, the os crashes (memory limit exceeds).
My code is:
def SumFib(n):
result = []
for i in range(0, n+1):
if i <= 1:
result.append(i)
else:
result.append(result[i-1] + result[i-2])
return (sum(result) % 10)
print(SumFib(int(input())))
Need some help to get over this problem.
In order to consume less memory, store less values. For example:
def sum_fib(n):
if n == 0:
return 0
elif n == 1:
return 1
elif n == 2:
return 2
res_sum = 2
a, b = 1, 1
for _ in range(n-2):
a, b = b, a+b
res_sum += b
return res_sum % 10
The memory requirements (at a glance) are constant. Your original snippet has lineal memory requirements (when n grows, the memory requirements grow).
(More intelligent things can be done regarding the modulo operation if n is really huge, but that is off topic to the original question)
Edit:
I was curious and ended up doing some homework.
Lemma 1: The last digit of the fibonacci follows a 60-length cycle.
Demonstration: One can check that, starting by (0,1), element 60 has last digit 0 and element 61 has last digit 1 (this can be checked empirically). From this follows that fib(n) equals fib(n % 60).
Lemma 2: The last digit of the sum of the first 60 digits is 0.
Demonstration: Can be checked empirically too.
Conclusion: sum_fib(n) == sum_fib(n % 60). So the algorithm to solve "last digit of the sum of the n-th first fibonacci numbers can be solved in O(1) by taking n % 60 and evaluating sum_fib of that number, i.e., sum_fib(n % 60). Or one can precreate the digit list of 60 elements and do a simple-lookup.
Instead of using range you can try a custom function that works as generator.
Generators calculate values on the fly instead of loading all at once like iterators.
Replace the below function with range
def generate_value(start, end, step):
count =start
while True:
if count <=end:
yield count
count+=step
break
I'm doing some excercises in Python course and one of them where I'm stuck is below:
Given a digit sequence that represents a message where each uppercase letter
is replaced with a number (A - 1, B - 2, ... , Z - 26) and space - 0.
Find the number of the initial messages, from which that sequence
could be obtained.
Example: 12345 - 3 (ABCDE, LCDE, AWDE)
11 - 2 (AA, K)
The naive solution is easy and it is simple bruteforce algorithm:
import string
def count_init_messages(sequence):
def get_alpha(seq):
nonlocal count
if len(seq) == 0:
count += 1
return
for i in range(1, len(seq) + 1):
if seq[:i] not in alph_table:
break
else:
get_alpha(seq[i:])
alphabet = " " + string.ascii_uppercase
# generate dictionary of possible digit combination
alph_table = {str(n): alph for n, alph in zip(range(len(alphabet)), alphabet)}
# counter for the right combination met
count = 0
get_alpha(sequence)
return count
def main():
sequence = input().rstrip()
print(count_init_messages2(sequence))
if __name__ == "__main__":
main()
But as the length of an input sequence might be as long as 100 characters and there might be lots of repetition I have met a time limits. For example, one of the sample input is 2222222222222222222222222222222222222222222222222222222222222222222222 (possible messages number is 308061521170129). As my implementation makes too many repetition it takes ages for processing such an input. I think of using the backtracking algorithm, but I haven't realised yet how to implement the memoization for the succesive results.
I'd be glad if it is possible to point me out to the right way how to break that task.
The recurrence relation you have to solve (where s is a string of digits, and a and b are single digits) is this:
S("") = 1
S(a) = 1
S(s + a + b) = S(s+a) + (S(s) if ab is between 10 and 26)
That can be computed using dynamic programming rather than backtracking. If you do it right, it's O(n) time complexity, and O(1) space complexity.
def seq(s):
a1, a2 = 1, 1
for i in xrange(1, len(s)):
a1, a2 = a1 + (a2 if 9 < int(s[i-1:i+1]) < 27 else 0), a1
return a1
print seq('2222222222222222222222222222222222222222222222222222222222222222222222')
The largest number in the lookup table is 26 so you never need to lookup strings of lengths greater than 2. Modify the for loop accordingly. That might be enough to make brute force viable.
You may have also recognized 308061521170129 as the 71st Fibonacci number. This relationship corresponds with the Fibonacci numbers giving "the solution to certain enumerative problems. The most common such problem is that of counting the number of compositions of 1s and 2s that sum to a given total n: there are Fn+1 ways to do this" (https://en.wikipedia.org/wiki/Fibonacci_number#Use_in_mathematics).
Every contiguous subsequence in the string that can be divided into either single or double digit codes represents such an n with multiple possible compositions of 1s and 2s; and thus, for every such subsequence within the string, the result must be multiplied by the (subsequence's length + 1) Fibonacci number (in the case of the 70 2's, we just multiply 1 by the 71st Fibonacci number).