Naive implementation of Karp-Rabin pattern matching algorithm - python

I'm having problem implementing the naive version of Karp-Rabin pattern marcher; I'm not getting the expected result. Here's my example;
string='today is a good day'
sub='good'
I would like to find the pattern good in the string above.
def kapr(n,m):
for i in range(len(n)-len(m)+1):
for j in range(len(m)):
if n[i+j-1]!=m[j]:
continue
return i
return not found
Print (kapr(string, sub))
Output=0
Expected output=11, should correspond with the offset of good in the string.
Thanks for your help.

You want break instead of continue. Continue will move on to the next iteration of the inner loop, while break will exit the inner loop. Furthermore, you aren't jumping directly to the next iteration of the outer loop by using break, so you will hit the return i statement. To stop this happening, you can use a for/else branch.
E.g.
for j in range(len(m)):
if n[i+j-1]!=m[j]:
break
else:
return i
It will only return i if the inner loop completes normally.
The index it returns is also not zero indexed, so with the above modifications it will return 12. Should be simple to update if you want it to be zero-indexed!

Related

How is my almostIncreasingSequence(sequence) code incorrect? [CodeSignal]

I've seen some posts about this same question, and I think my logic is pretty much the same as their answers. But I cannot find where exactly I'm wrong here.
My code first checks the length of the provided sequence, if it is 2 or less it automatically returns True.
Next, it removes(pops) the first element and check if the rest are in ascending order.
If the sequence isn't in order, it replaces it with the original sequence and repeats the second step, but this time it removes the next element (pop(i)).
This continues until there are no more elements to remove, which ultimately returns as False
If in any of the iterations, the list is found to be in ascending order, the function returns True.
This is the code:
def almostIncreasingSequence(sequence):
original = sequence.copy()
if len(sequence) <= 2: return True
for i in range(len(sequence)):
sequence.pop(i)
# print(sequence)
for j in range(len(sequence)-1):
if sequence[j+1] <= sequence[j]:
sequence = original.copy()
elif j+1 == len(sequence)-1:
return True
if i == len(sequence)-1:
return False
And this is my result :'(
I think my logic may not be correctly implemented in the code. But I don't know how to test it. It'd be helpful if you can give me a sequence where this function will give a wrong answer.
Solve almostIncreasingSequence (Codefights)
This is one of the posts I was referring to at the very beginning. It also explains the almostIncreasingSequence(sequence) question and the answer explains the logic behind the code.
You don't have to try every element. Just find the violation of the ascension, and try to resolve it by removing one of the violators. Then check the rest of the list.
More formally, suppose that the sequence[:i] is in the ascending order, but sequence[i] < sequence[i+1]. You cannot keep them both; one must be gone. Which one, depends on sequence[i-1].
If sequence[i+1] < sequence[i-1], removal of sequence[i] wouldn't help: a violation will remain. Therefore, remove sequence[i+1]. Otherwise, remove sequence[i] (do you see why?). Finally, check that the rest of sequence is ascending.

Trouble understanding break vs else statements with nested loops

I'm practicing python and one of the coding tasks assigned was to create a function that looks through a list and ignores numbers that occur between a 6 and a 9 and returns the sum of all other values.
Edit: this does not mean to add numbers whose values are less than 6 or greater than 9. It means to add all numbers of any value, but to ignore any numbers that come after a 6, until a 9 is seen. Symbolically if i means include and x means exclude, the code should return all the values marked as i:
[i,i...6, x,x,...,9,i,i...,6,x,x,...]
In other words, 6 turns off adding and if adding is off, 9 turns it back on.
Note that a 9 with no preceding 6 is just a number and will be added.
For example if I have a list:
[4,5,6,7,8,9,9]
the output should be:
8 <---(4+5+9)
The solution is provided but I'm having trouble understanding the code. I don't understand the purpose of the break statements in the code. The solution provided is as follows:
def summer_69(*arr):
total = 0
add = True
for num in arr:
while add == True:
if num!=6:
total = total + num
break
else:
add = False
while add == False:
if num !=9:
break
else:
add = True
break
return total
I'm really confused how the break statements help with the code. Particularly, I'm confused why the first 'break' is needed when there is already an 'else'.
The second break confuses me as well.
I understand that 'break' statements stop a loop and go onto the next loop.
My interpretation of the code so is 'if the number does not equal to 6 then total = total + num, if it does equal 6 then the loop is broken and if it is broken, add changes to False'.
I'm not sure if that interpretation is correct or not.
I was wondering how seasoned Python coders interpret 'breaks' vs 'else'.
break will exit whatever loop the statement is in. It's useful for many things, but often it's used to "short-circuit" the loop. If we know that the rest of the loop is irrelevant after some condition is met, then it's just wasted resources to keep looping through.
The break statement allow you to leave the while loop, but the if else statement allow you to stay in loop, until the condition of the while loop change or a break statement is in the action into the while loop
The solution you've provided is extremely convoluted and hard to understand.
A much better solution is:
total = 0
for num in arr_2:
if(num >= 6 and num <=9):
continue
total += num
Or a more pythonic way:
filtered_arr = filter(lambda x: x <6 or x > 9, arr_2)
total = reduce(lambda x, y: x + y, arr)
Anyways, in your solution, the first break is absolutely redundant. The reason why there is a break there, is because when you've found a number that doesn't equal 6, you add it, and you get out of the while loop.
In other words, the solution should have used an if statement, instead of the while statement. The break is there to basically have the while loop execute once.
Because, if a number does equal 6, then add will be false, and the while loop will terminate. If a number does not equal 6, you get out of the while loop. So the while loop is pointless, and meant to be an if statement instead.
This is a tricky way to handle program flow with a toggle nested in conditional loops.
It's a little hard to follow, but it is a well-known classic pattern.
Initially ADD == True, so if we start with a number that is not 6 (as in your example), the algorithm adds the number & breaks out of the first while loop. When it breaks, the next statement executed will be the line while add == False
At this point ADD == TRUE so the second while loop will not be entered. The next statement executed will be for num in arr (the outermost loop).
The outer FOR loop will go again and this process will repeat.
When you encounter a 6, the number will not be added and the break will not occur. The program will execute the else clause, setting ADD = FALSE.
After the else clause, execution continues with statement while add == false. Since ADD == FALSE at this point, the second while loop will be entered.
From now on ADD will be FALSE so the first While loop will not be entered and numbers will not be added. Instead, the condition for the second while loop will be evaluated for each number. As long as numbers are not equal to 9, the second while loop will not be entered.
When you encounter a 9, you will enter the second while loop, switch ADD back to TRUE, and break out of the while loop.
The first 9 comes after a 6 (ADD is FALSE) so it just toggles ADD from FALSE to TRUE and the number 9 doesn't get added.
When the NEXT 9 is encountered, ADD is TRUE and the number is not 6, so the first while loop will be entered and the number 9 will get added.
This is a classic pattern that used to be used in assembly language code perhaps 40 years ago. As written, the IF statements toggle a state variable. The state variable is turned on when the start condition is met, and turned off when a stop condition is met. The while loops ensure that the toggle can only be turned ON when it was OFF and vice versa, and provide places to put in different handling when the state is ON vs when it is OFF. This pattern brings certain efficiencies that are completely irrelevant in modern high-level languages.
There are better ways to do this in all modern languages, but as an exercise in following tricky program flow it's quite good :)

Function result varies on each run

I have the following function that generates the longest palindrome of a string by removing and re-ordering the characters:
from collections import Counter
def find_longest_palindrome(s):
count = Counter(s)
chars = list(set(s))
beg, mid, end = '', '', ''
for i in range(len(chars)):
if count[chars[i]] % 2 != 0:
mid = chars[i]
count[chars[i - 1]] -= 1
else:
for j in range(0, int(count[chars[i]] / 2)):
beg += chars[i]
end = beg
end = ''.join(list(reversed(end)))
return beg + mid + end
out = find_longest_palindrome('aacggg')
print(out)
I got this function by 'translating' this example from C++
When ever I run my function, I get one of the following outputs at random it seems:
a
aca
agcga
The correct one in this case is 'agcga' as this is the longest palindrome for the input string 'aacggg'.
Could anyone suggest why this is occurring and how I could get the function to reliably return the longest palindrome?
P.S. The C++ code does not have this issue.
Your code depends on the order of list(set(s)).
But sets are unordered.
In CPython 3.4-3.7, the specific order you happen to get for sets of strings depends on the hash values for strings, which are explicitly randomized at startup, so it makes sense that you’d get different results on each run.
The reason you don’t see this in C++ is that the C++ set class template is not an unordered set, but a sorted set (based on a binary search tree, instead of a hash table), so you always get the same order in every run.
You could get the same behavior in Python by calling sorted on the set instead of just copying it to a list in whatever order it has.
But the code still isn’t correct; it just happens to work for some examples because the sorted order happens to give you the characters in most-repeated order. But that’s obviously not true in general, so you need to rethink your logic.
The most obvious difference introduced in your translation is this:
count[ch--]--;
… or, since you're looping over the characters by index instead of directly, more like:
count[chars[i--]]--;
Either way, this decrements the count of the current character, and then decrements the current character so that the loop will re-check the same character the next time through. You've turned this into something completely different:
count[chars[i - 1]] -= 1
This just decrements the count of the previous character.
In a for-each loop, you can't just change the loop variable and have any effect on the looping. To exactly replicate the C++ behavior, you'd either need to switch to a while loop, or put a while True: loop inside the for loop to get the same "repeat the same character" effect.
And, of course, you have to decrement the count of the current character, not decrement the count of the previous character that you're never going to see again.
for i in range(len(chars)):
while True:
if count[chars[i]] % 2 != 0:
mid = chars[i]
count[chars[i]] -= 1
else:
for j in range(0, int(count[chars[i]] / 2)):
beg += chars[i]
break
Of course you could obviously simplify this—starting with just looping for ch in chars:, but if you think about the logic of how the two loops work together, you should be able to see how to remove a whole level of indentation here. But this seems to be the smallest change to your code.
Notice that if you do this change, without the sorted change, the answer is chosen randomly when the correct answer is ambiguous—e.g., your example will give agcga one time, then aggga the next time.
Adding the sorted will make that choice consistent, but no less arbitrary.

How to understand this Python code for Twosum

The leetcode twosum problem:
Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
I read someone's python code as following:
vis = {}
for i,num in enumerate(nums):
diff = target - num
if diff in vis:
return[vis[diff],i]
vis[num] = i
I can understand the majority of logic behind this code. However why the last line have to be in this order? It seems weird to me to make assignment after the return statement.
So I tried to move it to other places, but this will output null. Why does the last line have to be at that place?
The return statement is within the if statement, so it will only execute if diff is in vis, i.e., when you have found a diff you wanted, such that nums[diff] + nums[i] == target is True. If that doesn't happen, only then the final statement will be executed. So, it would add the diff for a later comparison and would move on with the next iteration.

Python IndexError : string index out of range in substring program

I am writing a code for a class that wants me to make a code to check the substring in a string using nested loops.
Basically my teacher wants to prove how the function 'in', as in:
ana in banana will return True.
The goal of the program is to make a function of 2 parameters,
substring(subStr,fullStr)
that will print out a sentence saying if subStr is a substring of fullStr, my program is as follows:
def substring(subStr,fullStr):
tracker=""
for i in (0,(len(fullStr)-1)):
for j in (0,(len(subStr)-1)):
if fullStr[i]==subStr[j]:
tracker=tracker+subStr[j]
i+=1
if i==(len(fullStr)-1):
break
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
When i called the function in the interpreter 'substring("ana","banana")', it printed out a traceback error on line 5 saying string index out of range:
if fullStr[i]==subStr[j]:
I'm banging my head trying to find the error. Any help would be appreciated
There are a few separate issues.
You are not reseting tracker in every iteration of the outer loop. This means that the leftovers from previous iterations contaminate later iterations.
You are not using range, and are instead looping over a tuple of just the 0 and the length of each string.
You are trying to increment the outer counter and skipping checks for the iteration of the outer loop.
You are not doing the bounds check correctly before trying to index into the outer string.
Here is a corrected version.
def substring(subStr,fullStr):
for i in range(0,(len(fullStr))):
tracker=""
for j in range(0,(len(subStr))):
if i + j >= len(fullStr):
break
if fullStr[i+j]==subStr[j]:
tracker=tracker+subStr[j]
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
return
substring("ana", "banana")
First off, your loops should be
for i in xrange(0,(len(fullStr))):
for example. i in (0, len(fullStr)-1) will have i take on the value of 0 the first time around, then take on len(fullStr)-1 the second time. I assume by your algorithm you want it to take on the intermediate values as well.
Now as for the error, consider i on the very last pass of the for loop. i is going to be equal to len(fullStr)-1. Now when we execute i+=1, i is now equal to len(fullStr). This does not fufill the condition of i==len(fullStr)-1, so we do not break, we loop, and we crash. It would be better if you either made it if i>=len(fullStr)-1 or checked for i==len(fullStr)-1 before your if fullStr[i]==subStr[j]: statement.
Lastly, though not related to the question specifically, you do not reset tracker each time you stop checking a certain match. You should place tracker = "" after the for i in xrange(0,(len(fullStr))): line. You also do not check if tracker is correct after looping through the list starting at i, nor do you break from the loop when you get a mismatch(instead continuing and possibly picking up more letters that match, but not consecutively.)
Here is a fully corrected version:
def substring(subStr,fullStr):
for i in xrange(0,(len(fullStr))):
tracker="" #this is going to contain the consecutive matches we find
for j in xrange(0,(len(subStr))):
if i==(len(fullStr)): #end of i; no match.
break
if fullStr[i]==subStr[j]: #okay, looks promising, check the next letter to see if it is a match,
tracker=tracker+subStr[j]
i+=1
else: #found a mismatch, leave inner loop and check what we have so far.
break
if tracker==subStr:
print "Yes",subStr,"is a substring of",fullStr
return #we already know it is a substring, so we don't need to check the rest

Categories

Resources