improving Boyer-Moore string search - python

I've been playing around with the Boyer-Moore sting search algorithm and starting with a base code set from Shriphani Palakodety I created 2 additional versions (v2 and v3) - each making some modifications such as removing len() function from the loop and than refactoring the while/if conditions. From v1 to v2 I see about a 10%-15% improvement and from v1 to v3 a 25%-30% improvement (significant).
My question is: does anyone have any additional mods that would improve performance even more (if you can submit as a v4) - keeping the base 'algorithm' true to Boyer-Moore.
#!/usr/bin/env python
import time
bcs = {} #the table
def goodSuffixShift(key):
for i in range(len(key)-1, -1, -1):
if key[i] not in bcs.keys():
bcs[key[i]] = len(key)-i-1
#---------------------- v1 ----------------------
def searchv1(text, key):
"""base from Shriphani Palakodety fixed for single char"""
i = len(key)-1
index = len(key) -1
j = i
while True:
if i < 0:
return j + 1
elif j > len(text):
return "not found"
elif text[j] != key[i] and text[j] not in bcs.keys():
j += len(key)
i = index
elif text[j] != key[i] and text[j] in bcs.keys():
j += bcs[text[j]]
i = index
else:
j -= 1
i -= 1
#---------------------- v2 ----------------------
def searchv2(text, key):
"""removed string len functions from loop"""
len_text = len(text)
len_key = len(key)
i = len_key-1
index = len_key -1
j = i
while True:
if i < 0:
return j + 1
elif j > len_text:
return "not found"
elif text[j] != key[i] and text[j] not in bcs.keys():
j += len_key
i = index
elif text[j] != key[i] and text[j] in bcs.keys():
j += bcs[text[j]]
i = index
else:
j -= 1
i -= 1
#---------------------- v3 ----------------------
def searchv3(text, key):
"""from v2 plus modified 3rd if condition
breaking down the comparison for efficiency,
modified the while loop to include the first
if condition (opposite of it)
"""
len_text = len(text)
len_key = len(key)
i = len_key-1
index = len_key -1
j = i
while i >= 0 and j <= len_text:
if text[j] != key[i]:
if text[j] not in bcs.keys():
j += len_key
i = index
else:
j += bcs[text[j]]
i = index
else:
j -= 1
i -= 1
if j > len_text:
return "not found"
else:
return j + 1
key_list = ["POWER", "HOUSE", "COMP", "SCIENCE", "SHRIPHANI", "BRUAH", "A", "H"]
text = "SHRIPHANI IS A COMPUTER SCIENCE POWERHOUSE"
t1 = time.clock()
for key in key_list:
goodSuffixShift(key)
#print searchv1(text, key)
searchv1(text, key)
bcs = {}
t2 = time.clock()
print('v1 took %0.5f ms' % ((t2-t1)*1000.0))
t1 = time.clock()
for key in key_list:
goodSuffixShift(key)
#print searchv2(text, key)
searchv2(text, key)
bcs = {}
t2 = time.clock()
print('v2 took %0.5f ms' % ((t2-t1)*1000.0))
t1 = time.clock()
for key in key_list:
goodSuffixShift(key)
#print searchv3(text, key)
searchv3(text, key)
bcs = {}
t2 = time.clock()
print('v3 took %0.5f ms' % ((t2-t1)*1000.0))

Using "in bcs.keys()" is creating a list and then doing an O(N) search of the list -- just use "in bcs".
Do the goodSuffixShift(key) thing inside the search function. Two benefits: the caller has only one API to use, and you avoid having bcs as a global (horrid ** 2).
Your indentation is incorrect in several places.
Update
This is not the Boyer-Moore algorithm (which uses TWO lookup tables). It looks more like the Boyer-Moore-Horspool algorithm, which uses only the first BM table.
A probable speedup: add the line 'bcsget = bcs.get' after setting up the bcs dict. Then replace:
if text[j] != key[i]:
if text[j] not in bcs.keys():
j += len_key
i = index
else:
j += bcs[text[j]]
i = index
with:
if text[j] != key[i]:
j += bcsget(text[j], len_key)
i = index
Update 2 -- back to basics, like getting the code correct before you optimise
Version 1 has some bugs which you have carried forward into versions 2 and 3. Some suggestions:
Change the not-found response from "not found" to -1. This makes it compatible with text.find(key), which you can use to check your results.
Get some more text values e.g. "R" * 20, "X" * 20, and "XXXSCIENCEYYY" for use with your existing key values.
Lash up a test harness, something like this:
func_list = [searchv1, searchv2, searchv3]
def test():
for text in text_list:
print '==== text is', repr(text)
for func in func_list:
for key in key_list:
try:
result = func(text, key)
except Exception, e:
print "EXCEPTION: %r expected:%d func:%s key:%r" % (e, expected, func.__name__, key)
continue
expected = text.find(key)
if result != expected:
print "ERROR actual:%d expected:%d func:%s key:%r" % (result, expected, func.__name__, key)
Run that, fix the errors in v1, carry those fixes forward, run the tests again until they're all OK. Then you can tidy up your timing harness along the same lines, and see what the performance is. Then you can report back here, and I'll give you my idea of what a searchv4 function should look like ;-)

Related

How to generate alternating substrings using recursion

I have a practice question that requires me to generate x number of alternating substrings, namely "#-" & "#--" using both recursion as well as iteration. Eg.string_iteration(3) generates "#-#--#-".
I have successfully implemented the solution for the iterative method,
but I'm having trouble getting started on the recursive method. How can I proceed?
Iterative method
def string_iteration(x):
odd_block = '#-'
even_block = '#--'
current_block = ''
if x == 0:
return ''
else:
for i in range(1,x+1):
if i % 2 != 0:
current_block += odd_block
elif i % 2 == 0:
current_block += even_block
i += 1
return current_block
For recursion, you almost always just need a base case and everything else. Here, your base case it pretty simple — when x < 1, you can return an empty string:
if x < 1:
return ''
After than you just need to return the block + the result of string_iteration(x-1). After than it's just a matter of deciding which block to choose. For example:
def string_iteration(x):
# base case
if x < 1:
return ''
blocks = ('#--', '#-')
# recursion
return string_iteration(x-1) + blocks[x % 2]
string_iteration(5)
# '#-#--#-#--#-'
This boils down to
string_iteration(1) + string_iteration(2) ... string_iteration(x)
The other answer doesn't give the same result as your iterative method. If you always want it to start with the odd block, you should add the block on the right of the recursive call instead of the left:
def string_recursion(x):
odd_block = '#-'
even_block = '#--'
if x == 0:
return ''
if x % 2 != 0:
return string_recursion(x - 1) + odd_block
elif x % 2 == 0:
return string_recursion(x - 1) + even_block
For recursive solution, you need a base case and calling the function again with some other value so that at the end you will have the desired output. Here, we can break this problem recursively like - string_recursive(x) = string_recursive(x-1) + string_recursive(x-2) + ... + string_recursive(1).
def string_recursion(x, parity):
final_str = ''
if x == 0:
return ''
if parity == -1: # when parity -1 we will add odd block
final_str += odd_block
elif parity == 1:
final_str += even_block
parity *= -1 # flip the parity every time
final_str += string_recursion(x-1, parity)
return final_str
odd_block = '#-'
even_block = '#--'
print(string_recursion(3, -1)) # for x=1 case we have odd parity, hence -1
# Output: #-#--#-

how to decrease time complexity in python?

Given a paragraph of space-separated lowercase English words and a list of unique lowercase English keywords, find the minimum length of the substring of which contains all the keywords that are separated by space in any order.
i put the following code where is the error ? How can i decrease time complexity.
import sys
def minimumLength(text, keys):
answer = 10000000
text += " $"
for i in xrange(len(text) - 1):
dup = list(keys)
word = ""
if i > 0 and text[i - 1] != ' ':
continue
for j in xrange(i, len(text)):
if text[j] == ' ':
for k in xrange(len(dup)):
if dup[k] == word:
del(dup[k])
break
word = ""
else:
word += text[j]
if not dup:
answer = min(answer, j - i)
break
if(answer == 10000000):
answer = -1
return answer
text = raw_input()
keyWords = int(raw_input())
keys = []
for i in xrange(keyWords):
keys.append(raw_input())
print(minimumLength(text, keys))
The trick is to scan from left to right and, once you find a window containing all the keys, try to reduce it on the left and enlarge it on the right preserving the property that all the terms remain inside the window.
Using this strategy you can solve the task in linear time.
The following code is a draft of the code that I tested on few strings, I hope the comments are enough to highlight the most critical steps:
def minimum_length(text, keys):
assert isinstance(text, str) and (isinstance(keys, set) or len(keys) == len(set(keys)))
minimum_length = None
key_to_occ = dict((k, 0) for k in keys)
text_words = [word if word in key_to_occ else None for word in text.split()]
missing_words = len(keys)
left_pos, last_right_pos = 0, 0
# find an interval with all the keys
for right_pos, right_word in enumerate(text_words):
if right_word is None:
continue
key_to_occ[right_word] += 1
occ_word = key_to_occ[right_word]
if occ_word == 1: # the first time we see this word in the current interval
missing_words -= 1
if missing_words == 0: # we saw all the words in this interval
key_to_occ[right_word] -= 1
last_right_pos = right_pos
break
if missing_words > 0:
return None
# reduce the interval on the left and enlarge it on the right preserving the property that all the keys are inside
for right_pos in xrange(last_right_pos, len(text_words)):
right_word = text_words[right_pos]
if right_word is None:
continue
key_to_occ[right_word] += 1
while left_pos < right_pos: # let's try to reduce the interval on the left
left_word = text_words[left_pos]
if left_word is None:
left_pos += 1
continue
if key_to_occ[left_word] == 1: # reduce the interval only if it doesn't decrease the number of occurrences
interval_size = right_pos + 1 - left_pos
if minimum_length is None or interval_size < minimum_length:
minimum_length = interval_size
break
else:
left_pos += 1
key_to_occ[left_word] -= 1
return minimum_length

Encryption code in python, call upon a function, python returns nothing. No error messages show

Okay, so I've written the following series of functions in Python 3.6.0:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
while result > ord("Z"):
result -= 26
while result < ord("A"):
result += 26
return chr(result)
else:
while result > ord("z"):
result -= 26
while result < ord("a"):
result += 26
result = chr(result)
return result
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
key = key * (length // len(key)) + key[0:(length % len(key))]
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
Then I try to call on the functions with:
encrypt("This is a secret message!!", "12345678")
When python runs the program absolutely nothing happens. Nothing gets returned, and in the shell python forces me onto a blank line without indents, or >>>. i don't know what is right or wrong with the code as no error messages appear, and no results appear. Any kind of advice would be appreciated.
Thank you.
Looking at your code, I don't think this is an infinite loop. I think your loop will not be infinite but will run for a very long time since the value of key is very big, and so, subtracting 26 at a time, until it gets to an English letter ascii value, will just take forever (but not really forever)
>>> key = '12345678'
>>> length = len("This is a secret message!!")
>>> key * (length // len(key)) + key[0:(length % len(key))]
'12345678123456781234567812'
It might be a problem in the your logic, maybe in the logic generating the key, but if this is indeed the logic you want, how about using modulus rather than iterating:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
if result > ord("Z"):
result = ord("Z") + result % 26 - 26
if result < ord("A"):
result = ord("A") - result % 26 + 26
return chr(result)
else:
if result > ord("z"):
result = ord("z") + result % 26 - 26
if result < ord("a"):
result = ord("a") - result % 26 + 26
return chr(result)
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
key = key * (length // len(key)) + key[0:(length % len(key))]
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
>>> encrypt("This is a secret message!!", "12345678")
'Rlmwrmwrerwigvixrqiwwekiss'
Should you be having while loop here , or are you intening if loop? I don't see any exit for while loop. That may be where your code is hanging.
if c.isupper():
while result > ord("Z"):
result -= 26
while result < ord("A"):
result += 26
return chr(result)
else:
while result > ord("z"):
result -= 26
while result < ord("a"):
result += 26
Also, if I replace while with if above, it's giving me overflow error.
OverflowError: Python int too large to convert to C long
EDIT
After looking at #polo's comment and taking a second look at code, I believe #polo is correct. I put while loop back and added print statements. I have commented them, but you can uncomment at your end.
I've also reduced key's complexity to just key = key and reduced key from 12345678 to just 1234 to see if the code works and if it completes in reasonable time.. You can make it as complex as you want once code runs smoothly.
Here is result I got after:
>>>
key =1234
coding char = T
coding char = h
coding char = i
coding char = s
coding char =
coding char = i
coding char = s
coding char =
coding char = a
coding char =
coding char = s
coding char = e
coding char = c
coding char = r
coding char = e
coding char = t
coding char =
coding char = m
coding char = e
coding char = s
coding char = s
coding char = a
coding char = g
coding char = e
coding char = !
coding char = !
encrypted_message = Ftuezuezmzeqodqfzyqeemsqaa
Modified code below:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
while result > ord("Z"):
#print("result1 = {}",format(result))
result -= 26
while result < ord("A"):
#print("result2 = {}",format(result))
result += 26
return chr(result)
else:
while result > ord("z"):
#print("result3 = {}",format(result))
result -= 26
while result < ord("a"):
#print("result4 = {}",format(result))
result += 26
result = chr(result)
return result
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
#key = key * (length // len(key)) + key[0:(length % len(key))]
key = key
print "key ={}".format(key)
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
print("coding char = {}".format(c))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
#encrypt("This is a secret message!!", "12345678")
encrypted_message = encrypt("This is a secret message!!", "1234")
print("encrypted_message = {}".format(encrypted_message))

How do I run a binary search for words that start with a certain letter?

I am asked to binary search a list of names and if these names start with a particular letter, for example A, then I am to print that name.
I can complete this task by doing much more simple code such as
for i in list:
if i[0] == "A":
print(i)
but instead I am asked to use a binary search and I'm struggling to understand the process behind it. We are given base code which can output the position a given string. My problem is not knowing what to edit so that I can achieve the desired outcome
name_list = ["Adolphus of Helborne", "Aldric Foxe", "Amanita Maleficant", "Aphra the Vicious", "Arachne the Gruesome", "Astarte Hellebore", "Brutus the Gruesome", "Cain of Avernus"]
def bin_search(list, item):
low_b = 0
up_b = len(list) - 1
found = False
while low_b <= up_b and found == False:
midPos = ((low_b + up_b) // 2)
if list[midPos] < item:
low_b = midPos + 1
elif list[midPos] > item:
up_b = midPos - 1
else:
found = True
if found:
print("The name is at positon " + str(midPos))
return midPos
else:
print("The name was not in the list.")
Desired outcome
bin_search(name_list,"A")
Prints all the names starting with A (Adolphus of HelBorne, Aldric Foxe .... etc)
EDIT:
I was just doing some guess and check and found out how to do it. This is the solution code
def bin_search(list, item):
low_b = 0
up_b = len(list) - 1
true_list = []
count = 100
while low_b <= up_b and count > 0:
midPos = ((low_b + up_b) // 2)
if list[midPos][0] == item:
true_list.append(list[midPos])
list.remove(list[midPos])
count -= 1
elif list[midPos] < item:
low_b = midPos + 1
count -= 1
else:
up_b = midPos - 1
count -= 1
print(true_list)
Not too sure if this is what you want as it seems inefficient... as you mention it seems a lot more intuitive to just iterate over the entire list but using binary search i found here i have:
def binary_search(seq, t):
min = 0
max = len(seq) - 1
while True:
if max < min:
return -1
m = (min + max) // 2
if seq[m][0] < t:
min = m + 1
elif seq[m][0] > t:
max = m - 1
else:
return m
index=0
while True:
index=binary_search(name_list,"A")
if index!=-1:
print(name_list[index])
else:
break
del name_list[index]
Output i get:
Aphra the Vicious
Arachne the Gruesome
Amanita Maleficant
Astarte Hellebore
Aldric Foxe
Adolphus of Helborne
You just need to found one item starting with the letter, then you need to identify the range. This approach should be fast and memory efficient.
def binary_search(list,item):
low_b = 0
up_b = len(list) - 1
found = False
midPos = ((low_b + up_b) // 2)
if list[low_b][0]==item:
midPos=low_b
found=True
elif list[up_b][0]==item:
midPos = up_b
found=True
while True:
if found:
break;
if list[low_b][0]>item:
break
if list[up_b][0]<item:
break
if up_b<low_b:
break;
midPos = ((low_b + up_b) // 2)
if list[midPos][0] < item:
low_b = midPos + 1
elif list[midPos] > item:
up_b = midPos - 1
else:
found = True
break
if found:
while True:
if midPos>0:
if list[midPos][0]==item:
midPos=midPos-1
continue
break;
while True:
if midPos<len(list):
if list[midPos][0]==item:
print list[midPos]
midPos=midPos+1
continue
break
else:
print("The name was not in the list.")
the output is
>>> binary_search(name_list,"A")
Adolphus of Helborne
Aldric Foxe
Amanita Maleficant
Aphra the Vicious
Arachne the Gruesome
Astarte Hellebore

Implementing Knuth-Morris-Pratt (KMP) algorithm for string matching with Python

I am following Cormen Leiserson Rivest Stein (clrs) book and came across "kmp algorithm" for string matching. I implemented it using Python (as-is).
However, it doesn't seem to work for some reason. where is my fault?
The code is given below:
def kmp_matcher(t,p):
n=len(t)
m=len(p)
# pi=[0]*n;
pi = compute_prefix_function(p)
q=-1
for i in range(n):
while(q>0 and p[q]!=t[i]):
q=pi[q]
if(p[q]==t[i]):
q=q+1
if(q==m):
print "pattern occurs with shift "+str(i-m)
q=pi[q]
def compute_prefix_function(p):
m=len(p)
pi =range(m)
pi[1]=0
k=0
for q in range(2,m):
while(k>0 and p[k]!=p[q]):
k=pi[k]
if(p[k]==p[q]):
k=k+1
pi[q]=k
return pi
t = 'brownfoxlazydog'
p = 'lazy'
kmp_matcher(t,p)
This is a class I wrote based on CLRs KMP algorithm, which contains what you are after. Note that only DNA "characters" are accepted here.
class KmpMatcher(object):
def __init__(self, pattern, string, stringName):
self.motif = pattern.upper()
self.seq = string.upper()
self.header = stringName
self.prefix = []
self.validBases = ['A', 'T', 'G', 'C', 'N']
#Matches the motif pattern against itself.
def computePrefix(self):
#Initialize prefix array
self.fillPrefixList()
k = 0
for pos in range(1, len(self.motif)):
#Check valid nt
if(self.motif[pos] not in self.validBases):
self.invalidMotif()
#Unique base in motif
while(k > 0 and self.motif[k] != self.motif[pos]):
k = self.prefix[k]
#repeat in motif
if(self.motif[k] == self.motif[pos]):
k += 1
self.prefix[pos] = k
#Initialize the prefix list and set first element to 0
def fillPrefixList(self):
self.prefix = [None] * len(self.motif)
self.prefix[0] = 0
#An implementation of the Knuth-Morris-Pratt algorithm for linear time string matching
def kmpSearch(self):
#Compute prefix array
self.computePrefix()
#Number of characters matched
match = 0
found = False
for pos in range(0, len(self.seq)):
#Check valid nt
if(self.seq[pos] not in self.validBases):
self.invalidSequence()
#Next character is not a match
while(match > 0 and self.motif[match] != self.seq[pos]):
match = self.prefix[match-1]
#A character match has been found
if(self.motif[match] == self.seq[pos]):
match += 1
#Motif found
if(match == len(self.motif)):
print(self.header)
print("Match found at position: " + str(pos-match+2) + ':' + str(pos+1))
found = True
match = self.prefix[match-1]
if(found == False):
print("Sorry '" + self.motif + "'" + " was not found in " + str(self.header))
#An invalid character in the motif message to the user
def invalidMotif(self):
print("Error: motif contains invalid DNA nucleotides")
exit()
#An invalid character in the sequence message to the user
def invalidSequence(self):
print("Error: " + str(self.header) + "sequence contains invalid DNA nucleotides")
exit()
You might want to try out my code:
def recursive_find_match(i, j, pattern, pattern_track):
if pattern[i] == pattern[j]:
pattern_track.append(i+1)
return {"append":pattern_track, "i": i+1, "j": j+1}
elif pattern[i] != pattern[j] and i == 0:
pattern_track.append(i)
return {"append":pattern_track, "i": i, "j": j+1}
else:
i = pattern_track[i-1]
return recursive_find_match(i, j, pattern, pattern_track)
def kmp(str_, pattern):
len_str = len(str_)
len_pattern = len(pattern)
pattern_track = []
if len_pattern == 0:
return
elif len_pattern == 1:
pattern_track = [0]
else:
pattern_track = [0]
i = 0
j = 1
while j < len_pattern:
data = recursive_find_match(i, j, pattern, pattern_track)
i = data["i"]
j = data["j"]
pattern_track = data["append"]
index_str = 0
index_pattern = 0
match_from = -1
while index_str < len_str:
if index_pattern == len_pattern:
break
if str_[index_str] == pattern[index_pattern]:
if index_pattern == 0:
match_from = index_str
index_pattern += 1
index_str += 1
else:
if index_pattern == 0:
index_str += 1
else:
index_pattern = pattern_track[index_pattern-1]
match_from = index_str - index_pattern
Try this:
def kmp_matcher(t, d):
n=len(t)
m=len(d)
pi = compute_prefix_function(d)
q = 0
i = 0
while i < n:
if d[q]==t[i]:
q=q+1
i = i + 1
else:
if q != 0:
q = pi[q-1]
else:
i = i + 1
if q == m:
print "pattern occurs with shift "+str(i-q)
q = pi[q-1]
def compute_prefix_function(p):
m=len(p)
pi =range(m)
k=1
l = 0
while k < m:
if p[k] <= p[l]:
l = l + 1
pi[k] = l
k = k + 1
else:
if l != 0:
l = pi[l-1]
else:
pi[k] = 0
k = k + 1
return pi
t = 'brownfoxlazydog'
p = 'lazy'
kmp_matcher(t, p)
KMP stands for Knuth-Morris-Pratt it is a linear time string-matching algorithm.
Note that in python, the string is ZERO BASED, (while in the book the string starts with index 1).
So we can workaround this by inserting an empty space at the beginning of both strings.
This causes four facts:
The len of both text and pattern is augmented by 1, so in the loop range, we do NOT have to insert the +1 to the right interval. (note that in python the last step is excluded);
To avoid accesses out of range, you have to check the values of k+1 and q+1 BEFORE to give them as index to arrays;
Since the length of m is augmented by 1, in kmp_matcher, before to print the response, you have to check this instead: q==m-1;
For the same reason, to calculate the correct shift you have to compute this instead: i-(m-1)
so the correct code, based on your original question, and considering the starting code from Cormen, as you have requested, would be the following:
(note : I have inserted a matching pattern inside, and some debug text that helped me to find logical errors):
def compute_prefix_function(P):
m = len(P)
pi = [None] * m
pi[1] = 0
k = 0
for q in range(2, m):
print ("q=", q, "\n")
print ("k=", k, "\n")
if ((k+1) < m):
while (k > 0 and P[k+1] != P[q]):
print ("entered while: \n")
print ("k: ", k, "\tP[k+1]: ", P[k+1], "\tq: ", q, "\tP[q]: ", P[q])
k = pi[k]
if P[k+1] == P[q]:
k = k+1
print ("Entered if: \n")
print ("k: ", k, "\tP[k]: ", P[k], "\tq: ", q, "\tP[q]: ", P[q])
pi[q] = k
print ("Outside while or if: \n")
print ("pi[", q, "] = ", k, "\n")
print ("---next---")
print ("---end for---")
return pi
def kmp_matcher(T, P):
n = len(T)
m = len(P)
pi = compute_prefix_function(P)
q = 0
for i in range(1, n):
print ("i=", i, "\n")
print ("q=", q, "\n")
print ("m=", m, "\n")
if ((q+1) < m):
while (q > 0 and P[q+1] != T[i]):
q = pi[q]
if P[q+1] == T[i]:
q = q+1
if q == m-1:
print ("Pattern occurs with shift", i-(m-1))
q = pi[q]
print("---next---")
print("---end for---")
txt = " bacbababaabcbab"
ptn = " ababaab"
kmp_matcher(txt, ptn)
(so this would be the correct accepted answer...)
hope that it helps.

Categories

Resources