KMP algorithm pattern calculation issue - python

Here is my code and test case. My question is, it seems the value of KMP pattern will never increase, since in last iteration, we checked pattern[i] != pattern[j], and in current round if (j == -1) or (pattern[j] == pattern[i]) cannot be true unless j == -1?
def findPattern(pattern):
j = -1
next = [-1] * len(pattern)
i = 0 # next[0] is always -1, by KMP definition
while (i+1 < len(pattern)):
if (j == -1) or (pattern[j] == pattern[i]):
i += 1
j += 1
if pattern[i] != pattern[j]:
next[i] = j
else:
next[i] = next[j]
else:
j = next[j]
return next
if __name__ == "__main__":
# print findPattern("aaaab")
print findPattern("abaabc")
thanks in advance,
Lin

You have already increased i and j, so you are actually checking pattern[i+1] != pattern[j+1] after if (j == -1) or (pattern[j] == pattern[i])

Related

Changing values of the array but the code does not go into the if else statement (Python)

The Code does not go in the If Statement no matter what (where I intend to alter the array "dop", I have no idea why. Any Help will be appreciated
#The Array dop
dop = ["1878[1879]","1877","[2016]2015","1874","[2871]2987","3012[2019]"]
i = 0;
#Used Try Catch Block because of the Type Error that occurs when '[' is not found in an array
#Element
while(i < 6):
try:
strObject = dop[i]
print(strObject)
indexOne = strObject.index('[',0)
indexTwo = strObject.index(']',0)
print(indexOne)
print(indexTwo)
if (indexOne != 0 | indexOne != -1):
substringObject = strObject[0:indexOne]
dop[i] = substringObject
i = i+1
print(i)
elif (indexOne == -1):
i = i+1
elif (indexOne == 0):
substringObject = strObject[indexTwo]
dop[i] = substringObject
i = i+1
except ValueError:
print("Goes Here")
i = i+1
print(i)
i = i+1

How to generate alternating substrings using recursion

I have a practice question that requires me to generate x number of alternating substrings, namely "#-" & "#--" using both recursion as well as iteration. Eg.string_iteration(3) generates "#-#--#-".
I have successfully implemented the solution for the iterative method,
but I'm having trouble getting started on the recursive method. How can I proceed?
Iterative method
def string_iteration(x):
odd_block = '#-'
even_block = '#--'
current_block = ''
if x == 0:
return ''
else:
for i in range(1,x+1):
if i % 2 != 0:
current_block += odd_block
elif i % 2 == 0:
current_block += even_block
i += 1
return current_block
For recursion, you almost always just need a base case and everything else. Here, your base case it pretty simple — when x < 1, you can return an empty string:
if x < 1:
return ''
After than you just need to return the block + the result of string_iteration(x-1). After than it's just a matter of deciding which block to choose. For example:
def string_iteration(x):
# base case
if x < 1:
return ''
blocks = ('#--', '#-')
# recursion
return string_iteration(x-1) + blocks[x % 2]
string_iteration(5)
# '#-#--#-#--#-'
This boils down to
string_iteration(1) + string_iteration(2) ... string_iteration(x)
The other answer doesn't give the same result as your iterative method. If you always want it to start with the odd block, you should add the block on the right of the recursive call instead of the left:
def string_recursion(x):
odd_block = '#-'
even_block = '#--'
if x == 0:
return ''
if x % 2 != 0:
return string_recursion(x - 1) + odd_block
elif x % 2 == 0:
return string_recursion(x - 1) + even_block
For recursive solution, you need a base case and calling the function again with some other value so that at the end you will have the desired output. Here, we can break this problem recursively like - string_recursive(x) = string_recursive(x-1) + string_recursive(x-2) + ... + string_recursive(1).
def string_recursion(x, parity):
final_str = ''
if x == 0:
return ''
if parity == -1: # when parity -1 we will add odd block
final_str += odd_block
elif parity == 1:
final_str += even_block
parity *= -1 # flip the parity every time
final_str += string_recursion(x-1, parity)
return final_str
odd_block = '#-'
even_block = '#--'
print(string_recursion(3, -1)) # for x=1 case we have odd parity, hence -1
# Output: #-#--#-

What is the error in this merge sort implementation?

def merge(l1,l2):
(lmerged,i,j) = ([],0,0)
while i+j < len(l1) + len(l2):
if i == len(l1):
lmerged.append(l2[j])
j = j+1
elif j == len(l2):
lmerged.append(l1[i])
i = i+1
elif l1[i] < l2[j]:
lmerged.append(l1[i])
i = i+1
elif l2[j] < l1[i]:
lmerged.append(l2[j])
j = j+1
else:
lmerged.append(l1[i])
i = i+1
j = j+1
return(lmerged)
def mergesort(l):
if len(l) < 2:
return(l)
else:
n = len(l)
leftsorted = mergesort(l[:n//2])
rightsorted = mergesort(l[n//2:])
return(merge(leftsorted,rightsorted))
What is the error in this code sample? On which list will this implementation fail? Is the logic correct or there is some flaw in my logic itself?
fail test: [1, 1] is sorted as [1]
fix: remove j = j + 1 in merge function in the last else block.

Binary Subtraction - Python

I want to make a binary calculator and I have a problem with the subtraction part. Here is my code (I have tried to adapt one for sum that I've found on this website).
maxlen = max(len(s1), len(s2))
s1 = s1.zfill(maxlen)
s2 = s2.zfill(maxlen)
result = ''
carry = 0
i = maxlen - 1
while(i >= 0):
s = int(s1[i]) - int(s2[i])
if s <= 0:
if carry == 0 and s != 0:
carry = 1
result = result + "1"
else:
result = result + "0"
else:
if carry == 1:
result = result + "0"
carry = 0
else:
result = result + "1"
i = i - 1
if carry>0:
result = result + "1"
return result[::-1]
The program works fine with some binaries subtraction but it fails with others.
Can someone please help me because I can't find the mistake? Thanks a lot.
Short answer: Your code is wrong for the case when s1[i] == s2[i] and carry == 1.
Longer answer: You should restructure your code to have three separate cases for s==-1, s==0, and s==1, and then branch on the value of carry within each case:
if s == -1: # 0-1
if carry == 0:
...
else:
...
elif s == 0: # 1-1 or 0-0
if carry == 0:
...
else:
...
else: # 1-0
if carry == 0:
...
else:
...
This way you have a separate block for each possibility, so there is no chance of overlooking a case like you did on your first attempt.
I hope the answer below it helps.
def binarySubstration(str1,str2):
if len(str1) == 0:
return
if len(str2) == 0:
return
str1,str2 = normaliseString(str1,str2)
startIdx = 0
endIdx = len(str1) - 1
carry = [0] * len(str1)
result = ''
while endIdx >= startIdx:
x = int(str1[endIdx])
y = int(str2[endIdx])
sub = (carry[endIdx] + x) - y
if sub == -1:
result += '1'
carry[endIdx-1] = -1
elif sub == 1:
result += '1'
elif sub == 0:
result += '0'
else:
raise Exception('Error')
endIdx -= 1
return result[::-1]
normalising the strings
def normaliseString(str1,str2):
diff = abs((len(str1) - len(str2)))
if diff != 0:
if len(str1) < len(str2):
str1 = ('0' * diff) + str1
else:
str2 = ('0' * diff) + str2
return [str1,str2]

Traceback in Needleman-Wunsch global alignment without storing pointer

My understanding is that while basically every discussion of dynamic programming I can find has one store the pointers as the matrix is populated, it is faster to instead to re-calculate the previous cells during the traceback step instead.
I have my dynamic programming algorithm to build the matrix correctly as far as I can tell, but I am confused on how to do the traceback calculations. I also have been told that it is necessary to recalculate the values (instead of just looking them up) but I don't see how that will come up with different numbers.
The version of SW I am implementing includes an option for gaps in both sequences to open up, so the recurrence relation for each matrix has three options. Below is the current version of my global alignment class. From my hand calculations I believe that score_align properly generates the matrix, but obviously traceback_col_seq does not work.
INF = 2147483647 #max size of int32
class global_aligner():
def __init__(self, subst, open=10, extend=2, double=3):
self.extend, self.open, self.double, self.subst = extend, open, double, subst
def __call__(self, row_seq, col_seq):
#add alphabet error checking?
score_align(row_seq, col_seq)
return traceback_col_seq()
def init_array(self):
self.M = zeros((self.maxI, self.maxJ), int)
self.Ic = zeros((self.maxI, self.maxJ), int)
self.Ir = zeros((self.maxI, self.maxJ), int)
for i in xrange(self.maxI):
self.M[i][0], self.Ir[i][0], self.Ic[i][0] = \
-INF, -INF, -(self.open+self.extend*i)
for j in xrange(self.maxJ):
self.M[0][j], self.Ic[0][j], self.Ir[0][j] = \
-INF, -INF, -(self.open+self.extend*j)
self.M[0][0] = 0
self.Ic[0][0] = -self.open
def score_cell(self, i, j, chars):
thisM = [self.Ic[i-1][j-1]+self.subst[chars], self.M[i-1][j-1]+\
self.subst[chars], self.Ir[i-1][j-1]+self.subst[chars]]
thisC = [self.Ic[i][j-1]-self.extend, self.M[i][j-1]-self.open, \
self.Ir[i][j-1]-self.double]
thisR = [self.M[i-1][j]-self.open, self.Ir[i-1][j]-self.extend, \
self.Ic[i-1][j]-self.double]
return max(thisM), max(thisC), max(thisR)
def score_align(self, row_seq, col_seq):
self.row_seq, self.col_seq = list(row_seq), list(col_seq)
self.maxI, self.maxJ = len(self.row_seq)+1, len(self.col_seq)+1
self.init_array()
for i in xrange(1, self.maxI):
row_char = self.row_seq[i-1]
for j in xrange(1, self.maxJ):
chars = row_char+self.col_seq[j-1]
self.M[i][j], self.Ic[i][j], self.Ir[i][j] = \
self.score_cell(i, j, chars)
def traceback_col_seq(self):
self.traceback = list()
i, j = self.maxI-1, self.maxJ-1
while i > 1 and j > 1:
cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]]
cellMax = max(cell)
chars = self.row_seq[i-1]+self.col_seq[j-1]
if cell.index(cellMax) == 0: #M
diag = [diagM, diagC, diagR] = self.score_cell(i-1, j-1, chars)
diagMax = max(diag)
if diag.index(diagMax) == 0: #match
self.traceback.append(self.col_seq[j-1])
elif diag.index(diagMax) == 1: #insert column (open)
self.traceback.append('-')
elif diag.index(diagMax) == 2: #insert row (open other)
self.traceback.append(self.col_seq[j-1].lower())
i, j = i-1, j-1
elif cell.index(cellMax) == 1: #Ic
up = [upM, upC, upR] = self.score_cell(i-1, j, chars)
upMax = max(up)
if up.index(upMax) == 0: #match (close)
self.traceback.append(self.col_seq[j-1])
elif up.index(upMax) == 1: #insert column (extend)
self.traceback.append('-')
elif up.index(upMax) == 2: #insert row (double)
self.traceback.append('-')
i -= 1
elif cell.index(cellMax) == 2: #Ir
left = [leftM, leftC, leftR] = self.score_cell(i, j-1, chars)
leftMax = max(left)
if left.index(leftMax) == 0: #match (close)
self.traceback.append(self.col_seq[j-1])
elif left.index(leftMax) == 1: #insert column (double)
self.traceback.append('-')
elif left.index(leftMax) == 2: #insert row (extend other)
self.traceback.append(self.col_seq[j-1].lower())
j -= 1
for j in xrange(0,j,-1):
self.traceback.append(self.col_seq[j-1])
for i in xrange(0,i, -1):
self.traceback.append('-')
return ''.join(self.traceback[::-1])
test = global_aligner(blosumMatrix)
test.score_align('AA','AAA')
test.traceback_col_seq()
I think the main problem is that you aren't taking the matrix that you're currently in into account when generating the cells that you could potentially have come from. cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]] is right for the first time through the while loop, but after that you can't just choose the matrix that has the highest score. Your options are constrained by where you're coming from. I'm having a bit of trouble following your code, but I think you're taking that into account in the if statements in the while loop. If that's the case, then I think changes along the lines of these should be sufficient:
cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]]
cellIndex = cell.index(max(cell))
while i > 1 and j > 1:
chars = self.row_seq[i-1]+self.col_seq[j-1]
if cellIndex == 0: #M
diag = [diagM, diagC, diagR] = self.score_cell(i-1, j-1, chars)
diagMax = max(diag)
...
cellIndex = diagMax
i, j = i-1, j-1
elif cell.index(cellMax) == 1: #Ic
up = [upM, upC, upR] = self.score_cell(i-1, j, chars)
upMax = max(up)
...
cellIndex = upMax
i -= 1
elif cell.index(cellMax) == 2: #Ir
left = [leftM, leftC, leftR] = self.score_cell(i, j-1, chars)
leftMax = max(left)
...
cellIndex = leftMax
j -= 1
Like I said, I'm not positive that I'm following your code correctly, but see if that helps.

Categories

Resources