Longest common substring matrix - python

I am very new with python and am struggling in creating a matrix that expresses the longest common substring. I am looking for a result like this: LCS matrix
This is my code so far.
def compute_lcs(X, Y):
m = len(X)
n = len(Y)
# An (m) times (n) matrix
matrix = [[0] * (n) for _ in range(m)]
for i in range(1, m):
for j in range(1, n):
if X[i] == Y[j]:
if i == 0 or j == 0:
matrix[i][j] = 1
else:
matrix[i][j] = matrix[i-1][j-1]+1
else:
matrix[i][j] = 0
return matrix
b = compute_lcs('AACTGGCAG','TACGCTGGA')
for y in b:
print (y)
Current Output:
[0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 1, 1, 1, 1, 1, 0]
[0, 1, 0, 2, 0, 2, 2, 2, 0]
[0, 1, 2, 1, 3, 0, 3, 3, 0]
[0, 1, 2, 0, 2, 4, 0, 0, 0]
[0, 1, 2, 0, 1, 3, 0, 0, 0]
[0, 1, 0, 3, 0, 2, 4, 1, 0]
[0, 0, 2, 1, 4, 1, 3, 5, 0]
[0, 1, 1, 0, 2, 5, 0, 0, 0]
Expected Output:
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 0]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
However my result is a matrix that shows incorrect values. When I do the matrix by hand, the correct output looks like this: Correct output. I feel like my logic makes sense, what am I doing incorrectly?
Thanks everyone.

First off, to make things clear, the longest common subsequence problem is not the same as the longest common substring problem. What you're trying to solve is the latter; better not to confuse the two.
Secondly, your else branches are not aligned under the appropriate if conditions.
Whenever the strings match X[i] == Y[j], we set the matrix element to 1 if the index i or j is 0 since i-1 or j-1 at 0 gives -1 (unfortunately, this is also the index for last item in Python) which is not what we want, otherwise we increment for higher indices i, j > 1.
Thirdly, your looping should start at 0 not 1 since we start from the first characters in the strings which are at indices 0:
def compute_lcs(X, Y):
m = len(X)
n = len(Y)
# An (m) times (n) matrix
matrix = [[0] * n for _ in range(m)]
for i in range(0, m):
for j in range(0, n):
if X[i] == Y[j]:
if i == 0 or j == 0:
matrix[i][j] = 1
else:
matrix[i][j] = matrix[i-1][j-1]+1
else:
matrix[i][j] = 0
return matrix
To get the exact matrix shown in the expected output, you should swap the order of the arguments or transpose the matrix before printing. Note however that these are not necessary (swapping or transposing) and only serve for formatting purposes:
b = compute_lcs('TACGCTGGA', 'AACTGGCAG')
for y in b:
print (y)
[0, 0, 0, 1, 0, 0, 0, 0, 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 2, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 1, 1, 0, 0, 1]
[0, 0, 1, 0, 0, 0, 2, 0, 0]
[0, 0, 0, 2, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 3, 1, 0, 0, 1]
[0, 0, 0, 0, 1, 4, 0, 0, 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0]

Related

My code with in function does not perform the task but does the same while out of it?

Here's my code where I am flipping a bit, crossing over two lists and selecting random elements of lists:
def selRandom(individuals, k):
return [random.choice(individuals) for i in range(k)]
def cxOnePoint(ind1, ind2):
size = min(len(ind1), len(ind2))
cxpoint = random.randint(1, size - 1)
ind1[cxpoint:], ind2[cxpoint:] = ind2[cxpoint:], ind1[cxpoint:]
return ind1, ind2
def mutFlipBit(individual, indpb):
for i in range(len(individual)):
if random.random() < indpb:
individual[i] = type(individual[i])(not individual[i])
return individual,
def operators(selection, crossover, mutation, parent, k, indvpb):
select = ['randomSelection']
cx = ['OnePoint']
mutate = ['flipBitMutate']
if selection not in select:
return "invalid"
else:
if selection == 'randomSelection':
(parent) = selRandom(parent, k)
if crossover not in cx:
return "invalid"
else:
if crossover == 'OnePoint':
ind = cxOnePoint(parent[0], parent[1])
if mutation not in mutate:
return "not valid"
else:
if mutation == 'flipBitMutate':
mutatedIndvidual = mutFlipBit(ind[0], indvpb)
return parent, ind, mutatedIndvidual
I run this to execute the code:
indv = ([1,0,1,0,1,0,1,1],[0,1,0,1,0,0,0,1],[0,0,1,1,1,1,0,0],[0,1,1,1,0,0,0,1],[1,0,0,0,1,1,1,1])
selection = 'randomSelection'
crossover = 'OnePoint'
mutation = 'flipBitMutate'
selected_parent, ind, mutatedIndvidual = operators(selection = selection , crossover = crossover, mutation = mutation, parent = indv, k = 3, indvpb = 0.1 )
print("Parents:\n",indv)
print("Selected parent to reproduce:\n",selected_parent)
print("Crossover offsprings:\n",ind)
print("Mutated offsprings",mutatedIndvidual)
And get the result:
Parents:
([1, 0, 1, 0, 1, 0, 1, 1], [1, 1, 1, 1, 0, 0, 1, 0], [0, 0, 1, 1, 1, 1, 0, 0], [0, 1, 0, 1, 0, 0, 0, 1], [1, 0, 0, 0, 1, 1, 1, 1])
Selected parent to reproduce:
[[1, 1, 1, 1, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0, 0, 1], [1, 1, 1, 1, 0, 0, 1, 0]]
Crossover offsprings:
([1, 1, 1, 1, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0, 0, 1])
Mutated offsprings ([1, 1, 1, 1, 0, 0, 1, 0],)
So the code is executing but is not functioning. It randomely selects from the tuple and then it does not crossovers (mixes the bits from two lists) or flips the bits. If I test run the code separately (out of the operator method) it works:
a = [1,1,1,1,1,1,1,1]
b = [0,0,0,0,0,0,0,0]
c = [1,0,0,0,1,1,0,1]
d= (a,b,c)
print("selecting randomely:\n",selRandom(d,1))
print("TESTING CROSSOVER\n", cxOnePoint(a,b))
print("Mutate:\n",mutFlipBit(a,0.4))
and got the proper result:
selecting randomely:
[[0, 0, 0, 0, 0, 0, 0, 0]]
TESTING CROSSOVER
([1, 1, 1, 1, 1, 1, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1])
Mutate:
([0, 1, 1, 1, 1, 1, 1, 1],)
What is the logical mistake that I am making here?
Thank you!
To answer my own question:
I have assigned original lists in the mutFlipBit() and cxOnePoint and I changed my 'mutFlipBit()' to:
def mutation(individual, indp):
return [not ind if random.random() < indp else ind for ind in individual]
This worked for me

Consecutive values in array with periodic boundaries in Python

I have some 2D-arrays filled with 0 and 1:
import numpy as np
a = np.random.randint(2, size=(20, 20))
b = np.random.randint(2, size=(20, 20))
c = np.random.randint(2, size=(20, 20))
d = np.random.randint(2, size=(20, 20))
and I want to count the consecutive occurrence of the ones with periodic boundaries.
That means (in 1D for clearness):
[1 1 0 0 1 1 0 1 1 1]
should give me 5(last three elements + first two).
The 2D-arrays should be compared/counted in the third (second if you start with 0) axis, like first stacking the arrays in axis=2 and then applying the same algorithm like for 1D. But I am not sure if this is the most simple way.
Here's one way for ndarrays a of 2D and higher dim arrays, meant for performance efficiency -
def count_periodic_boundary(a):
a = a.reshape(-1,a.shape[-1])
m = a==1
c0 = np.flip(m,axis=-1).argmin(axis=-1)+m.argmin(axis=-1)
z = np.zeros(a.shape[:-1]+(1,),dtype=bool)
p = np.hstack((z,m,z))
c = (p[:,:-1]<p[:,1:]).sum(1)
s = np.r_[0,c[:-1].cumsum()]
l = np.diff(np.flatnonzero(np.diff(p.ravel())))[::2]
d = np.maximum(c0,np.maximum.reduceat(l,s))
return np.where(m.all(-1),a.shape[-1],d)
Sample runs -
In [75]: np.random.seed(0)
...: a = np.random.randint(2, size=(5, 20))
In [76]: a
Out[76]:
array([[0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0],
[0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]])
In [77]: count_periodic_boundary(a)
Out[77]: array([7, 4, 5, 2, 6])
In [72]: np.random.seed(0)
...: a = np.random.randint(2, size=(2, 5, 20))
In [73]: a
Out[73]:
array([[[0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0],
[0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0]],
[[1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0],
[1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1],
[0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0]]])
In [74]: count_periodic_boundary(a)
Out[74]: array([7, 4, 5, 2, 6, 2, 5, 4, 2, 1])
You can use groupby from itertools:
from itertools import groupby
a = [1, 1, 0, 0, 1, 1, 0, 1, 1, 1]
def get_longest_seq(a):
if all(a):
return len(a)
a_lens = [len(list(it)) for k, it in groupby(a) if k != 0]
if a[0] == 1 and a[-1] == 1:
m = max(max(a_lens), a_lens[0] + a_lens[-1])
else:
m = max(a_lens)
return m
print(get_longest_seq(a))
Here is a two-liner, admittedly containing one rather long line:
*m,n = a.shape
return np.minimum(n,(np.arange(1,2*n+1)-np.maximum.accumulate(np.where(a[...,None,:],0,np.arange(1,2*n+1).reshape(2,n)).reshape(*m,2*n),-1)).max(-1))
How it works:
Let's first ignore the wrap around and consider a simple example: a = [1 0 0 1 1 0 1 1 1 0]
We want to transform this into b = [1 0 0 1 2 0 1 2 3 0], so we can simply take the maximum. One way of generating b is taking the arange r = [1 2 3 4 5 6 7 8 9 10] and subtracting aux = [0 2 3 3 3 6 6 6 6 10]. aux we create by multiplying r with (1-a) yielding [0 2 3 0 0 6 0 0 0 10] and taking the cumulative maximum.
To deal with the wrap around we simply put two copies of a next to each other and then use the above.
Here is the code again broken down into smaller bits and commented:
*m,n = a.shape
# r has length 2*n because of how we deal with the wrap around
r = np.arange(1,2*n+1)
# create r x (1-a) using essentially np.where(a,0,r)
# it's a bit more involved because we are cloning a in the same step
# a will be doubled along a new axis we insert before the last one
# this will happen by means of broadcasting against r which we distribute
# over two rows along the new axis
# in the very end we merge the new and the last axis
r1_a = np.where(a[...,None,:],0,r.reshape(2,n)).reshape(*m,2*n)
# take cumulative max
aux = np.maximum.accumulate(r1_a,-1)
# finally, take the row wise maximum and deal with all-one rows
return np.minimum(n,(r-aux).max(-1))

How could I fill my second list with zero and one according to the conditions below?

I have a list containing multiple lists. It's full of random numbers between 0 and 1.
I need to create another list with the same size of the first one, but if the random numbers less or equal to 0.75, I need them equal to zero and the more than 0.75 will be one. When I need to print the second list x, it must contain zeros and ones according to my conditions below. I always get a list full of zeros, where is my fault?
This is below my try:
import random
y = [[random.uniform(0,1) for i in range(10)]for j in range(10)]
x = [[0 for i in range(len(y[0]))]for j in range(len(y))]
for i in range(len(y)):
for j in range(len(y[0])):
if y[i][j] <= 0.75:
x[i][j] == 0
else:
x[i][j] == 1
print(x)
Your if/else code is wrong, you are using '==' instead '=' so x[i] value is not being updated. Try this:
import random
y = [[random.uniform(0,1) for i in range(10)]for j in range(10)]
x = [[0 for i in range(len(y[0]))]for j in range(len(y))]
for i in range(len(y)):
for j in range(len(y[0])):
if y[i][j] <= 0.75:
x[i][j] = 0
else:
x[i][j] = 1
print(x)
Put the condition in your list comprehension:
import random
y = [[random.uniform(0,1) for i in range(10)]for j in range(10)]
x = [[0 if val <=0.75 else 1 for val in sublist] for sublist in y]
print(x)
# [[1, 0, 0, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 1, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], [0, 0, 0, 1, 1, 0, 0, 0, 1, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]]
you can declare the way you want to act your condition then you can apply it against each element from your initial list:
import random
y = [[random.uniform(0,1) for i in range(10)]for j in range(10)]
cond = lambda x: int(x < 0.75)
x = list(map(lambda x : list(map(cond, x)), y))

Flood fill working only on squared matrix?

I'm trying to implement flood fill to find all available cells in a grid
from which my robot can move to. if a cell is occupied its value will be 1,
and if a cell is free its value will be 0. my code seems to work on squared
matrices but not on other matrices. In my code I mark the reachable cells with
the number 2.
Here is my code:
def floodfill(matrix, x, y):
if matrix[x][y] == 0:
matrix[x][y] = 2
if x > 0:
floodfill(matrix,x-1,y)
if x < len(matrix[y]) - 1:
floodfill(matrix,x+1,y)
if y > 0:
floodfill(matrix,x,y-1)
if y < len(matrix) - 1:
floodfill(matrix,x,y+1)
This matrix seems to work:
def main():
maze = [[0, 1, 1, 1, 1, 0, 0, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 1, 0, 1, 0, 1, 0],
[0, 0, 0, 1, 1, 0, 1, 0, 1, 0]]
floodfill(maze, 0,0)
print(maze)
And this matrix does not (same matrix with last column removed):
def main():
maze = [[0, 1, 1, 1, 1, 0, 0, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 1, 0, 1, 1, 0, 1, 0, 1],
[0, 0, 0, 1, 1, 0, 1, 0, 1]]
floodfill(maze, 0,0)
print(maze)
Would appreciate your help.
Thanks!
Your first matrix works because it is a square matrix where the number of rows and the numbers of columns are equal = 10.
In your second case, your matrix is not a square matrix because you have 10 rows (x variable) but only 9 columns (y variable). Hence, when you do
y < len(matrix) - 1
len(matrix) is 10 which means you are going up to y < 9. Otherwise, you will get "List Index Out of Range Error". To get the correct numbers, you should check against the length of your rows which gives you the number of columns. One way is to use the length of first row as len(matrix[0]).
Similarly, for the x you should use the corresponding number of rows which can be accessed using len(matrix) which is 10 in your case. So, you should use
if x < len(matrix) - 1
instead of if x < len(matrix[y]) - 1: as juvian also pointed it out in the comments.
Other way is to convert your list of lists to a NumPy array and use the shape command to get the corresponding number of rows and columns.
When accessing elements in the matrix, the row index comes first (the matrix is an array of rows), followed by the column index (each row is an array of numbers).
You want matrix[y][x], not matrix[x][y].

How does pandas qcut() method select what bins to put extra items in?

I would like to understand how pd.qcut() selects where to put extra items when numItems % binSize != 0. For example, I wrote this code to check how 0-9 items are binned in a decile setting
for i in range(10):
a = pd.qcut(pd.Series(range(i+10)),10,False).value_counts().ix[range(10)].tolist()
a = [x-1 for x in a]
print(str(i),'extra:',a)
0 extra: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 extra: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2 extra: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1]
3 extra: [1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
4 extra: [1, 0, 0, 1, 0, 0, 1, 0, 0, 1]
5 extra: [1, 0, 1, 0, 1, 0, 0, 1, 0, 1]
6 extra: [1, 1, 0, 1, 0, 1, 0, 1, 0, 1]
7 extra: [1, 1, 0, 1, 1, 0, 1, 0, 1, 1]
8 extra: [1, 1, 1, 0, 1, 1, 0, 1, 1, 1]
9 extra: [1, 1, 1, 1, 1, 0, 1, 1, 1, 1]
Of course, this will change as numItems and binSize changes. Do you have any insight on how the algorithm works to try to select where to put the extra items? It appears that it tries to balance them in some way

Categories

Resources