Python: Constructing & Printing matrices

Python: Constructing & Printing matrices - python

I need to create a matrix that calculates the LCS and then print it out. This is my code, but I'm having trouble with the print function (don't know how to get the LCSmatrix values into the printing)
def compute_LCS(seqA, seqB):
for row in seqA:
for col in seqB:
if seqA[row] == seqB[col]:
if row==0 or col==0:
LCSmatrix(row,col) = 1
else:
LCSmatrix(row,col) = LCS(row-1,col-1) + 1
else:
LCSmatrix(row,col) = 0
return LCSmatrix
def printMatrix(parameters...):
print ' ',
for i in seqA:
print i,
print
for i, element in enumerate(LCSMatrix):
print i, ' '.join(element)
matrix = LCSmatrix
print printMatrix(compute_LCS(seqA,seqB))
Any help would be much appreciated.

Try this:
seqA='AACTGGCAG'
seqB='TACGCTGGA'
def compute_LCS(seqA, seqB):
LCSmatrix = [len(seqB)*[0] for row in seqA]
for row in range(len(seqB)):
for col in range(len(seqA)):
if seqB[row] == seqA[col]:
if row==0 or col==0:
LCSmatrix[row][col] = 1
else:
LCSmatrix[row][col] = LCSmatrix[row-1][col-1] + 1_
else:
LCSmatrix[row][col] = 0
return LCSmatrix
def printMatrix(seqA, seqB, LCSmatrix):
print ' '.join('%2s' % x for x in ' '+seqA)
for i, element in enumerate(LCSmatrix):
print '%2s' % seqB[i], ' '.join('%2i' % x for x in element)
matrix = compute_LCS(seqA, seqB)
printMatrix(seqA, seqB, matrix)
The above produces:
A A C T G G C A G
T 0 0 0 1 0 0 0 0 0
A 1 1 0 0 0 0 0 1 0
C 0 0 2 0 0 0 1 0 0
G 0 0 0 0 1 1 0 0 1
C 0 0 1 0 0 0 2 0 0
T 0 0 0 2 0 0 0 0 0
G 0 0 0 0 3 1 0 0 1
G 0 0 0 0 1 4 0 0 1
A 1 1 0 0 0 0 0 1 0

Related

Counting repeated sequences in transition table

I'm using the following function to generate a transition table:
import numpy as np
import pandas as pd
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
ind = (seq[1:], seq[:-1])
arr[ind] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')
However, my result is incorrect:
list1 = [1,2,3,4,5,4,5,4,5]
list2 = [4,5,4,5]
make_table([list1, list2])
Current 0 1 2 3 4 5
Next
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 1 0 2
5 0 0 0 0 2 0
For example, the transition 4->5 should be counted 5 times, but it's only counted once per sequence (2). I know the issue is the arr[ind] += 1 line, but I just can't figure it out! Do I nest another loop, or is there a slick way to add the total number of instances at once? Thanks!

Figured it out! Switched to the following:
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
for i,j in zip(seq[1:],seq[:-1]):
ind = (i,j)
arr[ind] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')

Another loop seems like the easiest solution, with a bit of a twist of using zip:
import numpy as np
import pandas as pd
def make_table(allSeq):
n = max([ max(s) for s in allSeq ]) + 1
arr = np.zeros((n,n), dtype=int)
for seq in allSeq:
ind = zip(seq[1:], seq[:-1])
for i in ind:
arr[i] += 1
return pd.DataFrame(arr).rename_axis(index='Next', columns='Current')
list1 = [1,2,3,4,5,4,5,4,5]
list2 = [4,5,4,5]
make_table([list1, list2])
returns
Next 0 1 2 3 4 5
------ --- --- --- --- --- ---
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 0 0 0
4 0 0 0 1 0 3
5 0 0 0 0 5 0

Replace one column in a line in a text file using Python

I have a text file and it has the following contents
#
# Keywords:
#
LiFePO4
end
name Li5FeO4
cell
18.557309 18.316802 9.125725 90.047539 90.100646 90.060551 0 0 0 0 0 0
fractional 1
Li core 0.06001 0.059408 0.849507 1 1 0 0 0 0
Li1 core 0.025416 0.339078 0.128746 1 1 0 0 0 0
Li2 core 0.02517 0.838929 0.130747 1 1 0 0 0 0
Li3 core 0.525498 0.339179 0.127632 1 1 0 0 0 0
Li4 core 0.524753 0.841333 0.129329 1 1 0 0 0 0
Li5 core 0.179907 0.158182 0.634012 1 1 0 0 0 0
Li6 core 0.180817 0.666028 0.628327 1 1 0 0 0 0
This is the input that I need to supply to a tool which used in some research application. Now I need to replace the 0 on the first line which starts with Li on the third column from the last. That is, there are four zeros towards the end in each of the lines starting with Li. I need to replace the second zero and so the file will have the the contents as follows:
#
# Keywords:
#
LiFePO4
end
name Li5FeO4
cell
18.557309 18.316802 9.125725 90.047539 90.100646 90.060551 0 0 0 0 0 0
fractional 1
Li core 0.06001 0.059408 0.849507 1 1 0 1 0 0
Li1 core 0.025416 0.339078 0.128746 1 1 0 0 0 0
Li2 core 0.02517 0.838929 0.130747 1 1 0 0 0 0
Li3 core 0.525498 0.339179 0.127632 1 1 0 0 0 0
Li4 core 0.524753 0.841333 0.129329 1 1 0 0 0 0
Li5 core 0.179907 0.158182 0.634012 1 1 0 0 0 0
Li6 core 0.180817 0.666028 0.628327 1 1 0 0 0 0
This has to be done a number of times for the zeros in various positions and I have the following code. There are some more operations that I am doing in the same code.
import os
import shutil
import time
def edit_file(column, next_column):
# Processing x.gin file
file_name = './' + column + '.gin'
file_handler = open(file_name, 'r')
print("Processing " + file_name + " file")
contents = file_handler.readlines()
find_line = contents[14]
find_line_array = find_line.split('\t')
print(find_line_array)
# change values according to the file name
if column == 'x':
find_line_array[8] = 1
elif column == 'y':
print(contents)
print(find_line_array)
find_line_array[9] = 1
elif column == 'z':
find_line_array[10] = 1
elif column == 'xy':
find_line_array[8] = 1
find_line_array[9] = 1
elif column == 'yz':
find_line_array[9] = 1
find_line_array[10] = 1
elif column == 'xz':
find_line_array[8] = 1
find_line_array[10] = 1
formatted = '\t'.join(map(str, find_line_array))
contents[14] = formatted
with open(file_name, 'w') as f:
for item in contents:
f.write("%s\n" % item)
print("Formatting completed for " + file_name)
print('Executing GULP command ----')
gulp_command = 'gulp ' + column + '.gin > ' + column + '.gout'
print(gulp_command)
shutil.copy(file_name, next_column+'.gin')
file_handler.close()
os.system(gulp_command)
while not os.path.exists('./Li.grs'):
print('Waiting for output file')
time.sleep(1)
if os.path.isfile('./Li.grs'):
print('renaming file')
os.rename('./Li.grs', next_column+'.gin')
os.rename('./Li.grs', column+'.grs')
return True
if __name__ == '__main__':
print('Starting Execution')
column_list = ['x', 'y', 'xy', 'yz', 'xz']
print(column_list)
for index, column in enumerate(column_list):
if column != 'xz':
edit_file(column, column_list[index + 1])
else:
edit_file(column, 'xz')
print('Execution completed')
I am replacing it correctly and rewriting the file. But this file doesn't appears to be in correct format as it has additional new lines. Is it possible that I can rewrite the single line only and so that I can keep the file in exact same format.

i created a function for that. try this
def replace(filename,row,column,value):
columnspan = " "
data = open(filename).read().split("\n")
for i in range(len(data)):
data[i] = data[i].split(columnspan)
data[row][column] = value
write=""
for i in range(len(data)):
for x in range(len(data[i])):
write+=(str(data[i][x])+columnspan)
write += "\n"
write.strip()
file = open(filename,"w")
file.write(write)
file.close()

You can use regex to find and update text:
import re
with open('input.txt', 'r') as f1, open('output.txt', 'w') as f2:
data = f1.read()
match = re.findall('Li\s+\w+\s+\d+\.\d+\s+\d\.\d+\s+\d\.\d+\s+\d\s+\d\s+\d\s+\d', data)
for m in match:
data = data.replace(m, f'{m[:-1]}1')
f2.write(data)
Output:
#
# Keywords:
#
LiFePO4
end
name Li5FeO4
cell
18.557309 18.316802 9.125725 90.047539 90.100646 90.060551 0 0 0 0 0 0
fractional 1
Li core 0.06001 0.059408 0.849507 1 1 0 1 0 0
Li1 core 0.025416 0.339078 0.128746 1 1 0 0 0 0
Li2 core 0.02517 0.838929 0.130747 1 1 0 0 0 0
Li3 core 0.525498 0.339179 0.127632 1 1 0 0 0 0
Li4 core 0.524753 0.841333 0.129329 1 1 0 0 0 0
Li5 core 0.179907 0.158182 0.634012 1 1 0 0 0 0
Li6 core 0.180817 0.666028 0.628327 1 1 0 0 0 0

Find all the blocks

I am very new to python and coding. I have this homework that I have to do:
You will receive on the first line the rows of the matrix (n) and on the next n lines you will get each row of the matrix as a string (zeros and ones separated by a single space). You have to calculate how many blocks you have (connected ones horizontally or diagonally) Here are examples:
Input:
5
1 1 0 0 0
1 1 0 0 0
0 0 0 0 0
0 0 0 1 1
0 0 0 1 1
Output:
2
Input:
6
1 1 0 1 0 1
0 1 1 1 1 1
0 1 0 0 0 0
0 1 1 0 0 0
0 1 1 1 1 0
0 0 0 1 1 0
Output:
1
Input:
4
0 1 0 1 1 0
1 0 1 1 0 1
1 0 0 0 0 0
0 0 0 1 0 0
Output:
5
the code I came up with for now is :
n = int(input())
blocks = 0
matrix = [[int(i) for i in input().split()] for j in range(n)]
#loop or something to find the blocks in the matrix
print(blocks)
Any help will be greatly appreciated.

def valid(y,x):
if y>=0 and x>=0 and y<N and x<horizontal_len:
return True
def find_blocks(y,x):
Q.append(y)
Q.append(x)
#search around 4 directions (up, right, left, down)
dy = [0,1,0,-1]
dx = [1,0,-1,0]
# if nothing is in Q then terminate counting block
while Q:
y = Q.pop(0)
x = Q.pop(0)
for dir in range(len(dy)):
next_y = y + dy[dir]
next_x = x + dx[dir]
#if around component is valid range(inside the matrix) and it is 1(not 0) then include it as a part of block
if valid(next_y,next_x) and matrix[next_y][next_x] == 1:
Q.append(next_y)
Q.append(next_x)
matrix[next_y][next_x] = -1
N = int(input())
matrix = []
for rows in range(N):
row = list(map(int, input().split()))
matrix.append(row)
#row length
horizontal_len = len(matrix[0])
blocks = 0
#search from matrix[0][0] to matrix[N][horizontal_len]
for start_y in range(N):
for start_x in range(horizontal_len):
#if a number is 1 then start calculating
if matrix[start_y][start_x] == 1:
#make 1s to -1 for not to calculate again
matrix[start_y][start_x] = -1
Q=[]
#start function
find_blocks(start_y, start_x)
blocks +=1
print(blocks)
I used BFS algorithm to solve this question. The quotations are may not enough to understand the logic.
If you have questions about this solution, let me know!

Iterate over indices with constant sum in Python

I am working with polynomials and have to perform some operation where my looping variables have a sum inferior to some constant d.
Right now I have
for i in range(0, d):
for j in range(i, d):
for k in range(j, d):
which seems a bit ugly to me.
Is there some function, probably in itertools, allowing me to iterate for i, j, k in foo(d) ?

You can write your own. Here's a brute force way for 3 variables:
def constant_sum(s):
for i in range(s+1):
for j in range(s-i+1):
k = s - i - j
yield i,j,k
def inferior_sum(s):
for i in range(s+1):
for j in range(s+1):
if i+j >= s:
break
for k in range(s+1):
if i+j+k < s:
yield i,j,k
else:
break
for i,j,k in constant_sum(3):
print(i,j,k)
print()
for i,j,k in inferior_sum(3):
print(i,j,k)
Output:
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 2 0
1 0 0
1 0 1
1 1 0
2 0 0
Here are recursive versions that can do any number of variables(n) for any sum(s)...lightly tested:
def constant_sum(n,s):
if n == 1:
yield [s]
else:
for i in range(s+1):
for r in constant_sum(n-1,s-i):
yield [i] + r
def inferior_sum(n,s):
if n == 1:
for i in range(s):
yield [i]
else:
for i in range(s):
for r in inferior_sum(n-1,s-i):
yield [i] + r
for x in constant_sum(3,3):
print(*x)
print()
for x in inferior_sum(3,3):
print(*x)
Output:
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 2 0
1 0 0
1 0 1
1 1 0
2 0 0

The itertools/functional way would be something like:
from itertools import product
inferior_sum3 = filter(lambda x: sum(x)<3, product(range(4),range(4),range(4)))
for permu in inferior_sum3:
print(permu)
Output:
(0, 0, 0)
(0, 0, 1)
(0, 0, 2)
(0, 1, 0)
(0, 1, 1)
(0, 2, 0)
(1, 0, 0)
(1, 0, 1)
(1, 1, 0)
(2, 0, 0)

finding a value by looping, multiple files python

I am very new to python so please bear with me.
I have a files with atom coordinates. The files look a certain way, but the coordinates are not necessarily on the same line. The file also contains some text, below is a part of the file which is important:
<Gold.Protein.RotatedAtoms>
28.5571 85.1121 3.9003 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
27.3346 84.9085 3.2531 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
28.9141 86.4057 4.2554 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.4701 85.9748 2.9810 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
28.0456 87.4704 3.9845 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.8436 87.2569 3.3417 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.1924 88.0932 3.1196 H 0 0 0 0 0 0 0 0 0 0 0 0
27.0510 83.9062 2.9565 H 0 0 0 0 0 0 0 0 0 0 0 0
what I want to do is the following:
Get the python to recognize if the the number on the 5th row in the 6th column (in our case 3.3417) is more or less than 6. Then, if the value is more than 6 write the FILENAME of the file to a text document. Note that the position of this chunk of information changes in the different files. That is to say, the number 3.3417 is not always on the same row.
Also, all the numbers change all time.
I was thinking that I might loop through the text, scanning for the a line with "Gold.Protein.RotatedAtoms" and then take the 3rd insert on line the line 5 rows down. But how would one do that?
Thanks for your help!

Split all the lines of the text into a list using splitlines().
Find the index of the line with "Gold.Protein.RotatedAtoms" using the enumerate method and a filter in a list comprehension, something like this:
index = [index for index,line in enumerate(all_lines) if "Gold.Protein.RotatedAtoms" in line]
Add 5 to that index to get the line you need from all_lines, use the split() method to split it into tokens, and finally take out the 3rd element with the index operator (3rd element = line.split()[2]).

As Lanaru stated... you could read from the file and split output from the file into an array.
Like so:
#!/usr/bin/env python
def s_coord():
fo = open('Gold.Protein.RotatedAtoms')
count = 1
for i in fo.readlines():
array = i.split()
if array[2] == "3.3417":
print("Element 3.3417 is in the {0} row.".format(count))
count = count + 1
def main():
s_coord()
return 0
if __name__ == '__main__':
main()

It seems to me that the value 3.3417 is in the third column, so I may not understand your question.
I think regular expressions are the cleanest way to do this. I used http://kodos.sourceforge.net/ to create the following regular expression and code.
import re
# common variables
rawstr = r"""^\s*([0-9.]+)\s*([0-9.]+)\s*([0-9.]+)\s*([a-zA-Z.]+)"""
matchstr = """<Gold.Protein.RotatedAtoms>
28.5571 85.1121 3.9003 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
27.3346 84.9085 3.2531 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
28.9141 86.4057 4.2554 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.4701 85.9748 2.9810 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
28.0456 87.4704 3.9845 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.8436 87.2569 3.3417 C.ar 0 0 0 0 0 0 0 0 0 0 0 0
26.1924 88.0932 3.1196 H 0 0 0 0 0 0 0 0 0 0 0 0
27.0510 83.9062 2.9565 H 0 0 0 0 0 0 0 0 0 0 0 0"""
# build a compile object
compile_obj = re.compile(rawstr, re.MULTILINE)
match_obj = compile_obj.search(matchstr)
for values in compile_obj.findall(matchstr):
if values[2] == '3.3417':
print 'found it'
You can modify the conditional in the loop to look for your desired cases and change the print to write a file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Constructing & Printing matrices - python

Related

Counting repeated sequences in transition table

Replace one column in a line in a text file using Python

Find all the blocks

Iterate over indices with constant sum in Python

finding a value by looping, multiple files python

Categories

Resources