python3, how to handle a matrix in a TSV file

python3, how to handle a matrix in a TSV file - python

My source data is in a TSV file, it is a 20X20 matrix (ehttp://www.bio.davidson.edu/genomics/2008/Simpson/BLOSUM62.png)
Here's what I'm trying to accomplish:
I need to read the matrix in this source file and then I have to make a function (called distance) that calculate the intersection between two index. For example if I have [A][D], the function returns the value of the intersection between these two letters. This is the code I tried:
Blosum62 = open('BLOSUM62.tsv','r')
print(Blosum62)
def distance(x,y):
a = Blosum62.index(x)
b = Blosum62.index(y)
dist = Blosum62Matrix[a][b]
return dist
but it doesn't work, I think there is a problem either in the way I used to open/print my file and in the function...thanks for helping!

Related

create a loop in python to loop over files, variables, and matrix

Does anyone can help me with creating a loop in python? I need to create a loop to loop in the same time over files, variable, and matrix. To explain better, these are the steps of my code:
read files using h5py library:
file_00 = h5py.File('file0','r')
file_01 = h5py.File('file1','r')
file_1000 = h5py.File('file1000','r')
Extract variables from file:
alpha_00 = numpy.array(file_00['alpha'])[:,:,:,:]
alpha_01 = numpy.array(file_01['alpha'])[:,:,:,:]
alpha_1000= numpy.array(file_1000['alpha'])[:,:,:,:]
Construct new matrix:
new_alpha_1 = np.zeros([100,200,100])
for index in range (100):
m =
n =
p =
new_alpha_1[m:m+10,n:n+10,p:p+10]=alpha_1[index,:,:,:]
My goal is to create a loop from 0 to 1000, to read all files, extract alpha variable for all files and construct new_alpha matrix for all files.
What I tried is first looping over files by creating a list:
for counter in range (0,1000):
File=h5py.File('file_{0:0=4d}'.format(counter),'r')
This lines works and are able to read all files.
How can I create a loop to extract alpha variables for all files and construct the matrix new_alpha for all files ?

How can I find the index position corresponding to a 2D array?

I have a .stl file, and a .txt file, both files need to read xyz values (aka point coordinates). The coordinate value of the stl file contains the coordinate value of the txt file. How can I find the corresponding position (that is, the index) of each coordinate value in the txt file in the stl file?
This is my .py code:
def find_sequence():
mesh = pv.read("./data/bai--LL-去掉下表面.stl")
vertex = np.around(np.array(mesh.points), decimals=4)
data = np.around(read_data("./data/锚定点1.txt", 0, 3), decimals=4)
index = []
for i in range(len(data)):
for j in range(len(vertex)):
if np.array_equal(data[i, :], vertex[j, :]):
index.append(j + 1)
return index
I think there is no problem with my logic, but it doesn't work properly, please help me!
Thank you so much!

Just replace np.array_equal() with np.allclose(). Because the data has a problem with the number of decimal places
code show as below:
for i in range(len(data)):
for j in range(len(vertex)):
if np.allclose(data[i, :], vertex[j, :]):
index.append(j)

Converting pixels into wavelength using 2 FITS files

I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below.
import numpy as np
from astropy.io import fits
# read the images
image_file = ("run_1.fits")
image_calibration = ("cali_1.fits")
hdr = fits.getheader(image_file)
hdr_c = fits.getheader(image_calibration)
# print headers
sp = fits.open(image_file)
print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n')
sp_c = fits.open(image_calibration)
print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n')
# generation of arrays with the wavelengths and counts
count = np.array(sp[0].data)
wave = np.array(sp_c[0].data)
I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code
file_list = fits.open(image_file)
calibration_list = fits.open(image_calibration)
image_data = file_list[0].data
calibration_data = calibration_list[0].data
# make a list to hold images
img_list = []
img_list.append(image_data)
img_list.append(calibration_data)
# list to numpy array
img_array = np.array(img_list)
# save the array as fits - image cube
fits.writeto('mycube.fits', img_array)
However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you.
I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website:
https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf
This is my code:
# Making a Primary HDU (required):
primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1)
# If you have additional extensions:
secondhdu = fits.ImageHDU(wave)
# Making a new HDU List:
hdulist1 = fits.HDUList([primaryhdu, secondhdu])
# Writing the file:
hdulist1.writeto("filename.fits", overwrite=True)
image = ("filename.fits")
hdr = fits.open(image)
image_data = hdr[0].data
wave_data = hdr[1].data
I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data

If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra.
It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata.
The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in).
You might also be able to use the fits-wcs1d format.
If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.

Convert Matlab to Python

I'm converting matlab code to python, and I'm having a huge doubt on the following line of code:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
the whole code is this:
BD_teste = [];
por_treino = 0;
for l = 1:k
quant_elementos_t = int64((length(grupos.(['g',int2str(l)]).('elementos')) * por_treino)/100);
for element_c = 1 : quant_elementos_t
ind_element = randi([1 length(grupos.(['g',int2str(l)]).('elementos'))]);
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) = [];
end
end
This line of code below is a structure, as I am converting to python, I used a list and inside it, a dictionary with its list 'elementos':
'g',int2str(l)]).('elementos')
So my question is just in the line I quoted above, I was wondering what is happening and how it is occurring, and how I would write in python.
Thank you very much in advance.

BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Is one very weird line. Let's break it down into pieces:
int2str(l) returns the number l as a char array (will span from '1' until k).
['g',int2str(l)] returns the char array g1, then g2 and so on along with the value of l.
grupos.(['g',int2str(l)]) will return the value of the field named g1, g2 and so on that belongs to the struct grupos.
grupos.(['g',int2str(l)]).('elementos') Now assumes that grupos.(['g',int2str(l)]) is itself a struct, and returns the value of its field named 'elementos'.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:) Assuming that grupos.(['g',int2str(l)]) is a matrix, this line returns a line-vector containing the ind_element-th line of said matrix.
grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l appends the number one to the vector obtained before.
[BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] appends the line vector [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l] to the matrix BD_teste, at its bottom. and creates a new matrix.
Finally:
BD_teste = [BD_teste; grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];``assignes the value of the obtained matrix to the variableBD_teste`, overwriting its previous value. Effectively, this just appends the new line, but because of the overwriting step, it is not very effective.
It would be recommendable to append with:
BD_teste(end+1,:) = [grupos.(['g',int2str(l)]).('elementos')(ind_element,:),l];
Now, how you will rewrite this in Python is a whole different story, and will depend on how you want to define the variable grupos mostly.

How convert multidimensional array to two dimensional array

Here, my code feats value form text file; and create matrices as multidimensional array, but the problem is the code create more then two dimensional array, that I can't manipulate, I need two dimensional array, how I do that?
Explain algorithm of my code:
Moto of code:
My code fetch value from a specific folder, each folder contain 7 'txt' file, that generate from one user, in this way multiple folder contain multiple data of multiple user.
step1: Start a 1st for loop, and control it using how many folder have in specific folder,and in variable 'path' store the first path of first folder.
step2: Open the path and fetch data of 7 txt file using 2nd for loop.after feats, it close 2nd for loop and execute the rest code.
step3: Concat the data of 7 txt file in one 1d array.
step4(Here the problem arise): Store the 1d arry of each folder as 2d array.end first for loop.
Code:
import numpy as np
from array import *
import os
f_path='Result'
array_control_var=0
#for feacth directory path
for (path,dirs,file) in os.walk(f_path):
if(path==f_path):
continue
f_path_1= path +'\page_1.txt'
#Get data from page1 indivisualy beacuse there string type data exiest
pgno_1 = np.array(np.loadtxt(f_path_1, dtype='U', delimiter=','))
#only for page_2.txt
f_path_2= path +'\page_2.txt'
with open(f_path_2) as f:
str_arr = ','.join([l.strip() for l in f])
pgno_2 = np.asarray(str_arr.split(','), dtype=int)
#using loop feach data from those text file.datda type = int
for j in range(3,8):
#store file path using variable
txt_file_path=path+'\page_'+str(j)+'.txt'
if os.path.exists(txt_file_path)==True:
#genarate a variable name that auto incriment with for loop
foo='pgno_'+str(j)
else:
break
#pass the variable name as string and store value
exec(foo + " = np.array(np.loadtxt(txt_file_path, dtype='i', delimiter=','))")
#z=np.array([pgno_2,pgno_3,pgno_4,pgno_5,pgno_6,pgno_7])
#marge all array from page 2 to rest in single array in one dimensation
f_array=np.concatenate((pgno_2,pgno_3,pgno_4,pgno_5,pgno_6,pgno_7), axis=0)
#for first time of the loop assing this value
if array_control_var==0:
main_f_array=f_array
else:
#here the problem arise
main_f_array=np.array([main_f_array,f_array])
array_control_var+=1
print(main_f_array)
current my code generate array like this(for 3 folder)
[
array([[0,0,0],[0,0,0]]),
array([0,0,0])
]
Note: I don't know how many dimension it have
But I want
[
array(
[0,0,0]
[0,0,0]
[0,0,0])
]

I tried to write a recursive code that recursively flattens the list of lists into one list. It gives the desired output for your case, but I did not try it for many other inputs(And it is buggy for certain cases such as :list =[0,[[0,0,0],[0,0,0]],[0,0,0]])...
flat = []
def main():
list =[[[0,0,0],[0,0,0]],[0,0,0]]
recFlat(list)
print(flat)
def recFlat(Lists):
if len(Lists) == 0:
return Lists
head, tail = Lists[0], Lists[1:]
if isinstance(head, (list,)):
recFlat(head)
return recFlat(tail)
else:
return flat.append(Lists)
if __name__ == '__main__':
main()
My idea behind the code was to traverse the head of each list, and check whether it is an instance of a list or an element. If the head is an element, this means I have a flat list and I can return the list. Else, I should recursively traverse more.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python3, how to handle a matrix in a TSV file - python

Related

create a loop in python to loop over files, variables, and matrix

How can I find the index position corresponding to a 2D array?

Converting pixels into wavelength using 2 FITS files

Convert Matlab to Python

How convert multidimensional array to two dimensional array

Categories

Resources