create a loop in python to loop over files, variables, and matrix - python

Does anyone can help me with creating a loop in python? I need to create a loop to loop in the same time over files, variable, and matrix. To explain better, these are the steps of my code:
read files using h5py library:
file_00 = h5py.File('file0','r')
file_01 = h5py.File('file1','r')
file_1000 = h5py.File('file1000','r')
Extract variables from file:
alpha_00 = numpy.array(file_00['alpha'])[:,:,:,:]
alpha_01 = numpy.array(file_01['alpha'])[:,:,:,:]
alpha_1000= numpy.array(file_1000['alpha'])[:,:,:,:]
Construct new matrix:
new_alpha_1 = np.zeros([100,200,100])
for index in range (100):
m =
n =
p =
new_alpha_1[m:m+10,n:n+10,p:p+10]=alpha_1[index,:,:,:]
My goal is to create a loop from 0 to 1000, to read all files, extract alpha variable for all files and construct new_alpha matrix for all files.
What I tried is first looping over files by creating a list:
for counter in range (0,1000):
File=h5py.File('file_{0:0=4d}'.format(counter),'r')
This lines works and are able to read all files.
How can I create a loop to extract alpha variables for all files and construct the matrix new_alpha for all files ?

Related

Python: Use the "i" counter in while loop as digit for expressions

This seems like it should be very simple but am not sure the proper syntax in Python. To streamline my code I want a while loop (or for loop if better) to cycle through 9 datasets and use the counter to call each file out using the counter as a way to call on correct file.
I would like to use the "i" variable within the while loop so that for each file with sequential names I can get the average of 2 arrays, the max-min of this delta, and the max-min of another array.
Example code of what I am trying to do but the avg(i) and calling out temp(i) in loop does not seem proper. Thank you very much for any help and I will continue to look for solutions but am unsure how to best phrase this to search for them.
temp1 = pd.read_excel("/content/113VW.xlsx")
temp2 = pd.read_excel("/content/113W6.xlsx")
..-> temp9
i=1
while i<=9
avg(i) =np.mean(np.array([temp(i)['CC_H='],temp(i)['CC_V=']]),axis=0)
Delta(i)=(np.max(avg(i)))-(np.min(avg(i)))
deltaT(i)=(np.max(temp(i)['temperature='])-np.min(temp(i)['temperature=']))
i+= 1
EG: The slow method would be repeating code this for each file
avg1 =np.mean(np.array([temp1['CC_H='],temp1['CC_V=']]),axis=0)
Delta1=(np.max(avg1))-(np.min(avg1))
deltaT1=(np.max(temp1['temperature='])-np.min(temp1['temperature=']))
avg2 =np.mean(np.array([temp2['CC_H='],temp2['CC_V=']]),axis=0)
Delta2=(np.max(avg2))-(np.min(avg2))
deltaT2=(np.max(temp2['temperature='])-np.min(temp2['temperature=']))
......
Think of things in terms of lists.
temps = []
for name in ('113VW','113W6',...):
temps.append( pd.read_excel(f"/content/{name}.xlsx") )
avg = []
Delta = []
deltaT = []
for data in temps:
avg.append(np.mean(np.array([data['CC_H='],data['CC_V=']]),axis=0)
Delta.append(np.max(avg[-1]))-(np.min(avg[-1]))
deltaT.append((np.max(data['temperature='])-np.min(data['temperature=']))
You could just do your computations inside the first loop, if you don't need the dataframes after that point.
The way that I would tackle this problem would be to create a list of filenames, and then iterate through them to do the necessary calculations as per the following:
import pandas as pd
# Place the files to read into this list
files_to_read = ["/content/113VW.xlsx", "/content/113W6.xlsx"]
results = []
for i, filename in enumerate(files_to_read):
temp = pd.read_excel(filename)
avg_val =np.mean(np.array([temp(i)['CC_H='],temp['CC_V=']]),axis=0)
Delta=(np.max(avg_val))-(np.min(avg_val))
deltaT=(np.max(temp['temperature='])-np.min(temp['temperature=']))
results.append({"avg":avg_val, "Delta":Delta, "deltaT":deltaT})
# Create a dataframe to show the results
df = pd.DataFrame(results)
print(df)
I have included the enumerate feature to grab the index (or i) should you want to access it for anything, or include it in the results. For example, you could change the the results.append line to something like this:
results.append({"index":i, "Filename":filename, "avg":avg_val, "Delta":Delta, "deltaT":deltaT})
Not sure if I understood the question correctly. But if you want to read the files inside a loop using indexes (i variable), you can create a list to hold the contents of the excel files instead of using 9 different variables.
something like
files = []
files.append(pd.read_excel("/content/113VW.xlsx"))
files.append(pd.read_excel("/content/113W6.xlsx"))
...
then use the index variable to iterate over the list
i=1
while i<=9
avg(i) = np.mean(np.array([files[i]['CC_H='],files[i]['CC_V=']]),axis=0)
...
i+=1
P.S.: I am not a Pandas/NumPy expert, so you may have to adapt the code to your needs

Reading in image files from a folder indexed with numeric name tags using python

I am trying to read in a series of images from a folder using python. The images represent small pieces of larger image that has been split into a grid and are indexed using "imagename_row_column.jpg" (i.e. img_0_1.jpg). My current code (pasted below) is having trouble with the column index and is counting numbers 10 and above in the incorrect order. For example instead of reading in like (img_0, img_0_1, img_0_2,...img_0_9, img_0_10...) I am getting (img_0, img_0_10, img_0_11, img_0_1, img_0_2...) Any advice would be much appreciated. Thanks!
# Get images from folder
path1 = r'C:\Users\user_\Desktop\Test\IMG_Scan'
images = []
mylist = os.listdir(path1)
for img in mylist:
curimg = cv2.imread(f'{path1}/{img}')
images.append(curimg)
"img_0_10" < "img_0_2" by string comparison.
Try custom sorting your files before iterating:
mylist = sorted(os.listdir(path1), key=lambda x: list(map(int, x.split("_")[1:])))

Converting pixels into wavelength using 2 FITS files

I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below.
import numpy as np
from astropy.io import fits
# read the images
image_file = ("run_1.fits")
image_calibration = ("cali_1.fits")
hdr = fits.getheader(image_file)
hdr_c = fits.getheader(image_calibration)
# print headers
sp = fits.open(image_file)
print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n')
sp_c = fits.open(image_calibration)
print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n')
# generation of arrays with the wavelengths and counts
count = np.array(sp[0].data)
wave = np.array(sp_c[0].data)
I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code
file_list = fits.open(image_file)
calibration_list = fits.open(image_calibration)
image_data = file_list[0].data
calibration_data = calibration_list[0].data
# make a list to hold images
img_list = []
img_list.append(image_data)
img_list.append(calibration_data)
# list to numpy array
img_array = np.array(img_list)
# save the array as fits - image cube
fits.writeto('mycube.fits', img_array)
However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you.
I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website:
https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf
This is my code:
# Making a Primary HDU (required):
primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1)
# If you have additional extensions:
secondhdu = fits.ImageHDU(wave)
# Making a new HDU List:
hdulist1 = fits.HDUList([primaryhdu, secondhdu])
# Writing the file:
hdulist1.writeto("filename.fits", overwrite=True)
image = ("filename.fits")
hdr = fits.open(image)
image_data = hdr[0].data
wave_data = hdr[1].data
I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data
If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra.
It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata.
The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in).
You might also be able to use the fits-wcs1d format.
If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.

Making a presence/absence matrix (y axis is file names, x axis is reads in file)

I have multiple files (filenames) with multiple sequence reads (Each has a readname that starts with >) in them:
Filename1
>Readname1
>Readname2
Filename2
>Readname1
>Readname3
Given a dictionary that contains all possible readnames like this:
g={}
g['Readname1']=[]
g['Readname2']=[]
g['Readname3']=[]
How could I write code that would iterate each file and generate the following matrix:
Filename1 Filename2
Readname1 1 1
Readname2 1 0
Readname3 0 1
The code should scan the contents of each file in the directory. Ideally I could read the dictionary from an input file, rather than hard-coded, so I can generate matrices for different dictionaries. The content of each read (e.g. its gene sequence) is not relevant, just whether the readname is present or absent in that file.
I am just learning python, so a colleague shared their code to get me started. Here they were creating a presence/absence matrix of their dictionary (Readnames) in a single specified file (files.txt). I would like to input the dictionary from a second file (so that it's not static in the code) and to iterate over multiple files.
from Bio import SeqIO
import os
dir_path="" #directory path
files=os.listdir(path=dir_path)
with open(dir_path+'files.txt') as f:
files=f.readlines()
files=[x.strip() for x in files]
enter code here
g={}
g['Readname1']=[]
g['Readname2']=[]
g['Readname3']=[]
for i in files:
a = list(SeqIO.parse(dir_path + i, 'fasta'))
for j in a:
g[j.id].append(i)
print('generating counts...')
counts={}
for i in g.keys():
counts[i]=[]
for i in files:
for j in g:
if i in g[j]:
counts[j].append(1)
else:
counts[j].append(0)
print('writing out...')
outfile=open(dir_path+'core_withLabels.csv','w')
outfile2=open(dir_path+'core_noLabels.csv','w')
temp_string=''
for i in files:
outfile.write(','+i)
temp_string=temp_string+i+','
temp_string=temp_string[:-1]
outfile2.write(temp_string+'\n')
outfile.write('\n')
for i in counts:
outfile.write(i)
temp_string=''
for j in counts[i]:
outfile.write(','+str(j))
temp_string=temp_string+str(j)+','
temp_string=temp_string[:-1]
outfile2.write(temp_string+'\n')
outfile.write('\n')
outfile.close()
outfile2.close()
By matrices, do you mean a numpy matrix or List[List[int]]?
If you know the total number of readnames, numpy matrix is an easy go. For numpy matrix, create a zero matrix of the corresponding size.
matrix = np.zeros((n_filenames, n_readnames), dtype=int)
Alternatively, define
matrix = [[] for _ in range(n_filenames)]
Also, define the map that maps readname to idx in the matrix
mapping = dict()
next_available_idx = 0
Then, iterate over all files, and fill out the corresponding entries with ones.
for i, filename in enumerate(filenames):
with open(filename) as f:
for readname in f:
readname.strip() # get rid of extra spaces
# find the corresponding column
if readname in mapping:
col_idx = mapping[readname]
else:
col_idx = next_available_idx
next_available_idx += 1
mapping[readname] = col_idx
matrix[i, col_idx] = 1 # for numpy matrix
"""
if you use list of lists, then:
matrix[i] += [0] * (col_idx - len(matrix[i]) + [1]
"""
Finally, if you use list of lists, please make sure that the length of all lists is the same. You need to iterate over the rows of matrix one more time.

How convert multidimensional array to two dimensional array

Here, my code feats value form text file; and create matrices as multidimensional array, but the problem is the code create more then two dimensional array, that I can't manipulate, I need two dimensional array, how I do that?
Explain algorithm of my code:
Moto of code:
My code fetch value from a specific folder, each folder contain 7 'txt' file, that generate from one user, in this way multiple folder contain multiple data of multiple user.
step1: Start a 1st for loop, and control it using how many folder have in specific folder,and in variable 'path' store the first path of first folder.
step2: Open the path and fetch data of 7 txt file using 2nd for loop.after feats, it close 2nd for loop and execute the rest code.
step3: Concat the data of 7 txt file in one 1d array.
step4(Here the problem arise): Store the 1d arry of each folder as 2d array.end first for loop.
Code:
import numpy as np
from array import *
import os
f_path='Result'
array_control_var=0
#for feacth directory path
for (path,dirs,file) in os.walk(f_path):
if(path==f_path):
continue
f_path_1= path +'\page_1.txt'
#Get data from page1 indivisualy beacuse there string type data exiest
pgno_1 = np.array(np.loadtxt(f_path_1, dtype='U', delimiter=','))
#only for page_2.txt
f_path_2= path +'\page_2.txt'
with open(f_path_2) as f:
str_arr = ','.join([l.strip() for l in f])
pgno_2 = np.asarray(str_arr.split(','), dtype=int)
#using loop feach data from those text file.datda type = int
for j in range(3,8):
#store file path using variable
txt_file_path=path+'\page_'+str(j)+'.txt'
if os.path.exists(txt_file_path)==True:
#genarate a variable name that auto incriment with for loop
foo='pgno_'+str(j)
else:
break
#pass the variable name as string and store value
exec(foo + " = np.array(np.loadtxt(txt_file_path, dtype='i', delimiter=','))")
#z=np.array([pgno_2,pgno_3,pgno_4,pgno_5,pgno_6,pgno_7])
#marge all array from page 2 to rest in single array in one dimensation
f_array=np.concatenate((pgno_2,pgno_3,pgno_4,pgno_5,pgno_6,pgno_7), axis=0)
#for first time of the loop assing this value
if array_control_var==0:
main_f_array=f_array
else:
#here the problem arise
main_f_array=np.array([main_f_array,f_array])
array_control_var+=1
print(main_f_array)
current my code generate array like this(for 3 folder)
[
array([[0,0,0],[0,0,0]]),
array([0,0,0])
]
Note: I don't know how many dimension it have
But I want
[
array(
[0,0,0]
[0,0,0]
[0,0,0])
]
I tried to write a recursive code that recursively flattens the list of lists into one list. It gives the desired output for your case, but I did not try it for many other inputs(And it is buggy for certain cases such as :list =[0,[[0,0,0],[0,0,0]],[0,0,0]])...
flat = []
def main():
list =[[[0,0,0],[0,0,0]],[0,0,0]]
recFlat(list)
print(flat)
def recFlat(Lists):
if len(Lists) == 0:
return Lists
head, tail = Lists[0], Lists[1:]
if isinstance(head, (list,)):
recFlat(head)
return recFlat(tail)
else:
return flat.append(Lists)
if __name__ == '__main__':
main()
My idea behind the code was to traverse the head of each list, and check whether it is an instance of a list or an element. If the head is an element, this means I have a flat list and I can return the list. Else, I should recursively traverse more.

Categories

Resources