while loops for processing FITS files(python) - python

I am new at programming in python and am in the process of trying to create a setup of processing thousands of files with one piece of code in python. I created a practice folder to do this in. In it are two FITS files (FITS1.fits and FITS2.fits). I did the following to put them both in a .txt file:
ls > practice.txt
Here is what I did next:
$ python
import numpy
import pyfits
import matplotlib.pyplot as plt
from matplotlib import pylab
from pylab import *
import asciidata
a = asciidata.open('practice.txt')
print a[0][0] #To test to see if practice.txt really contains my FITS files FITS1.fits
i = 0
while i <=1 #Now I attempt a while loop to read data from columns in FITS files, plot the numbers desired, save and show the figures. I chose i <=1 because there are only two FITS files in the text(also because of zero-indexing).
b = pyfits.getdata(a[0][i]) # "i" will be the index used to use a different file when the while loop gets to the end
time = b['TIME'] #'TIME' is a column in the FITS file
brightness = b['SAP_FLUX']
plt.plot(time, brightness)
xlabel('Time(days)')
ylabel('Brightness (e-/s)')
title(a[0][i])
pylab.savefig('a[0][i].png') #Here am I lost on how to get the while loop to name the saved figure something different every time. It takes the 'a[0][i].png' as a string and not as the index I am trying to make it be.
pylab.show()
i=i+1 # I placed this here, hoping that when the while loop gets to this point, it would start over again with a different "i" value
After pressing enter twice, I see the first figure as expected. Then I will close it and see the second. However, only the first figure is saved. Does anyone have any suggestions on how I can change my loop to do what I need it to?

In your code the i is being treated as the letter i, not the variable. If you wanted to keep this naming you could do something like:
FileName = 'a[0][%s].png' % i
pylab.savefig(FileName)

You should use glob to automatically get a the fits files as a list, from there using a for loop will let you iterate of the names of the files directly instead of using an index. When you call plt.savefig, you need to construct the file name you want to save it as. Here is the code cleaned up and put together:
from glob import glob
import pyfits
from matplotlib import pyplot as plt
files = glob('*.fits')
for file_name in files:
data = pyfits.getdata(file_name)
name = file_name[:-len('.fits')] # Remove .fits from the file name
time = data['TIME']
brightness = data['SAP_FLUX']
plt.plot(time, brightness)
plt.xlabel('Time(days)')
plt.ylabel('Brightness (e-/s)')
plt.title(name)
plt.savefig(name + '.png')
plt.show()

Related

How to automatically flatten and combine .csv files into one matrix in Python?

I have a bunch of .csv files in Python that are x by y dimensions. I know how to flatten and reshape matrices, but am having trouble automatically doing this for multiple files. Once I have flattened the matrices into one dimension, I also would like to stack them on top of each other, in one big matrix.
Is this even the proper way to do a for loop? I have not gotten to the part of stacking the linearized matrices on top of each other into one matrix yet. Would that involve the DataFrame.stack() function? When I run the code below it gives me an error.
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('./E/')) #created the list of files in a specific directory
del file_list[0] #removed an item from the list that I did not want
for file in range(0,26):
pd.read_csv('./E/' + print(file_list), header=None) #should read files
A = set(Int.flatten()) #should collapse matrix to one dimension
B = np.reshape(A, -1) #should make it linear going across
Since I don't know what your files look like, I'm not sure this will work. But still, the below code includes some concepts that should be useful:
import numpy as np
import pandas as pd
file_list = sorted(os.listdir('.\\E'))
del file_list[0]
# Eventually, all_files_array will contain len(file_list) elements, each of which is a file.
all_files_array = []
for i in range(len(file_list)):
file = file_list[i]
# Depending on how you saved your file, you may need to add index=None as an argument to read_csv.
this_file_arr = pd.read_csv('.\\E\\' + file, header=None)
# Change the dtype to int if that's what you're working with.
this_file_arr = this_file_arr.to_numpy(dtype=float, copy=False)
this_file_arr = np.unique(this_file_arr.flatten())
# In all my tests, the following line does absolutely nothing, but I guess it doesn't hurt.
this_file_arr = np.reshape(this_file_arr, -1)
all_files_array.append(this_file_arr)
all_files_array = np.array(all_files_array)
# all_files_array now has shape (len(files_list), x, y) where one file has shape (x, y).
The main takeaways are that:
os.listdir() doesn't work when the path has slashes at the end. Also, Python requires that / in path names be replaced with '\', so I've done that.
Using range instead of hard-coding the number of files to read is good practice in case you add more files to file_list later down the line, unless of course you don't want to read all the files in file_list.
A print statement inside pd.read_csv is at best useless, and at worst will throw an error.
this_file_arr.flatten() is a NumPy method, so this_file_arr needs to be a NumPy array, hence the to_numpy() line.
Because np.reshape doesn't take sets, I used np.unique instead to avoid converting to a non-NumPy structure. If you want to use NumPy methods, keep your data in a NumPy array and don't convert it to a list or set or anything else.
Let me know if you have any questions!

Using multiple FITS file

How can I open two fits file at the same time with astropy? Is it possible to work on multiple FITS file at the same time or do I have to work on one at a time?
You can open as many FITS files as you like. Each is represented by a HDUList object.
from astropy.io import fits
hdu_list1 = fits.open('file1.fits')
hdu_list2 = fits.open('file2.fits')
Then I'd suggest to call this to see what the FITS files contain:
hdu_list1.info()
hdu_list2.info()
You can then access any header and data information in those FITS files and do what you want. It goes something like this:
array1 = hdu_list1[0].data
array2 = hdu_list2[0].data
ratio = array1 / array2
If you want to make a plot:
import matplotlib.pyplot as plt
plt.imshow(ratio)
The Astropy docs are very good. E.g. you could start to learn about astropy.io.fits here or here.

Access data in several matlab files

I am currently trying to get data from several MATLab files. I am reading from a directory where I have all my data saved. I can read the data (ex. Gain) within the a Matlab file individually. I would like to add a for loop to read through all the files (different data but same format) the same "Gain," but every time I try to write a for loop, it gives me this error: TypeError: list indices must be integers or slices, not dict
import numpy as np
import sys
import scipy.io as sio
import scipy
import matplotlib.pyplot as plt
import tkinter as tk
from tkinter import *
from tkinter import filedialog
import os
#directory where all data will be stored
dataDir="C:/Users/me/Desktop/Data/"
Files=[] #list of files
lenght=len(Files)
for file in os.listdir(dataDir):
Files.append(scipy.io.loadmat(dataDir+file))
#initialize arrays
results=[lenght]
Gain=[lenght]
for files in Files:
results[files]=Files[files]['results']
#Gain in side of my results key in the file. I can read it in only one file
Gain[files]=results[files]['PowerDomain'][0,0]['Gain'][0,0]
print (files)
There are two separate issues with the code that are causing this error:
(1) Your code is initializing results and Gain to length-1 lists. Instead (based on what it appears you're trying to do), you should be initializing them to length lenght (sic*) lists, i.e.results = [None] * lenght and Gain = [None] * lenght.
(2) Files is an array of dicts. In each loop iteration, files (sic**) becomes one of those dicts. If you instead want files to be an index into the Files list, then your loop should be for files in range(0, lenght):.
Alternatively, if you want results and Gain to be dicts, you can initialize them as results = {} and similarly for Gain, and then in each iteration you can set their values like results(<filename>) = ... (where <filename> is the current file name.
Also, if you want to keep the loop of the form for files in Files:, that's fine, but keep in mind that files will actually be an element of Files rather than an index into Files.
* From a style perspective, you should correct the spelling of lenght to length.
** Also from a style perspective, you should probably not name the list index files; perhaps something like f. But this is more subjective.

Looping over files and plotting (Python)

My data is look like as in the picture. All of my datas are in .txt format and my aim is to loop over files and plot them. First row represents my variables
(WL, ABS, T%) so firstly I need to delete them before proceeding.
with open('Desktop/100-3.txt', 'r') as f:
data = f.read().splitlines(True)
with open('Desktop/100-3.txt', 'w') as f:
f.writelines(data[1:])
Probably it would not be necessary but I am very new in Numpy. Basically the algorithm will be as follows:
Read all the .txt files
Plot T% versus WL, plot ABS versus WL, save. (WL -> x variable)
Continue for the next file, .. (two graphs for every .txt file)
Then finish the loop, exit.
data looks like this
What I've tried
from numpy import loadtxt
import os
dizin = os.listdir(os.getcwd())
for i in dizin:
if i.endswith('.txt'):
data = loadtxt("??",float)
For data files like this I would prefer np.genfromtxt over np.loadtxt, it has many useful options you can look up in the docs. The glob module is also nice to iterate over directories with wildcards as filters:
from glob import glob
import numpy as np
import matplotlib.pyplot as plt
# loop over all files in the current directory ending with .txt
for fname in glob("./*.txt"):
# read file, skip header (1 line) and unpack into 3 variables
WL, ABS, T = np.genfromtxt(fname, skip_header=1, unpack=True)
# first plot
plt.plot(WL, T)
plt.xlabel('WL')
plt.ylabel('T%')
plt.show()
plt.clf()
# second plot
plt.plot(ABS, T)
plt.xlabel('WL')
plt.ylabel('ABS')
plt.show()
plt.clf()
The next step would be to do some research on matplotlib to make the plots look better.
Please let me know if the code does not work, I'll try to fix it then.
EDIT: Added plt.clf() to clear the figure before creating a new one.

Import Multiple Text files (Large Number) using numpy and Post Processing

This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.

Categories

Resources