How can I open two fits file at the same time with astropy? Is it possible to work on multiple FITS file at the same time or do I have to work on one at a time?
You can open as many FITS files as you like. Each is represented by a HDUList object.
from astropy.io import fits
hdu_list1 = fits.open('file1.fits')
hdu_list2 = fits.open('file2.fits')
Then I'd suggest to call this to see what the FITS files contain:
hdu_list1.info()
hdu_list2.info()
You can then access any header and data information in those FITS files and do what you want. It goes something like this:
array1 = hdu_list1[0].data
array2 = hdu_list2[0].data
ratio = array1 / array2
If you want to make a plot:
import matplotlib.pyplot as plt
plt.imshow(ratio)
The Astropy docs are very good. E.g. you could start to learn about astropy.io.fits here or here.
Related
I follow this question Append multiple numpy files to one big numpy file in python in order to put many numpy files in one big file, the result is:
import matplotlib.pyplot as plt
import numpy as np
import glob
import os, sys
fpath ="path_Of_my_final_Big_File"
npyfilespath ="path_of_my_numpy_files"
os.chdir(npyfilespath)
npfiles= glob.glob("*.npy")
npfiles.sort()
all_arrays = np.zeros((166601,8000))
for i,npfile in enumerate(npfiles):
all_arrays[i]=np.load(os.path.join(npyfilespath, npfile))
np.save(fpath, all_arrays)
data = np.load(fpath)
print data
print data.shape
I have thousands of files, by using this code, I have always a memory error, so I can't have my result file.
How to resolve this error?
How to read, write and append int the final numpy file by file, ?
Try to have a look to np.memmap. You can instantiateall_arrays:
all_arrays = np.memmap("all_arrays.dat", dtype='float64', mode='w+', shape=(166601,8000))
from the documentation:
Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.
You will be able to access all the array, but the operating system will take care of loading the part that you actually need. Read carefully the documentation page and note that from the performance point of view you can decide whether the file should be stored column-wise or row-wise.
My data is look like as in the picture. All of my datas are in .txt format and my aim is to loop over files and plot them. First row represents my variables
(WL, ABS, T%) so firstly I need to delete them before proceeding.
with open('Desktop/100-3.txt', 'r') as f:
data = f.read().splitlines(True)
with open('Desktop/100-3.txt', 'w') as f:
f.writelines(data[1:])
Probably it would not be necessary but I am very new in Numpy. Basically the algorithm will be as follows:
Read all the .txt files
Plot T% versus WL, plot ABS versus WL, save. (WL -> x variable)
Continue for the next file, .. (two graphs for every .txt file)
Then finish the loop, exit.
data looks like this
What I've tried
from numpy import loadtxt
import os
dizin = os.listdir(os.getcwd())
for i in dizin:
if i.endswith('.txt'):
data = loadtxt("??",float)
For data files like this I would prefer np.genfromtxt over np.loadtxt, it has many useful options you can look up in the docs. The glob module is also nice to iterate over directories with wildcards as filters:
from glob import glob
import numpy as np
import matplotlib.pyplot as plt
# loop over all files in the current directory ending with .txt
for fname in glob("./*.txt"):
# read file, skip header (1 line) and unpack into 3 variables
WL, ABS, T = np.genfromtxt(fname, skip_header=1, unpack=True)
# first plot
plt.plot(WL, T)
plt.xlabel('WL')
plt.ylabel('T%')
plt.show()
plt.clf()
# second plot
plt.plot(ABS, T)
plt.xlabel('WL')
plt.ylabel('ABS')
plt.show()
plt.clf()
The next step would be to do some research on matplotlib to make the plots look better.
Please let me know if the code does not work, I'll try to fix it then.
EDIT: Added plt.clf() to clear the figure before creating a new one.
I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).
My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).
Here is my code:
import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0
It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data is <type 'netCDF4.Variable'>, thus the copy. Is this the best/fastest way to do netCDF reads in python?
Thanks.
The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As JCOidl says it's not clear why you don't just use:
raw_data = openFile.variables['MergedReflectivityQCComposite'][:]
For more info see SciPy Cookbook and SO View onto a numpy array?
I'm not sure what to say about the np.copy operation (which is indeed slow), but I find that the PyNIO module from UCAR works well for both NetCDF and HDF files. This will place data into a numpy array:
import Nio
f = Nio.open_file(file, format="netcdf")
data = f.variables['MergedReflectivityQCComposite'][:]
f.close()
Testing your code versus the PyNIO code on a ndfCDF file I have resulted in 1.1 seconds for PyNIO, versus 3.1 seconds for the netCDF4 module. Your results may vary; worth a look though.
You can use xarray for that.
%matplotlib inline
import xarray as xr
### Single netcdf file ###
ds = xr.open_dataset('path/file.nc')
### Opening multiple NetCDF files and concatenating them by time ####
ds = xr.open_mfdatset('path/*.nc', concat_dim='time
To read the variable you can simply type ds.MergedReflectivityQCCompositeor ds.['MergedReflectivityQCComposite'][:]
You can also use xr.load_dataset but I find that it uses up more space than the open function. For xr.open_mfdataset, you can also chunk along the dimensions of the file if you want. There are other options for both functions and you might be interested to learn more about it in the xarray documentation.
This forum has been extremely helpful for a python novice like me to improve my knowledge. I have generated a large number of raw data in text format from my CFD simulation. My objective is to import these text files into python and do some postprocessing on them. This is a code that I have currently.
import numpy as np
from matplotlib import pyplot as plt
import os
filename=np.array(['v1-0520.txt','v1-0878.txt','v1-1592.txt','v1-3020.txt','v1-5878.txt'])
for i in filename:
format_name= i
path='E:/Fall2015/Research/CFDSimulations_Fall2015/ddn310/Autoexport/v1'
data= os.path.join(path,format_name)
X,Y,U,V,T,Tr = np.loadtxt(data,usecols=(1,2,3,4,5,6),skiprows=1,unpack = True) # Here X and Y represents the X and Y coordinate,U,V,T,Tr represents the Dependent Variables
plt.figure(1)
plt.plot(T,Y)
plt.legend(['vt1a','vtb','vtc','vtd','vte','vtf'])
plt.grid(b=True)
Is there a better way to do this, like importing all the text files (~10000 files) at once into python and then accessing whichever files I need for post processing (maybe indexing). All the text files will have the same number of columns and rows.
I am just a beginner to Python.I will be grateful if someone can help me or point me in the right direction.
Your post needs to be edited to show proper indentation.
Based on a quick read, I think you are:
reading a file, making a small edit, and write it back
then you load it into a numpy array and plot it
Presumably the purpose of your edit is to correct some header or value.
You don't need to write the file back. You can use content directly in loadtxt.
content = content.replace("nodenumber","#nodenumber") # Ignoring Node number column
data1=np.loadtxt(content.splitlines())
Y=data1[:,2]
temp=data1[:,5]
loadtxt accepts any thing that feeds it line by line. content.splitlines() makes a list of lines, which loadtxt can use.
the load could be more compact with:
Y, temp = np.loadtxt(content.splitlines(), usecols=(2,5), unpack=True)
With usecols you might not even need the replace step. You haven't given us a sample file to test.
I don't understand your multiple file needs. One way other you need to open and read each file, one by one. And it would be best to close one before going on to the next. The with open(name) as f: syntax is great for ensuring that a file is closed.
You could collect the loaded data in larger lists or arrays. If Y and temp are identical in size for all files, they can be collected into larger dimensional array, e.g. YY[i,:] = Y for the ith file, where YY is preallocated. If they can vary in size, it is better to collect them in lists.
I am new at programming in python and am in the process of trying to create a setup of processing thousands of files with one piece of code in python. I created a practice folder to do this in. In it are two FITS files (FITS1.fits and FITS2.fits). I did the following to put them both in a .txt file:
ls > practice.txt
Here is what I did next:
$ python
import numpy
import pyfits
import matplotlib.pyplot as plt
from matplotlib import pylab
from pylab import *
import asciidata
a = asciidata.open('practice.txt')
print a[0][0] #To test to see if practice.txt really contains my FITS files FITS1.fits
i = 0
while i <=1 #Now I attempt a while loop to read data from columns in FITS files, plot the numbers desired, save and show the figures. I chose i <=1 because there are only two FITS files in the text(also because of zero-indexing).
b = pyfits.getdata(a[0][i]) # "i" will be the index used to use a different file when the while loop gets to the end
time = b['TIME'] #'TIME' is a column in the FITS file
brightness = b['SAP_FLUX']
plt.plot(time, brightness)
xlabel('Time(days)')
ylabel('Brightness (e-/s)')
title(a[0][i])
pylab.savefig('a[0][i].png') #Here am I lost on how to get the while loop to name the saved figure something different every time. It takes the 'a[0][i].png' as a string and not as the index I am trying to make it be.
pylab.show()
i=i+1 # I placed this here, hoping that when the while loop gets to this point, it would start over again with a different "i" value
After pressing enter twice, I see the first figure as expected. Then I will close it and see the second. However, only the first figure is saved. Does anyone have any suggestions on how I can change my loop to do what I need it to?
In your code the i is being treated as the letter i, not the variable. If you wanted to keep this naming you could do something like:
FileName = 'a[0][%s].png' % i
pylab.savefig(FileName)
You should use glob to automatically get a the fits files as a list, from there using a for loop will let you iterate of the names of the files directly instead of using an index. When you call plt.savefig, you need to construct the file name you want to save it as. Here is the code cleaned up and put together:
from glob import glob
import pyfits
from matplotlib import pyplot as plt
files = glob('*.fits')
for file_name in files:
data = pyfits.getdata(file_name)
name = file_name[:-len('.fits')] # Remove .fits from the file name
time = data['TIME']
brightness = data['SAP_FLUX']
plt.plot(time, brightness)
plt.xlabel('Time(days)')
plt.ylabel('Brightness (e-/s)')
plt.title(name)
plt.savefig(name + '.png')
plt.show()