How to slice and loop through a netCDF variable in Python? - python

I have a netCDF variable with 372 time-steps, I need to slice this variable to read in each individual time-step for subsequent processing.
I have used glob. to read in my 12 netCDF files and then defined the variables.
NAME_files = glob.glob('RGL*nc')
NAME_files = NAME_files[0:12]
for n in (NAME_files):
RGL = Dataset(n, mode='r')
footprint = RGL.variables['fp'][:]
lons = RGL.variables['lon'][:]
lats = RGL.variables['lat'][:]
I now need to repeat the code below in a loop for each of the 372 time-steps of the variable 'footprint'.
footprint_2 = RGL.variables['fp'][:,:,1:2]
I'm new to Python and have a poor grasp of looping. Any help would be appreciated, including better explanation/description of my issue.

You need to determine both the dimensions and shape of the fp variable in order to access it properly.
I'm making assumptions here about those values.
Your code implies 3 dimensions: time,lon,lat. Again just assuming.
footprint_2 = RGL.variables['fp'][:,:,1:2]
But the code above gets all the times, all the lons, for 1 latitude. Slice 1:2 selects 1 value.
fp_dims = RGL.variables['fp'].dimensions
print(fp_dims)
# a tuple of dimesions names
(u'time', u'lon', u'lat')
fp_shape = RGL.variables['fp'].shape
# a tuple of dimesions sizes or lengths
print(fp_shape)
(372, 30, 30)
len = fp_shape[0]
for time_idx in range(0,len)):
# you don't say if you want a single lon,lat or all the lon,lat's for a given time step.
test = RGL.variables['fp'][time_idx,:,:]
# or if you really want this:
test = RGL.variables['fp'][time_idx,:,1:2]
# or a single lon, lat
test = RGL.variables['fp'][time_idx,8,8]

Related

Create arrays of fixed size within a while loop in python

I am trying to create arrays of fixed size within a while loop. Since I do not know how many arrays I have to create, I am using a loop to initiate them within a while loop. The problem I am facing is, with the array declaration.I would like the name of each array to end with the index of the while loop, so it will be later useful for my calculations. I do not expect to find a easy way out, however it would be great if someone can point me in the right direction
I tried using arrayname + str(i). This returns the error 'Can't assign to operator'.
#parse through the Load vector sheet to load the values of the stress vector into the dataframe
Loadvector = x2.parse('Load_vector')
Lvec_rows = len(Loadvector.index)
Lvec_cols = len(Loadvector.columns)
i = 0
while i < Lvec_cols:
y_values + str(i) = np.zeros(Lvec_rows)
i = i +1
I expect arrays with names arrayname1, arrayname2 ... to be created.
I think the title is somewhat misleading.
An easy way to do this would be using a dictionary:
dict_of_array = {}
i = 0
while i < Lvec_cols:
dict_of_array[y_values + str(i)] = np.zeros(Lvec_rows)
i = i +1
and you can access arrayname1 by dict_of_array[arrayname1].
If you want to create a batch of arrays, try:
i = 0
while i < Lvec_cols:
exec('{}{} = np.zeros(Lvec_rows)'.format(y_values, i))
i = i +1

How to use function like matlab 'fread' in python?

This is a .dat file.
In Matlab, I can use this code to read.
lonlatfile='NOM_ITG_2288_2288(0E0N)_LE.dat';
f=fopen(lonlatfile,'r');
lat_fy=fread(f,[2288*2288,1],'float32');
lon_fy=fread(f,[2288*2288,1],'float32')+86.5;
lon=reshape(lon_fy,2288,2288);
lat=reshape(lat_fy,2288,2288);
Here are some results of Matlab:
matalab
How to do in python to get the same result?
PS: My code is this:
def fromfileskip(fid,shape,counts,skip,dtype):
"""
fid : file object, Should be open binary file.
shape : tuple of ints, This is the desired shape of each data block.
For a 2d array with xdim,ydim = 3000,2000 and xdim = fastest
dimension, then shape = (2000,3000).
counts : int, Number of times to read a data block.
skip : int, Number of bytes to skip between reads.
dtype : np.dtype object, Type of each binary element.
"""
data = np.zeros((counts,) + shape)
for c in range(counts):
block = np.fromfile(fid,dtype=np.float32,count=np.product(shape))
data[c] = block.reshape(shape)
fid.seek( fid.tell() + skip)
return data
fid = open(r'NOM_ITG_2288_2288(0E0N)_LE.dat','rb')
data = fromfileskip(fid,(2288,2288),1,0,np.float32)
loncenter = 86.5 #Footpoint of FY2E
latcenter = 0
lon2e = data+loncenter
lat2e = data+latcenter
Lon = lon2e.reshape(2288,2288)
Lat = lat2e.reshape(2288,2288)
But, the result is different from that of Matlab.
You should be able to translate the code directly into Python with little change:
lonlatfile = 'NOM_ITG_2288_2288(0E0N)_LE.dat'
with open(lonlatfile, 'rb') as f:
lat_fy = np.fromfile(f, count=2288*2288, dtype='float32')
lon_fy = np.fromfile(f, count=2288*2288, dtype='float32')+86.5
lon = lon_ft.reshape([2288, 2288], order='F');
lat = lat_ft.reshape([2288, 2288], order='F');
Normally the numpy reshape would be transposed compared to the MATLAB result, due to different index orders. The order='F' part makes sure the final output has the same layout as the MATLAB version. It is optional, if you remember the different index order you can leave that off.
The with open() as f: opens the file in a safe manner, making sure it is closed again when you are done even if the program has an error or is cancelled for whatever reason. Strictly speaking it is not needed, but you really should always use it when opening a file.

Using astropy.fits and numpy to apply coincidence corrections to SWIFT fits image

This question may be a little specialist, but hopefully someone might be able to help. I normally use IDL, but for developing a pipeline I'm looking to use python to improve running times.
My fits file handling setup is as follows:
import numpy as numpy
from astropy.io import fits
#Directory: /Users/UCL_Astronomy/Documents/UCL/PHASG199/M33_UVOT_sum/UVOTIMSUM/M33_sum_epoch1_um2_norm.img
with fits.open('...') as ima_norm_um2:
#Open UVOTIMSUM file once and close it after extracting the relevant values:
ima_norm_um2_hdr = ima_norm_um2[0].header
ima_norm_um2_data = ima_norm_um2[0].data
#Individual dimensions for number of x pixels and number of y pixels:
nxpix_um2_ext1 = ima_norm_um2_hdr['NAXIS1']
nypix_um2_ext1 = ima_norm_um2_hdr['NAXIS2']
#Compute the size of the images (you can also do this manually rather than calling these keywords from the header):
#Call the header and data from the UVOTIMSUM file with the relevant keyword extensions:
corrfact_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
coincorr_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
#Check that the dimensions are all the same:
print(corrfact_um2_ext1.shape)
print(coincorr_um2_ext1.shape)
print(ima_norm_um2_data.shape)
# Make a new image file to save the correction factors:
hdu_corrfact = fits.PrimaryHDU(corrfact_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_corrfact]).writeto('.../M33_sum_epoch1_um2_corrfact.img')
# Make a new image file to save the corrected image to:
hdu_coincorr = fits.PrimaryHDU(coincorr_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_coincorr]).writeto('.../M33_sum_epoch1_um2_coincorr.img')
I'm looking to then apply the following corrections:
# Define the variables from Poole et al. (2008) "Photometric calibration of the Swift ultraviolet/optical telescope":
alpha = 0.9842000
ft = 0.0110329
a1 = 0.0658568
a2 = -0.0907142
a3 = 0.0285951
a4 = 0.0308063
for i in range(nxpix_um2_ext1 - 1): #do begin
for j in range(nypix_um2_ext1 - 1): #do begin
if (numpy.less_equal(i, 4) | numpy.greater_equal(i, nxpix_um2_ext1-4) | numpy.less_equal(j, 4) | numpy.greater_equal(j, nxpix_um2_ext1-4)): #then begin
#UVM2
corrfact_um2_ext1[i,j] == 0
coincorr_um2_ext1[i,j] == 0
else:
xpixmin = i-4
xpixmax = i+4
ypixmin = j-4
ypixmax = j+4
#UVM2
ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax])
xvec_UVM2 = ft*ima_UVM2sum
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2*xvec_UVM2) + (a3*xvec_UVM2*xvec_UVM2*xvec_UVM2) + (a4*xvec_UVM2*xvec_UVM2*xvec_UVM2*xvec_UVM2)
Ctheory_UVM2 = - alog(1-(alpha*ima_UVM2sum*ft))/(alpha*ft)
corrfact_um2_ext1[i,j] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum)
coincorr_um2_ext1[i,j] = corrfact_um2_ext1[i,j]*ima_sk_um2[i,j]
The above snippet is where it is messing up, as I have a mixture of IDL syntax and python syntax. I'm just not sure how to convert certain aspects of IDL to python. For example, the ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax]) I'm not quite sure how to handle.
I'm also missing the part where it will update the correction factor and coincidence correction image files, I would say. If anyone could have the patience to go over it with a fine tooth comb and suggest the neccessary changes I need that would be excellent.
The original normalised image can be downloaded here: Replace ... in above code with this file
One very important thing about numpy is that it does every mathematical or comparison function on an element-basis. So you probably don't need to loop through the arrays.
So maybe start where you convolve your image with a sum-filter. This can be done for 2D images by astropy.convolution.convolve or scipy.ndimage.filters.uniform_filter
I'm not sure what you want but I think you want a 9x9 sum-filter that would be realized by
from scipy.ndimage.filters import uniform_filter
ima_UVM2sum = uniform_filter(ima_norm_um2_data, size=9)
since you want to discard any pixel that are at the borders (4 pixel) you can simply slice them away:
ima_UVM2sum_valid = ima_UVM2sum[4:-4,4:-4]
This ignores the first and last 4 rows and the first and last 4 columns (last is realized by making the stop value negative)
now you want to calculate the corrections:
xvec_UVM2 = ft*ima_UVM2sum_valid
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2**2) + (a3*xvec_UVM2**3) + (a4*xvec_UVM2**4)
Ctheory_UVM2 = - np.alog(1-(alpha*ima_UVM2sum_valid*ft))/(alpha*ft)
these are all arrays so you still do not need to loop.
But then you want to fill your two images. Be careful because the correction is smaller (we inored the first and last rows/columns) so you have to take the same region in the correction images:
corrfact_um2_ext1[4:-4,4:-4] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum_valid)
coincorr_um2_ext1[4:-4,4:-4] = corrfact_um2_ext1[4:-4,4:-4] *ima_sk_um2
still no loop just using numpys mathematical functions. This means it is much faster (MUCH FASTER!) and does the same.
Maybe I have forgotten some slicing and that would yield a Not broadcastable error if so please report back.
Just a note about your loop: Python's first axis is the second axis in FITS and the second axis is the first FITS axis. So if you need to loop over the axis bear that in mind so you don't end up with IndexErrors or unexpected results.

Python: slow for loop performance on reading, extracting and writing from a list of thousands of files

I am extracting 150 different cell values from 350,000 (20kb) ascii raster files. My current code is fine for processing the 150 cell values from 100's of the ascii files, however it is very slow when running on the full data set.
I am still learning python so are there any obvious inefficiencies? or suggestions to improve the below code.
I have tried closing the 'dat' file in the 2nd function; no improvement.
dat = None
First: I have a function which returns the row and column locations from a cartesian grid.
def world2Pixel(gt, x, y):
ulX = gt[0]
ulY = gt[3]
xDist = gt[1]
yDist = gt[5]
rtnX = gt[2]
rtnY = gt[4]
pixel = int((x - ulX) / xDist)
line = int((ulY - y) / xDist)
return (pixel, line)
Second: A function to which I pass lists of 150 'id','x' and 'y' values in a for loop. The first function is called within and used to extract the cell value which is appended to a new list. I also have a list of files 'asc_list' and corresponding times in 'date_list'. Please ignore count / enumerate as I use this later; unless it is impeding efficiency.
def asc2series(id, x, y):
#count = 1
ls_id = []
ls_p = []
ls_d = []
for n, (asc,date) in enumerate(zip(asc, date_list)):
dat = gdal.Open(asc_list)
gt = dat.GetGeoTransform()
pixel, line = world2Pixel(gt, east, nort)
band = dat.GetRasterBand(1)
#dat = None
value = band.ReadAsArray(pixel, line, 1, 1)[0, 0]
ls_id.append(id)
ls_p.append(value)
ls_d.append(date)
Many thanks
In world2pixel you are setting rtnX and rtnY which you don't use.
You probably meant gdal.Open(asc) -- not asc_list.
You could move gt = dat.GetGeoTransform() out of the loop. (Rereading made me realize you can't really.)
You could cache calls to world2Pixel.
You're opening dat file for each pixel -- you should probably turn the logic around to only open files once and lookup all the pixels mapped to this file.
Benchmark, check the links in this podcast to see how: http://talkpython.fm/episodes/show/28/making-python-fast-profiling-python-code

Can vtk InsertValue take float arguments?

I have a question about InsertValue
If I understand it only takes integer arguements. I was wondering if there is a way to have it take float values? Or maybe some other function that does the job of InsertValue but takes float values? I know there is InsertNextValue, but I am not sure if it'll be efficient in my case since my array is a very big array (~ 100.000 by 120)
Below is my code and in my code I am making the entries of fl values integers to make it work for now but ideally it'll be great if I don't have to do that.
Thanks in advance :)
import vtk
import math
from vtk import vtkStructuredGrid, vtkPoints, vtkFloatArray, vtkXMLStructuredGridWriter
import scipy.io
import numpy
import os
#loading the matlab files
mats = scipy.io.loadmat('/home/lusine/data/3DDA/donut_for_vtk/20130228_050000_3D_E=1.mat')
#x,y,z coordinate, fl flux values
xx = mats['xvect']
yy = mats['yvect']
zz = mats['zvect']
fl = mats['fluxmesh3d'] #3d matrix
nx = xx.shape[1]
ny = yy.shape[1]
nz = zz.shape[1]
fl = numpy.nan_to_num(fl)
inx = numpy.nonzero(fl)
l = len(inx[1])
grid = vtk.vtkStructuredGrid()
grid.SetDimensions(nx,ny,nz) # sets the dimensions of the grid
pts = vtk.vtkPoints() # represents 3D points, The data model for vtkPoints is an array of vx-vy-vz triplets accessible by (point or cell) id.
pts.SetNumberOfPoints(nx*ny*nz) # Specify the number of points for this object to hold.
p=0
for i in range(l):
pts.InsertPoint(p, xx[0][inx[0][i]], yy[0][inx[1][i]], zz[0][inx[2][i]])
p = p + 1
SetPoint()
grid.SetPoints(pts)
cdata = vtk.vtkFloatArray()
cdata.SetNumberOfComponents(1)
cdata.SetNumberOfTuples((nx-1)*(ny-1)*(nz-1))
cdata.SetName('cellData')
p=0
for i in range(l-1):
cdata.InsertValue(p,inx[0][i]+inx[1][i]+inx[2][i])
p = p+1
grid.GetCellData().SetScalars(cdata)
pdata = vtk.vtkFloatArray()
pdata.SetNumberOfComponents(1)
#Get the number of tuples (a component group) in the array
pdata.SetNumberOfTuples(nx*ny*nz)
#Sets the array name
pdata.SetName('pointData')
for i in range(l):
pdata.InsertValue(int(fl[inx[0][i]][inx[1][i]][inx[2][i]]), inx[0][i]+inx[1][i]+inx[2][i])
grid.GetPointData().SetScalars(pdata)
writer = vtk.vtkXMLStructuredGridWriter()
writer.SetFileName('new_grid.vts')
#writer.SetInput(grid)
writer.SetInputData(grid)
writer.Update()
print 'end'
The first argument of InsertValue requires an integer because it's the index where the value is going to be inserted. If instead of a vtkFloatArray pdata you had a numpy array called p, this would be the equivalent of your instruction:
pdata.InsertValue(a,b) becomes p[a]=b
p[0.1] wouldn't make sense, it a must be an integer!
But I am a bit lost on the data. What do you mean that your array is (~ 100.000 by 120)..do you have 100.000 points, and each point has a vector of 120 components? In such a case, your pdata should have 120 components, and for each point point_index you call
pdata.SetTuple[point_index,[v0,v1...,v119]
or
pdata.SetComponent[point_index,0,v0]
...
pdata.SetComponent[point_index,119,v119]
If not, are you sure that you have to access pdata based on fl values (you have to be sure that fl is int, 0 <= fl < ntuples, and that you are not going to have holes). Check if you can do the same thing that you do for cdata (btw in your code p is always equal to i, you can just use i)
It's also possible to copy a numpy array directly to vtk , see http://vtk.1045678.n5.nabble.com/vtk-to-numpy-how-to-get-a-vtk-array-tp1244891p1244895.html , but you have to be very careful with the structure of your data

Categories

Resources