I need to interpolate bilinearly some air data of a hdf4/netcdf4/hdf5 file from a 240x240 structured grid on an arbitrary collection of coordinates. I have no idea on how to do this. I have tried using pyresample but that needs an AreaDefinition of target grid which is not possible in my case of unstructured target data (arbitrary points). Here is my code:
import numpy as np
import pyresample
from netCDF4 import Dataset
air_file = Dataset('air.hdf', mode='r')
air_data = air_file.variables['air_2m' ][:].flatten()
air_lon = air_file.variables['air_lon'][:].flatten()
air_lat = air_file.variables['air_lat'][:].flatten()
air_data = air_data.reshape(240,240)
air_lon = air_lon. reshape(240,240) # grid size is 240x240
air_lat = air_lat. reshape(240,240)
tar_lon = 100 * np.random.random((100,1)) # random points
tar_lat = 100 * np.random.random((100,1)) # random points
source_def = pyresample.geometry.SwathDefinition(lons=air_lon, lats=air_lat)
target_def = pyresample.geometry.SwathDefinition(lons=tar_lon, lats=tar_lat)
result = pyresample.bilinear.resample_bilinear(gmt_1500, source_def, target_def, radius=50e3, neighbours=32, nprocs=1, fill_value=None, reduce_data=True, segments=None, epsilon=0)
I am getting the following error (which is understood as it needs an AreaDefinition for target):
AttributeError: 'SwathDefinition' object has no attribute 'proj_str'
Is there any other way of doing this?
I'm not familiar with the pyresample package, but for bilinear interpolation in python I suggest referring to this earlier stackexchange thread which gives a number of useful examples:
How to perform bilinear interpolation in Python
p.s: By the way, if anyone wants to perform this task from the command line, you can also extract a set of points using bilinear interpolation with cdo too
# some bash loop over a pairs of x and y
cdo remapbil,lon=${x}/lat=${x} in.nc mypoint_${x}_${y}.nc
Related
I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!
In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)
I need to regrid data on a irregular grid (lambert conical) to a regular grid. I think pyresample is my best bet. Infact my original lat,lon are not 1D (which seems to be needed to use basemap.interp or scipy.interpolate.griddata).
I found this SO's answer helpful. However I get empty interpolated data. I think it has to do with the choice of my radius of influence and with the fact that my data are wrapped (??).
This is my code:
import numpy as np
from matplotlib import pyplot as plt
import netCDF4
%matplotlib inline
url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR/Dailies/monolevel/hlcy.2009.nc"
SRHtemp = netCDF4.Dataset(url).variables['hlcy'][0,::]
Y_n = netCDF4.Dataset(url).variables['y'][:]
X_n = netCDF4.Dataset(url).variables['x'][:]
T_n = netCDF4.Dataset(url).variables['time'][:]
lat_n = netCDF4.Dataset(url).variables['lat'][:]
lon_n = netCDF4.Dataset(url).variables['lon'][:]
lat_n and lon_n are irregular and the latitude and longitude corresponding to the projected coordinates x,y.
Because of the way lon_n is, I added:
lon_n[lon_n<0] = lon_n[lon_n<0]+360
so that now if I plot them they look nice and ok:
Then I create my new set of regular coordinates:
XI = np.arange(148,360)
YI = np.arange(0,87)
XI, YI = np.meshgrid(XI,YI)
Following the answer above I wrote the following code:
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest
def_a = SwathDefinition(lons=XI, lats=YI)
def_b = SwathDefinition(lons=lon_n, lats=lat_n)
interp_dat = resample_nearest(def_b,SRHtemp,def_a,radius_of_influence = 70000,fill_value = -9.96921e+36)
the resolution of the data is about 30km, so I put 70km, the fill_value I put is the one from the data, but of course I can just put zero or nan.
however I get an empty array.
What do I do wrong? also - if there is another way of doing it, I am interested in knowing it. Pyresample documentation is a bit thin, and I need a bit more help.
I did find this answer suggesting to use another griddata function:
import matplotlib.mlab as ml
resampled_data = ml.griddata(lon_n.ravel(), lat_n.ravel(),SRHtemp.ravel(),XI,YI,interp = "linear")
and it seems to be ok:
But I would like to understand more about pyresample, since it seems so powerful.
The problem is that XI and XI are integers, not floats. You can fix this by simply doing
XI = np.arange(148,360.)
YI = np.arange(0,87.)
XI, YI = np.meshgrid(XI,YI)
The inability to handle integer datatypes is an undocumented, unintuitive, and possibly buggy behavior from pyresample.
A few more notes on your coding style:
It's not necessary to overwrite the XI and YI variables, you don't gain much by this
You should just load the netCDF dataset once and the access the variables via that object
I am trying to splice a fits array based on the latitudes provided from the Header. However, I cannot seem to do so with my knowledge of Python and the documentation of astropy. The code I have is something like this:
from astropy.io import fits
import numpy as np
Wise1 = fits.open('Image1.fits')
im1 = Wise1[0].data
im1 = np.where(im1 > *latitude1, 0, im1)
newhdu = fits.PrimaryHDU(im1)
newhdulist = fits.HDUList([newhdu])
newhdulist.writeto('1b1_Bg_Removed_2.fits')
Here latitude1 would be a value in degrees, recognized after being called from the header. So there are two things I need to accomplish:
How to call the header to recognize Galactic Latitudes?
Splice the array in such a way that it only contains values for the range of latitudes, with everything else being 0.
I think by "splice" you mean "cut out" or "crop", based on the example you've shown.
astropy.nddata has a routine for world-coordinate-system-based (i.e., lat/lon or ra/dec) cutouts
However, in the simple case you're dealing with, you just need the coordinates of each pixel. Do this by making a WCS:
from astropy import wcs
w = wcs.WCS(Wise1[0].header)
xx,yy = np.indices(im.shape)
lon,lat = w.wcs_pix2world(xx,yy,0)
newim = im[lat > my_lowest_latitude]
But if you want to preserve the header information, you're much better off using the cutout tool, since you then do not have to manually manage this.
from astropy.nddata import Cutout2D
from astropy import coordinates
from astropy import units as u
# example coordinate - you'll have to figure one out that's in your map
center = coordinates.SkyCoord(mylon*u.deg, mylat*u.deg, frame='fk5')
# then make an array cutout
co = nddata.Cutout2D(im, center, size=[0.1,0.2]*u.arcmin, wcs=w)
# create a new FITS HDU
hdu = fits.PrimaryHDU(data=co.data, header=co.wcs.to_header())
# write to disk
hdu.writeto('cropped_file.fits')
An example use case is in the astropy documentation.
I am currently working with BUFR files with wind data. When I read this file on python I get 4 large vectors, latitude vector, longitude vector, wind_direction vector, and wind_speed vector.
Both wind vectors are masked python arrays because there is non-valid data. This happens because the data comes from a non-geostationary satellite. In fact I successfully generated the following image from this BUFR file to show you the general shape that the data takes.
In this image I have plotted a color field to represent the wind speed, while the arrows obviously represent the wind direction.
Please notice the two bands of actual data. Unfortunately the way I am plotting the data, generates a third band (where the color field is smooth), in-between the actual data bands. This is an artefact of the function pcolormesh. If I could superimpose two `pcolormesh plots, each one representing one of the bands, this problem would disappear.
Unfortunately, I do not know how I could separate the data "regions". I have thought about clustering techniques but do not know how to cluster along latlon data using ANOTHER array (the wind data) as the clustering rule.
This is my current code:
#!/usr/bin/python
import bufr
import numpy as np
import sys
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
from matplotlib import mlab
WIND_DIR_INDEX = 97
WIND_SPEED_INDEX = 96
bfrfile = sys.argv[1]
print bfrfile
bfr = bufr.BUFRFile(bfrfile)
lon = []
lat = []
wind_d = []
wind_s = []
for record in bfr:
for entry in record:
if entry.index == WIND_DIR_INDEX:
wind_d.append(entry.data)
if entry.index == WIND_SPEED_INDEX:
wind_s.append(entry.data)
if entry.name.find("LONGITUDE") == 0:
lon.append(entry.data)
if entry.name.find("LATITUDE") == 0:
lat.append(entry.data)
lons = np.concatenate(lon)
lats = np.concatenate(lat)
winds_d = np.concatenate(wind_d)
winds_s = np.concatenate(wind_s)
winds_d = np.ma.masked_greater(winds_d,1.0e+6)
winds_s = np.ma.masked_greater(winds_s,1.0e+6)
windu = np.cos((winds_d-180)*(np.pi/180))
windv = np.sin((winds_d-180)*(np.pi/180))
# Data interpolation for pcolormesh (needs gridded data)
xi = np.linspace(lons.min(),lons.max(),lons.size/10)
yi = np.linspace(lats.min(),lats.max(),lats.size/10)
Z = mlab.griddata(lons,lats,winds_s,xi,yi)
X,Y = np.meshgrid(xi,yi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
plt.quiver(lons[::5],lats[::5],windu[::5],windv[::5],linewidths=0)
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat.png',bbox_inches=0,dpi=5*mydpi)
mydpi = 96
fig = plt.figure(frameon=True)
fig.set_size_inches(1600/mydpi,1200/mydpi)
ax = plt.Axes(fig,[0,0,1,1])
#ax.set_axis_off()
fig.add_axes(ax)
plt.hold(True);
try:
plt.pcolormesh(X,Y,Z,alpha=None)
plt.clim(0,10)
except ValueError:
pass
print "Warning: Empty data array."
for method in (ax.set_xticks,ax.set_xticklabels,ax.set_yticks,ax.set_yticklabels):
method([])
fig.savefig('/home/cendas/bin/python/bufr_ascat_color.png',bbox_inches=0,dpi=5*mydpi)
I then usually follow this python code with the following terminal commands to combine the images:
convert bufr_ascat.png -transparent white bufr_ascat.png
convert bufr_ascat_color.png -transparent white bufr_ascat_color.png
composite bufr_ascat.png bufr_ascat_color.png bufrascat.png
Don't abuse clustering for this.
What you need is a simple selection / filtering; not a structure discovery process.
Choose the mean of the masked data. All non-masked data left of that mean is the left part, all non-masked data on the right is the other?
Clustering is the wrong tool for this task.
In my application, the data data is sampled on a distorted grid, and I would like to resample it to a nondistorted grid. In order to test this, I wrote this program with examplary distortions and a simple function as data:
from __future__ import division
import numpy as np
import scipy.interpolate as intp
import pylab as plt
# Defining some variables:
quadratic = -3/128
linear = 1/16
pn = np.poly1d([quadratic, linear,0])
pixels_x = 50
pixels_y = 30
frame = np.zeros((pixels_x,pixels_y))
x_width= np.concatenate((np.linspace(8,7.8,57) , np.linspace(7.8,8,pixels_y-57)))
def data(x,y):
z = y*(np.exp(-(x-5)**2/3) + np.exp(-(x)**2/5) + np.exp(-(x+5)**2))
return(z)
# Generating grid coordinates
yt = np.arange(380,380+pixels_y*4,4)
xt = np.linspace(-7.8,7.8,pixels_x)
X, Y = np.meshgrid(xt,yt)
Y=Y.T
X=X.T
Y_m = np.zeros((pixels_x,pixels_y))
X_m = np.zeros((pixels_x,pixels_y))
# generating distorted grid coordinates:
for i in range(pixels_y):
Y_m[:,i] = Y[:,i] - pn(xt)
X_m[:,i] = np.linspace(-x_width[i],x_width[i],pixels_x)
# Sample data:
for i in range(pixels_y):
for j in range(pixels_x):
frame[j,i] = data(X_m[j,i],Y_m[j,i])
Y_m = Y_m.flatten()
X_m = X_m.flatten()
frame = frame.flatten()
##
Y = Y.flatten()
X = X.flatten()
ipf = intp.interp2d(X_m,Y_m,frame)
interpolated_frame = ipf(xt,yt)
At this point, I have to questions:
The code works, but I get the the following warning:
Warning: No more knots can be added because the number of B-spline coefficients
already exceeds the number of data points m. Probably causes: either
s or m too small. (fp>s)
kx,ky=1,1 nx,ny=54,31 m=1500 fp=0.000006 s=0.000000
Also, some interpolation artifacts appear, and I assume that they are related to the warning - Do you guys know what I am doing wrong?
For my actual applications, the frames need to be around 500*100, but when doing this, I get a MemoryError - Is there something I can do to help that, apart from splitting the frame into several parts?
Thanks!
This problem is most likely related to the usage of bisplrep and bisplev within interp2d. The docs mention that they use a smooting factor of s=0.0 and that bisplrep and bisplev should be used directly if more control over s is needed. The related docs mention that s should be found between (m-sqrt(2*m),m+sqrt(2*m)) where m is the number of points used to construct the splines. I had a similar problem and found it solved when using bisplrep and bisplev directly, where s is only optional.
For 2d interpolation,
griddata
is solid, local, fast.
Take a look at problem-with-2d-interpolation-in-scipy-non-rectangular-grid on SO.
You might want to look at the following interp method in basemap:
mpl_toolkits.basemap.interp
http://matplotlib.sourceforge.net/basemap/doc/html/api/basemap_api.html
unless you really need spline-based interpolation.