Issues with Scipy interpolate griddata - python

I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!

In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')

I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)

Related

Bilinear interpolation from structured grid to unstructured grid (arbitrary points)

I need to interpolate bilinearly some air data of a hdf4/netcdf4/hdf5 file from a 240x240 structured grid on an arbitrary collection of coordinates. I have no idea on how to do this. I have tried using pyresample but that needs an AreaDefinition of target grid which is not possible in my case of unstructured target data (arbitrary points). Here is my code:
import numpy as np
import pyresample
from netCDF4 import Dataset
air_file = Dataset('air.hdf', mode='r')
air_data = air_file.variables['air_2m' ][:].flatten()
air_lon = air_file.variables['air_lon'][:].flatten()
air_lat = air_file.variables['air_lat'][:].flatten()
air_data = air_data.reshape(240,240)
air_lon = air_lon. reshape(240,240) # grid size is 240x240
air_lat = air_lat. reshape(240,240)
tar_lon = 100 * np.random.random((100,1)) # random points
tar_lat = 100 * np.random.random((100,1)) # random points
source_def = pyresample.geometry.SwathDefinition(lons=air_lon, lats=air_lat)
target_def = pyresample.geometry.SwathDefinition(lons=tar_lon, lats=tar_lat)
result = pyresample.bilinear.resample_bilinear(gmt_1500, source_def, target_def, radius=50e3, neighbours=32, nprocs=1, fill_value=None, reduce_data=True, segments=None, epsilon=0)
I am getting the following error (which is understood as it needs an AreaDefinition for target):
AttributeError: 'SwathDefinition' object has no attribute 'proj_str'
Is there any other way of doing this?
I'm not familiar with the pyresample package, but for bilinear interpolation in python I suggest referring to this earlier stackexchange thread which gives a number of useful examples:
How to perform bilinear interpolation in Python
p.s: By the way, if anyone wants to perform this task from the command line, you can also extract a set of points using bilinear interpolation with cdo too
# some bash loop over a pairs of x and y
cdo remapbil,lon=${x}/lat=${x} in.nc mypoint_${x}_${y}.nc

Plotting masked array that has been gridded using griddata

I have a 2D array of satellite data, and two corresponding 2D arrays giving the latitude and longitude of each pixel.
The data array is a masked array.
When I plot it up using pcolormesh, it looks like this:
m.pcolormesh(lon, lat, data)
I am attempting to grid this data on to a 0.25x0.25 deg grid.
lonGrid = arange(0, 360, 0.25)
latGrid = arange(-90, 90 0.25)
dataGridded = griddata(lon.ravel(),lat.ravel(),data.ravel(),latGrid,lonGrid, interp='linear')
m.pcolormesh(lonGrid, latGrid, dataGridded)
However, the resulting plot comes out as this:
It seems like this error has something to do with pcolormesh filling in the space between masked values. But I am unsure how to fix this.
Thanks
EDIT:
I was able to use the scipy version of griddata to get this to work...but its much slower and the syntax is more clunky. I would still appreciate some help getting the mpl(?) version above to work
from scipy.interpolate import griddata as griddata2
lonGrid,latGrid = meshgrid(lonGrid,latGrid)
dataGrid = griddata2((lon.ravel(),lat.ravel()),data.ravel(),(lonGrid,latGrid), method = 'linear')
dataGrid = ma.masked_where((dataGrid < 0) | isnan(dataGrid), dataGrid)
m.pcolormesh(lonGrid, latGrid, dataGridded)
Here are a couple initial troubleshooting ideas.
What version of Numpy are you using? If 1.09 or earlier the .ravel() will not return a masked array if given a masked array. See here.
The data array "wind" became "data". Is "data" truly masked? What happened between the two? Some more code would be useful.
dataGridded = griddata(lon.ravel(),lat.ravel(),XXXX.ravel(),latGrid,lonGrid, interp='linear')

understanding pyresample to regrid irregular grid data to a regular grid

I need to regrid data on a irregular grid (lambert conical) to a regular grid. I think pyresample is my best bet. Infact my original lat,lon are not 1D (which seems to be needed to use basemap.interp or scipy.interpolate.griddata).
I found this SO's answer helpful. However I get empty interpolated data. I think it has to do with the choice of my radius of influence and with the fact that my data are wrapped (??).
This is my code:
import numpy as np
from matplotlib import pyplot as plt
import netCDF4
%matplotlib inline
url = "http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/NARR/Dailies/monolevel/hlcy.2009.nc"
SRHtemp = netCDF4.Dataset(url).variables['hlcy'][0,::]
Y_n = netCDF4.Dataset(url).variables['y'][:]
X_n = netCDF4.Dataset(url).variables['x'][:]
T_n = netCDF4.Dataset(url).variables['time'][:]
lat_n = netCDF4.Dataset(url).variables['lat'][:]
lon_n = netCDF4.Dataset(url).variables['lon'][:]
lat_n and lon_n are irregular and the latitude and longitude corresponding to the projected coordinates x,y.
Because of the way lon_n is, I added:
lon_n[lon_n<0] = lon_n[lon_n<0]+360
so that now if I plot them they look nice and ok:
Then I create my new set of regular coordinates:
XI = np.arange(148,360)
YI = np.arange(0,87)
XI, YI = np.meshgrid(XI,YI)
Following the answer above I wrote the following code:
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest
def_a = SwathDefinition(lons=XI, lats=YI)
def_b = SwathDefinition(lons=lon_n, lats=lat_n)
interp_dat = resample_nearest(def_b,SRHtemp,def_a,radius_of_influence = 70000,fill_value = -9.96921e+36)
the resolution of the data is about 30km, so I put 70km, the fill_value I put is the one from the data, but of course I can just put zero or nan.
however I get an empty array.
What do I do wrong? also - if there is another way of doing it, I am interested in knowing it. Pyresample documentation is a bit thin, and I need a bit more help.
I did find this answer suggesting to use another griddata function:
import matplotlib.mlab as ml
resampled_data = ml.griddata(lon_n.ravel(), lat_n.ravel(),SRHtemp.ravel(),XI,YI,interp = "linear")
and it seems to be ok:
But I would like to understand more about pyresample, since it seems so powerful.
The problem is that XI and XI are integers, not floats. You can fix this by simply doing
XI = np.arange(148,360.)
YI = np.arange(0,87.)
XI, YI = np.meshgrid(XI,YI)
The inability to handle integer datatypes is an undocumented, unintuitive, and possibly buggy behavior from pyresample.
A few more notes on your coding style:
It's not necessary to overwrite the XI and YI variables, you don't gain much by this
You should just load the netCDF dataset once and the access the variables via that object

How do I subset a 2D grid from another 2D grid in python?

I have gridded data over the contiguous United States and I'm trying to select a chunk of it over a specific area.
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
filename = '/Users/me/myfile.nc'
full_data = Dataset(filename,'r')
latitudes = full_data.variables['latitude'][0,:,:]
longitudes = full_data.variables['longitude'][0,:,:]
temperature = full_data.variables['temperature'][0,:,:]
All three variables are 2-dimensional matrices of shape (337,451). I'm trying to do the following to get a sub-selection of the data over a specific region.
index = (latitudes>=44.0)&(latitudes<=45.0)&(longitudes>=-91.0)&(longitudes<=-89.0)
temp_subset = temperature[index]
lat_subset = latitudes[index]
lon_subset = longitudes[index]
I would expect all three of these variables to be 2-dimensional, but instead they all return a flattened array with a shape of (102,). I've tried another approach:
index2 = np.where((latitudes>=44.0)&(latitudes<=45.0)&(longitudes>=-91.0)&(longitudes<=-89.0))
temp = temperatures[index2[0],:]
temp2 = temp[:,index2[1]]
plt.imshow(temp2,origin='lower')
plt.colobar()
But my data looks quite incorrect. Is there a better way to get a 2D subset grid from a larger grid?
Edub,
I suggest looking on at numpy's matrix indexing documentation, specifically http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html#other-indexing-options . Currently, you are providing two dimensions for indexing, but no slicing information (resulting in only receiving one dimensional results). I hope this proves useful!

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.

Categories

Resources