Interpolation gridded data to geographical point location - python

I am a big fan of MetPy and had a look at their interpolation functions (https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.html) but could not find what I was looking for.
I am looking for a function to interpolate a gridded 2D (lon and lat) or 3D (lon, lat and vertical levels) climate data field to a specific geographic location (lat/lon).
The function would take 5 arguments: a 2D/3D data variable and associated latitude and longitude variables, as well as the two desired latitude and longitude coordinate values. Returned is either a single value (for 2D field) or a vertical profile (for 3D field).
I am basically looking for an equivalent to the old Basemap function bm.interp(). Cartopy does not have an equivalent. The CDO (Climate Data Operators) operator 'remapbil,lon=/lat=' does the same thing but works directly on netCDF files from the command line, I'm looking for a Python solution.
I think such a function would be a useful addition to the MetPy library as it allows for comparing gridded data (e.g., model or satellite data) with point observations such as from weather stations or radiosonde profiles (treated as just a vertical profile here).
Can you point me in the right direction?

I think what you're looking for already exists in scipy.interpolate (scipy is one of MetPy's dependencies). Here we can use interpn to interpolate linearly in n dimensions:
import numpy as np
from scipy.interpolate import interpn
# Array of synthetic grid to interpolate--ordered z,y,x
a = np.arange(24).reshape(2, 3, 4)
# Locations of grid points along each dimension
z = np.array([1.5, 2.5])
y = np.array([-1., 0., 1.])
x = np.array([-3.5, -1, 1, 3.5])
interpn((z, y, x), a, (2., 0.5, 2.))

This can be done easily with my nctoolkit package (https://nctoolkit.readthedocs.io/en/latest/). It uses CDO as a backend, and defaults to bilinear interpolation. The following would regrid a .nc file to a single grid point and then convert it to an xarray dataset.
import nctoolkit as nc
import pandas as pd
data = nc.open_data("example.nc")
grid = pd.DataFrame({"lon":[0], "lat":[50]})
data.regrid(grid)
ds = data.to_xarray()

To add one more solution, if you're already using multidimensional netCDF files and want a Python solution: check out xarray's interpolation tools. They support multidimensional, label-based interpolation with usage similar to xarray's indexing interface. These are built on top of the same scipy.interpolate otherwise mentioned, and xarray is also a MetPy dependency.

Related

What is the correct way to reproject a raster from a CRS to another using Python?

I have a raster of Land Cover data (specifically this one /eodata/auxdata/S2GLC/2017/S2GLC_T32TMS_2017 in https://finder.creodias.eu) that uses 'epsg:32632' as CRS. I want to reproject this raster on 'epsg:21781'. This is what the raster looks like when I open it with xarray.
fn = 'data/S2GLC_T32TMS_2017/S2GLC_T32TMS_2017.tif'
da = xr.open_rasterio(fn).sel(band=1, drop=True)
da
<xarray.DataArray (y: 10980, x: 10980)>
[120560400 values with dtype=uint8]
Coordinates:
* y (y) float64 5.2e+06 5.2e+06 5.2e+06 ... 5.09e+06 5.09e+06 5.09e+06
* x (x) float64 4e+05 4e+05 4e+05 ... 5.097e+05 5.097e+05 5.098e+05
Attributes:
transform: (10.0, 0.0, 399960.0, 0.0, -10.0, 5200020.0)
crs: +init=epsg:32632
res: (10.0, 10.0)
is_tiled: 0
nodatavals: (nan,)
scales: (1.0,)
offsets: (0.0,)
AREA_OR_POINT: Area
INTERLEAVE: BAND
My usual workflow was to transform all the point coordinates, create my destination grid and interpolate using nearest neighbors. Something that looks like this:
import numpy as np
import xarray as xr
import pyproj
from scipy.interpolate import griddata
y = da.y.values
x = da.x.values
xx, yy = np.meshgrid(x,y)
# (n,2) point coordinates in the original CRS
src_coords = np.column_stack([xx.flatten(), yy.flatten()])
transformer = pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
xx, yy = transformer.transform(src_coords[:,0], src_coords[:,1])
# (n,2) point coordinates in the destination CRS, which are not on a regular grid
dst_coords = np.column_stack([xx.flatten(), yy.flatten()])
# I define my destination **regular** grid coordinates
x = np.linspace(620005,719995,10)
y = np.linspace(199995,100005,10)
xx, yy = np.meshgrid(x,y)
dst_grid = np.column_stack([xx.flatten(), yy.flatten()])
# I interpolate onto the grid
reprojected_array = griddata(
src_coords, da.values.flatten(), dst_coords, method='nearest'
).reshape(dst_shape)
Although this method is fairly transparent and (apparently) error-free, it can take very long when dealing with billions of points. Recently, I discovered rasterio's reproject function, and I was blown away by how fast it is. This is how I implemented it:
source = da.values
destination = np.zeros(dst_shape, np.int16)
res, aff = reproject(
source,
destination,
src_transform=src_transform, # affine transformation from original data
src_crs=src_crs,
dst_transform=dst_transform, # affine transformation that corresponds to the grid defined in the other approach
dst_crs=dst_crs,
resampling=Resampling.nearest) # using nearest neighbors just like with scope's griddata
Naturally I wanted to compare the results expecting them to be the same, but they were not, as you can see in the figure.
The resolution is 10 meters so the differences are not large, but after careful comparison with precise satellite data in the 'epsg:21781' coordinates, it looks like the old approach yields better results.
So my questions are:
why do these results differ?
is one approach better than the other? Are there specific conditions where one should prefer one or the other?
Griddata find nearest points in Euclidean distance,
on whatever map projection you give it.
Thus the nearest neighbors from a pipeline like
  4326 data points --> reproject --> nearest-Euclidean griddata
  query points
depend on the "reproject". Could you try +proj=sinu +lon_0= middle lon
for both data and query ?
What one really wants is a nearest-neighbor engine with great-circle distance,
not Euclidean distance.
The difference may be insignificant for small grids, or near the equator,
but less so in Finland -- cos 61° / cos 60° is ~ 97 %.
TL;DR
Is pyproj.transformer.Transformer.from_crs('epsg:32632', 'epsg:21781')
"correct" ? Don't know.
I see no test suite, and a couple of issues:
warp.reproject() generates the wrong result
roundtrip test \
"Nearest neighbor" is ill-defined / sensitive halfway between data points,
e.g. along the lines x or y = int + 0.5 on an int grid.
This is easy to test with KDTree.
xarray makes regular (Cartesian) grids easy, but afaik does not do
curvilinear (2d) grids.

working with netCDF on python with matplotlib

So I am pretty new in programming, currently doing some research on netCDF .nc files to work with python. I have the following code which I am sure it will not work. The objective here is to plot a graph simple line graph of 10m u-component of winds over time.
The problem I think is that 10m u-component of winds has 4D shape of (time=840, expver=2, latitude=19, longitude=27) while time being only (time=840).
Any replies or ideas will be much appreciated! The code is as shown:
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import numpy as np
nc = Dataset(r'C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc','r')
for i in nc.variables:
print(i)
lat = nc.variables['latitude'][:]
lon = nc.variables['longitude'][:]
time = nc.variables['time'][:]
u = nc.variables['u10'][:]
plt.plot(np.squeeze(u), np.squeeze(time))
plt.show()
Right, you have winds that represent the 10m wind at every location of the model grid (lat,lon) as well as dimensioned on exper--not sure what that last one is. You need to select a subset of u. For example, let's pick exper=1, and lat/lon indexes of lat=8, lon=12 (not sure where those are going to be:
exper_index = 1
lat_index = 8
lon_index = 12
# ':' below means "full slice"--take everything along that dimension
plt.plot(u[:, exper_index, lat_index, lon_index], time)
plt.title(f'U at latitude {lat[lat_index]}, longitude {lon[lon_index]}')
plt.show()
Have you tried using xarray?
I think that it will be easier for you to read the netCDF4 file and plot it using matplotlib.
This is straight forward:
import xarray as xr
ds = xr.open_dataset('C:\WAIG\Python\ERA5_py01\Downloaded_nc_file/test_era5.nc')
This line will plot the timeseries of the horizontal mean U10
ds['u10'].mean(['longitude','latitude']).plot()
You can also select by the value or index in a specific dimension using sel and isel methods:
This line selects the 10th latitude and 5th longitude and plot it. In this case I am interested in specific indexes for latitude and longitude, not in the real units.
ds['u10'].isel(latitude=10,longitude=5).plot()
This line selects the nearest value of latitude and longitude to the given values and plot it. In this case, I am interested in the values of latitude and longitude in the real units.
ds['u10'].sel(latitude=-15,longitude=40,method='nearest').plot()
See their documentation to learn more about xarray.
I hope this solution is better for your case and it also introduce you this great tool. Please, let me know if you still need some help with this.

Create 3D triangulated mesh from scratch

What I am trying to do is to create a 3D triangulated mesh that can be parsed into a .vtk or .stl file for use in 3D printing application. Right now I am stuck with the creation of the triangle mesh. The geometry I want to create are basically three dimensional sine waves that have a certain thickness and intersect each other. So far I got one sine wave. Here's a MWE:
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage
import scipy.spatial
# create empty 3d array
array = np.zeros((100, 100, 100))
# create 3D sine wave in empty array
strut = np.sin(np.linspace(1, 10, 100))*12
for k in enumerate(strut):
y_shift = int(np.round(strut[k[0]]))
array[k, 50 + y_shift, 50] = 1
pattern = np.ones((4, 4, 4))
# convolve the array with the pattern / apply thickness
conv_array = ndimage.convolve(array, pattern)
# create list with data coordinates from convolved array
data = list()
for j in range(conv_array.shape[0]):
for k in range(conv_array.shape[1]):
for l in range(conv_array.shape[2]):
if conv_array[j, k, l] != 0:
data.append([j, k, l])
data = np.asarray(data)
tri = scipy.spatial.Delaunay(data)
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.hold(True)
ax.plot_trisurf(data[:, 0], data[:, 1], data[:, 2], triangles=tri.simplices)
plt.show()
What it does: I create an empty array which I fill with a sine wave represented by ones. I convolve that array with a rectangular array of a defined size, which gives me a thicker sine wave in space. Then the array gets converted into coordinate form so that it can be triangulated using Delaunay triangulation. What I get is this:
Plot
As you can see the triangulation kinda worked, but it fills the space between the sine wave amplitudes. Is there a way to remove the filled spaced? Or prevent it from doing them in the first place? The sine wave also looks wrong at the ends and I am not sure why. Is this even the best method to achieve want I am trying to do?
The parsing to a .vtk file should not present a problem, but I need a clean structure first. Thanks in advance for any kind of help!
I would not reinvent the wheel and do all that on my own. Rather than that, use python-vtk and paraview (which is a post-processing application for 3D data) to do the triangulation for you. "Just" create the points and do the rest in that application.
I don't know much about 3D printing, but I know my fair share about STL and VTK. It is a pain to do manually and the VTK library has has some nice Python examples and a dedicated STLWriter. You just need to wrap your head around the workflow of VTK and how it manages things internally. This is where paraview comes in quite handy. It enables you to record your actions that you do in the GUI and displays them and displays them in Python. This is great to learn the way it works internally.
Finally I got something very close to what I want. In case someone is interested in the answer:
Instead of going with the point cloud approach I dug myself into VTK (which is a pain to learn, but has a lot of functionality) with python.
My algorithm is basically this:
Approximate the sine wave as a simple triangular wave first.
Feed the x, y and z coordinates of the wave into a vtkPoints object
Use vtkParametricSpline to get a smooth wave
vtkSplineFilter to have control over the smoothness of the wave
vtkTubeFilter to create a volume from the line
vtkTriangleFilter for meshing
vtkSTLWriter

Create 2D projection of 3D matrix in python

Short version: I have a NxNxN matrix full of different values. I want to create a 2D projection of it looking exactly like this: http://tinyurl.com/bellfkn (3D if possible too!)
Long version: I have made a density matrix of dimension NxNxN with the following loop:
ndim = 512
massmat = np.zeros((ndim,ndim,ndim))
for i in range(0,npoints):
massmat[int(x1[i]),int(y1[i]),int(z1[i])] = massmat[int(x1[i]),int(y1[i]),int(z1[i])] + mpart
densemat = massmat/volumeofcell
massmat is a numpy array.
So basically I now have a NxNxN matrix with certain cells containing in this case, a density (units of g/cm^3). Is there a way to turn this into a 2D projection - a side-on view of the densities with a colorbar indicating dense areas and less dense areas?
In Matlab I would just do:
imageArray2Dmesh = mean(densemat, 3);
figure
sc(imageArray2Dmesh, 'pink')
And it gives me a density projection - I'd like to do the same but in Python. Is there a way to view the whole NxNxN matrix in a 3D projection too? Just like the link but in 3D. That would be great.
You can use a very similar code in numpy and matplotlib:
import numpy as np
import pylab as plt
imageArray2Dmesh = np.mean(mesh_reshape, axis=2);
plt.figure()
plt.pcolor(imageArray2Dmesh, cmap = ,cmap=plt.cm.pink)
plt.colorbar()
plt.show()
you have a couple of more command, but this is just due to different approaches for the grafics in matlab and matplotlib (hint: in the long run, the matplotlib way is way better)
If you want the project from another direction just change the axis parameter (remember that python has the indices from 0 and not from 1 like matlab).
For a projection from a generic direction...well, that is quite more difficult.
By the way, if you need to look at some 3D data I strongly suggest you to lose some time to explore mayavi. It's still a python library, and it's really powerful for 3d imaging:
http://docs.enthought.com/mayavi/mayavi/auto/examples.html

Interpolation over an irregular grid

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

Categories

Resources