Running out of memory: np.meshgrid - python

I'm struggling with an issue relating to Matplotlib and Numpy.
I am trying to create hillshading on my surface plots.
My input data is an irregular spacing of XYZ points derived from LiDAR.
I can generate a trisurf3D plot or 3Dscatter no problem. Save it, change the camera angles, colour it based on Z and animate it but for the life of me I can't get any sort of shading in there at all.
I'm getting stuck at Matplotlib requiring 2D arrays for X and Y and Z. My input data is honestly tiny: 376704 points, each with an XYZ value. I have converted the points to a euclidean coordinate system starting at 0:
from laspy.file import File as LAS
import numpy as np
def lasToNumpy(lasFile):
f = LAS(lasFile,mode='r')
## Establish min values
xmin = min(f.x)
ymin = min(f.y)
zmin = min(f.z)
## Arrays now in meters from 0 to max
x = np.array(f.x-xmin)
y = np.array(f.y-ymin)
z = np.array(f.z-zmin)
## Assign a max of each x and y
xmax = max(x)
ymax = max(y)
The issue is my next step is to create a meshgrid (as is seemingly required to generate a 2D array).
This eats about 50GB of RAM:
X, Y = np.meshgrid(x,y)
And rightfully so.
All I want to do is add hillshading to my surface but the whole 2D array seems so illogically unnecessary! What are my options here? Is this just not going to happen? For reference my my trisurf3D works fine:
fig = plt.figure(figsize=(60.0,60.0))
ax = fig.add_subplot(111, projection='3d')
ax.plot_trisurf(x,y,z, cmap='plasma', edgecolor='black', alpha=0.5)
Really want to throw some hill shading in there as well.

This question may be obsolete now, but for other users, the problem here is that you are trying to make a mesh of 376704 points in each direction using np.meshgrid. The purpose of np.meshgrid is to take the x and y ranges and create a grid. For example:
x=np.arange(0,100) #1D array
y=np.linspace(-50,50,1111) # 1D array
xgrid,ygrid=np.meshgrid(x,y) #Outputs 2D arrays
Only use np.meshgrid if you want to grid your data. You can grid your data to lower resolution using a 3D interpolator such as RegularGridInterpolator and is one way to solve your problem and create your hill.
A quicker and better option in my opinion is using tricontourf. The function takes in the 1D arrays that you have to create the hill shading figure you desire. If you can't get this to work, update your question with a some data.

Related

From Matplotlib Raster to Geoviews/ Holoviews / hvplot: How to transform x, y and z

I understand that Geoviews and Holoviews share common attributes, and Hvplot is meant to be a high level plotting API to all three.
Now, coming from Matplotlib, I have still difficulties adapting to the parameters required to display raster images in Geoviews or Holoviews.
Here's an example, where I am doing a Kernel Density Estimation for spatial data:
# add coordinates of observations
xy_train = np.vstack([y, x]).T
print(xy_train)
# [[5654810.66920637 413645.79802685]
# [5654712.51814666 412629.87266155]
# [5656120.03682466 411642.74943511]
# ...
# [5656316.96943554 411795.80163676]
# [5656299.73356505 411795.50717494]
# [5655756.85624901 411734.34680852]]
# create mesh
xbins=100j
ybins=100j
xx, yy = np.mgrid[left_bound:right_bound:xbins,
bottom_bound:top_bound:ybins]
xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
# compute Kernel Density here
# ..
kde = KernelDensity(kernel='gaussian', bandwidth=100, algorithm='ball_tree')
kde.fit(xy_train)
# get results
z = np.exp(kde.score_samples(xy_sample))
# reshape results to mesh
zz = z.reshape(xx.shape)
# plot in matplotlib
fig, ax_lst = plt.subplots(111)
levels = np.linspace(zz.min(), zz.max(), 25)
axis.contourf(xx, yy, zz, levels=levels, cmap='viridis')
axis.plot()
plt.show()
Shows my Image:
Now I want to use the pyviz environment for interactive display and map overlay, e.g. using Geoviews.
This somehow works, but gives me an error:
xr_dataset = gv.Dataset(hv.Image((xx, yy, zz.T), datatype=['grid']), crs=ccrs.UTM(zone='33N'))
Image02195: Image dimension(s) x and y are not evenly sampled to relative tolerance of 0.001. Please use the QuadMesh element for
irregularly sampled data or set a higher tolerance on
hv.config.image_rtol or the rtol parameter in the Image constructor.
I can still display the image (somehow low resolution).
gv.tile_sources.CartoLight.opts(width=800, height=480) * xr_dataset.to.image(['x', 'y']).opts(cmap='viridis', alpha=0.5)
.. but when I try to create FilledContours in Geoviews, it doesn't seem to work like in matplotlib:
gv.FilledContours(xx, yy, zz, levels=levels, cmap='viridis')
ValueError: kdims argument expects a Dimension or list of dimensions,
specified as tuples, strings, dictionaries or Dimension instances, not
a ndarray type. Ensure you passed the data as the first argument.
The documentation doesn't provide much info on how I should format dimensions (hv.help(gv.FilledContours)). I think I get lost somewhere when I need to create a Raster from the numpy xx/yy coordinate mesh (hv.Image((xx, yy, zz.T), datatype=['grid'])).
Can someone explain the difference in syntax that is required for matplotlib Contourf and Holoviews/Geoviews/Hvplot FilledContours?
[edit]
I found a way to create contours, but the dimensions problem persists:
# get xarray dataset, suited for handling raster data in pyviz
xr_dataset = gv.Dataset(hv.Image((xx.T, yy.T, zz.T), bounds=(left_bound,bottom_bound,right_bound,top_bound),
kdims=[hv.Dimension('x'), hv.Dimension('y')], datatype=['grid']), crs=ccrs.UTM(zone='33N'))
# Error: Image06593: Image dimension(s) x and y are not evenly sampled to relative tolerance of 0.001
# create contours from image
gv.FilledContours(xr_dataset)
# plot
gv.tile_sources.EsriImagery.opts(width=800, height=480) * gv.FilledContours(xr_dataset).opts(cmap='viridis', alpha=0.5)
The main thing to know about HoloViews/GeoViews elements is that the data is almost always specified as the first argument, which is unlike matplotlib where the data is often specified using multiple arguments. In your case you already had the correct syntax for an Image but didn't carry that over to other elements. So to make this concrete, to construct an Image you would do:
img = gv.Image((xx, yy, zz.T), crs=ccrs.UTM(zone='33N'))
However since you have 2D coordinate arrays rather than the 1D coordinates that an Image expects (in the next release this will error), you actually have a QuadMesh, which gets constructed in the same way:
qmesh = gv.QuadMesh((xx, yy, zz.T), crs=ccrs.UTM(zone='33N'))
And the same is also true for the geoviews FilledContours:
contours = gv.FilledContours((xx, yy, zz.T), crs=ccrs.UTM(zone='33N'))
So to summarize, the difference between HoloViews elements and matplotlib calls is that HoloViews is a lightweight wrapper around your data, which lets you give each coordinate and value array semantic meaning by assigning them to a key or value dimension, while matplotlib makes that mapping more explicit.
HoloViews understands a number of formats for defining gridded data like yours, the simplest is a tuple of x/y-coordinate arrays and the value array, it also understands xarray objects and dictionaries of the different arrays which would look like this:
contours = gv.FilledContours({'x': xx, 'y': yy, 'z': zz.T}, ['x', 'y'], 'z', crs=ccrs.UTM(zone='33N'))
In this format we can explicitly see how the 'x' and 'y' coordinate arrays are mapped to the key dimensions and the 'z' value array to the value dimensions.

Regrid 2D data onto larger 2D grid at given coordinates in Python

I have a square 2D array data that I would like to add to a larger 2D array frame at some given set of non-integer coordinates coords. The idea is that data will be interpolated onto frame with it's center at the new coordinates.
Some toy data:
# A gaussian to add to the frame
x, y = np.meshgrid(np.linspace(-1,1,10), np.linspace(-1,1,10))
data = 50*np.exp(-np.sqrt(x**2+y**2)**2)
# The frame to add the gaussian to
frame = np.random.normal(size=(100,50))
# The desired (x,y) location of the gaussian center on the new frame
coords = 23.4, 22.6
Here's the idea. I want to add this:
to this:
to get this:
If the coordinates were integers (indexes), of course I could simply add them like this:
frame[23:33,22:32] += data
But I want to be able to specify non-integer coordinates so that data is regridded and added to frame.
I've looked into PIL.Image methods but my use case is just for 2D data, not images. Is there a way to do this with just scipy? Can this be done with interp2d or a similar function? Any guidance would be greatly appreciated!
Scipy's shift function from scipy.ndimage.interpolation is what you are looking for, as long as the grid spacings between data and frame overlap. If not, look to the other answer. The shift function can take floating point numbers as input and will do a spline interpolation. First, I put the data into an array as large as frame, then shift it, and then add it. Make sure to reverse the coordinate list, as x is the rightmost dimension in numpy arrays. One of the nice features of shift is that it sets to zero those values that go out of bounds.
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage.interpolation import shift
# A gaussian to add to the frame.
x, y = np.meshgrid(np.linspace(-1,1,10), np.linspace(-1,1,10))
data = 50*np.exp(-np.sqrt(x**2+y**2)**2)
# The frame to add the gaussian to
frame = np.random.normal(size=(100,50))
x_frame = np.arange(50)
y_frame = np.arange(100)
# The desired (x,y) location of the gaussian center on the new frame.
coords = np.array([23.4, 22.6])
# First, create a frame as large as the frame.
data_large = np.zeros(frame.shape)
data_large[:data.shape[0], :data.shape[1]] = data[:,:]
# Subtract half the distance as the bottom left is at 0,0 instead of the center.
# The shift of 4.5 is because data is 10 points wide.
# Reverse the coords array as x is the last coordinate.
coords_shift = -4.5
data_large = shift(data_large, coords[::-1] + coords_shift)
frame += data_large
# Plot the result and add lines to indicate to coordinates
plt.figure()
plt.pcolormesh(x_frame, y_frame, frame, cmap=plt.cm.jet)
plt.axhline(coords[1], color='w')
plt.axvline(coords[0], color='w')
plt.colorbar()
plt.gca().invert_yaxis()
plt.show()
The script gives you the following figure, which has the desired coordinates indicated with white dotted lines.
One possible solution is to use scipy.interpolate.RectBivariateSpline. In the code below, x_0 and y_0 are the coordinates of a feature from data (i.e., the position of the center of the Gaussian in your example) that need to be mapped to the coordinates given by coords. There are a couple of advantages to this approach:
If you need to "place" the same object into multiple locations in the output frame, the spline needs to be computed only once (but evaluated multiple times).
In case you actually need to compute integrated flux of the model over a pixel, you can use the integral method of scipy.interpolate.RectBivariateSpline.
Resample using spline interpolation:
from scipy.interpolate import RectBivariateSpline
x = np.arange(data.shape[1], dtype=np.float)
y = np.arange(data.shape[0], dtype=np.float)
kx = 3; ky = 3; # spline degree
spline = RectBivariateSpline(
x, y, data.T, kx=kx, ky=ky, s=0
)
# Define coordinates of a feature in the data array.
# This can be the center of the Gaussian:
x_0 = (data.shape[1] - 1.0) / 2.0
y_0 = (data.shape[0] - 1.0) / 2.0
# create output grid, shifted as necessary:
yg, xg = np.indices(frame.shape, dtype=np.float64)
xg += x_0 - coords[0] # see below how to account for pixel scale change
yg += y_0 - coords[1] # see below how to account for pixel scale change
# resample and fill extrapolated points with 0:
resampled_data = spline.ev(xg, yg)
extrapol = (((xg < -0.5) | (xg >= data.shape[1] - 0.5)) |
((yg < -0.5) | (yg >= data.shape[0] - 0.5)))
resampled_data[extrapol] = 0
Now plot the frame and resampled data:
plt.figure(figsize=(14, 14));
plt.imshow(frame+resampled_data, cmap=plt.cm.jet,
origin='upper', interpolation='none', aspect='equal')
plt.show()
If you also want to allow for scale changes, then replace code for computing xg and yg above with:
coords = 20, 80 # change coords to easily identifiable (in plot) values
zoom_x = 2 # example scale change along X axis
zoom_y = 3 # example scale change along Y axis
yg, xg = np.indices(frame.shape, dtype=np.float64)
xg = (xg - coords[0]) / zoom_x + x_0
yg = (yg - coords[1]) / zoom_y + y_0
Most likely this is what you actually want based on your example. Specifically, the coordinates of pixels in data are "spaced" by 0.222(2) distance units. Therefore it actually seems that for your particular example (whether accidental or intentional), you have a zoom factor of 0.222(2). In that case your data image would shrink to almost 2 pixels in the output frame.
Comparison to #Chiel answer
In the image below, I compare the results from my method (left), #Chiel's method (center) and difference (right panel):
Fundamentally, the two methods are quite similar and possibly even use the same algorithm (I did not look at the code for shift but based on the description - it also uses splines). From comparison image it is visible that the biggest differences are at the edges and, for unknown to me reasons, shift seems to truncate the shifted image slightly too soon.
I think the biggest difference is that my method allows for pixel scale changes and it also allows re-use of the same interpolator to place the original image at different locations in the output frame. #Chiel's method is somewhat simpler but (what I did not like about it is that) it requires creation of a larger array (data_large) into which the original image is placed in the corner.
While the other answers have gone into detail, but here's my lazy solution:
xc,yc = 23.4, 22.6
x, y = np.meshgrid(np.linspace(-1,1,10)-xc%1, np.linspace(-1,1,10)-yc%1)
data = 50*np.exp(-np.sqrt(x**2+y**2)**2)
frame = np.random.normal(size=(100,50))
frame[23:33,22:32] += data
And it's the way you liked it. As you mentioned, the coordinates of both are the same, so the origin of data is somewhere between the indices. Now just simply shift it by the amount you want it to be off a grid point (remainder to one) in the second line and you're good to go (you might need to flip the sign, but I think this is correct).

Contouring non-uniform 2d data in python/matplotlib above terrain

I am having trouble contouring some data in matplotlib. I am trying to plot a vertical cross-section of temperature that I sliced from a 3d field of temperature.
My temperature array (T) is of size 50*300 where 300 is the number of horizontal levels which are evenly spaced. However, 50 is the number of vertical levels that are: a) non-uniformly spaced; and b) have a different starting level for each vertical column. As in there are always 50 vertical levels, but sometimes they span from 100 - 15000 m, and sometimes from 300 - 20000 m (due to terrain differences).
I also have a 2d array of height (Z; same shape as T), a 1d array of horizontal location (LAT), and a 1d array of terrain height (TER).
I am trying to get a similar plot to one like here in which you can see the terrain blacked out and the data is contoured around it.
My first attempt to plot this was to create a meshgrid of horizontal distance and height, and then contourf temperature with those arguments as well. However numpy.meshgrid requires 1d inputs, and my height is a 2d variable. Doing something like this only begins contouring upwards from the first column:
ax1 = plt.gca()
z1, x1 = np.meshgrid(LAT, Z[:,0])
plt.contourf(z1, x1, T)
ax1.fill_between(z1[0,:], 0, TER, facecolor='black')
Which produces this. If I use Z[:,-1] in the meshgrid, it contours underground for columns to the left, which obviously I don't want. What I really would like is to use some 2d array for Z in the meshgrid but I'm not sure how to go about that.
I've also looked into the griddata function but that requires 1D inputs as well. Anyone have any ideas on how to approach this? Any help is appreciated!
For what I understand your data is structured. Then you can directly use the contourf or contour option in matplotlib. The code you present have the right idea but you should use
x1, z1 = np.meshgrid(LAT, Z[:,0])
plt.contourf(x1, Z, T)
for the contours. I have an example below
import numpy as np
import matplotlib.pyplot as plt
L, H = np.pi*np.mgrid[-1:1:100j, -1:1:100j]
T = np.cos(L)*np.cos(2*H)
H = np.cos(L) + H
plt.contourf(L, H, T, cmap="hot")
plt.show()
Look that the grid is generated with the original bounding box, but the plot is made with the height that has been transformed and not the initial one. Also, you can use tricontour for nonstructured data (or in general), but then you will need to generate the triangulation (that in your case is straightforward).

scipy interp2d/bisplrep unexpected output when given 1D input

I've been having invalid input errors when working with scipy interp2d function. It turns out the problem comes from the bisplrep function, as showed here:
import numpy as np
from scipy import interpolate
# Case 1
x = np.linspace(0,1)
y = np.zeros_like(x)
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z) # or interp2d
Returns: ValueError: Invalid inputs
It turned out the test data I was giving interp2d contained only one distinct value for the 2nd axis, as in the test sample above. The bisplrep function inside interp2d considers it as an invalid output:
This may be considered as an acceptable behaviour: interp2d & bisplrep expect a 2D grid, and I'm only giving them values along one line.
On a side note, I find the error message quite unclear. One could include a test in interp2d to deal with such cases: something along the lines of
if len(np.unique(x))==1 or len(np.unique(y))==1:
ValueError ("Can't build 2D splines if x or y values are all the same")
may be enough to detect this kind of invalid input, and raise a more explicit error message, or even directly call the more appropriate interp1d function (which works perfectly here)
I thought I had correctly understood the problem. However, consider the following code sample:
# Case 2
x = np.linspace(0,1)
y = x
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z)
In that case, y being proportional to x, I'm also feeding bisplrep with data along one line. But, surprisingly, bisplrep is able to compute a 2D spline interpolation in that case. I plotted it:
# Plot
def plot_0to1(tck):
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X = np.linspace(0,1,10)
Y = np.linspace(0,1,10)
Z = interpolate.bisplev(X,Y,tck)
X,Y = np.meshgrid(X,Y)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
plt.show()
plot_0to1(tck)
The result is the following:
where bisplrep seems to fill the gaps with 0's, as better showed when I extend the plot below:
Regarding of whether adding 0 is expected, my real question is: why does bisplrep work in Case 2 but not in Case 1?
Or, in other words: do we want it to return an error when 2D interpolation is fed with input along one direction only (Case 1 & 2 fail), or not? (Case 1 & 2 should return something, even if unpredicted).
I was originally going to show you how much of a difference it makes for 2d interpolation if your input data are oriented along the coordinate axes rather than in some general direction, but it turns out that the result would be even messier than I had anticipated. I tried using a random dataset over an interpolated rectangular mesh, and comparing that to a case where the same x and y coordinates were rotated by 45 degrees for interpolation. The result was abysmal.
I then tried doing a comparison with a smoother dataset: turns out scipy.interpolate.interp2d has quite a few issues. So my bottom line will be "use scipy.interpolate.griddata".
For instructive purposes, here's my (quite messy) code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
n = 10 # rough number of points
dom = np.linspace(-2,2,n+1) # 1d input grid
x1,y1 = np.meshgrid(dom,dom) # 2d input grid
z = np.random.rand(*x1.shape) # ill-conditioned sample
#z = np.cos(x1)*np.sin(y1) # smooth sample
# first interpolator with interp2d:
fun1 = interp.interp2d(x1,y1,z,kind='linear')
# construct twice finer plotting and interpolating mesh
plotdom = np.linspace(-1,1,2*n+1) # for interpolation and plotting
plotx1,ploty1 = np.meshgrid(plotdom,plotdom)
plotz1 = fun1(plotdom,plotdom) # interpolated points
# construct 45-degree rotated input and interpolating meshes
rotmat = np.array([[1,-1],[1,1]])/np.sqrt(2) # 45-degree rotation
x2,y2 = rotmat.dot(np.vstack([x1.ravel(),y1.ravel()])) # rotate input mesh
plotx2,ploty2 = rotmat.dot(np.vstack([plotx1.ravel(),ploty1.ravel()])) # rotate plotting/interp mesh
# interpolate on rotated mesh with interp2d
# (reverse rotate by using plotx1, ploty1 later!)
fun2 = interp.interp2d(x2,y2,z.ravel(),kind='linear')
# I had to generate the rotated points element-by-element
# since fun2() accepts only rectangular meshes as input
plotz2 = np.array([fun2(xx,yy) for (xx,yy) in zip(plotx2.ravel(),ploty2.ravel())])
# try interpolating with griddata
plotz3 = interp.griddata(np.array([x1.ravel(),y1.ravel()]).T,z.ravel(),np.array([plotx1.ravel(),ploty1.ravel()]).T,method='linear')
plotz4 = interp.griddata(np.array([x2,y2]).T,z.ravel(),np.array([plotx2,ploty2]).T,method='linear')
# function to plot a surface
def myplot(X,Y,Z):
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1,
linewidth=0, antialiased=False,cmap=cm.coolwarm)
plt.show()
# plot interp2d versions
myplot(plotx1,ploty1,plotz1) # Cartesian meshes
myplot(plotx1,ploty1,plotz2.reshape(2*n+1,-1)) # rotated meshes
# plot griddata versions
myplot(plotx1,ploty1,plotz3.reshape(2*n+1,-1)) # Cartesian meshes
myplot(plotx1,ploty1,plotz4.reshape(2*n+1,-1)) # rotated meshes
So here's a gallery of the results. Using random input z data, and interp2d, Cartesian (left) vs rotated interpolation (right):
Note the horrible scale on the right side, noting that the input points are between 0 and 1. Even its mother wouldn't recognize the data set. Note that there are runtime warnings during the evaluation of the rotated data set, so we're being warned that it's all crap.
Now let's do the same with griddata:
We should note that these figures are much closer to each other, and they seem to make way more sense than the output of interp2d. For instance, note the overshoot in the scale of the very first figure.
These artifacts always arise between input data points. Since it's still interpolation, the input points have to be reproduced by the interpolating function, but it's pretty weird that a linear interpolating function overshoots between data points. It's clear that griddata doesn't suffer from this issue.
Consider an even more clear case: the other set of z values, which are smooth and deterministic. The surfaces with interp2d:
HELP! Call the interpolation police! Already the Cartesian input case has inexplicable (well, at least by me) spurious features in it, and the rotated input case poses the threat of s͔̖̰͕̞͖͇ͣ́̈̒ͦ̀̀ü͇̹̞̳ͭ̊̓̎̈m̥̠͈̣̆̐ͦ̚m̻͑͒̔̓ͦ̇oͣ̐ͣṉ̟͖͙̆͋i͉̓̓ͭ̒͛n̹̙̥̩̥̯̭ͤͤͤ̄g͈͇̼͖͖̭̙ ̐z̻̉ͬͪ̑ͭͨ͊ä̼̣̬̗̖́̄ͥl̫̣͔͓̟͛͊̏ͨ͗̎g̻͇͈͚̟̻͛ͫ͛̅͋͒o͈͓̱̥̙̫͚̾͂.
So let's do the same with griddata:
The day is saved, thanks to The Powerpuff Girls scipy.interpolate.griddata. Homework: check the same with cubic interpolation.
By the way, a very short answer to your original question is in help(interp.interp2d):
| Notes
| -----
| The minimum number of data points required along the interpolation
| axis is ``(k+1)**2``, with k=1 for linear, k=3 for cubic and k=5 for
| quintic interpolation.
For linear interpolation you need at least 4 points along the interpolation axis, i.e. at least 4 unique x and y values have to be present to get a meaningful result. Check these:
nvals = 3 # -> RuntimeWarning
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
nvals = 4 # -> no problem here
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
And of course this all ties in to you question like this: it makes a huge difference if your geometrically 1d data set is along one of the Cartesian axes, or if it's in a general way such that the coordinate values assume various different values. It's probably meaningless (or at least very ill-defined) to try 2d interpolation from a geometrically 1d data set, but at least the algorithm shouldn't break if your data are along a general direction of the x,y plane.

Interpolation over an irregular grid

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

Categories

Resources