Numpy griddata interpolation up to certain radius - python

I'm using griddata() to interpolate my (irregular) 2-dimensional depth-measurements; x,y,depth. The method does a great job - but it interpolates over the entire grid where it can find to opposing points. I don't want that behaviour. I'd like to have an interpolation around the existing measurements, say with up to an extent of a certain radius.
Is it possible to tell numpy/scipy: don't interpolate if you're too far from an existing measurement? Resulting in a NODATA-value? ideal = griddata(.., .., .., radius=5.0)
edit example:
In the image below; black dots are the measurements. Shades of blue are the interpolated cells by numpy. The area marked in green is in fact part of the picture but is considered as NODATA by numpy (because there's no points in between). Now, the red areas, are interpolated, but I want to get rid of them. any ideas?

Ok cool. I don't think there is a built-in option for griddata() that does what you want, so you will need to write it yourself.
This comes down to calculating the distances between N input data points and M interpolation points. This is simple enough to do but if you have a lot of points it can be slow at ~O(M*N). But here's an example that calculates the distances to allN data points, for each interpolation point. If the number of data points withing the radius is at least neighbors, it keeps the value. Otherwise is writes the value of NODATA.
neighbors is 4 because griddata() will use biilinear interpolation which needs points bounding the interpolants in each dimension (2*2 = 4).
#invec - input points Nx2 numpy array
#mvec - interpolation points Mx2 numpy array
#just some random points for example
N=100
invec = 10*np.random.random([N,2])
M=50
mvec = 10*np.random.random([M,2])
# --- here you would put your griddata() call, returning interpolated_values
interpolated_values = np.zeros(M)
NODATA=np.nan
radius = 5.0
neighbors = 4
for m in range(M):
data_in_radius = np.sqrt(np.sum( (invec - mvec[m])**2, axis=1)) <= radius
if np.sum(data_in_radius) < neighbors :
interpolated_values[m] = NODATA
Edit:
Ok re-read and noticed the input is really 2D. Example modified.
Just as an additional comment, this could be greatly accelerated if you first build a coarse mapping from each point mvec[m] to a subset of the relevant data points.
The costliest step in the loop would change from
np.sqrt(np.sum( (invec - mvec[m])**2, axis=1))
to something like
np.sqrt(np.sum( (invec[subset[m]] - mvec[m])**2, axis=1))
There are plenty of ways to do this, for example using a Quadtree, hashing function, or 2D index. But whether this gives performance advantage depends on the application, how your data is structured, etc.

Related

Creating and offsetting points outside polygon on a discrete grid

I am working in a discrete 2D grid of points in which there are "shapes" that I would like to create points outside of. I have been able to identify the vertices of these points and take convex hulls. So far, this leads to this and all is good and well. The purple here is the shape in question and the red line is the convex contour I have computed.
What I would like to do now is create two neighborhoods of points outside this shape. The first one is a set of points directly outside (as close as the grid size will allow), the second is another set of points but offset some distance away (the distance is not fixed, but rather an input).
I have attempted to write this in Python and get okay results. Here is an example of my current output. The problem is I notice the offsets are not perfect, for example look at the bottom most point in the image I attached. It kinks downwards whereas the original shape does not. It's not too bad in this example, but in other cases where the shape is smaller or if I take a smaller offset it gets worse. I also have an issue where the offsets sometimes overlap, even if they are supposed to be some distance away. I would also like there to be one line in each section of the contour, not two lines (for example in the top left).
My current attempt uses the Shapely package to handle most of the computational geometry. An outline of what I do once I have found the vertices of the convex contour is to offset these vertices by some amount, and interpolate along each pair of vertices to obtain many points alone these lines. Afterwards I use a coordinate transform to identify all points to the nearest grid point. This is how I obtain my final set of points. Below is the actual code I have written.
How can I improve this so I don't run into the issues I described?
Function #1 - Computes the offset points
def OutsidePoints(vertices, dist):
poly_line = LinearRing(vertices)
poly_line_offset = poly_line.buffer(dist, resolution=1, join_style=2, mitre_limit=1).exterior
new_vertices = list(poly_line_offset.coords)
new_vertices = np.asarray(new_vertices)
shape = sg.Polygon(new_vertices)
points = []
for t in np.arange(0, shape.length, step_size):
temp_points = np.transpose(shape.exterior.interpolate(t).xy)
points.append(temp_points[0])
points = np.array(points)
points = np.unique(points, axis=0)
return points
Function #2 - Transforming these points into points that are on my grid
def IndexFinder(points):
index_points = invCoordinateTransform(points)
for i in range(len(index_points)):
for j in range(2):
index_points[i][j] = math.floor(index_points[i][j])
index_points = np.unique(index_points, axis=0)
return index_points
Many thanks!

trouble with scipy interpolation

I'm having trouble using the scipy interpolation methods to generate a nice smooth curve from the data points given. I've tried using the standard 1D interpolation, the Rbf interpolation with all options (cubic, gaussian, multiquadric etc.)
in the image provided, the blue line is the original data, and I'm looking to first smooth the sharp edges, and then have dynamically editable points from which to recalculate the curve. Each time a single point is edited it should auto calculate a new spline of some sort to smoothly transition between each point.
It kind of works when the points are within a particular range of each other as below.
But if the points end up too far apart, or too close together, I end up with issues like the following.
Key points are:
The curve MUST be flat between the first two points
The curve must NOT go below point 1 or 2 (i.e. derivative can't be negative)
~15 points (not shown) between points 2 and 3 are also editable and the line between is not necessarily linear. Full control over each of these points is a must, as is the curve going through each of them.
I'm happy to break it down into smaller curves that i then join/convolve, but just need to ensure a >0 gradient.
sample data:
x=[0, 37, 50, 105, 115,120]
y=[0.00965, 0.00965, 0.047850827205882, 0.35600416666667, 0.38074375, 0.38074375]
As an example, try moving point 2 (x=37) to an extreme value, say 10 (keep y the same). Just ensure that all points from x=0 to x=10 (or any other variation) have identical y values of 0.00965.
any assistance is greatly appreciated.
UPDATE
Attempted pchip method suggested in comments with the results below:
pchip method, better and worse...
Solved!
While I'm not sure that this is exactly true, it is as if the spline tools for creating Bezier curves treat the control points as points the calculated curve must go through - which is not true in my case. I couldn't figure out how to turn this feature off, so I found the cubic formula for a Bezier curve (cubic is what I need) and calculated my own points. I only then had to do a little adjustment to make the points fit the required integer x values - in my case, near enough is good enough. I would otherwise have needed to interpolate linearly between two points either side of the desired x value and determine the exact value.
For those interested, cubic needs 4 points - start, end, and 2 control points. The rule is:
B(t) = (1-t)^3 P0 + 3(1-t)^2 tP1 + 3(1-t)t^2 P2 + t^3 P3
Calculate for x and y separately, using a list of values for t. If you need to gradient match, just make sure that the control points for P1 and P2 are only moved along the same gradient as the preceding/proceeding sections.
Perfect result

Interpolating Scattered Data from a Volume that has Empty Space

I have 3d data produced from mesh points. The structure that was meshed is complex enough that interpolation using griddata is lacking. Specifically, there are regions without data points which are being given values by griddata that are not the fill_value. I need these hollow regions to have the value of 0.0, which I set fill_value to.
A simplified version of this is illustrated below:
The area occupied by the cylinder has no data points but the rest of the cube volume does. There will be data points from interpolation inside the cylinder but I need them to be zero.
Below is a slice parallel to the xy plane of the actual interpolated data with a black oval that approximates the edge 'cylinder'. The red an blue 'bleed' in to the void after interpolation. The fill value of 0.0 can be seen in the upper left corner:
Any ideas on how I can achieve the goal of setting those values to 0.0? Note that the 'cylinder' is not of constant shape.
I thought about going z layer by z layer and finding a polygon that gives the cylinder shape and then setting points inside the polygon to zero.
I also thought about partitioning the volume so a portion of the cylinder ends up in corners of the partion (for each z layer) and hoping that the interpolator would not try to extrapolate into the void region.
The first option seems better, but I would like to know if Python provides some sort of functionality which would work better.
EDIT: Here are some actual points from the data set:
The z scale is much smaller than x or y. You can see that the regions I'm interested in are pretty well defined. But, again, how do I identify them for the purposes of setting grid points to 0.0?

Interpolation and Extrapolation of Randomly Scattered data to Uniform Grid in 3D

I have a 256 x 256 x 32 grid of regularly spaced points ranging over x, y, and z and with an associated variable "a". I also have a group of randomly scattered points in a more confined x, y, z space, with an associated variable "b". What I essentially want to do is interpolate and extrapolate my random data to a regularly spaced grid that matches the "a" cube, as shown below:
I have used scipy's griddata so far to achieve the interpolation, which seems to work fine, but it cannot handle the extrapolation (as far as I know) and the output sharply truncates to 'nan' values. Whilst researching this problem I came across a couple of people using griddata a second time with 'nearest' as the interpolation method to fill in the 'nan' values. I tried this but the results don't seem reliable. More appropriate looking results are obtained if I use a fill_Value with 'linear' mode, but at the moment it's more a fudge because fill_Value has to be a constant.
I noticed that MATLAB has a ScatteredInterpolant class which seems to do what I want, but I am unable to find an equivalent class in Python, nor figure out how to implement such a routine efficiently in 3D. Any help is greatly appreciated.
The code I am using for the interpolation is below:
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
# Create cube to match aCube dimensions
xi = np.linspace(-xmax_aCube, xmax_aCube, 256)
yi = np.linspace(-ymax_aCube, ymax_aCube, 256)
zi = np.linspace(zmin_aCube, zmax_aCube, 32)
# Interpolate scattered points
X, Y, Z = np.meshgrid(xi, yi, zi)
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear')
This discussion applies in any dimensionality. For your 3D case lets talk about computational geometry first, to understand why part of the region gives NaN from griddata.
The scattered points in your volume make up a convex hull; a geometric shape with the following properties:
The surface is always convex (as the name suggests)
The volume of the shape is the lowest possible without violating convexity
The surface (in 3d) is triangulated and closed
Less formally, the convex hull (which you can compute easily with scipy) is like stretching a balloon over a frame, where the frame corners are the outermost points of your scattered cluster.
At the regular grid location inside the balloon you're surrounded by known points. You can interpolate to these locations. Outside it, you have to extrapolate.
Extrapolation is hard. There's no general rule for how to do it... it's problem-specific. In that region, algorithms like griddata choose to return NaN - this is the safest way of informing the scientist that s/he must choose a sensible way of extrapolating.
Let's go through some ways of doing that.
1. [WORST] Botch it
Assign some scalar value outside the hull. In the numpy docs you'll see this is done with:
s = mean(b)
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=s)
Cons: This produces a sharp discontinuity in the interpolated field at the hull boundary, heavily biases the mean scalar field value and doesn't respect the functional form of the data.
2. [NEXT WORST] "Blended botching it"
Assume that at the corners of your domain, you apply some value. This might be the average value of the scalar field associated with your scattered points.
Sorry, this is pseudocode as I don't use numpy at all, but it'll probably be fairly clear
# With a unit cube, and selected scalar value
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
s = mean(b)
x.append([0 0 0 0 1 1 1 1])
y.append([0 0 1 1 0 0 1 1])
z.append([0 1 0 1 0 1 0 1])
b.append([s s s s s s s s])
# drop in the rest of your code
Cons: This produces a sharp discontinuity in gradient of the interpolated field at the hull boundary, fairly heavily biases the mean scalar field value and doesn't respect the functional form of the data.
3. [STILL PRETTY BAD] Nearest neighbour
For each of the regular NaN points, find the nearest non-NaN and assign that value. This is effective and stable, but crude because your field can end up with patterned features (like stripes or beams radiating out from the hull), often visually unappealing or, worse, unacceptable in terms of data smoothness
Depending on the density of data, you could use the nearest scattered datapoint instead of the nearest non-NaN regular point. This can be done simply by (again, pseudocode):
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=nan)
bCubeNearest = griddata((x, y, z), b, (X, Y, Z), method = 'nearest')
indicesMask = isNan(bCube)
# Use nearest interpolation outside the hull, keeping linear interpolation inside.
bCube(indicesMask) = bCubeNearest(indicesMask)
Using MATLAB's delaunay based approaches will reveal more powerful methods for achieving similar in a one-liner, but numpy looks a bit limited here.
4. [NOT ALWAYS TERRIBLE] Naturally weighted
apologies for poor explanation in this section, I've never written the algorithm but I'm sure some research on the natural neighbour technique will get you far
Use a distance weighting function with some parameter D, which might be similar to, or twice (say) the length of your box. You can adjust. For each NaN location, figure out the distance to each of the scattered points.
# Don't do it this way for anything but small matrices - this is O(NM)
# and it can be done much more effectively (e.g. MATLAB has a quick
# natural weighting option), but for illustrative purposes:
for each NaN point 1:N
for each scattered point 1:M
calculate a basis function using inverse distance from NaN to point, normalised on D, and store in a [1 x M] vector of weights
Multiply weights by the b value, summate and divide by M
You basically want to end up with a function that smoothly goes to the average intensity of B at a distance D away from the hull, but coincides with the hull at the boundary. Away from the boundary it is weighted most strongly on its nearest points.
Pros: nicely stable and reasonably continuous. Because of the weighting, is more resilient to noise at single data points than nearest neighbour.
5. [HEROIC ROCKSTAR] Functional form assumption
What do you know about the physics? Assume a functional form that represents what you expect the physics to do, then do a least squares (or some equivalent) fit of that form to the scattered data. Use the function to stabilise the extrapolation.
Some good ideas which can help you construct a function:
Do you expect symmetry or periodicity?
Is b a component of a vector field which has some property like zero divergence?
Directionality: do you expect all corners to be the same? Or maybe a linear variation in one direction?
is field b at a point in time - perhaps a smoothed timeseries of measurements can be used to come up with a basic function?
Is there already a known form like a gaussian or quadratic?
Some examples:
b represents intensity of a laser beam passing thru a volume. You expect the entry side to be nominally identical to the outlet, with the other four boundaries of zero intensity. The intensity will have a concentric gaussian profile.
b is one component of a velocity field in an incompressible fluid. The fluid must be divergence free, so any field produced in the NaN zone must also be divergence free so you apply this condition.
b represents temperature in a room. You expect higher temperature at the top, because hot air rises.
b represents lift on an aerofoil, tested over three independent variables. You can look up the lift at stall easily, so know exactly what it'll be in some parts of the space.
Pros/Cons: Get this right and it'll be awesome. Get it wrong, especially with nonlinear functional forms, and it will go very wrong and can lead to very unstable results.
Health warning you can't assume a functional form, get pretty results, then use them to prove that the functional form is correct. That's just bad science. The form needs to be something well behaved and known independent of your data analysis.
If your scatter of points conforms fairly well to a cube shape, one approach could be to use griddata to interpolate onto a regular grid of data that fits within your point cloud (therefore avoiding nans) and then use this regular grid of values as the input to interpn which does facilitate linear extrapolation (but requires a regular grid as input).
This way you can use griddata as before for all the points within the convex hull of your scatter of points and you can use interpn to estimate the points that are returned as nans.
This is far from perfect, but I think it comes closer to achieving what you are looking for.
Pros:
Avoids sharp discontinuities.
Captures the basic linear trends at the edge of your dataset without having to know the functional form.
Respects asymmetries in your data (e.g. doesn't tend to the population mean at large distances, so one side of your dataset can have larger values than the other at large distances.)
Cons:
The effectiveness of this approach will depend a lot on how large a cube you can fit within the convex hull of your initial scatter of points. If your data is spikey/patchy and irregular then even points on the edge of the convex hull may have been extrapolated significant distances from the edge of the nested cube, incurring errors as the extrapolation won't be taking into account nearer data points that lie outside the cube.
The linear extrapolation will be heavily influenced by noise in the data
at the edges of the point cloud.
Computational cost of doing two sets of interpolations.

Interpolation over an irregular grid

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

Categories

Resources