I'm referencing this question and this documentation in trying to turn a set of points (the purple dots in the image below) into an interpolated grid.
As you can see, the image has missing spots where dots should be. I'd like to figure out where those are.
import numpy as np
from scipy import interpolate
CIRCLES_X = 25 # There should be 25 circles going across
CIRCLES_Y = 10 # There should be 10 circles going down
points = []
values = []
# Points range from 0-800 ish X, 0-300 ish Y
for point in points:
points.append([points.x, points.y])
values.append(1) # Not sure what this should be
grid_x, grid_y = np.mgrid[0:CIRCLES_Y, 0:CIRCLES_X]
grid = interpolate.griddata(points, values, (grid_x, grid_y), method='linear')
print(grid)
Whenever I print out the result of the grid, I get nan for all of my values.
Where am I going wrong? Is my problem even the correct use case for interpolate.grid?
First, your uncertain points are mainly at an edge, so it's actually extrapolation. Second, interpolation methods built into scipy deal with continuous functions defined on the entire plane and approximate it as a polynomial. While yours is discrete (1 or 0), somewhat periodic rather than polynomial and only defined in a discrete "grid" of points.
So you have to invent some algorithm to inter/extrapolate your specific kind of function. Whether you'll be able to reuse an existing one - from scipy or elsewhere - is up to you.
One possible way is to replace it with some function (continuous or not) defined everywhere, then calculate that approximation in the missing points - whether as one step as scipy.interpolate non-class functions do or as two separate steps.
e.g. you can use a 3-D parabola with peaks in your dots and troughs exactly between them. Or just with ones in the dots and 0's in the blanks and hope the resulting approximation in the grid's points is good enough to give a meaningful result (random overswings are likely). Then you can use scipy.interpolate.RegularGridInterpolator for both inter- and extrapolation.
or as a harmonic function - then what you're seeking is Fourier transformation
Another possible way is to go straight for a discrete solution rather than try to shoehorn the continual mathanalysis' methods into your case: design a (probably entirely custom) algorithm that'll try to figure out the "shape" and "dimensions" of your "grids of dots" and then simply fill in the blanks. I'm not sure if it is possible to add it into the scipy.interpolate's harness as a selectable algorithm in addition to the built-in ones.
And last but not the least. You didn't specify whether the "missing" points are points where the value is unknown or are actual part of the data - i.e. are incorrect data. If it's the latter, simple interpolation is not applicable at all as it assumes that all the data are strictly correct. Then it's a related but different problem: you can approximate the data but then have to somehow "throw away irregularities" (higher order of smallness entities after some point).
Related
I am plotting the result of an interpolation in a periodic domain, namely, the earth mercator projection map, [0,2*pi] or [0,360] is the domain for longitude. As you can see on the picture below, I'm plotting a groundtrack.
I am getting first r, i.e. position, and then I'm projecting that right onto earth. Since the coordinate transformations involves trigonometric functions, the results that I obtain are certainly restricted to a domain, where the inverse is bijective. To obtain this plot I've used atan2 in order to obtain a non bijective inverse function, as well as manipulating arccos in order to extend the domain of the inverse function.
All good up to now. The fact is that when I interpolate the resulting points, naturally, the function that returns does not interpret the domain folding property.
I just wanted to know if there is any way around this, apart from manipulating my data and representing it in a non periodic domain, interpolate it, and after that applying %(2*np.pi). These option, even if is doable, implies touching even more those inverse functions. The other option I thought was interpolating in chunks of only increasing values, i.e. and concatenating them.
Nothing found on the scipy documentation.
Solved the issue implementing something like the following. Notice that I am using astropy units module.
adder = 2*np.pi*u.rad
for i in range(1,len(lons)):
if lons[i].value-lons[i-1].value > 1:
sgn=np.sign(lons[i].value-lons[i-1].value)
lons[i:] -= sgn*adder
after doing this, apply the %
f_lons = interp1d(t,lons)
lons = f_lons(new_t) % (2*np.pi)
I've been tasked to develop an algorithm that, given a set of sparse points representing measurements of an existing surface, would allow us to compute the z coordinate of any point on the surface. The challenge is to find a suitable interpolation method that can recreate the 3D surface given only a few points and extrapolate values also outside of the range containing the initial measurements (a notorious problem for many interpolation methods).
After trying to fit many analytic curves to the points I've decided to use RBF interpolation as I thought this will better reproduce the surface given that the points should all lie on it (I'm assuming the measurements have a negligible error).
The first results are quite impressive considering the few points that I'm using.
Interpolation results
In the picture that I'm showing the blue points are the ones used for the RBF interpolation which produces the shape represented in gray scale. The red points are instead additional measurements of the same shape that I'm trying to reproduce with my interpolation algorithm.
Unfortunately there are some outliers, especially when I'm trying to extrapolate points outside of the area where the initial measurements were taken (you can see this in the upper right and lower center insets in the picture). This is to be expected, especially in RBF methods, as I'm trying to extract information from an area that initially does not have any.
Apparently the RBF interpolation is trying to flatten out the surface while I would just need to continue with the curvature of the shape. Of course the method does not know anything about that given how it is defined. However this causes a large discrepancy from the measurements that I'm trying to fit.
That's why I'm asking if there is any way to constrain the interpolation method to keep the curvature or use a different radial basis function that doesn't smooth out so quickly only on the border of the interpolation range. I've tried different combination of the epsilon parameters and distance functions without luck. This is what I'm using right now:
from scipy import interpolate
import numpy as np
spline = interpolate.Rbf(df.X.values, df.Y.values, df.Z.values,
function='thin_plate')
X,Y = np.meshgrid(np.linspace(xmin.round(), xmax.round(), precision),
np.linspace(ymin.round(), ymax.round(), precision))
Z = spline(X, Y)
I was also thinking of creating some additional dummy points outside of the interpolation range to constrain the model even more, but that would be quite complicated.
I'm also attaching an animation to give a better idea of the surface.
Animation
Just wanted to post my solution in case someone has the same problem. The issue was indeed with scipy implementation of the RBF interpolation. I tried instead to adopt a more flexible library, https://rbf.readthedocs.io/en/latest/index.html#.
The results are pretty cool! Using the following options
from rbf.interpolate import RBFInterpolant
spline = RBFInterpolant(X_obs, U_obs, phi='phs5', order=1, sigma=0.0, eps=1.)
I was able to get the right shape even at the edge.
Surface interpolation
I've played around with the different phi functions and here is the boxplot of the spread between the interpolated surface and the points that I'm testing the interpolation against (the red points in the picture).
Boxplot
With phs5 I get the best result with an average spread of about 0.5 mm on the upper surface and 0.8 on the lower surface. Before I was getting a similar average but with many outliers > 15 mm. Definitely a success :)
I have two coordinate systems for each record in my dataset. Lat-lon coordinates and what I suppose is utm x-y coordinates.
50% of my dataset only has x-y data without lat-lon, viceversa is 6%.
There is a good portion of the dataset that has both (33%) for each single record.
I wanted to know if there is a way to take advantage of the intersection (and maybe the x-y only part, since it's the biggest) to obtain a full dataset with only one coordinate system that makes sense. The problem is that after a little bit of preprocessing, they look "relaxed" in a different way and the intersection doesn't really match. The scatter plot shows what I believe to be a non linear, warped relationship from one system of coordinates to another. With this, I mean that normalizing them both to [0;1] and centering them to (0,0) (by subtracting the mean), gives two slightly different point distributions, and apparently a coefficient multiplication to scale one down to look like the other is not enough to get them to match completely. Looks like some more complicated relationship between the two.
I also tried to use an external library called utm to convert the lat-long coordinates to x-y to have a third pair of attributes (let's call it my_xy), only to find out that is not matching even one of the first two systems, instead it shows another slight warp.
Notes: When I say I do not have data from one coordinate system, assume NaN.
Furthermore, I know the warping could be a result of the fundamental geometrical differences between latlon and xy systems, but I still do not know what else I could try, if the utm conversion and the scaling did not work.
Blue: latlon, Red: original xy, Green: my_xy calculated from latlon
I'm working on a heatmap generation program which hopefully will fill in the colors based on value samples provided from a building layout (this is not GPS based).
If I have only a few known data points such as these in a large matrix of unknowns, how do I get the values in between interpolated in Python?:
0,0,0,0,1,0,0,0,0,0,5,0,0,0,0,9
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0,0,0,8,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,8,0,0,0,0,0,0,0,6,0,0,0,0,0,0
0,0,0,0,0,3,0,0,0,0,0,0,0,0,7,0
I understand that bilinear won't do it, and Gaussian will bring all the peaks down to low values due to the sheer number of surrounding zeros. This is obviously a matrix handling proposition, and I don't need it to be Bezier curve smooth, just close enough to be a graphic representation would be fine. My matrix will end up being about 1500×900 cells in size, with approximately 100 known points.
Once the values are interpolated, I have written code to convert it all to colors, no problem. It's just that right now I'm getting single colored pixels sprinkled over a black background.
Proposing a naive solution:
Step 1: interpolate and extrapolate existing data points onto surroundings.
This can be done using "wave propagation" type algorithm.
The known points "spread out" their values onto surroundings until all the grid is "flooded" with some known values. At the end of this stage you have a number of intersected "disks", and no zeroes left.
Step 2: smoothen the result (using bilinear filtering or some other filtering).
If you are able to use ScyPy, then interp2d does exactly what you want. A possible problem with is that it seems to not extrapolate smoothly according to this issue. This means that all values near the walls are going to be the same as closest their neighbour points. This can be solved by putting thermometers in all 4 corners :)
I split de world in X random polygons.
Then I am given a coordinate C1, for instance (-21.45, 7.10), and I want to attribute the right polygon to this coordinate.
The first solution is to apply my ‘point_in_polygon’ algorithm (given a set of coordinates that defines a polygon and a coordinate that defines a point, tell me if the point is inside or not) on each polygon until I find the right one.
But that is very expensive if I have a lot of points to put in a lot of polygons.
An improvement on that relies on the following idea:
To optimise the search, I create a grid (a collection) with a step n, k where I already attribute each pair of coordinates such that:
for i=-180 to 180 step n
for j = -90 to 90 step k
grid.add(i,j)
Then I create a dictionary, and for each pair in the collection I find the corresponding polygon
For each g in grid
For each p in polygons
If point_in_polygon(g,p) == True
my_dict(g) = p
Then, when I receive C1, I look for the closest coordinate in my grid, let’s say g1.
Thanks to my_dict, I can get quickly p1 = my_dict(g1)
Then I compute point_in_polygon(C1, p1) which is likely to be true. If it’s not, I find the closest g which is assigned to a different polygon, and I redo a test. Etc. until I have found the right polygon.
Now, the question is: what is the optimal n, k to create the grid?
So that I can find the right polygon in the minimum number of steps.
I don’t want it too low, because the search of the closest g which is assigned to a different polygon might be expensive.
I don’t want it too high as well, because then I might be missing some polygons and then the search never converges.
My intuition is that the smallest polygon is going to give the steps.
I am not sure if this is a programming problem, a maths problem, or just something I can find empirically, that's why I ask it here.
Any inputs appreciated!
Let me suggest a slight modification to your grid. Currently, you store for each cell the polygon that the cell's center belongs to. Instead, store all the polygons that overlap the cell. Then, whenever you see that a cell has only a single overlapping polygon, you don't need to do any inclusion testing. The grid can be built by methods of conservative rasterization (note that the referenced article is not focused on conservative but rather general rasterization).
The efficiency of your grid correlates with the ratio of single-polygon cells and total cells (because this is the probability of not having to perform polygon-inclusion tests). The storage itself is pretty cheap. You can use a dense array and get constant access to the cells. Hence, from a theoretical point of view, you should have as many cells as possible (because as you have more cells, the single-polygon cell ratio increases). In practice, you might find that cache and other memory effects might make large grids impractical. However, there is no good way to know other than test. So, just try with a couple of sizes on a few different machines and try to find a good fit.
If I had to guess, I would say that your cells should be square and have an area of about 1% - 5% of the average polygon area. Also, more compact polygons can be handled more efficiently than many long and thin polygons.
Pick any point and draw a line straight down from that point. The first polygon edge you hit tells you what polygon the point is in.
So, if you don't want to do polygon tests, then instead of dividing the space into a regular grid, first cut it into strips with vertical cuts that go through all polygon intersections.
Now, within each strip none of the polygon edges cross or end, so you can make an ordered list of all those edges from bottom to top.
If you want to find the polygon that contains a point, then, do a binary search using the x coordinate to find the proper strip. Then in the list of edges that span the strip, you can do a binary search using the y coordinate to find the closest one underneath the point, and that tells you what polygon the point is in.
Google 'trapezoidal decomposition' to find lots of information about similar techniques.