Are there any Python options for 3D linear piecewise/segmented regression - python

I'm looking for a solution to fit a number of piecewise planes to linearly approximate a surface. Ideally the user could define the number of planes and the code would determine the "optimal" pieces of the data to fit them to.
There seems to be a number of 2D options discussed, e.g., here How to apply piecewise linear fit in Python? but nothing in 3D.
Thanks!

Related

explanation of sklearn optics plot

I am currently learning how to use OPTICS in sklearn. I am inputting a numpy array of (205,22). I am able to get plots out of it, but I do not understand how I am getting a 2d plot out of multiple dimensions and how I am supposed to read it. I more or less understand the reachability plot, but the rest of it makes no sense to me. Can someone please explain what is happening. Is the function just simplifying the data to two dimensions somehow? Thank you
From the sklearn user guide:
The reachability distances generated by OPTICS allow for variable density extraction of clusters within a single data set. As shown in the above plot, combining reachability distances and data set ordering_ produces a reachability plot, where point density is represented on the Y-axis, and points are ordered such that nearby points are adjacent. ‘Cutting’ the reachability plot at a single value produces DBSCAN like results; all points above the ‘cut’ are classified as noise, and each time that there is a break when reading from left to right signifies a new cluster.
the other three plots are a visual representation of the actual clusters found by three different algorithms.
as you can see in the OPTICS Clustering plot there are two high density clusters (blue and cyan) the gray crosses acording to the reachability plot are classify as noise because of the low xi value
in the DBSCAN clustering with eps = 0.5 everithing is considered noise since the epsilon value is to low and the algorithm can not found any density points.
Now it is obvious that in the third plot the algorithm found just a single cluster because of the adjustment of the epsilon value and everything above the 2.0 line is considered noise.
please refer to the user guide:

Calculate gradient over different spacing than prescribed latitude/longitude grid in python

I want to use the numpy.gradient function to calculate gradient components of .nc4 variables like soil moisture/temperature. The grid spacing/resolution of my data is extremely small (around ~9km) and I was interested in calculating the gradient across a larger delta (like 100km). Is this possible to do using the gradient function alone or do I have to regrid my data to do this?
numpy.gradient is doing a 2-point centered difference approximation for the first derivative. If your data are 9km and you want a 100km estimate, you need to decide how you'd want that calculated. Fit a line to the data and take the slope? Fit some higher order curve? Essentially gradient is using the fewest points it can, but if you want it across 100km you have many more points and need to decide how best to use/reduce them.

Constraining RBF interpolation of 3D surface to keep curvature

I've been tasked to develop an algorithm that, given a set of sparse points representing measurements of an existing surface, would allow us to compute the z coordinate of any point on the surface. The challenge is to find a suitable interpolation method that can recreate the 3D surface given only a few points and extrapolate values also outside of the range containing the initial measurements (a notorious problem for many interpolation methods).
After trying to fit many analytic curves to the points I've decided to use RBF interpolation as I thought this will better reproduce the surface given that the points should all lie on it (I'm assuming the measurements have a negligible error).
The first results are quite impressive considering the few points that I'm using.
Interpolation results
In the picture that I'm showing the blue points are the ones used for the RBF interpolation which produces the shape represented in gray scale. The red points are instead additional measurements of the same shape that I'm trying to reproduce with my interpolation algorithm.
Unfortunately there are some outliers, especially when I'm trying to extrapolate points outside of the area where the initial measurements were taken (you can see this in the upper right and lower center insets in the picture). This is to be expected, especially in RBF methods, as I'm trying to extract information from an area that initially does not have any.
Apparently the RBF interpolation is trying to flatten out the surface while I would just need to continue with the curvature of the shape. Of course the method does not know anything about that given how it is defined. However this causes a large discrepancy from the measurements that I'm trying to fit.
That's why I'm asking if there is any way to constrain the interpolation method to keep the curvature or use a different radial basis function that doesn't smooth out so quickly only on the border of the interpolation range. I've tried different combination of the epsilon parameters and distance functions without luck. This is what I'm using right now:
from scipy import interpolate
import numpy as np
spline = interpolate.Rbf(df.X.values, df.Y.values, df.Z.values,
function='thin_plate')
X,Y = np.meshgrid(np.linspace(xmin.round(), xmax.round(), precision),
np.linspace(ymin.round(), ymax.round(), precision))
Z = spline(X, Y)
I was also thinking of creating some additional dummy points outside of the interpolation range to constrain the model even more, but that would be quite complicated.
I'm also attaching an animation to give a better idea of the surface.
Animation
Just wanted to post my solution in case someone has the same problem. The issue was indeed with scipy implementation of the RBF interpolation. I tried instead to adopt a more flexible library, https://rbf.readthedocs.io/en/latest/index.html#.
The results are pretty cool! Using the following options
from rbf.interpolate import RBFInterpolant
spline = RBFInterpolant(X_obs, U_obs, phi='phs5', order=1, sigma=0.0, eps=1.)
I was able to get the right shape even at the edge.
Surface interpolation
I've played around with the different phi functions and here is the boxplot of the spread between the interpolated surface and the points that I'm testing the interpolation against (the red points in the picture).
Boxplot
With phs5 I get the best result with an average spread of about 0.5 mm on the upper surface and 0.8 on the lower surface. Before I was getting a similar average but with many outliers > 15 mm. Definitely a success :)

Define a 2D Gaussian probability with five peaks

I have a 2D data and it contains five peaks. Could I fit five 2D Gaussians function to obtain the peaks? In my problem, the peaks do not refer to the clustering problem. Which I think EM would be an appropriate answer for it.
In my case I measure a variable in x-y space and it shows maximum in more than one position. Is still fitting Fourier series or using Expectation-Maximization method an applicable solution to my problem?
In order to make my likelihood, do I need to just add up the five 2D Gaussians distributions with x and y and the height of each peak as variables?
If I understand what you're asking, check out Gaussian Mixture Models and Expectation Maximization. I don't know of any pre-implemented versions of these in Python, although I haven't looked too hard.

Higher order interpolation for contour plots in python

Is anybody of you aware of a higher order interpolation method (Catmull-Rom splines, cubic interpolation, etc.) for 2D contouring in Python?
Skimage, Matplotlib, and OpenCV provide the functions measure.find_contours(), contours() and findContours() respectively, but all are based on linear interpolation (also known as marching squares), I'm looking into something with higher accuracy in Python, preferably. Any pointers would be highly appreciated.
https://www.dropbox.com/s/orgr2yqhbbk2xnr/test.PNG
In the image above I'm trying to extract iso-value 25 from the scalar field of f(x,y)=x^3+y^3. I'm looking for 6 points with better accuracy than the 6 red points given by linear interpolation.
For unstructured 2d-data (or triangulated data), you might be interested by the following class:
http://matplotlib.org/api/tri_api.html?highlight=cubictriinterpolator#matplotlib.tri.CubicTriInterpolator
which provides a Clough-Tocher (cubic) interpolator from a user-defined Triangulation and field defined at triangulation nodes. It can also be used through the helper class UniformTriRefiner:
http://matplotlib.org/api/tri_api.html?highlight=refine_field#matplotlib.tri.UniformTriRefiner.refine_field
http://matplotlib.org/mpl_examples/pylab_examples/tricontour_smooth_user.png
Nevertheless the choice of the adapted interpolation depends of course of your data set.

Categories

Resources