Fitting a curve in points with associated normals (with spline)

Fitting a curve in points with associated normals (with spline) - python

I have faced a similar situation with this question, which was asked 8 years ago but in matlab not python Fit a curve in MATLAB where points have specified normals
Similar to this question I have 3 points (in 3d space) with their associated normals. I need to find a polynomial (in 3d maybe surface) that joins the 3 points and agrees with their normals.
I want to use spline for this, and I'm searching for an answer in python, but it's kinda rare thing when it comes to normals to be taken into account.
So, how to implement this or change matlab code into python one in the link above.
My points and their associated normals: A,B,C are points and N_A, N_B, N_C are normals.
A = np.array([ 348.92065834, -1402.3305998, 32.69313966])
N_A = np.array([-0.86925426, 0.02836434, -0.49355091])
B = np.array([282.19332067, 82.52027998, -5.92595371])
N_B = np.array([-0.82339849, 0.43041935, 0.3698028])
C = np.array([247.37475615, -3.70129865, -22.10494737])
N_C = np.array([-0.83989222, 0.23796899, 0.48780305])
First, I tried to use fitting, but it seems spline is much better in this case.

Related

trouble with scipy interpolation

I'm having trouble using the scipy interpolation methods to generate a nice smooth curve from the data points given. I've tried using the standard 1D interpolation, the Rbf interpolation with all options (cubic, gaussian, multiquadric etc.)
in the image provided, the blue line is the original data, and I'm looking to first smooth the sharp edges, and then have dynamically editable points from which to recalculate the curve. Each time a single point is edited it should auto calculate a new spline of some sort to smoothly transition between each point.
It kind of works when the points are within a particular range of each other as below.
But if the points end up too far apart, or too close together, I end up with issues like the following.
Key points are:
The curve MUST be flat between the first two points
The curve must NOT go below point 1 or 2 (i.e. derivative can't be negative)
~15 points (not shown) between points 2 and 3 are also editable and the line between is not necessarily linear. Full control over each of these points is a must, as is the curve going through each of them.
I'm happy to break it down into smaller curves that i then join/convolve, but just need to ensure a >0 gradient.
sample data:
x=[0, 37, 50, 105, 115,120]
y=[0.00965, 0.00965, 0.047850827205882, 0.35600416666667, 0.38074375, 0.38074375]
As an example, try moving point 2 (x=37) to an extreme value, say 10 (keep y the same). Just ensure that all points from x=0 to x=10 (or any other variation) have identical y values of 0.00965.
any assistance is greatly appreciated.
UPDATE
Attempted pchip method suggested in comments with the results below:
pchip method, better and worse...

Solved!
While I'm not sure that this is exactly true, it is as if the spline tools for creating Bezier curves treat the control points as points the calculated curve must go through - which is not true in my case. I couldn't figure out how to turn this feature off, so I found the cubic formula for a Bezier curve (cubic is what I need) and calculated my own points. I only then had to do a little adjustment to make the points fit the required integer x values - in my case, near enough is good enough. I would otherwise have needed to interpolate linearly between two points either side of the desired x value and determine the exact value.
For those interested, cubic needs 4 points - start, end, and 2 control points. The rule is:
B(t) = (1-t)^3 P0 + 3(1-t)^2 tP1 + 3(1-t)t^2 P2 + t^3 P3
Calculate for x and y separately, using a list of values for t. If you need to gradient match, just make sure that the control points for P1 and P2 are only moved along the same gradient as the preceding/proceeding sections.
Perfect result

Use NumPy for 3D vectors

I must admit, I'm not so bad in Python, but not so good in math. There, I said it. I'm planning on building a game with a coordinate system in 3D. Classic, really simple. Like my first room would be 0, 0, 0, and the one on the east would be 1, 0, 0.
What would be a bit more difficult is that I would need to search through these coordinates. Find, for instance, all rooms that are around a X,Y,Z coordinate in a 3-rooms radius, let's say. I may use it for pathfinding as well. So I was thinking of using NumPy for performance (since I have no idea how many coordinates there will be in the end) and so something quite simple:
import numpy as np
a = np.array([0.0, 0.0, 0.0])
b = np.array([1.0, 0.0, 0.0])
But that's where my meager skills reach a dead end. In theory I could substract one to the other to get the absolute distance, but even for that I'm stuck, it seems. So I'll put my needs here... and hope someone can help me figure things out:
Return the distance between two vectors (as an int, or float would be better).
Look for vectors close to another: the notion of close would be a distance, so in theory, that would mean browsing through all vectors and getting their distance to another one. I don't know if it's great in terms of performance though.
Obtaining both the 2D direction (in degrees or radiants, between A and B) and the vertical direction (same thing, but using the Z coordinate).
"Turning" a vector, keeping its distance (norm) but in a different direction, which would imply pivoting around the Z coordinate, if that makes sense. The same thing to pivot around X or Y would be great.
Normalize this vector, so it would be in the same "direction" but with only a distance (norm) of 1 from 0,0,0,.
I'm sorry if that doesn't make much sense. My use case is pretty clear in my head, but not knowing vectors very much, perhaps I'm missing on one or more simple concepts.
Thanks for your help!

A little bit of linear algebra will go a long way to do most of what you want.
Distance between two vectors. You can define c = a- b and then find the magnitude of this difference vector. Finding the magnitude of a vector is simple: mag = np.sqrt(np.dot(c,c))
Now that you have a way to calculate a distance between two points, you can do what you suggested, though checking every possible vector pair will be O(N^2).
I'm not entirely sure what you mean by 2D direction and vertical direction. But finding the angle between two vectors can be done using the fact that A dot B = |A|*|B|*cos(theta), where |A| is the magnitude of A, and theta is the angle. So you could do something like:
magA = np.sqrt(np.dot(A,A))
magB = np.sqrt(np.dot(B,B))
angle = np.arccos(np.dot(A,B)/(magA*magB))
This is what rotation matrices are for. Given an angle, you can define a rotation matrix, M, and simply take np.dot(M, A) to get your rotated vector.
To normalize a vector, you just divide each component by the magnitude. So normA = A / (np.sqrt(np.dot(A,A))
This isn't a complete answer, but hopefully it starts you in the right direction.

Efficiently and accurately interpolate between finite-element node stress points in python

I'd like to interpolate some 3D finite-element stress field data from a bunch of known nodes at points where nodes don't exist. I realise that node stresses are already extrapolated from gauss points, but it is the best I can do with the data I have available. The image below gives a 2D representation. The red and pink points would represent locations where I'd like to interpolate the value.
Initially I thought I could find the smallest bounding box (hull) or simplex that contained the point of interest and no other known points. Visualising this in 2D I realised that this might lead to ignoring data from a close-by value, incorrectly. I was planning on using the scipy LindearNDInterpolator but I notice there is some unexpected behaviour, and I'm worried it will exclude nearby points in the way that I just described. Notice how the pink point would not reference from the green triangle but ignore the point outside the orange triangle, although it is probably more relevant.
As far as I can tell the best way is to take the nearest surrounding nodes, and interpolating by weighted averaging on distance. I'm not sure if there is something readily available or if it needs to be written. I'd imagine this is a fairly common problem so I'd presume the wheel has already been invented...
Actually my final goal is to interpolate/regress values for a 3D line through the set of points.

You can try Inverse distance weighting. Here is an example in 1D (easily generalizable to 3D):
from pylab import *
# imaginary samples
xmax=10
Npoints=10
x=0.1*randint(0,10*xmax,Npoints)
y=sin(2*x)+x
plot(x,y,ls="",marker="x",color="red",label="samples",ms=9,mew=2)
# interpolation
x2=linspace(0,xmax,150) # new sampling
def weight(x,x0,p): # modify this function in 3D
return 1/(((x-x0)**2)**(p/2)+0.00001) # 0.00001 to avoid infinity
y2=zeros_like(x2)
for p in range(1,4):
for i in range(len(y2)):
y2[i]=sum(y*weight(x,x2[i],p))/sum(weight(x,x2[i],p))
plot(x2,y2,label="Interpolation p="+str(p))
legend(loc=2)
show()
Here is the result
As you can see, it's not really fantastic. The best results are, I think, for p=2, but it will be different in 3D. I have obtained better curves with a gaussian weight, but have no theorical background for such a choice.

https://stackoverflow.com/a/36337428/2372254
The first answer here was helpful but the 1-D example shows that the approach actually does some strange things with p=1 (wildy different from the data) and with p=3 we get some weird plateaux.
I took a look at Radial Basis Functions which are implemented in SciPy, and modified JPG's code as follows.
Modified Code
from pylab import *
from scipy.interpolate import Rbf, InterpolatedUnivariateSpline
# imaginary samples
xmax=10
Npoints=10
x=0.1*randint(0,10*xmax,Npoints)
Rbf requires sorted lists:
x.sort()
y=sin(2*x)+x
plot(x,y,ls="",marker="x",color="red",label="samples",ms=9,mew=2)
# interpolation
x2=linspace(0,xmax,150) # new sampling
def weight(x,x0,p): # modify this function in 3D
return 1/(((x-x0)**2)**(p/2)+0.00001) # 0.00001 to avoid infinity
y2=zeros_like(x2)
for p in range(1,4):
for i in range(len(y2)):
y2[i]=sum(y*weight(x,x2[i],p))/sum(weight(x,x2[i],p))
plot(x2,y2,label="Interpolation p="+str(p))
yrbf = Rbf(x, y)
fi = yrbf(x2)
plot(x2, fi, label="Radial Basis Function")
ius = InterpolatedUnivariateSpline(x, y)
yius = ius(x2)
plot(x2, yius, label="Univariate Spline")
legend(loc=2)
show()
The results are interesting and probably more suitable to my intended usage. The following figure was produced.
But the RBF implementation in SciPy (google for alternatives) has a major problem when points are repeated - not likely in a real scenario - and goes completely ballistic:
When smoothed (smooth=0.1 was used) it goes normal again. This might show some programming weirdness.

Finding best fit boxes of a scatter plot using python?

I'm looking for the best python library to solve this problem:
I have a scatter plot with clumps over data points. This is just a series of x,y coordinate pairs.
I want a tool that will look at the data points I have, then suggest N 'boxes' that encompass the different groups.
Presumably I could go with higher or lower granularity by choosing how many boxes I wanted to use.
Are there any python libraries out there best suited to solve this type of problem?

The way I understand your question, you want to find boxes that enclose clouds of data points.
You define your granularity criterion as the number of boxes used to describe your data set.
I think what you are looking for is agglomerative hierarchical clustering. The algorithm is quite straight forward. Let n be the number of data points you have in the set. Basically, the algorithm starts by considering n groups, each one being populated by a single point. Then, it is an iterative process :
Merge the two closest groups according to a distance criterion
Since the groups set has changed, update the distances between the groups
Back to the merge step until either you reached a specific number of clusters or a specific distance threshold
You can also build the dendogram. It is a tree-like structure that will store the history of all the merging process, allowing you to retrieve any level of granularity between 1 cluster and n clusters.
There is a set of functions in Scipy that are dedicated to this algorithm. It is covered by the question Tutorial for scipy.cluster.hierarchy.
Getting the clusters is the first step, now you can build your boxes. Lets cover this in a so-called mathematical point of view. Let C be a cluster and P1, ... Pn the points of the cluster. If a rectangular box is fine, then it can be defined by the two points of coordinates (xmin, ymin) and (xmax, ymax), with :
xmin = min (P.x P ∈ C)
ymin = min (P.y P ∈ C )
xmax = max (P.x P ∈ C )
xmax = max (P.y P ∈ C )
EDIT :
This way of building the boxes is the dumbest possible. If you want something that really fits, you'll have to look on building the convex hull of each cluster.

Interpolation and Extrapolation of Randomly Scattered data to Uniform Grid in 3D

I have a 256 x 256 x 32 grid of regularly spaced points ranging over x, y, and z and with an associated variable "a". I also have a group of randomly scattered points in a more confined x, y, z space, with an associated variable "b". What I essentially want to do is interpolate and extrapolate my random data to a regularly spaced grid that matches the "a" cube, as shown below:
I have used scipy's griddata so far to achieve the interpolation, which seems to work fine, but it cannot handle the extrapolation (as far as I know) and the output sharply truncates to 'nan' values. Whilst researching this problem I came across a couple of people using griddata a second time with 'nearest' as the interpolation method to fill in the 'nan' values. I tried this but the results don't seem reliable. More appropriate looking results are obtained if I use a fill_Value with 'linear' mode, but at the moment it's more a fudge because fill_Value has to be a constant.
I noticed that MATLAB has a ScatteredInterpolant class which seems to do what I want, but I am unable to find an equivalent class in Python, nor figure out how to implement such a routine efficiently in 3D. Any help is greatly appreciated.
The code I am using for the interpolation is below:
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
# Create cube to match aCube dimensions
xi = np.linspace(-xmax_aCube, xmax_aCube, 256)
yi = np.linspace(-ymax_aCube, ymax_aCube, 256)
zi = np.linspace(zmin_aCube, zmax_aCube, 32)
# Interpolate scattered points
X, Y, Z = np.meshgrid(xi, yi, zi)
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear')

This discussion applies in any dimensionality. For your 3D case lets talk about computational geometry first, to understand why part of the region gives NaN from griddata.
The scattered points in your volume make up a convex hull; a geometric shape with the following properties:
The surface is always convex (as the name suggests)
The volume of the shape is the lowest possible without violating convexity
The surface (in 3d) is triangulated and closed
Less formally, the convex hull (which you can compute easily with scipy) is like stretching a balloon over a frame, where the frame corners are the outermost points of your scattered cluster.
At the regular grid location inside the balloon you're surrounded by known points. You can interpolate to these locations. Outside it, you have to extrapolate.
Extrapolation is hard. There's no general rule for how to do it... it's problem-specific. In that region, algorithms like griddata choose to return NaN - this is the safest way of informing the scientist that s/he must choose a sensible way of extrapolating.
Let's go through some ways of doing that.
1. [WORST] Botch it
Assign some scalar value outside the hull. In the numpy docs you'll see this is done with:
s = mean(b)
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=s)
Cons: This produces a sharp discontinuity in the interpolated field at the hull boundary, heavily biases the mean scalar field value and doesn't respect the functional form of the data.
2. [NEXT WORST] "Blended botching it"
Assume that at the corners of your domain, you apply some value. This might be the average value of the scalar field associated with your scattered points.
Sorry, this is pseudocode as I don't use numpy at all, but it'll probably be fairly clear
# With a unit cube, and selected scalar value
x, y, z, b = np.loadtxt(scatteredfile, unpack = True)
s = mean(b)
x.append([0 0 0 0 1 1 1 1])
y.append([0 0 1 1 0 0 1 1])
z.append([0 1 0 1 0 1 0 1])
b.append([s s s s s s s s])
# drop in the rest of your code
Cons: This produces a sharp discontinuity in gradient of the interpolated field at the hull boundary, fairly heavily biases the mean scalar field value and doesn't respect the functional form of the data.
3. [STILL PRETTY BAD] Nearest neighbour
For each of the regular NaN points, find the nearest non-NaN and assign that value. This is effective and stable, but crude because your field can end up with patterned features (like stripes or beams radiating out from the hull), often visually unappealing or, worse, unacceptable in terms of data smoothness
Depending on the density of data, you could use the nearest scattered datapoint instead of the nearest non-NaN regular point. This can be done simply by (again, pseudocode):
bCube = griddata((x, y, z), b, (X, Y, Z), method = 'linear', fill_value=nan)
bCubeNearest = griddata((x, y, z), b, (X, Y, Z), method = 'nearest')
indicesMask = isNan(bCube)
# Use nearest interpolation outside the hull, keeping linear interpolation inside.
bCube(indicesMask) = bCubeNearest(indicesMask)
Using MATLAB's delaunay based approaches will reveal more powerful methods for achieving similar in a one-liner, but numpy looks a bit limited here.
4. [NOT ALWAYS TERRIBLE] Naturally weighted
apologies for poor explanation in this section, I've never written the algorithm but I'm sure some research on the natural neighbour technique will get you far
Use a distance weighting function with some parameter D, which might be similar to, or twice (say) the length of your box. You can adjust. For each NaN location, figure out the distance to each of the scattered points.
# Don't do it this way for anything but small matrices - this is O(NM)
# and it can be done much more effectively (e.g. MATLAB has a quick
# natural weighting option), but for illustrative purposes:
for each NaN point 1:N
for each scattered point 1:M
calculate a basis function using inverse distance from NaN to point, normalised on D, and store in a [1 x M] vector of weights
Multiply weights by the b value, summate and divide by M
You basically want to end up with a function that smoothly goes to the average intensity of B at a distance D away from the hull, but coincides with the hull at the boundary. Away from the boundary it is weighted most strongly on its nearest points.
Pros: nicely stable and reasonably continuous. Because of the weighting, is more resilient to noise at single data points than nearest neighbour.
5. [HEROIC ROCKSTAR] Functional form assumption
What do you know about the physics? Assume a functional form that represents what you expect the physics to do, then do a least squares (or some equivalent) fit of that form to the scattered data. Use the function to stabilise the extrapolation.
Some good ideas which can help you construct a function:
Do you expect symmetry or periodicity?
Is b a component of a vector field which has some property like zero divergence?
Directionality: do you expect all corners to be the same? Or maybe a linear variation in one direction?
is field b at a point in time - perhaps a smoothed timeseries of measurements can be used to come up with a basic function?
Is there already a known form like a gaussian or quadratic?
Some examples:
b represents intensity of a laser beam passing thru a volume. You expect the entry side to be nominally identical to the outlet, with the other four boundaries of zero intensity. The intensity will have a concentric gaussian profile.
b is one component of a velocity field in an incompressible fluid. The fluid must be divergence free, so any field produced in the NaN zone must also be divergence free so you apply this condition.
b represents temperature in a room. You expect higher temperature at the top, because hot air rises.
b represents lift on an aerofoil, tested over three independent variables. You can look up the lift at stall easily, so know exactly what it'll be in some parts of the space.
Pros/Cons: Get this right and it'll be awesome. Get it wrong, especially with nonlinear functional forms, and it will go very wrong and can lead to very unstable results.
Health warning you can't assume a functional form, get pretty results, then use them to prove that the functional form is correct. That's just bad science. The form needs to be something well behaved and known independent of your data analysis.

If your scatter of points conforms fairly well to a cube shape, one approach could be to use griddata to interpolate onto a regular grid of data that fits within your point cloud (therefore avoiding nans) and then use this regular grid of values as the input to interpn which does facilitate linear extrapolation (but requires a regular grid as input).
This way you can use griddata as before for all the points within the convex hull of your scatter of points and you can use interpn to estimate the points that are returned as nans.
This is far from perfect, but I think it comes closer to achieving what you are looking for.
Pros:
Avoids sharp discontinuities.
Captures the basic linear trends at the edge of your dataset without having to know the functional form.
Respects asymmetries in your data (e.g. doesn't tend to the population mean at large distances, so one side of your dataset can have larger values than the other at large distances.)
Cons:
The effectiveness of this approach will depend a lot on how large a cube you can fit within the convex hull of your initial scatter of points. If your data is spikey/patchy and irregular then even points on the edge of the convex hull may have been extrapolated significant distances from the edge of the nested cube, incurring errors as the extrapolation won't be taking into account nearer data points that lie outside the cube.
The linear extrapolation will be heavily influenced by noise in the data
at the edges of the point cloud.
Computational cost of doing two sets of interpolations.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.