Equally spaced spline evaluation in scipy

Equally spaced spline evaluation in scipy - python

I'm using scipy for building a bivariate spline of a curve (similar to an ellipse), with splprep and splev. The purpose is to smooth the points.
The problem is that the points I'm trying to smooth are not evenly distributed along the path, and when I try to evaluate the spline I will get uneven distribution, but I would like to have uniformly distributed points on the spline.
Here's an example, showing what my data looks like and a similar result (in my case this effect is, in fact, much more evident):
t = np.r_[0:2*np.pi:100.j, 0.142:np.pi+0.1:100j, 0.07+np.pi/2:0.23+np.pi:200j]
t = np.random.normal(t, 0.01)
t = np.unique(t)
# plt.plot(t)
r = np.asarray([1.0, 1.01] * (len(t) // 2)) # np.random.normal() # 1, 0.005, size=len(t))
xy = np.asarray([np.cos(t) * r, np.sin(t) * r]).T
# plt.plot(*xy.T, '.')
# plt.axis('equal')
tck, _ = splprep(xy.T, s=0, per=True)
xi, yi = splev(np.linspace(0, 1, 200), tck)
plt.subplots(figsize=(10, 10))
plt.plot(xi, yi, '.')
plt.axis('equal')
As you can see from the plot below, there is one area which is more dense of points: I would like to avoid this effect and have evenly spaced points (even better if they are spaced with a fixed angle relative to the centroid, e.g. 1 point every 0.5 degrees).
I think the reason for this is that points result in a "jagged" pattern in the dense area: see for example this plot showing how the points change in frequency at the top of the circle.
I think this is related to how u is computed in splprep (see doc) and I think I could fix the problem by tweaking the u parameter, but I don't know how: the way it is calculated is apparently fine right now, and I can't come up with a better strategy:
v = [0]
for i in range(1, len(xy)):
vi = v[i - 1] + sum((xy[i] - xy[i - 1]) ** 2) ** 0.5
v.append(vi)
u = [v[i] / v[-1] for i in range(1, len(xy))]
Considering that using the spline is the method I'm trying to use to remove extra points from the dataset (xy), the only idea I had is to recompute u in some way to get the desired effect, but I don't know how.
How can I smooth my data making sure that evaluated points on spline are roughly at the same distance one from the other?
Edit
I realized that I basically have to set u to be the angle of each point (divided by 2pi, to normalize within 0 and 1). I tried and points look like evenly spaced, but for some reason I get some outliers
uu = t / (2 * np.pi) # u1# 2
tck, _ = splprep(xy.T, u=uu, s=0, per=True)
xi, yi = splev(np.linspace(0, 1, 200), tck)
plt.subplots(figsize=(10, 10))
plt.plot(xi, yi)#, '.')
plt.axis('equal')
Problem is, I can't understand where these come from. I suspect it depends on how the spline is calculated, but can't figure out how to solve this issue. The only solution I can use right now is to use smoothing, but it's a very trial-and-error method that I'd rather not adopt.

Forcing u=t makes life too hard for the interpolator, because some of the t values are very close to each other while the corresponding points are not so close due to the varying r. This results in large deviations of the interpolating curve from the data, i.e., outliers on your second plot.
Instead, compute the spline with the default u, and then reparametrize proportional to polar angle. To this end, I first evaluate the spline at equally spaced values in the parameter domain (as in your first attempt), find the polar angle of each resulting point with unwrap(arctan2), and then find the inverse of the u->angle function with linear interpolation. This inverse function, inserted in the spline, results in uniformly spaced points according to their polar angle.
xx, yy = splev(np.linspace(0, 1, 200), tck)
s = np.unwrap(np.arctan2(yy, xx))
s_inv = np.interp(np.linspace(s[0], s[-1], len(s)), s, np.linspace(0, 1, len(s)))
xi, yi = splev(s_inv, tck)

Related

Converting gaussian to histogram

I'm running a model of particles, and I want to have initial conditions for the particle locations mimicking a gaussian distribution.
If I have N number of particles on 1D grid from -10 to 10, I want them to be distributed on the grid according to a gaussian with a known mean and standard deviation. It's basically creating a histogram where each bin width is 1 (the x-axis of locations resolution is 1), and the frequency of each bin should be how many particles are in it, which should all add up to N.
My strategy was to plot a gaussian function on the x-axis grid, and then just approximate the value of each point for the number of particles:
def gaussian(x, mu, sig):
return 1./(np.sqrt(2.*np.pi)*sig)*np.exp(-np.power((x - mu)/sig, 2.)/2)
mean = 0
sigma = 1
x_values = np.arange(-10, 10, 1)
y = gaussian(x_values, mean, sigma)
However, I have normalization issues (the sum doesn't add up to N), and the number of particles in each point should be an integer (I thought about converting the y array to integers but again, because of the normalization issue I get a flat line).
Usually, the problem is fitting a gaussian to histogram, but in my case, I need to do the reverse - and I couldn't find a solution for it yet. I will appreciate any help!
Thank you!!!

You can use numpy.random.normal to sample this distribution. You can get N points inside range (-10, 10) that follows Gaussian distribution with the following code.
import numpy as np
import matplotlib.pyplot as plt
N = 10000
mean = 5
sigma = 3
bin_edges = np.arange(-10, 11, 1)
x_values = (bin_edges[1:] + bin_edges[:-1]) / 2
points = np.random.normal(mean, sigma, N * 10)
mask = np.logical_and(points < 10, points > -10)
points = points[mask] # drop points outside range
points = points[:N] # only use the first N points
y, _ = np.histogram(points, bins=bin_edges)
plt.scatter(x_values, y)
plt.show()
The idea is to generate a lot of random numbers (10 N in the code), and ignores the points outside your desired range.

How to generate a random sample of points from a 3-D ellipsoid using Python?

I am trying to sample around 1000 points from a 3-D ellipsoid, uniformly. Is there some way to code it such that we can get points starting from the equation of the ellipsoid?
I want points on the surface of the ellipsoid.

Theory
Using this excellent answer to the MSE question How to generate points uniformly distributed on the surface of an ellipsoid? we can
generate a point uniformly on the sphere, apply the mapping f :
(x,y,z) -> (x'=ax,y'=by,z'=cz) and then correct the distortion
created by the map by discarding the point randomly with some
probability p(x,y,z).
Assuming that the 3 axes of the ellipsoid are named such that
0 < a < b < c
We discard a generated point with
p(x,y,z) = 1 - mu(x,y,y)/mu_max
probability, ie we keep it with mu(x,y,y)/mu_max probability where
mu(x,y,z) = ((acy)^2 + (abz)^2 + (bcx)^2)^0.5
and
mu_max = bc
Implementation
import numpy as np
np.random.seed(42)
# Function to generate a random point on a uniform sphere
# (relying on https://stackoverflow.com/a/33977530/8565438)
def randompoint(ndim=3):
vec = np.random.randn(ndim,1)
vec /= np.linalg.norm(vec, axis=0)
return vec
# Give the length of each axis (example values):
a, b, c = 1, 2, 4
# Function to scale up generated points using the function `f` mentioned above:
f = lambda x,y,z : np.multiply(np.array([a,b,c]),np.array([x,y,z]))
# Keep the point with probability `mu(x,y,z)/mu_max`, ie
def keep(x, y, z, a=a, b=b, c=c):
mu_xyz = ((a * c * y) ** 2 + (a * b * z) ** 2 + (b * c * x) ** 2) ** 0.5
return mu_xyz / (b * c) > np.random.uniform(low=0.0, high=1.0)
# Generate points until we have, let's say, 1000 points:
n = 1000
points = []
while len(points) < n:
[x], [y], [z] = randompoint()
if keep(x, y, z):
points.append(f(x, y, z))
Checks
Check if all points generated satisfy the ellipsoid condition (ie that x^2/a^2 + y^2/b^2 + z^2/c^2 = 1):
for p in points:
pscaled = np.multiply(p,np.array([1/a,1/b,1/c]))
assert np.allclose(np.sum(np.dot(pscaled,pscaled)),1)
Runs without raising any errors. Visualize the points:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(projection="3d")
points = np.array(points)
ax.scatter(points[:, 0], points[:, 1], points[:, 2])
# set aspect ratio for the axes using https://stackoverflow.com/a/64453375/8565438
ax.set_box_aspect((np.ptp(points[:, 0]), np.ptp(points[:, 1]), np.ptp(points[:, 2])))
plt.show()
These points seem evenly distributed.
Problem with currently accepted answer
Generating a point on a sphere and then just reprojecting it without any further corrections to an ellipse will result in a distorted distribution. This is essentially the same as setting this posts's p(x,y,z) to 0. Imagine an ellipsoid where one axis is orders of magnitude bigger than another. This way, it is easy to see, that naive reprojection is not going to work.

Consider using Monte-Carlo simulation: generate a random 3D point; check if the point is inside the ellipsoid; if it is, keep it. Repeat until you get 1,000 points.
P.S. Since the OP changed their question, this answer is no longer valid.

J.F. Williamson, "Random selection of points distributed on curved surfaces", Physics in Medicine & Biology 32(10), 1987, describes a general method of choosing a uniformly random point on a parametric surface. It is an acceptance/rejection method that accepts or rejects each candidate point depending on its stretch factor (norm-of-gradient). To use this method for a parametric surface, several things have to be known about the surface, namely—
x(u, v), y(u, v) and z(u, v), which are functions that generate 3-dimensional coordinates from two dimensional coordinates u and v,
The ranges of u and v,
g(point), the norm of the gradient ("stretch factor") at each point on the surface, and
gmax, the maximum value of g for the entire surface.
The algorithm is then:
Generate a point on the surface, xyz.
If g(xyz) >= RNDU01()*gmax, where RNDU01() is a uniform random variate in [0, 1), accept the point. Otherwise, repeat this process.
Chen and Glotzer (2007) apply the method to the surface of a prolate spheroid (one form of ellipsoid) in "Simulation studies of a phenomenological model for elongated virus capsid formation", Physical Review E 75(5), 051504 (preprint).

Here is a generic function to pick a random point on a surface of a sphere, spheroid or any triaxial ellipsoid with a, b and c parameters. Note that generating angles directly will not provide uniform distribution and will cause excessive population of points along z direction. Instead, phi is obtained as an inverse of randomly generated cos(phi).
import numpy as np
def random_point_ellipsoid(a,b,c):
u = np.random.rand()
v = np.random.rand()
theta = u * 2.0 * np.pi
phi = np.arccos(2.0 * v - 1.0)
sinTheta = np.sin(theta);
cosTheta = np.cos(theta);
sinPhi = np.sin(phi);
cosPhi = np.cos(phi);
rx = a * sinPhi * cosTheta;
ry = b * sinPhi * sinTheta;
rz = c * cosPhi;
return rx, ry, rz
This function is adopted from this post: https://karthikkaranth.me/blog/generating-random-points-in-a-sphere/

One way of doing this whch generalises for any shape or surface is to convert the surface to a voxel representation at arbitrarily high resolution (the higher the resolution the better but also the slower). Then you can easily select the voxels randomly however you want, and then you can select a point on the surface within the voxel using the parametric equation. The voxel selection should be completely unbiased, and the selection of the point within the voxel will suffer the same biases that come from using the parametric equation but if there are enough voxels then the size of these biases will be very small.
You need a high quality cube intersection code but with something like an elipsoid that can optimised quite easily. I'd suggest stepping through the bounding box subdivided into voxels. A quick distance check will eliminate most cubes and you can do a proper intersection check for the ones where an intersection is possible. For the point within the cube I'd be tempted to do something simple like a random XYZ distance from the centre and then cast a ray from the centre of the elipsoid and the selected point is where the ray intersects the surface. As I said above, it will be biased but with small voxels, the bias will probably be small enough.
There are libraries that do convex shape intersection very efficiently and cube/elipsoid will be one of the options. They will be highly optimised but I think the distance culling would probably be worth doing by hand whatever. And you will need a library that differentiates between a surface intersection and one object being totally inside the other.
And if you know your elipsoid is aligned to an axis then you can do the voxel/edge intersection very easily as a stack of 2D square intersection elipse problems with the set of squares to be tested defined as those that are adjacent to those in the layer above. That might be quicker.
One of the things that makes this approach more managable is that you do not need to write all the code for edge cases (it is a lot of work to get around issues with floating point inaccuracies that can lead to missing or doubled voxels at the intersection). That's because these will be very rare so they won't affect your sampling.
It might even be quicker to simply find all the voxels inside the elipse and then throw away all the voxels with 6 neighbours... Lots of options. It all depends how important performance is. This will be much slower than the opther suggestions but if you want ~1000 points then ~100,000 voxels feels about the minimum for the surface, so you probably need ~1,000,000 voxels in your bounding box. However even testing 1,000,000 intersections is pretty fast on modern computers.

Depending on what "uniformly" refers to, different methods are applicable. In any case, we can use the parametric equations using spherical coordinates (from Wikipedia):
where s = 1 refers to the ellipsoid given by the semi-axes a > b > c. From these equations we can derive the relevant volume/area element and generate points such that their probability of being generated is proportional to that volume/area element. This will provide constant volume/area density across the surface of the ellipsoid.
1. Constant volume density
This method generates points on the surface of an ellipsoid such that their volume density across the surface of the ellipsoid is constant. A consequence of this is that the one-dimensional projections (i.e. the x, y, z coordinates) are uniformly distributed; for details see the plot below.
The volume element for a triaxial ellipsoid is given by (see here):
and is thus proportional to sin(theta) (for 0 <= theta <= pi). We can use this as the basis for a probability distribution that indicates "how many" points should be generated for a given value of theta: where the area density is low/high, the probability for generating a corresponding value of theta should be low/high, too.
Hence, we can use the function f(theta) = sin(theta)/2 as our probability distribution on the interval [0, pi]. The corresponding cumulative distribution function is F(theta) = (1 - cos(theta))/2. Now we can use Inverse transform sampling to generate values of theta according to f(theta) from a uniform random distribution. The values of phi can be obtained directly from a uniform distribution on [0, 2*pi].
Example code:
import matplotlib.pyplot as plt
import numpy as np
from numpy import sin, cos, pi
rng = np.random.default_rng(seed=0)
a, b, c = 10, 3, 1
N = 5000
phi = rng.uniform(0, 2*pi, size=N)
theta = np.arccos(1 - 2*rng.random(size=N))
x = a*sin(theta)*cos(phi)
y = b*sin(theta)*sin(phi)
z = c*cos(theta)
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(x, y, z, s=2)
plt.show()
which produces the following plot:
The following plot shows the one-dimensional projections (i.e. density plots of x, y, z):
import seaborn as sns
sns.kdeplot(data=dict(x=x, y=y, z=z))
plt.show()
2. Constant area density
This method generates points on the surface of an ellipsoid such that their area density is constant across the surface of the ellipsoid.
Again, we start by calculating the corresponding area element. For simplicity we can use SymPy:
from sympy import cos, sin, symbols, Matrix
a, b, c, t, p = symbols('a b c t p')
x = a*sin(t)*cos(p)
y = b*sin(t)*sin(p)
z = c*cos(t)
J = Matrix([
[x.diff(t), x.diff(p)],
[y.diff(t), y.diff(p)],
[z.diff(t), z.diff(p)],
])
print((J.T # J).det().simplify())
This yields
-a**2*b**2*sin(t)**4 + a**2*b**2*sin(t)**2 + a**2*c**2*sin(p)**2*sin(t)**4 - b**2*c**2*sin(p)**2*sin(t)**4 + b**2*c**2*sin(t)**4
and further simplifies to (dividing by (a*b)**2 and taking the sqrt):
sin(t)*np.sqrt(1 + ((c/b)**2*sin(p)**2 + (c/a)**2*cos(p)**2 - 1)*sin(t)**2)
Since for this case the area element is more complex, we can use rejection sampling:
import matplotlib.pyplot as plt
import numpy as np
from numpy import cos, sin
def f_redo(t, p):
return (
sin(t)*np.sqrt(1 + ((c/b)**2*sin(p)**2 + (c/a)**2*cos(p)**2 - 1)*sin(t)**2)
< rng.random(size=t.size)
)
rng = np.random.default_rng(seed=0)
N = 5000
a, b, c = 10, 3, 1
t = rng.uniform(0, np.pi, size=N)
p = rng.uniform(0, 2*np.pi, size=N)
redo = f_redo(t, p)
while redo.any():
t[redo] = rng.uniform(0, np.pi, size=redo.sum())
p[redo] = rng.uniform(0, 2*np.pi, size=redo.sum())
redo[redo] = f_redo(t[redo], p[redo])
x = a*np.sin(t)*np.cos(p)
y = b*np.sin(t)*np.sin(p)
z = c*np.cos(t)
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(x, y, z, s=2)
plt.show()
which yields the following distribution:
The following plot shows the corresponding one-dimensional projections (x, y, z):

Using numpy/scipy to identify slope changes in digital signals?

I am trying to come up with a generalised way in Python to identify pitch rotations occurring during a set of planned spacecraft manoeuvres. You could think of it as a particular case of a shift detection problem.
Let's consider the solar_elevation_angle variable in my set of measurements, identifying the elevation angle of the sun measured from the spacecraft's instrument. For those who might want to play with the data, I saved the solar_elevation_angle.txt file here.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy.signal import argrelmax
from scipy.ndimage.filters import gaussian_filter1d
solar_elevation_angle = np.loadtxt("solar_elevation_angle.txt", dtype=np.float32)
fig, ax = plt.subplots()
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
plt.show()
The scanline is my time dimension. The four points where the slope changes identify the spacecraft pitch rotations.
As you can see, the solar elevation angle evolution outside the spacecraft manoeuvres regions is pretty much linear as a function of time, and that should always be the case for this particular spacecraft (except for major failures).
Note that during each spacecraft manoeuvre, the slope change is obviously continuous, although discretised in my set of angle values. That means: for each manoeuvre, it does not really make sense to try to locate a single scanline where a manoeuvre has taken place. My goal is rather to identify, for each manoeuvre, a "representative" scanline in the range of scanlines defining the interval of time where the manoeuvre occurred (e.g. middle value, or left boundary).
Once I get a set of "representative" scanline indexes where all manoeuvres have taken place, I could then use those indexes for rough estimations of manoeuvres durations, or to automatically place labels on the plot.
My solution so far has been to:
Compute the 2nd derivative of the solar elevation angle using
np.gradient.
Compute absolute value and clipping of resulting
curve. The clipping is necessary because of what I assume to be
discretisation noise in the linear segments, which would then severely affect the identification of the "real" local maxima in point 4.
Apply smoothing to the resulting curve, to get rid of multiple peaks. I'm using scipy's 1d gaussian filter with a trial-and-error sigma value for that.
Identify local maxima.
Here's my code:
fig = plt.figure(figsize=(8,12))
gs = gridspec.GridSpec(5, 1)
ax0 = plt.subplot(gs[0])
ax0.set_title('Solar elevation angle')
ax0.plot(solar_elevation_angle)
solar_elevation_angle_1stdev = np.gradient(solar_elevation_angle)
ax1 = plt.subplot(gs[1])
ax1.set_title('1st derivative')
ax1.plot(solar_elevation_angle_1stdev)
solar_elevation_angle_2nddev = np.gradient(solar_elevation_angle_1stdev)
ax2 = plt.subplot(gs[2])
ax2.set_title('2nd derivative')
ax2.plot(solar_elevation_angle_2nddev)
solar_elevation_angle_2nddev_clipped = np.clip(np.abs(np.gradient(solar_elevation_angle_2nddev)), 0.0001, 2)
ax3 = plt.subplot(gs[3])
ax3.set_title('absolute value + clipping')
ax3.plot(solar_elevation_angle_2nddev_clipped)
smoothed_signal = gaussian_filter1d(solar_elevation_angle_2nddev_clipped, 20)
ax4 = plt.subplot(gs[4])
ax4.set_title('Smoothing applied')
ax4.plot(smoothed_signal)
plt.tight_layout()
plt.show()
I can then easily identify the local maxima by using scipy's argrelmax function:
max_idx = argrelmax(smoothed_signal)[0]
print(max_idx)
# [ 689 1019 2356 2685]
Which correctly identifies the scanline indexes I was looking for:
fig, ax = plt.subplots()
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
ax.scatter(max_idx, solar_elevation_angle[max_idx], marker='x', color='red')
plt.show()
My question is: Is there a better way to approach this problem?
I find that having to manually specify the clipping threshold values to get rid of the noise and the sigma in the gaussian filter weakens this approach considerably, preventing it to be applied to other similar cases.

First improvement would be to use a Savitzky-Golay filter to find the derivative in a less noisy way. For example, it can fit a parabola (in the sense of least squares) to each data slice of certain size, and then take the second derivative of that parabola. The result is much nicer than just taking 2nd order difference with gradient. Here it is with window size 101:
savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)
Second, instead of looking for points of maximum with argrelmax it is better to look for places where the second derivative is large; for example, at least half its maximal size. This will of course return many indexes, but we can then look at the gaps between those indexes to identify where each peak begins and ends. The midpoint of the peak is then easily found.
Here is the complete code. The only parameter is window size, which is set to 101. The approach is robust; the size 21 or 201 gives essentially the same outcome (it must be odd).
from scipy.signal import savgol_filter
window = 101
der2 = savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)
max_der2 = np.max(np.abs(der2))
large = np.where(np.abs(der2) > max_der2/2)[0]
gaps = np.diff(large) > window
begins = np.insert(large[1:][gaps], 0, large[0])
ends = np.append(large[:-1][gaps], large[-1])
changes = ((begins+ends)/2).astype(np.int)
plt.plot(solar_elevation_angle)
plt.plot(changes, solar_elevation_angle[changes], 'ro')
plt.show()
The fuss with insert and append is because the first index with large derivative should qualify as "peak begins" and the last such index should qualify as "peak ends", even though they don't have a suitable gap next to them (the gap is infinite).
Piecewise linear fit
This is an alternative (not necessarily better) approach, which does not use derivatives: fit a smoothing spline of degree 1 (i.e., a piecewise linear curve), and notice where its knots are.
First, normalize the data (which I call y instead of solar_elevation_angle) to have standard deviation 1.
y /= np.std(y)
The first step is to build a piecewise linear curve that deviates from the data by at most the given threshold, arbitrarily set to 0.1 (no units here because y was normalized). This is done by calling UnivariateSpline repeatedly, starting with a large smoothing parameter and gradually reducing it until the curve fits. (Unfortunately, one can't simply pass in the desired uniform error bound).
from scipy.interpolate import UnivariateSpline
threshold = 0.1
m = y.size
x = np.arange(m)
s = m
max_error = 1
while max_error > threshold:
spl = UnivariateSpline(x, y, k=1, s=s)
interp_y = spl(x)
max_error = np.max(np.abs(interp_y - y))
s /= 2
knots = spl.get_knots()
values = spl(knots)
So far we found the knots, and noted the values of the spline at those knots. But not all of these knots are really important. To test the importance of each knot, I remove it and interpolate without it. If the new interpolant is substantially different from the old (doubling the error), the knot is considered important and is added to the list of found slope changes.
ts = knots.size
idx = np.arange(ts)
changes = []
for j in range(1, ts-1):
spl = UnivariateSpline(knots[idx != j], values[idx != j], k=1, s=0)
if np.max(np.abs(spl(x) - interp_y)) > 2*threshold:
changes.append(knots[j])
plt.plot(y)
plt.plot(changes, y[np.array(changes, dtype=int)], 'ro')
plt.show()
Ideally, one would fit piecewise linear functions to given data, increasing the number of knots until adding one more does not bring "substantial" improvement. The above is a crude approximation of that with SciPy tools, but far from best possible. I don't know of any off-the-shelf piecewise linear model selection tool in Python.

Higher order local interpolation of implicit curves in Python

Given a set of points describing some trajectory in the 2D plane, I would like to provide a smooth representation of this trajectory with local high order interpolation.
For instance, say we define a circle in 2D with 11 points in the figure below. I would like to add points in between each consecutive pair of points in order or produce a smooth trace. Adding points on every segment is easy enough, but it produces slope discontinuities typical for a "local linear interpolation". Of course it is not an interpolation in the classical sense, because
the function can have multiple y values for a given x
simply adding more points on the trajectory would be fine (no continuous representation is needed).
so I'm not sure what would be the proper vocabulary for this.
The code to produce this figure can be found below. The linear interpolation is performed with the lin_refine_implicit function. I'm looking for a higher order solution to produce a smooth trace and I was wondering if there is a way of achieving it with classical functions in Scipy? I have tried to use various 1D interpolations from scipy.interpolate without much success (again because of multiple y values for a given x).
The end goals is to use this method to provide a smooth GPS trajectory from discrete measurements, so I would think this should have a classical solution somewhere.
import numpy as np
import matplotlib.pyplot as plt
def lin_refine_implicit(x, n):
"""
Given a 2D ndarray (npt, m) of npt coordinates in m dimension, insert 2**(n-1) additional points on each trajectory segment
Returns an (npt*2**(n-1), m) ndarray
"""
if n > 1:
m = 0.5*(x[:-1] + x[1:])
if x.ndim == 2:
msize = (x.shape[0] + m.shape[0], x.shape[1])
else:
raise NotImplementedError
x_new = np.empty(msize, dtype=x.dtype)
x_new[0::2] = x
x_new[1::2] = m
return lin_refine_implicit(x_new, n-1)
elif n == 1:
return x
else:
raise ValueError
n = 11
r = np.arange(0, 2*np.pi, 2*np.pi/n)
x = 0.9*np.cos(r)
y = 0.9*np.sin(r)
xy = np.vstack((x, y)).T
xy_highres_lin = lin_refine_implicit(xy, n=3)
plt.plot(xy[:,0], xy[:,1], 'ob', ms=15.0, label='original data')
plt.plot(xy_highres_lin[:,0], xy_highres_lin[:,1], 'dr', ms=10.0, label='linear local interpolation')
plt.legend(loc='best')
plt.plot(x, y, '--k')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('GPS trajectory')
plt.show()

This is called parametric interpolation.
scipy.interpolate.splprep provides spline approximations for such curves. This assumes you know the order in which the points are on the curve.
If you don't know which point comes after which on the curve, the problem becomes more difficult. I think in this case, the problem is called manifold learning, and some of the algorithms in scikit-learn may be helpful in that.

I would suggest you try to transform your cartesian coordinates into polar coordinates, that should allow you to use the standard scipy.interpolation without issues as you won't have the ambiguity of the x->y mapping anymore.

Extrapolation from curved datapoints

I can't quite wrap my head around on how to extrapolate from a dataset where the points are not ordered, i.e. be decreasing for 'x'. like so:
http://www.pic-host.org/images/2014/07/21/0b5ad6a11266f549.png
I got that I need to create a plot for the x and y values seperately. So the code that gets me this: (The points are ordered)
x = bananax
y = bananay
t = np.arange(x.shape[0], dtype=float)
t /= t[-1]
nt = np.linspace(0, 1, 100)
x1 = scipy.interpolate.spline(t, x, nt)
y1 = scipy.interpolate.spline(t, y, nt)
plt.plot(nt, x1, label='data x')
plt.plot(nt, y1, label='data y')
Now I got the interpolated splines. I guess I have to do the extrapolation for f(nt)=x1 and f(nt)=y1 respectivly. I get how to interpolate from the data with a simple linear regression but I'm missing how to get a more complex spline(?) extrapolated from it.
The aim is to let the extrapolated function follow the curvature of the datapoints. (At one end at least)
Cheers, and thanks!

I believe that you're on the right track in that you're creating a parametric curve (creating x(t) and y(t)) because the points are ordered. Part of issue seems to be that the spline function is giving you back discrete values rather than the form and parameters of the spline. scipy.optimize has some nice tools that will help you find functions rather than calculating points
If you've got any insight into the underlying process generating the data I suggest that you use that to help select a functional form for fitting. These more free-form methods will give you a degree of flexibility to do so.
Fit x(t) and y(t) and hold onto the resulting fitting functions. They'll be generated with data from t=0 to t=1 but nothing* will stop you from evaluating them outside that range.
I can recommend the following links for guidance on curve fitting procedure:
short: http://glowingpython.blogspot.com/2011/05/curve-fitting-using-fmin.html
long: http://nbviewer.ipython.org/gist/keflavich/4042018
*almost nothing

Thanks this got me on the right track. What worked for me was:
x = bananax
y = bananay
#------ fit a spline to the coordinates, x and y axis are interpolated towards t
t = np.arange(x.shape[0], dtype=float) #t is # of values
t /= t[-1] #t is now devided from 0 to 1
nt = np.linspace(0, 1, 100) #nt is array with values from 0 to 1 with 100 intermediate values
x1 = scipy.interpolate.spline(t, x, nt) #The x values where spline should estimate the y values
y1 = scipy.interpolate.spline(t, y, nt)
#------ create a new linear space for nnt in which an extrapolation from the interpolated spline will be made
nnt = np.linspace(-1, 1, 100) #values <0 are extrapolated (interpolation started at the tip(=0)
x1fit = np.polyfit(nt,x1,3) #fits a polynomial function of the nth order with the spline as input, output are the function parameters
y1fit = np.polyfit(nt,y1,3)
xpoly = np.poly1d(x1fit) #genereates the function based on the parameters obtained by polyfit
ypoly = np.poly1d(y1fit)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.