Extrapolation from curved datapoints - python

I can't quite wrap my head around on how to extrapolate from a dataset where the points are not ordered, i.e. be decreasing for 'x'. like so:
I got that I need to create a plot for the x and y values seperately. So the code that gets me this: (The points are ordered)
x = bananax
y = bananay
t = np.arange(x.shape[0], dtype=float)
t /= t[-1]
nt = np.linspace(0, 1, 100)
x1 = scipy.interpolate.spline(t, x, nt)
y1 = scipy.interpolate.spline(t, y, nt)
plt.plot(nt, x1, label='data x')
plt.plot(nt, y1, label='data y')
Now I got the interpolated splines. I guess I have to do the extrapolation for f(nt)=x1 and f(nt)=y1 respectivly. I get how to interpolate from the data with a simple linear regression but I'm missing how to get a more complex spline(?) extrapolated from it.
The aim is to let the extrapolated function follow the curvature of the datapoints. (At one end at least)
Cheers, and thanks!

I believe that you're on the right track in that you're creating a parametric curve (creating x(t) and y(t)) because the points are ordered. Part of issue seems to be that the spline function is giving you back discrete values rather than the form and parameters of the spline. scipy.optimize has some nice tools that will help you find functions rather than calculating points
If you've got any insight into the underlying process generating the data I suggest that you use that to help select a functional form for fitting. These more free-form methods will give you a degree of flexibility to do so.
Fit x(t) and y(t) and hold onto the resulting fitting functions. They'll be generated with data from t=0 to t=1 but nothing* will stop you from evaluating them outside that range.
I can recommend the following links for guidance on curve fitting procedure:
short: http://glowingpython.blogspot.com/2011/05/curve-fitting-using-fmin.html
long: http://nbviewer.ipython.org/gist/keflavich/4042018
*almost nothing

Thanks this got me on the right track. What worked for me was:
x = bananax
y = bananay
#------ fit a spline to the coordinates, x and y axis are interpolated towards t
t = np.arange(x.shape[0], dtype=float) #t is # of values
t /= t[-1] #t is now devided from 0 to 1
nt = np.linspace(0, 1, 100) #nt is array with values from 0 to 1 with 100 intermediate values
x1 = scipy.interpolate.spline(t, x, nt) #The x values where spline should estimate the y values
y1 = scipy.interpolate.spline(t, y, nt)
#------ create a new linear space for nnt in which an extrapolation from the interpolated spline will be made
nnt = np.linspace(-1, 1, 100) #values <0 are extrapolated (interpolation started at the tip(=0)
x1fit = np.polyfit(nt,x1,3) #fits a polynomial function of the nth order with the spline as input, output are the function parameters
y1fit = np.polyfit(nt,y1,3)
xpoly = np.poly1d(x1fit) #genereates the function based on the parameters obtained by polyfit
ypoly = np.poly1d(y1fit)


Find two points/derivatives on curves between which the line is straight/constant

I'm plotting x and y points. This results in a curved line, the line is first bending and then after a point its straight and after some time it bends again. I want to retrieve those two points. Though x is linear and y is plotted against x but y is not linearly dependent on x.
I tried matplotlib for plotting and numpy polynomial functions, and am currently looking into splines, but it seems that for these y needs to be directly dependent on x.
Your data is noisy, so you can't use a simple numerical derivative. Instead, as you may have found already, you should fit it with a spline and then check the curvature of the spline.
Keying off this answer, you can fit a spline and calculate the second derivative (curvature) like this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = file['n']
y = file['Ds/2']
y_spline = UnivariateSpline(x, y)
x_range = np.linspace(x[0], x[-1], 1000) # or could use x_range = x
y_spline_deriv = y_spl.derivative(n=2)
curvature = y_spline_deriv(x_range)
Then you can find the start and end of the straight region like this:
straight_points = np.where(curvature.abs() <= 0.1)[0] # pick your threshold
start_idx = straight_points[0]
end_idx = straight_points[-1]
start_x = x_range[start_idx]
end_x = x_range[end_idx]
Alternatively, if you're mainly interested in finding the flattest part of the curve (as shown in your graphic), you could try calculating the first derivative and then finding regions where the slope is within some small amount of the minimum slope anywhere in the data. In that case, just substitute y_spline_deriv = y_spl.derivative(n=1) in the code above.

Equally spaced spline evaluation in scipy

I'm using scipy for building a bivariate spline of a curve (similar to an ellipse), with splprep and splev. The purpose is to smooth the points.
The problem is that the points I'm trying to smooth are not evenly distributed along the path, and when I try to evaluate the spline I will get uneven distribution, but I would like to have uniformly distributed points on the spline.
Here's an example, showing what my data looks like and a similar result (in my case this effect is, in fact, much more evident):
t = np.r_[0:2*np.pi:100.j, 0.142:np.pi+0.1:100j, 0.07+np.pi/2:0.23+np.pi:200j]
t = np.random.normal(t, 0.01)
t = np.unique(t)
# plt.plot(t)
r = np.asarray([1.0, 1.01] * (len(t) // 2)) # np.random.normal() # 1, 0.005, size=len(t))
xy = np.asarray([np.cos(t) * r, np.sin(t) * r]).T
# plt.plot(*xy.T, '.')
# plt.axis('equal')
tck, _ = splprep(xy.T, s=0, per=True)
xi, yi = splev(np.linspace(0, 1, 200), tck)
plt.subplots(figsize=(10, 10))
plt.plot(xi, yi, '.')
As you can see from the plot below, there is one area which is more dense of points: I would like to avoid this effect and have evenly spaced points (even better if they are spaced with a fixed angle relative to the centroid, e.g. 1 point every 0.5 degrees).
I think the reason for this is that points result in a "jagged" pattern in the dense area: see for example this plot showing how the points change in frequency at the top of the circle.
I think this is related to how u is computed in splprep (see doc) and I think I could fix the problem by tweaking the u parameter, but I don't know how: the way it is calculated is apparently fine right now, and I can't come up with a better strategy:
v = [0]
for i in range(1, len(xy)):
vi = v[i - 1] + sum((xy[i] - xy[i - 1]) ** 2) ** 0.5
u = [v[i] / v[-1] for i in range(1, len(xy))]
Considering that using the spline is the method I'm trying to use to remove extra points from the dataset (xy), the only idea I had is to recompute u in some way to get the desired effect, but I don't know how.
How can I smooth my data making sure that evaluated points on spline are roughly at the same distance one from the other?
I realized that I basically have to set u to be the angle of each point (divided by 2pi, to normalize within 0 and 1). I tried and points look like evenly spaced, but for some reason I get some outliers
uu = t / (2 * np.pi) # u1# 2
tck, _ = splprep(xy.T, u=uu, s=0, per=True)
xi, yi = splev(np.linspace(0, 1, 200), tck)
plt.subplots(figsize=(10, 10))
plt.plot(xi, yi)#, '.')
Problem is, I can't understand where these come from. I suspect it depends on how the spline is calculated, but can't figure out how to solve this issue. The only solution I can use right now is to use smoothing, but it's a very trial-and-error method that I'd rather not adopt.
Forcing u=t makes life too hard for the interpolator, because some of the t values are very close to each other while the corresponding points are not so close due to the varying r. This results in large deviations of the interpolating curve from the data, i.e., outliers on your second plot.
Instead, compute the spline with the default u, and then reparametrize proportional to polar angle. To this end, I first evaluate the spline at equally spaced values in the parameter domain (as in your first attempt), find the polar angle of each resulting point with unwrap(arctan2), and then find the inverse of the u->angle function with linear interpolation. This inverse function, inserted in the spline, results in uniformly spaced points according to their polar angle.
xx, yy = splev(np.linspace(0, 1, 200), tck)
s = np.unwrap(np.arctan2(yy, xx))
s_inv = np.interp(np.linspace(s[0], s[-1], len(s)), s, np.linspace(0, 1, len(s)))
xi, yi = splev(s_inv, tck)

Higher order local interpolation of implicit curves in Python

Given a set of points describing some trajectory in the 2D plane, I would like to provide a smooth representation of this trajectory with local high order interpolation.
For instance, say we define a circle in 2D with 11 points in the figure below. I would like to add points in between each consecutive pair of points in order or produce a smooth trace. Adding points on every segment is easy enough, but it produces slope discontinuities typical for a "local linear interpolation". Of course it is not an interpolation in the classical sense, because
the function can have multiple y values for a given x
simply adding more points on the trajectory would be fine (no continuous representation is needed).
so I'm not sure what would be the proper vocabulary for this.
The code to produce this figure can be found below. The linear interpolation is performed with the lin_refine_implicit function. I'm looking for a higher order solution to produce a smooth trace and I was wondering if there is a way of achieving it with classical functions in Scipy? I have tried to use various 1D interpolations from scipy.interpolate without much success (again because of multiple y values for a given x).
The end goals is to use this method to provide a smooth GPS trajectory from discrete measurements, so I would think this should have a classical solution somewhere.
import numpy as np
import matplotlib.pyplot as plt
def lin_refine_implicit(x, n):
Given a 2D ndarray (npt, m) of npt coordinates in m dimension, insert 2**(n-1) additional points on each trajectory segment
Returns an (npt*2**(n-1), m) ndarray
if n > 1:
m = 0.5*(x[:-1] + x[1:])
if x.ndim == 2:
msize = (x.shape[0] + m.shape[0], x.shape[1])
raise NotImplementedError
x_new = np.empty(msize, dtype=x.dtype)
x_new[0::2] = x
x_new[1::2] = m
return lin_refine_implicit(x_new, n-1)
elif n == 1:
return x
raise ValueError
n = 11
r = np.arange(0, 2*np.pi, 2*np.pi/n)
x = 0.9*np.cos(r)
y = 0.9*np.sin(r)
xy = np.vstack((x, y)).T
xy_highres_lin = lin_refine_implicit(xy, n=3)
plt.plot(xy[:,0], xy[:,1], 'ob', ms=15.0, label='original data')
plt.plot(xy_highres_lin[:,0], xy_highres_lin[:,1], 'dr', ms=10.0, label='linear local interpolation')
plt.plot(x, y, '--k')
plt.title('GPS trajectory')
This is called parametric interpolation.
scipy.interpolate.splprep provides spline approximations for such curves. This assumes you know the order in which the points are on the curve.
If you don't know which point comes after which on the curve, the problem becomes more difficult. I think in this case, the problem is called manifold learning, and some of the algorithms in scikit-learn may be helpful in that.
I would suggest you try to transform your cartesian coordinates into polar coordinates, that should allow you to use the standard scipy.interpolation without issues as you won't have the ambiguity of the x->y mapping anymore.

Gradient calculation with python

I would like to know how does numpy.gradient work.
I used gradient to try to calculate group velocity (group velocity of a wave packet is the derivative of frequencies respect to wavenumbers, not a group of velocities). I fed a 3 column array to it, the first 2 colums are x and y coords, the third column is the frequency of that point (x,y). I need to calculate gradient and I did expect a 2d vector, being gradient definition
and my function only a function of x and y i did expect something like
But i got 2 arrays with 3 colums each, i.e. 2 3d vectors; at first i thought that the sum of the two would give me the vector i were searchin for but the z component doesn't vanish. I hope i've been sufficiently clear in my explanation. I would like to know how numpy.gradient works and if it's the right choice for my problem. Otherwise i would like to know if there's any other python function i can use.
What i mean is: I want to calculate gradient of an array of values:
where x1,x2 are point coordinates on an uniform grid (my points on the brillouin zone) and x3 is the value of frequency for that point. I give in input also steps for derivation for the 2 directions:
the same for y direction.
I didn't build my data on a grid, i already have a grid and this is why kind examples given here in answers do not help me.
A more fitting example should have a grid of points and values like the one i have:
for i in range(10):
for j in range(10):
another thing i can add is that my grid is not a square one but has the shape of a polygon being the brillouin zone of a 2d crystal.
I've understood that numpy.gradient works properly only on a square grid of values, not what i'm searchin for. Even if i make my data as a grid that would have lots of zeroes outside of the polygon of my original data, that would add really high vectors to my gradient affecting (negatively) the precision of calculation. This module seems to me more a toy than a tool, it has severe limitations imho.
Problem solved using dictionaries.
You need to give gradient a matrix that describes your angular frequency values for your (x,y) points. e.g.
def f(x,y):
return np.sin((x + y))
x = y = np.arange(-5, 5, 0.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
gx,gy = np.gradient(Z,0.05,0.05)
You can see that plotting Z as a surface gives:
Here is how to interpret your gradient:
gx is a matrix that gives the change dz/dx at all points. e.g. gx[0][0] is dz/dx at (x0,y0). Visualizing gx helps in understanding:
Since my data was generated from f(x,y) = sin(x+y) gy looks the same.
Here is a more obvious example using f(x,y) = sin(x)...
and the gradients
update Let's take a look at the xy pairs.
This is the code I used:
def f(x,y):
return np.sin(x)
x = y = np.arange(-3,3,.05)
X, Y = np.meshgrid(x, y)
zs = np.array([f(x,y) for x,y in zip(np.ravel(X), np.ravel(Y))])
xy_pairs = np.array([str(x)+','+str(y) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)
xy_pairs = xy_pairs.reshape(X.shape)
gy,gx = np.gradient(Z,.05,.05)
Now we can look and see exactly what is happening. Say we wanted to know what point was associated with the value atZ[20][30]? Then...
>>> Z[20][30]
And the point is
>>> xy_pairs[20][30]
Is that right? Let's check.
>>> np.sin(-1.5)
And what are our gradient components at that point?
>>> gy[20][30]
>>> gx[20][30]
Do those check out?
dz/dy always 0 check.
dz/dx = cos(x) and...
>>> np.cos(-1.5)
Looks good.
You'll notice they aren't exactly correct, that is because my Z data isn't continuous, there is a step size of 0.05 and gradient can only approximate the rate of change.

2d interpolation in python with random spot

I checked the available interpolation method in scipy, but could not get the proper solution for my case.
assume i have 100 points whose coordinates are random,
e.g., their x and y positions are:
z = f(x,y) #the point value calculated by certain function
now i want to get the point value z of a new evenly sampled coordinates (xnew and y new)
xnew = range(100)
ynew = range(100)
how should i do this using bilinear sampling?
i know it is possible to do it point by point, e.g., find the 4 nearest random points, and do the interpolation, but there got to be some easier existing functions to do this
thanks alot!
Use scipy.interpolate.griddata. It does the exact thing you need
# griddata expects an ndarray for the interpolant coordinates
interpolants = numpy.array([xnew, ynew])
# defaults to linear interpolation
znew = scipy.interpolate.griddata((x, y), z, interpolants)

