Local linear approximation in numpy - python

I have some x and y data, where for every entry in the x vector there's a corresponding entry in the y vector. Furthermore, the x data are not evenly spaced.
I'd like to interpolate between the x samples to obtain an even spacing in the x dimension, and to approximate the corresponding y value. In numpy, interp1d seems like a natural solution, but my problem has a caveat: the x values are not monotonically increasing (because both x and y are a function of time). The interp1d function, and the other functions from the interpolate module, thus give weird results at those points where x reverses direction.
What I'd really like to do is simply fit a straight line between every set of two adjacent x points and then interpolate based on this very local approximation. Is there a function to do this in numpy or do I have to rig something up myself?

Could you sort your xy pairs and then use interp1d? Something like this?
import sort
xy = zip(x,y)
new_xy = sorted(xy, key=lambda xy: xy[0])
x = new_xy[:,0]
y = new_xy[:,1]
Now your x's are monotonically increasing and the relationships have been preserved.

Related

Interpolate rows simultaneously in Python

I am trying to vectorize my code and have reached a roadblock. I have :
nxd array of x values [[x1],[...],[xn]] (where each row [x1] has many points [x11, ..., x1d]
nxd array of y values [[y1],[y2],[y3]] (where each row [y1] has many points [y11, ..., y1d]
nx1 array of x' values [[x'1],[...],[x'n]] that I would like to interpolate a y value for based on the corresponding row of x and y
The only thing I can think to use is a list comprehension like [np.interp(x'[i,:], x[i,:], y[i,:]) for i in range(n)]. I'd like a faster vectorized option if one exists. Thanks for the help!
This is hardly an answer, but I guess it may still be useful for someone (if not, feel free to delete this); and by the way,
I think I misunderstood your question at first. What you have is a collection of n different one-dimensional datasets or functions y(x) that you want to interpolate (correct me otherwise).
As such, it turns out doing this by multidimensional interpolation is a terrible approach.
The idea I thought is to add a new dimension to the data so your datasets are mapped into one single dataset in which this new dimension is what distinguishes between the different xi, where i=1,2,...,n. In other words, you assign a value in this new dimension, let's say, z, to every row of x; this way, different functions are correctly mapped to this higher-dimensional space.
However, this approach is slower than the np.interp list comprehension solution, at least one order of magnitude in my computer. I guess it has to do with two-dimensional interpolation algorithms being at best of order O(nlog(n)) (this is a guess); in this sense, it would seem more efficient to perform multiple interpolations to different datasets rather than one big interpolation.
Anyways, the approach is shown in the following snippet:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
def vectorized_interpolation(x, y, xq):
"""
Vectorized option using LinearNDInterpolator
"""
# Dummy new data points in added dimension
z = np.arange(x.shape[0])
# We must repeat every z value for every row of x
interpolant = LinearNDInterpolator(list(zip(x.ravel(), np.repeat(z, x.shape[1]))), y.ravel())
return interpolant(xq, z)
def non_vectorized_interpolation(x, y, xq):
"""
Your non-vectorized solution
"""
return np.array([np.interp(xq[i], x[i], y[i]) for i in range(x.shape[0])])
if __name__ == "__main__":
n, d = 100, 500
x = np.linspace(0, 2*np.pi, n*d).reshape((n, d))
y = np.sin(x)
xq = np.linspace(0, 2*np.pi, n)
yq1 = vectorized_interpolation(x, y, xq)
yq2 = non_vectorized_interpolation(x, y, xq)
The only advantage of the vectorized solution is that LinearNDInterpolator (and some of the other scipy.interpolate functions) explicitly calculates the interpolant, so you can reuse it if you plan on interpolating the same datasets several times and avoid repetitive calculations. Another thing you could try is using multiprocessing if you have several cores in your machine, but this is not vectorizing which is what you asked for. Sorry I can't be of more help.

How to implement linear interpolation in python?

I want to implement a function interpolate(x, y, X_new) that computes the linear interpolation of the unknown function f at a new point x_new. The sample is given in the form of two sequences x and y. Both sequences have the same length, and their elements are numbers. The x sequence contains the points where the function has been sampled, and the y sequence contains the function value at the corresponding point. (without using import statement).
As I understand your question, you want to write some function y = interpolate(x_values, y_values, x), which will give you the y value at some x? The basic idea then follows these steps:
Find the indices of the values in x_values which define an interval containing x. For instance, for x=3 with your example lists, the containing interval would be [x1,x2]=[2.5,3.4], and the indices would be i1=1, i2=2
Calculate the slope on this interval by (y_values[i2]-y_values[i1])/(x_values[i2]-x_values[i1]) (ie dy/dx).
The value at x is now the value at x1 plus the slope multiplied by the distance from x1.
You will additionally need to decide what happens if x is outside the interval of x_values, either it's an error, or you could interpolate "backwards", assuming the slope is the same as the first/last interval.
Did this help, or did you need more specific advice?

Why is the output of linspace and interp1d always the same?

So I was doing my assignment and we are required to use interpolation (linear interpolation) for the same. We have been asked to use the interp1d package from scipy.interpolate and use it to generate new y values given new x values and old coordinates (x1,y1) and (x2,y2).
To get new x coordinates (lets call this x_new) I used np.linspace between (x1,x2) and the new y coordinates (lets call this y_new) I found out using interp1d function on x_new.
However, I also noticed that applying np.linspace on (y1,y2) generates the exact same values of y_new which we got from interp1d on x_new.
Can anyone please explain to me why this is so? And if this is true, is it always true?
And if this is always true why do we at all need to use the interp1d function when we can use the np.linspace in it's place?
Here is the code I wrote:
import scipy.interpolate as ip
import numpy as np
x = [-1.5, 2.23]
y = [0.1, -11]
x_new = np.linspace(start=x[0], stop=x[-1], num=10)
print(x_new)
y_new = np.linspace(start=y[0], stop=y[-1], num=10)
print(y_new)
f = ip.interp1d(x, y)
y_new2 = f(x_new)
print(y_new2) # y_new2 values always the same as y_new
The reason why you stumbled upon this is that you only use two points for an interpolation of a linear function. You have as an input two different x values with corresponding y values. You then ask interp1d to find a linear function f(x)=m*x +b that fits best your input data. As you only have two points as input data, there is an exact solution, because a linear function is exactly defined by two points. To see this: take piece of paper, draw two dots an then think about how many straight lines you can draw to connect these dots.
The linear function that you get from two input points is defined by the parameters m=(y1-y2)/(x1-x2) and b=y1-m*x1, where (x1,y1),(x2,y2) are your two inputs points (or elements in your x and y arrays in your code snippet.
So, now what does np.linspace(start, stop, num,...) do? It gives you num evenly spaced points between start and stop. These points are start, start + delta, ..., end. The step width delta is given by delta=(end-start)/(num - 1). The -1 comes from the fact that you want to include your endpoint. So the nth point in your interval will lie at xn=x1+n*(x2-x1)/(num-1). At what y values will these points end up after we apply our linear function from interp1d? Lets plug it in:
f(xn)=m*xn+b=(y1-y2)/(x1-x2)*(x1+n/(num-1)*(x2-x1)) + y1-(y1-y1)/(x1-x2)*x1. Simplifying this results in f(xn)=(y2-y1)*n/(num - 1) + y1. And this is exactly what you get from np.linspace(y1,y2,num), i.e. f(xn)=yn!
Now, does this always work? No! We made use of the fact that our linear function is defined by the two endpoints of the intervals we use in np.linspace. So this will not work in general. Try to add one more x value and one more y value in your input list and then compare the results.

What are other methods to find curvature of a polynomial apart from np.polyfit()

I am trying to find the curvature of a polynomial. X and Y are python lists of X and Y coordinates respectively. I use scipy.interpolate because I am able to see better curves in my image. But once I find the coefficients of a 2D polynomial and re-plot them back into the image, the replotted curve looks way too off.
How do I find accurate coefficients of a polynomial curve
interpolate = interpolate.interp1d(X, Y)
z = np.polyfit(X, interpolate(X),2) #coefficients
poly_y = [z[0]*x*x + z[1]*x + z[2] for x in X] #Recompute Y coordinates
plt.plot(X, poly_y)
Use np.polyval instead of list comprehension to calculate the polynomial values at the given coordinates. It's usually faster, and less error-prone than typing out the terms by hand. The result will be ndarray instead of Python list.
poly_y = np.polyval(z, X)
If the curve looks way too off, it could be that the given degree (2) is insufficient to accurately replicate your spline-interpolated points. You should as also plot the spline interpolation interpolate(X) to see if something didn't go wrong in interp1d.

Interpolation without specifying indices in Python

I have two arrays of the same length, say array x and array y. I want to find the value of y corresponding to x=0.56. This is not a value present in array x.
I would like python to find by itself the closest value larger than 0.56 (and its corresponding y value) and the closest value smaller than 0.56 (and its corresponding y value). Then simply interpolate to find the value of y when x 0.56.
This is easily done when I find the indices of the two x values and corresponding y values by myself and input them into Python (see following bit of code).
But is there any way for python to find the indices by itself?
#interpolation:
def effective_height(h1,h2,g1,g2):
return (h1 + (((0.56-g1)/(g2-g1))*(h2-h1)))
eff_alt1 = effective_height(x[12],x[13],y[12],y[13])
In this bit of code, I had to find the indices [12] and [13] corresponding to the closest smaller value to 0.56 and the closest larger value to 0.56.
Now I am looking for a similar technique where I would just tell python to interpolate between the two values of x for x=0.56 and print the corresponding value of y when x=0.56.
I have looked at scipy's interpolate but don't think it would help in this case, although further clarification on how I can use it in my case would be helpful too.
Does Numpy interp do what you want?:
import numpy as np
x = [0,1,2]
y = [2,3,4]
np.interp(0.56, x, y)
Out[81]: 2.56
Given your two arrays, x and y, you can do something like the following using SciPy.
from scipy.interpolate import InterpolatedUnivariateSpline
spline = InterpolatedUnivariateSpline(x, y, k=5)
spline(0.56)
The keyword k must be between 1 and 5, and controls the degree of the spline.
Example:
>>> x = range(10)
>>> y = range(0, 100, 10)
>>> spline = InterpolatedUnivariateSpline(x, y, k=5)
>>> spline(0.56)
array(5.6000000000000017)

Categories

Resources