Matlab fit NonlinearLeastSquares for Python - python

I rewrite code written in Matlab for Python and I can´t resolve correctly the fit function in Python.
Code in Matlab:
y = *Array 361x1*;
x = *Array 361x1*;
y1 = *Single value 1x1*;
x1 = *Single value 1x1*;
fo = fitoptions('Method','NonlinearLeastSquares','Lower',[-Inf,-Inf],'Upper',[Inf, Inf],'Startpoint',[0.0,0.0]);
ft = fittype( #(A,B, x) A.*x.^2 + B.*x + y1 - A.*x1.^2 - B.*x1, 'independent','x','dependent','y','options',fo);
[fitobject,gof,out] = fit(x,y,ft);
A = fitobject.A;
B = fitobject.B;
I tried the solution in Python through Scipy Least Squares and based on this article. I wrote the following code:
ft = least_squares(lambda coeffs: coeffs[0]*x**2 + coeffs[1]*x + y1 - coeffs[0]*x1**2 - coeffs[1]*x1, [0, 0], bounds=([-np.inf, -np.inf], [np.inf, np.inf]))
print(ft('x'))
Obviously it is not correct (array y is not considered in Python code) and I get different values for coefficients A and B. I´ve already tried difrferent functions like curve%fit, etc. But with no result...

You could use scipy.optimize's curve_fit. Assuming x and y are numpy.ndarrays containing the data:
from scipy.optimize import curve_fit
def fitme(x, *coeffs_and_args):
A, B, x1, y1 = coeffs_and_args
return A*x**2 + B*x + y1 - A*x1**2 - B*x1
popt, pcov = curve_fit(lambda x, A, B: fitme(x, A, B, x1, y1), x, y)
Then the array popt contains the optimal values for the parameters and pcov the estimated covariance of popt.

Related

Piecewise Fitting for multivariate problem

I am trying to fit 3-dimensional data (that is, 2 independent and 1 dependent variable) using multivariate fitting in scipy curve_fit. I wish to do piecewise fitting for the same problem. I have tried to proceed on the basis of this without any success. The problem is defined below:
import numpy as np
from scipy.optimize import curve_fit
#..........................................................................................................
def F0(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
if y[i] < y0:
lnZ = x[i] + c0*y[i]
else:
lnZ = x[i] + c*y[i]
val = a + (b*lnZ)
value.append(val)
return value
#..........................................................................................................
def F1(X, a, b, c):
x, y = X
lnZ = x + c*y
value = a + (b*lnZ)
return value
#..........................................................................................................
x = [-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
-2.302585093,
0,
0,
0,
0,
0,
0,
0,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093,
2.302585093
]
y = [7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04,
7.55E-04,
7.85E-04,
8.17E-04,
8.52E-04,
8.90E-04,
9.32E-04,
9.77E-04
]
z = [4.077424497,
4.358253892,
4.610475878,
4.881769469,
5.153063061,
5.323277142,
5.462023074,
4.610475878,
4.840765517,
5.04864602,
5.235070966,
5.351407761,
5.440090728,
5.540693448,
4.960439843,
5.118257381,
5.266539115,
5.370479367,
5.440090728,
5.528296904,
5.5816974,
]
popt, pcov = curve_fit(F0, (x, y), z, method = 'lm')
print(popt)
popt, pcov = curve_fit(F1, (x, y), z, method = 'lm')
print(popt)
The output is:
[1.34957781e+00 1.05456428e-01 1.00000000e+00 4.14879613e+04
1.00000000e+00]
[1.34957771e+00 1.05456434e-01 4.14879603e+04]
You can see that the values of parameters in the piecewise fitting remain as the initial values. I know I am not doing it in the correct way. Please correct me.
The main source of the problem is the insensitivity of this approach to the value of the variable that defines the switch from one function to another (see this response for a similar explanation). Moreover, the choice of starting parameters isn't good.
Since no starting values are provided, curve_fit chooses a value of 1 for all the fitting parameters (see here the default value for p0). Since the fitting algorithm works by making small variations on the parameters, y0 is varied in small steps around 1, which produces no changes in the output of the function (all y values are much smaller than 1). Since y[i] < y0 is always True and only the first branch is ever evaluated, and the output of the function does not depend on the value of c. That explains why y0 and c stay at the initial values.
One might expect that setting y0 initial value to be inside of the range of values that are evaluated (i.e. around 8E-4) might solve the problem. Indeed, since the second branch is evaluated, the value of c is now optimized. Nevertheless, y0 value will stay unchanged. As the fitting algorithm works testing very small changes to the values, the changes are not large enough to move from the interval between two experimental y values to another one. In this particular case, if one chooses 8E-4, the small variations will never be enough to make it go over 8.17E-04 or below 7.85E-4, that are the values encompassing initial y0 choice.
One can usually circumvent this problem making the function depend explicitly on the value of y0. A smart choice would be to redefine the function so the value at y0 is the same no matter which branch is taken (i.e. ensure that the function is continuous). In this case, the function definition does not ensure so. A reasonable change would be:
def F2(X, a, b, c, c0, y0):
x, y = X
value = []
for i in range(0, len(x)):
lnZ = x[i] + c0 * y[i]
if y[i] >= y0:
lnZ += c * (y[i]-y0)
val = a + (b*lnZ)
value.append(val)
return value
which changes the meaning of the parameter c, and limits the results to only continuous functions. In this case, the value of y0 is indeed the function turning point. Nevertheless, it yields the desired results:
popt2, pcov = curve_fit(F2, (x, y), z, p0=(1, 1, 1E4, 1E4, 9.1E-4), method = 'lm')
print(popt2)
results in:
[-1.93417968e-01 1.05456433e-01 -3.65740192e+04 5.97890809e+04
8.64354057e-04]
A better (pythonic) definition for the function avoids the for loop:
def F3(X, a, b, c, c0, y0):
x, y = X
lnZ = x + c0 * y
idx = np.where(y>=y0)
lnZ[idx] += c * (y[idx] - y0)
rv = a + (b * lnZ)
return rv
which will probably be much faster for larger datasets.

How to evaluate this line integral (Python-Sympy)

The idea is to compute the line integral of the following vector field and curve:
This is the code I have tried:
import numpy as np
from sympy import *
from sympy import Curve, line_integrate
from sympy.abc import x, y, t
C = Curve([cos(t) + 1, sin(t) + 1, 1 - cos(t) - sin(t)], (t, 0, 2*np.pi))
line_integrate(y * exp(x) + x**2 + exp(x) + z**2 * exp(z), C, [x, y, z])
But the ValueError: Function argument should be (x(t), y(t)) but got [cos(t) + 1, sin(t) + 1, -sin(t) - cos(t) + 1] comes up.
How can I compute this line integral then?
I think that maybe this line integral contains integrals that don't have exact solution. It is also fine if you provide a numerical approximation method.
Thanks
In this case you can compute the integral using line_integrate because we can reduce the 3d integral to a 2d one. I'm sorry to say I don't know python well enough to write the code, but here's the drill:
If we write
C(t) = x(t),y(t),z(t)
then the thing to notice is that
z(t) = 3 - x(t) - y(t)
and so
dz = -dx - dy
So, we can write
F.dr = Fx*dx + Fy*dy + Fz*dz
= (Fx-Fz)*dx + (Fy-Fz)*dy
So we have reduced the problem to a 2d problem: we integrate
G = (Fx-Fz)*i + (Fx-Fz)*j
round
t -> x(t), y(t)
Note that in G we need to get rid of z by substituting
z = 3 - x - y
The value error you receive does not come from your call to the line_integrate function; it comes because according to the source code for the Curve class, only functions in 2D Euclidean space are supported. This integral can still be computed without using sympy according to this research blog that I found by simply searching for a workable method on Google.
The code you need looks like this:
import autograd.numpy as np
from autograd import jacobian
from scipy.integrate import quad
def F(X):
x, y, z = X
return [y * np.exp(x), x**2 + np.exp(x), z**2 * np.exp(z)]
def C(t):
return np.array([np.cos(t) + 1, np.sin(t) + 1, 1 - np.cos(t) - np.sin(t)])
dCdt = jacobian(C, 0)
def integrand(t):
return F(C(t)) # dCdt(t)
I, e = quad(integrand, 0, 2 * np.pi)
The variable I then stores the numerical solution to your question.
You can define a function:
import sympy as sp
from sympy import *
def linea3(f,C):
P = f[0].subs([(x,C[0]),(y,C[1]),(z,C[2])])
Q = f[1].subs([(x,C[0]),(y,C[1]),(z,C[2])])
R = f[2].subs([(x,C[0]),(y,C[1]),(z,C[2])])
dx = diff(C[0],t)
dy = diff(C[1],t)
dz = diff(C[2],t)
m = integrate(P*dx+Q*dy+R*dz,(t,C[3],C[4]))
return m
Then use the example:
f = [x**2*z**2,y**2*z**2,x*y*z]
C = [2*cos(t),2*sin(t),4,0,2*sp.pi]

Scipy Least Squares 2 squares desired // Error: Result from function call is not a proper array of floats

Current I'm attempting to use scipy's least squares, or any of their minimization functions to minimize a function with 5 parameters.
What I would like scipy to do is minimize some function using a standard least squares.
My code is below:
fitfunc1 = lambda p, xx, yy, zz: -(50000*(xx + (p[0] + p[1])*yy +
p[3]))/(1.67*(-p[2]*yy + zz + p[4]))
errfunc1 = lambda p,x11, xx, yy, zz: fitfunc1(p, xx, yy, zz) - x11
x0 = np.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1], dtype = float)
res3 = leastsq(errfunc1, x0[:], args=(x1, x, y, z))
where x1, x, y, z are all column numpy arrays of the same length about 90x1
I'm currently getting an error that says ' Error: Result from function call is not a proper array of floats', I've attempted many possibilities, and tried to rewrite this the way it is described in examples, but doesn't seem to work.
In addition: I actually would like to solve the problem:
min sum (f - x1)**2 + (g - x2)**2
where f = f(p, x, y, z) and g = g(p, x, y, z) and x, y, z, x1, y1 are all data, but attempting to find the parameters, p (6 of them).
Is this currently possible in least squares? I have attempted using scipy.minimize, but when this is done using the Nedler's Mead method, it doesn't seem to work either.
Here is my current code:
def f(phi, psi, theta, xnot, ynot, znot):
return sum(abs( (-50000*(x[:]+ (psi + phi)*y[:] + xnot)/(1.67*(-
theta*y[:] + z[:] + znot))) - x1[:]) //
+ abs( (-50000*(-x[:]*(psi + phi) + y[:] + theta*(z[:]) + ynot)/(1.67*(-
theta*y[:] + z[:] + znot))) - y1[:]))
x0 = np.array([0.1, 0.1, 0.1, 0.1, 0.1, 0.1], dtype = float)
res3 = leastsq(f, x0[:], args=(x1, y1, x, y, z))
I feel as if I am making some mistake that may be obvious to someone more familiar, but this is my first time using scipy. All help would be much appreciated.
I believe your problem is with the shape of the variables:
where x1, x, y, z are all column numpy arrays of the same length about 90x1
It causes your fitfunc1 and errfunc1 functions to return 2d arrays (of shape (90,1)), where the scipy optimization function expects a 1d array.
Try reshaping your arrays, e.g.,
x1 = x1.reshape((90,)),
and similarly for the rest of your input variables.
This should fix your problem.

Fit a curve for data made up of two distinct regimes

I'm looking for a way to plot a curve through some experimental data. The data shows a small linear regime with a shallow gradient, followed by a steep linear regime after a threshold value.
My data is here: http://pastebin.com/H4NSbxqr
I could fit the data with two lines relatively easily, but I'd like to fit with a continuous line ideally - which should look like two lines with a smooth curve joining them around the threshold (~5000 in the data, shown above).
I attempted this using scipy.optimize curve_fit and trying a function which included the sum of a straight line and an exponential:
y = a*x + b + c*np.exp((x-d)/e)
although despite numerous attempts, it didn't find a solution.
If anyone has any suggestions please, either on the choice of fitting distribution / method or the curve_fit implementation, they would be greatly appreciated.
If you don't have a particular reason to believe that linear + exponential is the true underlying cause of your data, then I think a fit to two lines makes the most sense. You can do this by making your fitting function the maximum of two lines, for example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def two_lines(x, a, b, c, d):
one = a*x + b
two = c*x + d
return np.maximum(one, two)
Then,
x, y = np.genfromtxt('tmp.txt', unpack=True, delimiter=',')
pw0 = (.02, 30, .2, -2000) # a guess for slope, intercept, slope, intercept
pw, cov = curve_fit(two_lines, x, y, pw0)
crossover = (pw[3] - pw[1]) / (pw[0] - pw[2])
plt.plot(x, y, 'o', x, two_lines(x, *pw), '-')
If you really want a continuous and differentiable solution, it occurred to me that a hyperbola has a sharp bend to it, but it has to be rotated. It was a bit difficult to implement (maybe there's an easier way), but here's a go:
def hyperbola(x, a, b, c, d, e):
""" hyperbola(x) with parameters
a/b = asymptotic slope
c = curvature at vertex
d = offset to vertex
e = vertical offset
"""
return a*np.sqrt((b*c)**2 + (x-d)**2)/b + e
def rot_hyperbola(x, a, b, c, d, e, th):
pars = a, b, c, 0, 0 # do the shifting after rotation
xd = x - d
hsin = hyperbola(xd, *pars)*np.sin(th)
xcos = xd*np.cos(th)
return e + hyperbola(xcos - hsin, *pars)*np.cos(th) + xcos - hsin
Run it as
h0 = 1.1, 1, 0, 5000, 100, .5
h, hcov = curve_fit(rot_hyperbola, x, y, h0)
plt.plot(x, y, 'o', x, two_lines(x, *pw), '-', x, rot_hyperbola(x, *h), '-')
plt.legend(['data', 'piecewise linear', 'rotated hyperbola'], loc='upper left')
plt.show()
I was also able to get the line + exponential to converge, but it looks terrible. This is because it's not a good descriptor of your data, which is linear and an exponential is very far from linear!
def line_exp(x, a, b, c, d, e):
return a*x + b + c*np.exp((x-d)/e)
e0 = .1, 20., .01, 1000., 2000.
e, ecov = curve_fit(line_exp, x, y, e0)
If you want to keep it simple, there's always a polynomial or spline (piecewise polynomials)
from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, s=x.size) #larger s-value has fewer "knots"
plt.plot(x, s(x))
I researched this a little, Applied Linear Regression by Sanford, and the Correlation and Regression lecture by Steiger had some good info on it. They all however lack the right model, the piecewise function should be
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import lmfit
dfseg = pd.read_csv('segreg.csv')
def err(w):
th0 = w['th0'].value
th1 = w['th1'].value
th2 = w['th2'].value
gamma = w['gamma'].value
fit = th0 + th1*dfseg.Temp + th2*np.maximum(0,dfseg.Temp-gamma)
return fit-dfseg.C
p = lmfit.Parameters()
p.add_many(('th0', 0.), ('th1', 0.0),('th2', 0.0),('gamma', 40.))
mi = lmfit.minimize(err, p)
lmfit.printfuncs.report_fit(mi.params)
b0 = mi.params['th0']; b1=mi.params['th1'];b2=mi.params['th2']
gamma = int(mi.params['gamma'].value)
import statsmodels.formula.api as smf
reslin = smf.ols('C ~ 1 + Temp + I((Temp-%d)*(Temp>%d))' % (gamma,gamma), data=dfseg).fit()
print reslin.summary()
x0 = np.array(range(0,gamma,1))
x1 = np.array(range(0,80-gamma,1))
y0 = b0 + b1*x0
y1 = (b0 + b1 * float(gamma) + (b1 + b2)* x1)
plt.scatter(dfseg.Temp, dfseg.C)
plt.hold(True)
plt.plot(x0,y0)
plt.plot(x1+gamma,y1)
plt.show()
Result
[[Variables]]
th0: 78.6554456 +/- 3.966238 (5.04%) (init= 0)
th1: -0.15728297 +/- 0.148250 (94.26%) (init= 0)
th2: 0.72471237 +/- 0.179052 (24.71%) (init= 0)
gamma: 38.3110177 +/- 4.845767 (12.65%) (init= 40)
The data
"","Temp","C"
"1",8.5536,86.2143
"2",10.6613,72.3871
"3",12.4516,74.0968
"4",16.9032,68.2258
"5",20.5161,72.3548
"6",21.1613,76.4839
"7",24.3929,83.6429
"8",26.4839,74.1935
"9",26.5645,71.2581
"10",27.9828,78.2069
"11",32.6833,79.0667
"12",33.0806,71.0968
"13",33.7097,76.6452
"14",34.2903,74.4516
"15",36,56.9677
"16",37.4167,79.8333
"17",43.9516,79.7097
"18",45.2667,76.9667
"19",47,76
"20",47.1129,78.0323
"21",47.3833,79.8333
"22",48.0968,73.9032
"23",49.05,78.1667
"24",57.5,81.7097
"25",59.2,80.3
"26",61.3226,75
"27",61.9194,87.0323
"28",62.3833,89.8
"29",64.3667,96.4
"30",65.371,88.9677
"31",68.35,91.3333
"32",70.7581,91.8387
"33",71.129,90.9355
"34",72.2419,93.4516
"35",72.85,97.8333
"36",73.9194,92.4839
"37",74.4167,96.1333
"38",76.3871,89.8387
"39",78.0484,89.4516
Graph
I used #user423805 's answer (found via google groups thread: https://groups.google.com/forum/#!topic/lmfit-py/7I2zv2WwFLU ) but noticed it had some limitations when trying to use three or more segments.
Instead of applying np.maximum in the minimizer error function or adding (b1 + b2) in #user423805 's answer, I used the same linear spline calculation for both the minimizer and end-usage:
# least_splines_calc works like this for an example with three segments
# (four threshold params, three gamma params):
#
# for 0 < x < gamma0 : y = th0 + (th1 * x)
# for gamma0 < x < gamma1 : y = th0 + (th1 * x) + (th2 * (x - gamma0))
# for gamma1 < x : y = th0 + (th1 * x) + (th2 * (x - gamma0)) + (th3 * (x - gamma1))
#
def least_splines_calc(x, thresholds, gammas):
if(len(thresholds) < 2):
print("Error: expected at least two thresholds")
return None
applicable_gammas = filter(lambda gamma: x > gamma , gammas)
#base result
y = thresholds[0] + (thresholds[1] * x)
#additional factors calculated depending on x value
for i in range(0, len(applicable_gammas)):
y = y + ( thresholds[i + 2] * ( x - applicable_gammas[i] ) )
return y
def least_splines_calc_array(x_array, thresholds, gammas):
y_array = map(lambda x: least_splines_calc(x, thresholds, gammas), x_array)
return y_array
def err(params, x, data):
th0 = params['th0'].value
th1 = params['th1'].value
th2 = params['th2'].value
th3 = params['th3'].value
gamma1 = params['gamma1'].value
gamma2 = params['gamma2'].value
thresholds = np.array([th0, th1, th2, th3])
gammas = np.array([gamma1, gamma2])
fit = least_splines_calc_array(x, thresholds, gammas)
return np.array(fit)-np.array(data)
p = lmfit.Parameters()
p.add_many(('th0', 0.), ('th1', 0.0),('th2', 0.0),('th3', 0.0),('gamma1', 9.),('gamma2', 9.3)) #NOTE: the 9. / 9.3 were guesses specific to my data, you will need to change these
mi = lmfit.minimize(err_alt, p, args=(np.array(dfseg.Temp), np.array(dfseg.C)))
After minimization, convert the params found by the minimizer into an array of thresholds and gammas to re-use linear_splines_calc to plot the linear splines regression.
Reference: While there's various places that explain least splines (I think #user423805 used http://www.statpower.net/Content/313/Lecture%20Notes/Splines.pdf , which has the (b1 + b2) addition I disagree with in its sample code despite similar equations) , the one that made the most sense to me was this one (by Rob Schapire / Zia Khan at Princeton) : https://www.cs.princeton.edu/courses/archive/spring07/cos424/scribe_notes/0403.pdf - section 2.2 goes into linear splines. Excerpt below:
If you're looking to join what appears to be two straight lines with a hyperbola having a variable radius at/near the intersection of the two lines (which are its asymptotes), I urge you to look hard at Using an Hyperbola as a Transition Model to Fit Two-Regime Straight-Line Data, by Donald G. Watts and David W. Bacon, Technometrics, Vol. 16, No. 3 (Aug., 1974), pp. 369-373.
The formula is drop dead simple, nicely adjustable, and works like a charm. From their paper (in case you can't access it):
As a more useful alternative form we consider an hyperbola for which:
(i) the dependent variable y is a single valued function of the independent variable x,
(ii) the left asymptote has slope theta_1,
(iii) the right asymptote has slope theta_2,
(iv) the asymptotes intersect at the point (x_o, beta_o),
(v) the radius of curvature at x = x_o is proportional to a quantity delta. Such an hyperbola can be written y = beta_o + beta_1*(x - x_o) + beta_2* SQRT[(x - x_o)^2 + delta^2/4], where beta_1 = (theta_1 + theta_2)/2 and beta_2 = (theta_2 - theta_1)/2.
delta is the adjustable parameter that allows you to either closely follow the lines right to the intersection point or smoothly merge from one line to the other.
Just solve for the intersection point (x_o, beta_o), and plug into the formula above.
BTW, in general, if line 1 is y_1 = b_1 + m_1 *x and line 2 is y_2 = b_2 + m_2 * x, then they intersect at x* = (b_2 - b_1) / (m_1 - m_2) and y* = b_1 + m_1 * x*. So, to connect with the formalism above, x_o = x*, beta_o = y* and the two m_*'s are the two thetas.
There is a straightforward method (not iterative, no initial guess) pp.12-13 in https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The data comes from the scanning of the figure published by IanRoberts in his question. Scanning for the coordinates of the pixels in not accurate. So, don't be surprised by additional deviation.
Note that the abscisses and ordinates scales have been devised by 1000.
The equations of the two segments are
The approximate values of the five parameters are written on the above figure.

How to make scipy.interpolate give an extrapolated result beyond the input range?

I'm trying to port a program which uses a hand-rolled interpolator (developed by a mathematician colleage) over to use the interpolators provided by scipy. I'd like to use or wrap the scipy interpolator so that it has as close as possible behavior to the old interpolator.
A key difference between the two functions is that in our original interpolator - if the input value is above or below the input range, our original interpolator will extrapolate the result. If you try this with the scipy interpolator it raises a ValueError. Consider this program as an example:
import numpy as np
from scipy import interpolate
x = np.arange(0,10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y)
print f(9)
print f(11) # Causes ValueError, because it's greater than max(x)
Is there a sensible way to make it so that instead of crashing, the final line will simply do a linear extrapolate, continuing the gradients defined by the first and last two points to infinity.
Note, that in the real software I'm not actually using the exp function - that's here for illustration only!
As of SciPy version 0.17.0, there is a new option for scipy.interpolate.interp1d that allows extrapolation. Simply set fill_value='extrapolate' in the call. Modifying your code in this way gives:
import numpy as np
from scipy import interpolate
x = np.arange(0,10)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y, fill_value='extrapolate')
print f(9)
print f(11)
and the output is:
0.0497870683679
0.010394302658
You can take a look at InterpolatedUnivariateSpline
Here an example using it:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
# given values
xi = np.array([0.2, 0.5, 0.7, 0.9])
yi = np.array([0.3, -0.1, 0.2, 0.1])
# positions to inter/extrapolate
x = np.linspace(0, 1, 50)
# spline order: 1 linear, 2 quadratic, 3 cubic ...
order = 1
# do inter/extrapolation
s = InterpolatedUnivariateSpline(xi, yi, k=order)
y = s(x)
# example showing the interpolation for linear, quadratic and cubic interpolation
plt.figure()
plt.plot(xi, yi)
for order in range(1, 4):
s = InterpolatedUnivariateSpline(xi, yi, k=order)
y = s(x)
plt.plot(x, y)
plt.show()
1. Constant extrapolation
You can use interp function from scipy, it extrapolates left and right values as constant beyond the range:
>>> from scipy import interp, arange, exp
>>> x = arange(0,10)
>>> y = exp(-x/3.0)
>>> interp([9,10], x, y)
array([ 0.04978707, 0.04978707])
2. Linear (or other custom) extrapolation
You can write a wrapper around an interpolation function which takes care of linear extrapolation. For example:
from scipy.interpolate import interp1d
from scipy import arange, array, exp
def extrap1d(interpolator):
xs = interpolator.x
ys = interpolator.y
def pointwise(x):
if x < xs[0]:
return ys[0]+(x-xs[0])*(ys[1]-ys[0])/(xs[1]-xs[0])
elif x > xs[-1]:
return ys[-1]+(x-xs[-1])*(ys[-1]-ys[-2])/(xs[-1]-xs[-2])
else:
return interpolator(x)
def ufunclike(xs):
return array(list(map(pointwise, array(xs))))
return ufunclike
extrap1d takes an interpolation function and returns a function which can also extrapolate. And you can use it like this:
x = arange(0,10)
y = exp(-x/3.0)
f_i = interp1d(x, y)
f_x = extrap1d(f_i)
print f_x([9,10])
Output:
[ 0.04978707 0.03009069]
What about scipy.interpolate.splrep (with degree 1 and no smoothing):
>> tck = scipy.interpolate.splrep([1, 2, 3, 4, 5], [1, 4, 9, 16, 25], k=1, s=0)
>> scipy.interpolate.splev(6, tck)
34.0
It seems to do what you want, since 34 = 25 + (25 - 16).
Here's an alternative method that uses only the numpy package. It takes advantage of numpy's array functions, so may be faster when interpolating/extrapolating large arrays:
import numpy as np
def extrap(x, xp, yp):
"""np.interp function with linear extrapolation"""
y = np.interp(x, xp, yp)
y = np.where(x<xp[0], yp[0]+(x-xp[0])*(yp[0]-yp[1])/(xp[0]-xp[1]), y)
y = np.where(x>xp[-1], yp[-1]+(x-xp[-1])*(yp[-1]-yp[-2])/(xp[-1]-xp[-2]), y)
return y
x = np.arange(0,10)
y = np.exp(-x/3.0)
xtest = np.array((8.5,9.5))
print np.exp(-xtest/3.0)
print np.interp(xtest, x, y)
print extrap(xtest, x, y)
Edit: Mark Mikofski's suggested modification of the "extrap" function:
def extrap(x, xp, yp):
"""np.interp function with linear extrapolation"""
y = np.interp(x, xp, yp)
y[x < xp[0]] = yp[0] + (x[x<xp[0]]-xp[0]) * (yp[0]-yp[1]) / (xp[0]-xp[1])
y[x > xp[-1]]= yp[-1] + (x[x>xp[-1]]-xp[-1])*(yp[-1]-yp[-2])/(xp[-1]-xp[-2])
return y
It may be faster to use boolean indexing with large datasets, since the algorithm checks if every point is in outside the interval, whereas boolean indexing allows an easier and faster comparison.
For example:
# Necessary modules
import numpy as np
from scipy.interpolate import interp1d
# Original data
x = np.arange(0,10)
y = np.exp(-x/3.0)
# Interpolator class
f = interp1d(x, y)
# Output range (quite large)
xo = np.arange(0, 10, 0.001)
# Boolean indexing approach
# Generate an empty output array for "y" values
yo = np.empty_like(xo)
# Values lower than the minimum "x" are extrapolated at the same time
low = xo < f.x[0]
yo[low] = f.y[0] + (xo[low]-f.x[0])*(f.y[1]-f.y[0])/(f.x[1]-f.x[0])
# Values higher than the maximum "x" are extrapolated at same time
high = xo > f.x[-1]
yo[high] = f.y[-1] + (xo[high]-f.x[-1])*(f.y[-1]-f.y[-2])/(f.x[-1]-f.x[-2])
# Values inside the interpolation range are interpolated directly
inside = np.logical_and(xo >= f.x[0], xo <= f.x[-1])
yo[inside] = f(xo[inside])
In my case, with a data set of 300000 points, this means an speed up from 25.8 to 0.094 seconds, this is more than 250 times faster.
I did it by adding a point to my initial arrays. In this way I avoid defining self-made functions, and the linear extrapolation (in the example below: right extrapolation) looks ok.
import numpy as np
from scipy import interp as itp
xnew = np.linspace(0,1,51)
x1=xold[-2]
x2=xold[-1]
y1=yold[-2]
y2=yold[-1]
right_val=y1+(xnew[-1]-x1)*(y2-y1)/(x2-x1)
x=np.append(xold,xnew[-1])
y=np.append(yold,right_val)
f = itp(xnew,x,y)
I don't have enough reputation to comment, but in case somebody is looking for an extrapolation wrapper for a linear 2d-interpolation with scipy, I have adapted the answer that was given here for the 1d interpolation.
def extrap2d(interpolator):
xs = interpolator.x
ys = interpolator.y
zs = interpolator.z
zs = np.reshape(zs, (-1, len(xs)))
def pointwise(x, y):
if x < xs[0] or y < ys[0]:
x1_index = np.argmin(np.abs(xs - x))
x2_index = x1_index + 1
y1_index = np.argmin(np.abs(ys - y))
y2_index = y1_index + 1
x1 = xs[x1_index]
x2 = xs[x2_index]
y1 = ys[y1_index]
y2 = ys[y2_index]
z11 = zs[x1_index, y1_index]
z12 = zs[x1_index, y2_index]
z21 = zs[x2_index, y1_index]
z22 = zs[x2_index, y2_index]
return (z11 * (x2 - x) * (y2 - y) +
z21 * (x - x1) * (y2 - y) +
z12 * (x2 - x) * (y - y1) +
z22 * (x - x1) * (y - y1)
) / ((x2 - x1) * (y2 - y1) + 0.0)
elif x > xs[-1] or y > ys[-1]:
x1_index = np.argmin(np.abs(xs - x))
x2_index = x1_index - 1
y1_index = np.argmin(np.abs(ys - y))
y2_index = y1_index - 1
x1 = xs[x1_index]
x2 = xs[x2_index]
y1 = ys[y1_index]
y2 = ys[y2_index]
z11 = zs[x1_index, y1_index]
z12 = zs[x1_index, y2_index]
z21 = zs[x2_index, y1_index]
z22 = zs[x2_index, y2_index]#
return (z11 * (x2 - x) * (y2 - y) +
z21 * (x - x1) * (y2 - y) +
z12 * (x2 - x) * (y - y1) +
z22 * (x - x1) * (y - y1)
) / ((x2 - x1) * (y2 - y1) + 0.0)
else:
return interpolator(x, y)
def ufunclike(xs, ys):
if isinstance(xs, int) or isinstance(ys, int) or isinstance(xs, np.int32) or isinstance(ys, np.int32):
res_array = pointwise(xs, ys)
else:
res_array = np.zeros((len(xs), len(ys)))
for x_c in range(len(xs)):
res_array[x_c, :] = np.array([pointwise(xs[x_c], ys[y_c]) for y_c in range(len(ys))]).T
return res_array
return ufunclike
I haven't commented a lot and I am aware, that the code isn't super clean. If anybody sees any errors, please let me know. In my current use-case it is working without a problem :)
I'm afraid that there is no easy to do this in Scipy to my knowledge. You can, as I'm fairly sure that you are aware, turn off the bounds errors and fill all function values beyond the range with a constant, but that doesn't really help. See this question on the mailing list for some more ideas. Maybe you could use some kind of piecewise function, but that seems like a major pain.
The below code gives you the simple extrapolation module. k is the value to which the data set y has to be extrapolated based on the data set x. The numpy module is required.
def extrapol(k,x,y):
xm=np.mean(x);
ym=np.mean(y);
sumnr=0;
sumdr=0;
length=len(x);
for i in range(0,length):
sumnr=sumnr+((x[i]-xm)*(y[i]-ym));
sumdr=sumdr+((x[i]-xm)*(x[i]-xm));
m=sumnr/sumdr;
c=ym-(m*xm);
return((m*k)+c)
Standard interpolate + linear extrapolate:
def interpola(v, x, y):
if v <= x[0]:
return y[0]+(y[1]-y[0])/(x[1]-x[0])*(v-x[0])
elif v >= x[-1]:
return y[-2]+(y[-1]-y[-2])/(x[-1]-x[-2])*(v-x[-2])
else:
f = interp1d(x, y, kind='cubic')
return f(v)

Categories

Resources