Calculating the derivative of points in python

Calculating the derivative of points in python - python

I want to calculate the derivative of points, a few internet posts suggested using np.diff function. However, I tried using np.diff against manually calculated results (chose a random polynomial equation and differentiated it) to see if I end up with the same results. I used the following eq : Y = (X^3) + (X^2) + 7 and the results i ended up with were different. Any ideas why?. Is there any other method to calculate the differential.
In the problem, I am trying to solve, I have recieved data points of fitted spline function ( not the original data that need to be fitted by spline but the points of the already fitted spline). The x-values are at equal intervals. I only have the points and no equation, what I require is to calculate, the first, second and third derivatives. i.e dy/dx, d2y/dx2, d3y/dx3. Any ideas on how to do this?. Thanks in advance.
xval = [1,2,3,4,5]
yval = []
yval_dashList = []
#selected a polynomial equation
def calc_Y(X):
Y = (X**3) + (X**2) + 7
return(Y)
#calculate y values using equatuion
for i in xval:
yval.append(calc_Y(i))
#output: yval = [9,19,43,87,157]
#manually differentiated the equation or use sympy library (sym.diff(x**3 + x**2 + 7))
def calc_diffY(X):
yval_dash = 3*(X**2) + 2**X
#store differentiated y-values in a list
for i in xval:
yval_dashList.append(yval_dash(i))
#output: yval_dashList = [5,16,35,64,107]
#use numpy diff method on the y values(yval)
numpyDiff = np.diff(yval)
#output: [10,24,44,60]
The values of numpy diff method [10,24,44,60] is different from yval_dashList = [5,16,35,64,107]

The idea behind what you are trying to do is correct, but there are a couple of points to make it work as intended:
There is a typo in calc_diffY(X), the derivative of X**2 is 2*X, not 2**X:
def calc_diffY(X):
yval_dash = 3*(X**2) + 2*X
By doing this you don't obtain much better results:
yval_dash = [5, 16, 33, 56, 85]
numpyDiff = [10. 24. 44. 70.]
To calculate the numerical derivative you should do a "Difference quotient" which is an approximation of a derivative
numpyDiff = np.diff(yval)/np.diff(xval)
The approximation becomes better and better if the values of the points are more dense.
The difference between your points on the x axis is 1, so you end up in this situation (in blue the analytical derivative, in red the numerical):
If you reduce the difference in your x points to 0.1, you get this, which is much better:
Just to add something to this, have a look at this image showing the effect of reducing the distance of the points on which the derivative is numerically calculated, taken from Wikipedia:

I like #lgsp's answer. I will add that you can directly estimate the derivative without having to worry about how much space there is between the values. This just uses the symmetric formula for calculating finite differences, described at this wikipedia page.
Take note, though, of the way delta is specified. I found that when it is too small, higher-order estimates fail. There's probably not a 100% generic value that will always work well!
Also, I simplified your code by taking advantage of numpy broadcasting over arrays to eliminate for loops.
import numpy as np
# selecte a polynomial equation
def f(x):
y = x**3 + x**2 + 7
return y
# manually differentiate the equation
def f_prime(x):
return 3*x**2 + 2*x
# numerically estimate the first three derivatives
def d1(f, x, delta=1e-10):
return (f(x + delta) - f(x - delta)) / (2 * delta)
def d2(f, x, delta=1e-5):
return (d1(f, x + delta, delta) - d1(f, x - delta, delta)) / (2 * delta)
def d3(f, x, delta=1e-2):
return (d2(f, x + delta, delta) - d2(f, x - delta, delta)) / (2 * delta)
# demo output
# note that functions operate in parallel on numpy arrays -- no for loops!
xval = np.array([1,2,3,4,5])
print('y = ', f(xval))
print('y\' = ', f_prime(xval))
print('d1 = ', d1(f, xval))
print('d2 = ', d2(f, xval))
print('d3 = ', d3(f, xval))
And the outputs:
y = [ 9 19 43 87 157]
y' = [ 5 16 33 56 85]
d1 = [ 5.00000041 16.00000132 33.00002049 56.00000463 84.99995374]
d2 = [ 8.0000051 14.00000116 20.00000165 25.99996662 32.00000265]
d3 = [6. 6. 6. 6. 5.99999999]

Related

Smooth a curve in Python while preserving the value and slope at the end points

I have two solutions to this problem actually, they are both applied below to a test case. The thing is that none of them is perfect: first one only take into account the two end points, the other one can't be made "arbitrarily smooth": there is a limit in the amount of smoothness one can achieve (the one I am showing).
I am sure there is a better solution, that kind-of go from the first solution to the other and all the way to no smoothing at all. It may already be implemented somewhere. Maybe solving a minimization problem with an arbitrary number of splines equidistributed?
Thank you very much for your help
Ps: the seed used is a challenging one
import matplotlib.pyplot as plt
from scipy import interpolate
from scipy.signal import savgol_filter
import numpy as np
import random
def scipy_bspline(cv, n=100, degree=3):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
"""
cv = np.asarray(cv)
count = cv.shape[0]
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = interpolate.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
def round_up_to_odd(f):
return np.int(np.ceil(f / 2.) * 2 + 1)
def generateRandomSignal(n=1000, seed=None):
"""
Parameters
----------
n : integer, optional
Number of points in the signal. The default is 1000.
Returns
-------
sig : numpy array
"""
np.random.seed(seed)
print("Seed was:", seed)
steps = np.random.choice(a=[-1, 0, 1], size=(n-1))
roughSig = np.concatenate([np.array([0]), steps]).cumsum(0)
sig = savgol_filter(roughSig, round_up_to_odd(n/10), 6)
return sig
# Generate a random signal to illustrate my point
n = 1000
t = np.linspace(0, 10, n)
seed = 45136. # Challenging seed
sig = generateRandomSignal(n=1000, seed=seed)
sigInit = np.copy(sig)
# Add noise to the signal
mean = 0
std = sig.max()/3.0
num_samples = n/5
idxMin = n/2-100
idxMax = idxMin + num_samples
tCut = t[idxMin+1:idxMax]
noise = np.random.normal(mean, std, size=num_samples-1) + 2*std*np.sin(2.0*np.pi*tCut/0.4)
sig[idxMin+1:idxMax] += noise
# Define filtering range enclosing the noisy area of the signal
idxMin -= 20
idxMax += 20
# Extreme filtering solution
# Spline between first and last points, the points in between have no influence
sigTrim = np.delete(sig, np.arange(idxMin,idxMax))
tTrim = np.delete(t, np.arange(idxMin,idxMax))
f = interpolate.interp1d(tTrim, sigTrim, kind='quadratic')
sigSmooth1 = f(t)
# My attempt. Not bad but not perfect because there is a limit in the maximum
# amount of smoothing we can add (degree=len(tSlice) is the maximum)
# If I could do degree=10*len(tSlice) and converging to the first solution
# I would be done!
sigSlice = sig[idxMin:idxMax]
tSlice = t[idxMin:idxMax]
cv = np.stack((tSlice, sigSlice)).T
p = scipy_bspline(cv, n=len(tSlice), degree=len(tSlice))
tSlice = p.T[0]
sigSliceSmooth = p.T[1]
sigSmooth2 = np.copy(sig)
sigSmooth2[idxMin:idxMax] = sigSliceSmooth
# Plot
plt.figure()
plt.plot(t, sig, label="Signal")
plt.plot(t, sigSmooth1, label="Solution 1")
plt.plot(t, sigSmooth2, label="Solution 2")
plt.plot(t[idxMin:idxMax], sigInit[idxMin:idxMax], label="What I'd want (kind of, smoother will be even better actually)")
plt.plot([t[idxMin],t[idxMax]], [sig[idxMin],sig[idxMax]],"o")
plt.legend()
plt.show()
sys.exit()

Yes, a minimization is a good way to approach this smoothing problem.
Least squares problem
Here is a suggestion for a least squares formulation: let s[0], ..., s[N] denote the N+1 samples of the given signal to smooth, and let L and R be the desired slopes to preserve at the left and right endpoints. Find the smoothed signal u[0], ..., u[N] as the minimizer of
min_u (1/2) sum_n (u[n] - s[n])² + (λ/2) sum_n (u[n+1] - 2 u[n] + u[n-1])²
subject to
s[0] = u[0], s[N] = u[N] (value constraints),
L = u[1] - u[0], R = u[N] - u[N-1] (slope constraints),
where in the minimization objective, the sums are over n = 1, ..., N-1 and λ is a positive parameter controlling the smoothing strength. The first term tries to keep the solution close to the original signal, and the second term penalizes u for bending to encourage a smooth solution.
The slope constraints require that
u[1] = L + u[0] = L + s[0] and u[N-1] = u[N] - R = s[N] - R. So we can consider the minimization as over only the interior samples u[2], ..., u[N-2].
Finding the minimizer
The minimizer satisfies the Euler–Lagrange equations
(u[n] - s[n]) / λ + (u[n+2] - 4 u[n+1] + 6 u[n] - 4 u[n-1] + u[n-2]) = 0
for n = 2, ..., N-2.
An easy way to find an approximate solution is by gradient descent: initialize u = np.copy(s), set u[1] = L + s[0] and u[N-1] = s[N] - R, and do 100 iterations or so of
u[2:-2] -= (0.05 / λ) * (u - s)[2:-2] + np.convolve(u, [1, -4, 6, -4, 1])[4:-4]
But with some more work, it is possible to do better than this by solving the E–L equations directly. For each n, move the known quantities to the right-hand side: s[n] and also the endpoints u[0] = s[0], u[1] = L + s[0], u[N-1] = s[N] - R, u[N] = s[N]. The you will have a linear system "A u = b", and matrix A has rows like
0, ..., 0, 1, -4, (6 + 1/λ), -4, 1, 0, ..., 0.
Finally, solve the linear system to find the smoothed signal u. You could use numpy.linalg.solve to do this if N is not too large, or if N is large, try an iterative method like conjugate gradients.

you can apply a simple smoothing method and plot the smooth curves with different smoothness values to see which one works best.
def smoothing(data, smoothness=0.5):
last = data[0]
new_data = [data[0]]
for datum in data[1:]:
new_value = smoothness * last + (1 - smoothness) * datum
new_data.append(new_value)
last = datum
return new_data
You can plot this curve for multiple values of smoothness and pick the curve which suits your needs. You can also apply this method only on a range of values in the actual curve by defining start and end

equivalent to numpy.linalg.lstsq that allows weighting

I am fitting a 2d polynomial with the numpy function linalg.lstsq:
coeffs = np.array([y*0+1, y, x, x**2, y**2]).T
coeff_r, r, rank, s =np.linalg.lstsq(coeffs, values)
Some points that I am trying to fit are more reliable than others.
Is there a way to weigh the points differently?
Thanks

lstsq is enough for this; the weights can be applied to the equations. That is, if in an overdetermined system
3*a + 2*b = 9
2*a + 3*b = 4
5*a - 4*b = 2
you care about the first equation more than about the others, multiply it by some number greater than 1. For example, by 5:
15*a + 10*b = 45
2*a + 3*b = 4
5*a - 4*b = 2
Mathematically, the system is the same, but the least squares solution will be different because it minimizes the sum of squares of the residuals, and the residual of the 1st equation got multiplied by 5.
Here is an example based on your code (with small adjustments to make it more NumPythonic). First, unweighted fit:
import numpy as np
x, y = np.meshgrid(np.arange(0, 3), np.arange(0, 3))
x = x.ravel()
y = y.ravel()
values = np.sqrt(x+y+2) # some values to fit
functions = np.stack([np.ones_like(y), y, x, x**2, y**2], axis=1)
coeff_r = np.linalg.lstsq(functions, values, rcond=None)[0]
values_r = functions.dot(coeff_r)
print(values_r - values)
This displays the residuals as
[ 0.03885814 -0.00502763 -0.03383051 -0.00502763 0.00097465 0.00405298
-0.03383051 0.00405298 0.02977753]
Now I give the 1st data point greater weight.
weights = np.ones_like(x)
weights[0] = 5
coeff_r = np.linalg.lstsq(functions*weights[:, None], values*weights, rcond=None)[0]
values_r = functions.dot(coeff_r)
print(values_r - values)
Residuals:
[ 0.00271103 -0.01948647 -0.04828936 -0.01948647 0.00820407 0.0112824
-0.04828936 0.0112824 0.03700695]
The first residual is now an order of magnitude smaller, of course at the expense of others residuals.

Finite difference approximations in python

I am trying to calculate the derivative of a function at x = 0, but I keep getting odd answers with all functions I have tried. For example with f(x)=x**2 I get the derivative to be 2 at all points. My finite difference coefficients are correct, it is second order accurate for the second derivative with respect to x.
from numpy import *
from matplotlib.pyplot import *
def f1(x):
return x**2
n = 100 # grid points
x = zeros(n+1,dtype=float) # array to store values of x
step = 0.02/float(n) # step size
f = zeros(n+1,dtype=float) # array to store values of f
df = zeros(n+1,dtype=float) # array to store values of calulated derivative
for i in range(0,n+1): # adds values to arrays for x and f(x)
x[i] = -0.01 + float(i)*step
f[i] = f1(x[i])
# have to calculate end points seperately using one sided form
df[0] = (f[2]-2*f[1]+f[0])/step**2
df[1] = (f[3]-2*f[2]+f[1])/step**2
df[n-1] = (f[n-1]-2*f[n-2]+f[n-3])/step**2
df[n] = (f[n]-2*f[n-1]+f[n-2])/step**2
for i in range(2,n-1): # add values to array for derivative
df[i] = (f[i+1]-2*f[i]+f[i-1])/step**2
print df # returns an array full of 2...

The second derivative of x^2 is the constant 2, and you use the central difference quotient for the second derivative, as you can also see by the square in the denominator. Your result is absolutely correct, your code does exactly what you told it to do.
To get the first derivative with a symmetric difference quotient, use
df[i] = ( f[i+1] - f[i-1] ) / ( 2*step )

first order derivative at point x of function f1 (for the case f1(x)=x^2) can be obtained:
def f1(x):
return (x**2)
def derivative (f, x, step=0.0000000000001):
return ((f(x+step)-f(x))/step)
hope that helps

Solving an ordinary differential equation on a fixed grid (preferably in python)

I have a differential equation of the form
dy(x)/dx = f(y,x)
that I would like to solve for y.
I have an array xs containing all of the values of x for which I need ys.
For only those values of x, I can evaluate f(y,x) for any y.
How can I solve for ys, preferably in python?
MWE
import numpy as np
# these are the only x values that are legal
xs = np.array([0.15, 0.383, 0.99, 1.0001])
# some made up function --- I don't actually have an analytic form like this
def f(y, x):
if not np.any(np.isclose(x, xs)):
return np.nan
return np.sin(y + x**2)
# now I want to know which array of ys satisfies dy(x)/dx = f(y,x)

Assuming you can use something simple like Forward Euler...
Numerical solutions will rely on approximate solutions at previous times. So if you want a solution at t = 1 it is likely you will need the approximate solution at t<1.
My advice is to figure out what step size will allow you to hit the times you need, and then find the approximate solution on an interval containing those times.
import numpy as np
#from your example, smallest step size required to hit all would be 0.0001.
a = 0 #start point
b = 1.5 #possible end point
h = 0.0001
N = float(b-a)/h
y = np.zeros(n)
t = np.linspace(a,b,n)
y[0] = 0.1 #initial condition here
for i in range(1,n):
y[i] = y[i-1] + h*f(t[i-1],y[i-1])
Alternatively, you could use an adaptive step method (which I am not prepared to explain right now) to take larger steps between the times you need.
Or, you could find an approximate solution over an interval using a coarser mesh and interpolate the solution.
Any of these should work.

I think you should first solve ODE on a regular grid, and then interpolate solution on your fixed grid. The approximate code for your problem
import numpy as np
from scipy.integrate import odeint
from scipy import interpolate
xs = np.array([0.15, 0.383, 0.99, 1.0001])
# dy/dx = f(x,y)
def dy_dx(y, x):
return np.sin(y + x ** 2)
y0 = 0.0 # init condition
x = np.linspace(0, 10, 200)# here you can control an accuracy
sol = odeint(dy_dx, y0, x)
f = interpolate.interp1d(x, np.ravel(sol))
ys = f(xs)
But dy_dx(y, x) should always return something reasonable (not np.none).
Here is the drawing for this case

Fit a curve for data made up of two distinct regimes

I'm looking for a way to plot a curve through some experimental data. The data shows a small linear regime with a shallow gradient, followed by a steep linear regime after a threshold value.
My data is here: http://pastebin.com/H4NSbxqr
I could fit the data with two lines relatively easily, but I'd like to fit with a continuous line ideally - which should look like two lines with a smooth curve joining them around the threshold (~5000 in the data, shown above).
I attempted this using scipy.optimize curve_fit and trying a function which included the sum of a straight line and an exponential:
y = a*x + b + c*np.exp((x-d)/e)
although despite numerous attempts, it didn't find a solution.
If anyone has any suggestions please, either on the choice of fitting distribution / method or the curve_fit implementation, they would be greatly appreciated.

If you don't have a particular reason to believe that linear + exponential is the true underlying cause of your data, then I think a fit to two lines makes the most sense. You can do this by making your fitting function the maximum of two lines, for example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def two_lines(x, a, b, c, d):
one = a*x + b
two = c*x + d
return np.maximum(one, two)
Then,
x, y = np.genfromtxt('tmp.txt', unpack=True, delimiter=',')
pw0 = (.02, 30, .2, -2000) # a guess for slope, intercept, slope, intercept
pw, cov = curve_fit(two_lines, x, y, pw0)
crossover = (pw[3] - pw[1]) / (pw[0] - pw[2])
plt.plot(x, y, 'o', x, two_lines(x, *pw), '-')
If you really want a continuous and differentiable solution, it occurred to me that a hyperbola has a sharp bend to it, but it has to be rotated. It was a bit difficult to implement (maybe there's an easier way), but here's a go:
def hyperbola(x, a, b, c, d, e):
""" hyperbola(x) with parameters
a/b = asymptotic slope
c = curvature at vertex
d = offset to vertex
e = vertical offset
"""
return a*np.sqrt((b*c)**2 + (x-d)**2)/b + e
def rot_hyperbola(x, a, b, c, d, e, th):
pars = a, b, c, 0, 0 # do the shifting after rotation
xd = x - d
hsin = hyperbola(xd, *pars)*np.sin(th)
xcos = xd*np.cos(th)
return e + hyperbola(xcos - hsin, *pars)*np.cos(th) + xcos - hsin
Run it as
h0 = 1.1, 1, 0, 5000, 100, .5
h, hcov = curve_fit(rot_hyperbola, x, y, h0)
plt.plot(x, y, 'o', x, two_lines(x, *pw), '-', x, rot_hyperbola(x, *h), '-')
plt.legend(['data', 'piecewise linear', 'rotated hyperbola'], loc='upper left')
plt.show()
I was also able to get the line + exponential to converge, but it looks terrible. This is because it's not a good descriptor of your data, which is linear and an exponential is very far from linear!
def line_exp(x, a, b, c, d, e):
return a*x + b + c*np.exp((x-d)/e)
e0 = .1, 20., .01, 1000., 2000.
e, ecov = curve_fit(line_exp, x, y, e0)
If you want to keep it simple, there's always a polynomial or spline (piecewise polynomials)
from scipy.interpolate import UnivariateSpline
s = UnivariateSpline(x, y, s=x.size) #larger s-value has fewer "knots"
plt.plot(x, s(x))

I researched this a little, Applied Linear Regression by Sanford, and the Correlation and Regression lecture by Steiger had some good info on it. They all however lack the right model, the piecewise function should be
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import lmfit
dfseg = pd.read_csv('segreg.csv')
def err(w):
th0 = w['th0'].value
th1 = w['th1'].value
th2 = w['th2'].value
gamma = w['gamma'].value
fit = th0 + th1*dfseg.Temp + th2*np.maximum(0,dfseg.Temp-gamma)
return fit-dfseg.C
p = lmfit.Parameters()
p.add_many(('th0', 0.), ('th1', 0.0),('th2', 0.0),('gamma', 40.))
mi = lmfit.minimize(err, p)
lmfit.printfuncs.report_fit(mi.params)
b0 = mi.params['th0']; b1=mi.params['th1'];b2=mi.params['th2']
gamma = int(mi.params['gamma'].value)
import statsmodels.formula.api as smf
reslin = smf.ols('C ~ 1 + Temp + I((Temp-%d)*(Temp>%d))' % (gamma,gamma), data=dfseg).fit()
print reslin.summary()
x0 = np.array(range(0,gamma,1))
x1 = np.array(range(0,80-gamma,1))
y0 = b0 + b1*x0
y1 = (b0 + b1 * float(gamma) + (b1 + b2)* x1)
plt.scatter(dfseg.Temp, dfseg.C)
plt.hold(True)
plt.plot(x0,y0)
plt.plot(x1+gamma,y1)
plt.show()
Result
[[Variables]]
th0: 78.6554456 +/- 3.966238 (5.04%) (init= 0)
th1: -0.15728297 +/- 0.148250 (94.26%) (init= 0)
th2: 0.72471237 +/- 0.179052 (24.71%) (init= 0)
gamma: 38.3110177 +/- 4.845767 (12.65%) (init= 40)
The data
"","Temp","C"
"1",8.5536,86.2143
"2",10.6613,72.3871
"3",12.4516,74.0968
"4",16.9032,68.2258
"5",20.5161,72.3548
"6",21.1613,76.4839
"7",24.3929,83.6429
"8",26.4839,74.1935
"9",26.5645,71.2581
"10",27.9828,78.2069
"11",32.6833,79.0667
"12",33.0806,71.0968
"13",33.7097,76.6452
"14",34.2903,74.4516
"15",36,56.9677
"16",37.4167,79.8333
"17",43.9516,79.7097
"18",45.2667,76.9667
"19",47,76
"20",47.1129,78.0323
"21",47.3833,79.8333
"22",48.0968,73.9032
"23",49.05,78.1667
"24",57.5,81.7097
"25",59.2,80.3
"26",61.3226,75
"27",61.9194,87.0323
"28",62.3833,89.8
"29",64.3667,96.4
"30",65.371,88.9677
"31",68.35,91.3333
"32",70.7581,91.8387
"33",71.129,90.9355
"34",72.2419,93.4516
"35",72.85,97.8333
"36",73.9194,92.4839
"37",74.4167,96.1333
"38",76.3871,89.8387
"39",78.0484,89.4516
Graph

I used #user423805 's answer (found via google groups thread: https://groups.google.com/forum/#!topic/lmfit-py/7I2zv2WwFLU ) but noticed it had some limitations when trying to use three or more segments.
Instead of applying np.maximum in the minimizer error function or adding (b1 + b2) in #user423805 's answer, I used the same linear spline calculation for both the minimizer and end-usage:
# least_splines_calc works like this for an example with three segments
# (four threshold params, three gamma params):
#
# for 0 < x < gamma0 : y = th0 + (th1 * x)
# for gamma0 < x < gamma1 : y = th0 + (th1 * x) + (th2 * (x - gamma0))
# for gamma1 < x : y = th0 + (th1 * x) + (th2 * (x - gamma0)) + (th3 * (x - gamma1))
#
def least_splines_calc(x, thresholds, gammas):
if(len(thresholds) < 2):
print("Error: expected at least two thresholds")
return None
applicable_gammas = filter(lambda gamma: x > gamma , gammas)
#base result
y = thresholds[0] + (thresholds[1] * x)
#additional factors calculated depending on x value
for i in range(0, len(applicable_gammas)):
y = y + ( thresholds[i + 2] * ( x - applicable_gammas[i] ) )
return y
def least_splines_calc_array(x_array, thresholds, gammas):
y_array = map(lambda x: least_splines_calc(x, thresholds, gammas), x_array)
return y_array
def err(params, x, data):
th0 = params['th0'].value
th1 = params['th1'].value
th2 = params['th2'].value
th3 = params['th3'].value
gamma1 = params['gamma1'].value
gamma2 = params['gamma2'].value
thresholds = np.array([th0, th1, th2, th3])
gammas = np.array([gamma1, gamma2])
fit = least_splines_calc_array(x, thresholds, gammas)
return np.array(fit)-np.array(data)
p = lmfit.Parameters()
p.add_many(('th0', 0.), ('th1', 0.0),('th2', 0.0),('th3', 0.0),('gamma1', 9.),('gamma2', 9.3)) #NOTE: the 9. / 9.3 were guesses specific to my data, you will need to change these
mi = lmfit.minimize(err_alt, p, args=(np.array(dfseg.Temp), np.array(dfseg.C)))
After minimization, convert the params found by the minimizer into an array of thresholds and gammas to re-use linear_splines_calc to plot the linear splines regression.
Reference: While there's various places that explain least splines (I think #user423805 used http://www.statpower.net/Content/313/Lecture%20Notes/Splines.pdf , which has the (b1 + b2) addition I disagree with in its sample code despite similar equations) , the one that made the most sense to me was this one (by Rob Schapire / Zia Khan at Princeton) : https://www.cs.princeton.edu/courses/archive/spring07/cos424/scribe_notes/0403.pdf - section 2.2 goes into linear splines. Excerpt below:

If you're looking to join what appears to be two straight lines with a hyperbola having a variable radius at/near the intersection of the two lines (which are its asymptotes), I urge you to look hard at Using an Hyperbola as a Transition Model to Fit Two-Regime Straight-Line Data, by Donald G. Watts and David W. Bacon, Technometrics, Vol. 16, No. 3 (Aug., 1974), pp. 369-373.
The formula is drop dead simple, nicely adjustable, and works like a charm. From their paper (in case you can't access it):
As a more useful alternative form we consider an hyperbola for which:
(i) the dependent variable y is a single valued function of the independent variable x,
(ii) the left asymptote has slope theta_1,
(iii) the right asymptote has slope theta_2,
(iv) the asymptotes intersect at the point (x_o, beta_o),
(v) the radius of curvature at x = x_o is proportional to a quantity delta. Such an hyperbola can be written y = beta_o + beta_1*(x - x_o) + beta_2* SQRT[(x - x_o)^2 + delta^2/4], where beta_1 = (theta_1 + theta_2)/2 and beta_2 = (theta_2 - theta_1)/2.
delta is the adjustable parameter that allows you to either closely follow the lines right to the intersection point or smoothly merge from one line to the other.
Just solve for the intersection point (x_o, beta_o), and plug into the formula above.
BTW, in general, if line 1 is y_1 = b_1 + m_1 *x and line 2 is y_2 = b_2 + m_2 * x, then they intersect at x* = (b_2 - b_1) / (m_1 - m_2) and y* = b_1 + m_1 * x*. So, to connect with the formalism above, x_o = x*, beta_o = y* and the two m_*'s are the two thetas.

There is a straightforward method (not iterative, no initial guess) pp.12-13 in https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
The data comes from the scanning of the figure published by IanRoberts in his question. Scanning for the coordinates of the pixels in not accurate. So, don't be surprised by additional deviation.
Note that the abscisses and ordinates scales have been devised by 1000.
The equations of the two segments are
The approximate values of the five parameters are written on the above figure.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calculating the derivative of points in python - python

Related

Smooth a curve in Python while preserving the value and slope at the end points

equivalent to numpy.linalg.lstsq that allows weighting

Finite difference approximations in python

Solving an ordinary differential equation on a fixed grid (preferably in python)

Fit a curve for data made up of two distinct regimes

Categories

Resources