Using parameters as bounds for scipy.optimize.curve_fit - python

I was wondering if it is possible to set bounds for the parameters in curve_fit() such that the bounds are dependent on another parameter. For example, say if I wanted to set the slope of a line to be greater than the intercept.
def linear(x, m, b):
return lambda x: (m*x) + b
def plot_linear(x, y):
B = ([b, -np.inf], [np.inf, np.inf])
p, v = curve_fit(linear, x, y, bounds = B)
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), '-')
I know that this doesn't work because the parameter b is not defined before it is called in the bounds, but I am not sure if there is a way to make this work?

We can always re-parameterize w.r.t. the specific curve-fitting problem. For example, if you wanted to fit y=mx+b s.t. m >= b, it can be re-written as m=b+k*k with another parameter k and we can optimize with the parameters b, k now as follows:
def linear(x, m, b):
return m*x + b
def linear2(x, k, b): # constrained fit, m = b + k**2 >= b
return (b+k**2)*x + b
def plot_linear(x, y):
p, v = curve_fit(linear, x, y)
print(p)
# [3.1675609 6.01025041]
p2, v2 = curve_fit(linear2, x, y)
print(p2)
# [2.13980283e-05 4.99368661e+00]
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), 'r-', label='unconstrained fit')
plt.plot(xs, linear2(xs, *p2), 'b-', label='constrained (m>b) fit')
plt.legend()
Now let's fit the curves on following data, using both the constrained and unconstrained fit functions (note the unconstrained optimal fit will have slope less than intercept)
x = np.linspace(0,1,100)
y = 3*x + 5 + 2*np.random.rand(len(x))
plot_linear(x, y)

Related

Parabolic fit with fixed peak

I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.
I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:
Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c

How to use curve_fit from scipy.optimize with a shared fit parameter across multiple datasets?

Assuming I have a fit function f with multiple parameters, for example a and b. Now I want to fit multiple datasets to this function and use the same a for all of them (the shared parameter) while b can be individual for each fit.
Example:
import numpy as np
# Fit function
def f(x, a, b):
return a * x + b
# Datasets
x = np.arange(4)
y = np.array([x + a + np.random.normal(0, 0.5, len(x)) for a in range(3)])
So we have 4 x values and 3 datasets of 4 y values each.
One way to do this is to concatenate the datasets and use an adjusted fit function.
In the following example, this happens inside the new fit function g with np.concatenate. The individual fits for each dataset are also done so we can compare their graph to the concatenated fit with the shared parameter.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
# Create example datasets
x = np.arange(4)
y = np.array([x + a + np.random.normal(0, 0.5, len(x)) for a in range(3)])
print("x =", x)
print("y =", y)
# Individual fits to each dataset
def f(x, a, b):
return a * x + b
for y_i in y:
(a, b), _ = curve_fit(f, x, y_i)
plt.plot(x, f(x, a, b), label=f"{a:.1f}x{b:+.1f}")
plt.plot(x, y_i, linestyle="", marker="x", color=plt.gca().lines[-1].get_color())
plt.legend()
plt.show()
# Fit to concatenated dataset with shared parameter
def g(x, a, b_1, b_2, b_3):
return np.concatenate((f(x, a, b_1), f(x, a, b_2), f(x, a, b_3)))
(a, *b), _ = curve_fit(g, x, y.ravel())
for b_i, y_i in zip(b, y):
plt.plot(x, f(x, a, b_i), label=f"{a:.1f}x{b_i:+.1f}")
plt.plot(x, y_i, linestyle="", marker="x", color=plt.gca().lines[-1].get_color())
plt.legend()
plt.show()
Output:
x = [0 1 2 3]
y = [[0.40162683 0.65320576 1.92549698 2.9759299 ]
[1.15804251 1.69973973 3.24986941 3.25735249]
[1.97214167 2.60206217 3.93789235 6.04590999]]
Individual fits with three different values for a:
Fit with shared parameter a:

Curve-fitting with uncertainties in fixed parameters of function to fit (Python)

I have a function that looks like f(x, m, E, I) = m * (x - x ** 2) / (E * I), where I want to get a value for E. I have some data, which I call X and Y, and some uncertainty in the y data which I call yerr. Additionally the parameters m and I are physical quantities, and they have been measured with some uncertainty.
I want to fit the function f to my data X, Y, having into account the uncertainties of the quantities m, I. Right now this is the command I am using to do the fit:
m = some value
I = some other value
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
Of course this doesn't take into account the uncertainty in m and I. Is there any way to fit a curve having into account this uncertainties?
For instance, here they solve a ODE using the module uncertainties, I tried to copy the procedure but didn't work:
import uncertainties as u
def f(x, m, E, I):
return m * (x - x ** 2) / (E * I)
m = u.ufloat(3e-4, 0.1e-6)
I = u.ufloat(1e-10, 0.2e-12)
#u.wrap
def fit():
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
return popt, pcov
where X, Y, yerr are the data and the error in Y as mentioned before.

How do I put a constraint on SciPy curve fit?

I'm trying to fit the distribution of some experimental values with a custom probability density function. Obviously, the integral of the resulting function should always be equal to 1, but the results of simple scipy.optimize.curve_fit(function, dataBincenters, dataCounts) never satisfy this condition.
What is the best way to solve this problem?
You can define your own residuals function, including a penalization parameter, like detailed in the code below, where it is known beforehand that the integral along the interval must be 2.. If you test without the penalization you will see that what your are getting is the conventional curve_fit:
import matplotlib.pyplot as plt
import scipy
from scipy.optimize import curve_fit, minimize, leastsq
from scipy.integrate import quad
from scipy import pi, sin
x = scipy.linspace(0, pi, 100)
y = scipy.sin(x) + (0. + scipy.rand(len(x))*0.4)
def func1(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# here you include the penalization factor
def residuals(p, x, y):
integral = quad(func1, 0, pi, args=(p[0], p[1], p[2], p[3]))[0]
penalization = abs(2.-integral)*10000
return y - func1(x, p[0], p[1], p[2], p[3]) - penalization
popt1, pcov1 = curve_fit(func1, x, y)
popt2, pcov2 = leastsq(func=residuals, x0=(1., 1., 1., 1.), args=(x, y))
y_fit1 = func1(x, *popt1)
y_fit2 = func1(x, *popt2)
plt.scatter(x, y, marker='.')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.legend()
plt.xlim(-0.1, 3.5)
plt.ylim(0, 1.4)
print('Exact integral:', quad(sin, 0, pi)[0])
print('Approx integral1:', quad(func1, 0, pi, args=(popt1[0], popt1[1], popt1[2], popt1[3]))[0])
print('Approx integral2:', quad(func1, 0, pi, args=(popt2[0], popt2[1], popt2[2], popt2[3]))[0])
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.60068579748
#Approx integral2: 2.00001911981
Other related questions:
SciPy LeastSq Goodness of Fit Estimator
Here is an almost-identical snippet which makes only use of curve_fit.
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
import scipy.integrate as integr
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def Func(x, a0, a1, a2, a3):
return a0 + a1*x + a2*x**2 + a3*x**3
# modified function definition with Penalization
def FuncPen(x, a0, a1, a2, a3):
integral = integr.quad( Func, 0, np.pi, args=(a0,a1,a2,a3))[0]
penalization = abs(2.-integral)*10000
return a0 + a1*x + a2*x**2 + a3*x**3 + penalization
popt1, pcov1 = opt.curve_fit( Func, x, y )
popt2, pcov2 = opt.curve_fit( FuncPen, x, y )
y_fit1 = Func(x, *popt1)
y_fit2 = Func(x, *popt2)
plt.scatter(x,y, marker='.')
plt.plot(x,y_fit2, color='y', label='constrained')
plt.plot(x,y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
print 'Exact integral:',integr.quad(np.sin ,0,np.pi)[0]
print 'Approx integral1:',integr.quad(Func,0,np.pi,args=(popt1[0],popt1[1],
popt1[2],popt1[3]))[0]
print 'Approx integral2:',integr.quad(Func,0,np.pi,args=(popt2[0],popt2[1],
popt2[2],popt2[3]))[0]
plt.show()
#Exact integral: 2.0
#Approx integral1: 2.66485028754
#Approx integral2: 2.00002116217
Following the example above here is more general way to add any constraints:
from scipy.optimize import minimize
from scipy.integrate import quad
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, np.pi, 100)
y = np.sin(x) + (0. + np.random.rand(len(x))*0.4)
def func_to_fit(x, params):
return params[0] + params[1] * x + params[2] * x ** 2 + params[3] * x ** 3
def constr_fun(params):
intgrl, _ = quad(func_to_fit, 0, np.pi, args=(params,))
return intgrl - 2
def func_to_minimise(params, x, y):
y_pred = func_to_fit(x, params)
return np.sum((y_pred - y) ** 2)
# Do the parameter fitting
#without constraints
res1 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y))
params1 = res1.x
# with constraints
cons = {'type': 'eq', 'fun': constr_fun}
res2 = minimize(func_to_minimise, x0=np.random.rand(4), args=(x, y), constraints=cons)
params2 = res2.x
y_fit1 = func_to_fit(x, params1)
y_fit2 = func_to_fit(x, params2)
plt.scatter(x,y, marker='.')
plt.plot(x, y_fit2, color='y', label='constrained')
plt.plot(x, y_fit1, color='g', label='curve_fit')
plt.legend(); plt.xlim(-0.1,3.5); plt.ylim(0,1.4)
plt.show()
print(f"Constrant violation: {constr_fun(params1)}")
Constraint violation: -2.9179325622408214e-10
If you are able normalise your probability fitting function in advance then you can use this information to constrain your fit. A very simple example of this would be fitting a Gaussian to data. If one were to fit the following three-parameter (A, mu, sigma) Gaussian then it would be unnormalised in general:
however, if one instead enforces the normalisation condition on A:
then the Gaussian is only two parameter and is automatically normalised.
You could ensure that your fitted probability distribution is normalised via a numerical integration. For example, assuming that you have data x and y and that you have defined an unnormalised_function(x, a, b) with parameters a and b for your probability distribution, which is defined on the interval x1 to x2 (which could be infinite):
from scipy.optimize import curve_fit
from scipy.integrate import quad
# Define a numerically normalised function
def normalised_function(x, a, b):
normalisation, _ = quad(lambda x: unnormalised_function(x, a, b), x1, x2)
return unnormalised_function(x, a, b)/normalisation
# Do the parameter fitting
fitted_parameters, _ = curve_fit(normalised_function, x, y)

Line fitting below points

I have a set of x, y points and I'd like to find the line of best fit such that the line is below all points using SciPy. I'm trying to use leastsq for this, but I'm unsure how to adjust the line to be below all points instead of the line of best fit. The coefficients for the line of best fit can be produced via:
def linreg(x, y):
fit = lambda params, x: params[0] * x - params[1]
err = lambda p, x, y: (y - fit(p, x))**2
# initial slope/intercept
init_p = np.array((1, 0))
p, _ = leastsq(err, init_p.copy(), args=(x, y))
return p
xs = sp.array([1, 2, 3, 4, 5])
ys = sp.array([10, 20, 30, 40, 50])
print linreg(xs, ys)
The output is the coefficients for the line of best fit:
array([ 9.99999997e+00, -1.68071668e-15])
How can I get the coefficients of the line of best fit that is below all points?
A possible algorithm is as follows:
Move the axes to have all the data on the positive half of the x axis.
If the fit is of the form y = a * x + b, then for a given b the best fit for a will be the minimum of the slopes joining the point (0, b) with each of the (x, y) points.
You can then calculate a fit error, which is a function of only b, and use scipy.optimize.minimize to find the best value for b.
All that's left is computing a for that b and calculating b for the original position of the axes.
The following does that most of the time, except when the minimization fails with some mysterious error:
from __future__ import division
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
def fit_below(x, y) :
idx = np.argsort(x)
x = x[idx]
y = y[idx]
x0, y0 = x[0] - 1, y[0]
x -= x0
y -= y0
def error_function_2(b, x, y) :
a = np.min((y - b) / x)
return np.sum((y - a * x - b)**2)
b = scipy.optimize.minimize(error_function_2, [0], args=(x, y)).x[0]
a = np.min((y - b) / x)
return a, b - a * x0 + y0
x = np.arange(10).astype(float)
y = x * 2 + 3 + 3 * np.random.rand(len(x))
a, b = fit_below(x, y)
plt.plot(x, y, 'o')
plt.plot(x, a*x + b, '-')
plt.show()
And as TheodrosZelleke wisely predicted, it goes through two points that are part of the convex hull:

Categories

Resources