Curve-fitting with uncertainties in fixed parameters of function to fit (Python) - python

I have a function that looks like f(x, m, E, I) = m * (x - x ** 2) / (E * I), where I want to get a value for E. I have some data, which I call X and Y, and some uncertainty in the y data which I call yerr. Additionally the parameters m and I are physical quantities, and they have been measured with some uncertainty.
I want to fit the function f to my data X, Y, having into account the uncertainties of the quantities m, I. Right now this is the command I am using to do the fit:
m = some value
I = some other value
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
Of course this doesn't take into account the uncertainty in m and I. Is there any way to fit a curve having into account this uncertainties?
For instance, here they solve a ODE using the module uncertainties, I tried to copy the procedure but didn't work:
import uncertainties as u
def f(x, m, E, I):
return m * (x - x ** 2) / (E * I)
m = u.ufloat(3e-4, 0.1e-6)
I = u.ufloat(1e-10, 0.2e-12)
#u.wrap
def fit():
popt, pcov = curve_fit(lambda x, E: f(x, m, E, I), X, Y, p0=[1e9], sigma=yerr)
return popt, pcov
where X, Y, yerr are the data and the error in Y as mentioned before.

Related

Regression with a multi-variable function

I have this code to fit a function with only one variable (x):
from scipy.optimize import curve_fit
def func(x, s, k, L,A):
return A + (L * (1/(1+((x/k)**(-s)))))
init_vals = [0.4,4, 100,50]
# fit your data and getting fit parameters
popt, pcov = curve_fit(func, xdata, ydata, p0=init_vals, bounds=([0,0.1, 1,0], [10,10, 1000,1000]))
But now I need to fit this one:
def func(x, s, k, L,A):
return A + (L * (1/(1+(((b1*x1+b2*x2+b3*x3)/k)**(-s)))))
Where x is now f(x1,x2,x3)
Should it be like this?
def func(x, s, k, L,A):
return A + (L * (1/(1+(((b1*x[0]+b2*x[1]+b3*x[2])/k)**(-s)))))
and in this case xdata has to be (3,n) shaped array.

Parabolic fit with fixed peak

I have a set of data and want to put a parabolic fit over it. This already works with the polyfit function from numpy like this:
fit = np.polyfit(X, y, 2)
formula = np.poly1d(fit)
Now I want the parabula to have its peak value at a fixed x value and that the fit is still carried out as best as possible with this fixed peak. Is there a way to accomplish that?
From my data I know that the parabola will always be open downwards.
I think this is quite a difficult problem since the x coordinate of the peak of a second-order polynomial (ax^2 + bx + c) always lies in x = -b/2a.
A thing you could do is to drop the b term and offset it by the desired peak x value in fitting the polynomial like the code below. Note that I used scipy.optimize.curve_fit to fit for the custom function func.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generating a parabola with noise
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x-2)**2 + np.random.normal(0, 5, x.shape)
# function to fit
def func(x, a, c):
return a*x**2 + c
# desired x peak value
x_peak = 2
popt, pcov = curve_fit(func, x - x_peak, y)
y_fit = func(x - x_peak, *popt)
# plotting
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
plt.axvline(x_peak)
plt.show()
Outputs the image:
Fixing a point on your parabola simplifies the problem, since you can rewrite your equation slightly in terms of a constant now:
y = A(x - B)**2 + C
Given the coefficients a, b, c in your original unconstrained fit, you have the relationships
a = A
b = -2AB
c = AB**2 + C
The only difference is that since B is a constant and you don't have an x - B term in the equation, you need to set up the least-squares problem yourself. Given arrays x, y and constant B, the problem looks like this:
m = np.stack((x - B, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
You can then extract the normal coefficient from the formulas for a, b, c above.
Here is a complete example, just like the one in the other answer:
B = 2
np.random.seed(42)
x = np.linspace(-10, 10, 100)
y = 10 -(x - B)**2 + np.random.normal(0, 5, x.shape)
m = np.stack(((x - B)**2, np.ones_like(x)), axis=-1)
(A, C), *_ = np.linalg.lstsq(m, y, rcond=None)
a = A
b = -2 * A * B
c = A * B**2 + C
y_fit = a * x**2 + b * x + c
You can drop a, b, c entirely and do
y_fit = A * (x - B)**2 + C
The result will be identical.
plt.plot(x, y, 'k.')
plt.plot(x, y_fit)
Without the condition of location of the peak the function to be fitted would be :
y = a x^2 + b x + c
With condition of location of the peak at x=p , given p :
-b/(2a)=p
b=-2 a p
y = a x^2 -2 a p x + c
y = a (x^2 - 2 p x) +c
Knowing p , one change of variable :
X = x^2 -2 p x
So, from the data (x,y) one first compute the new data (X,y)
Then a and c are computed thanks to linear regression
y = a X + c

Using parameters as bounds for scipy.optimize.curve_fit

I was wondering if it is possible to set bounds for the parameters in curve_fit() such that the bounds are dependent on another parameter. For example, say if I wanted to set the slope of a line to be greater than the intercept.
def linear(x, m, b):
return lambda x: (m*x) + b
def plot_linear(x, y):
B = ([b, -np.inf], [np.inf, np.inf])
p, v = curve_fit(linear, x, y, bounds = B)
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), '-')
I know that this doesn't work because the parameter b is not defined before it is called in the bounds, but I am not sure if there is a way to make this work?
We can always re-parameterize w.r.t. the specific curve-fitting problem. For example, if you wanted to fit y=mx+b s.t. m >= b, it can be re-written as m=b+k*k with another parameter k and we can optimize with the parameters b, k now as follows:
def linear(x, m, b):
return m*x + b
def linear2(x, k, b): # constrained fit, m = b + k**2 >= b
return (b+k**2)*x + b
def plot_linear(x, y):
p, v = curve_fit(linear, x, y)
print(p)
# [3.1675609 6.01025041]
p2, v2 = curve_fit(linear2, x, y)
print(p2)
# [2.13980283e-05 4.99368661e+00]
xs = np.linspace(min(x), max(x), 1000)
plt.plot(x,y,'.')
plt.plot(xs, linear(xs, *p), 'r-', label='unconstrained fit')
plt.plot(xs, linear2(xs, *p2), 'b-', label='constrained (m>b) fit')
plt.legend()
Now let's fit the curves on following data, using both the constrained and unconstrained fit functions (note the unconstrained optimal fit will have slope less than intercept)
x = np.linspace(0,1,100)
y = 3*x + 5 + 2*np.random.rand(len(x))
plot_linear(x, y)

Fitting of a 3d polynominal / volume in python

I'd like to find the best fit (least-squares solution) for the a coefficients in an equation similar to this one:
b = f(x,y,z) = (a0 + a1*x + a2*y + a3*z + a4*x*y + a5*x*z + a6*y*z + a7*x*y*z)
x, y, and z are small arrays with a length of about 20. The shown example is for x**k with k=1. I'm looking for a solution up k=3.
I have found this solution for a 2d fit Equivalent of `polyfit` for a 2D polynomial in Python
Now I'm looking for a similar solution but in 3d.
You right, similar technic works:
import numpy as np
x, y, z = np.random.randn(3, 20)
grid = np.meshgrid(x, y, z, indexing='ij')
x, y, z = np.stack(grid).reshape(3, -1)
b = np.random.randn(*x.shape).reshape(-1)
A = np.stack([np.ones_like(x, dtype=x.dtype), x, y, z, x * y, x * z, y * z, x * y * z], axis=1)
coeff, r, rank, s = np.linalg.lstsq(A, b, rcond=None)

How to use curve_fit from scipy.optimize with a shared fit parameter across multiple datasets?

Assuming I have a fit function f with multiple parameters, for example a and b. Now I want to fit multiple datasets to this function and use the same a for all of them (the shared parameter) while b can be individual for each fit.
Example:
import numpy as np
# Fit function
def f(x, a, b):
return a * x + b
# Datasets
x = np.arange(4)
y = np.array([x + a + np.random.normal(0, 0.5, len(x)) for a in range(3)])
So we have 4 x values and 3 datasets of 4 y values each.
One way to do this is to concatenate the datasets and use an adjusted fit function.
In the following example, this happens inside the new fit function g with np.concatenate. The individual fits for each dataset are also done so we can compare their graph to the concatenated fit with the shared parameter.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
# Create example datasets
x = np.arange(4)
y = np.array([x + a + np.random.normal(0, 0.5, len(x)) for a in range(3)])
print("x =", x)
print("y =", y)
# Individual fits to each dataset
def f(x, a, b):
return a * x + b
for y_i in y:
(a, b), _ = curve_fit(f, x, y_i)
plt.plot(x, f(x, a, b), label=f"{a:.1f}x{b:+.1f}")
plt.plot(x, y_i, linestyle="", marker="x", color=plt.gca().lines[-1].get_color())
plt.legend()
plt.show()
# Fit to concatenated dataset with shared parameter
def g(x, a, b_1, b_2, b_3):
return np.concatenate((f(x, a, b_1), f(x, a, b_2), f(x, a, b_3)))
(a, *b), _ = curve_fit(g, x, y.ravel())
for b_i, y_i in zip(b, y):
plt.plot(x, f(x, a, b_i), label=f"{a:.1f}x{b_i:+.1f}")
plt.plot(x, y_i, linestyle="", marker="x", color=plt.gca().lines[-1].get_color())
plt.legend()
plt.show()
Output:
x = [0 1 2 3]
y = [[0.40162683 0.65320576 1.92549698 2.9759299 ]
[1.15804251 1.69973973 3.24986941 3.25735249]
[1.97214167 2.60206217 3.93789235 6.04590999]]
Individual fits with three different values for a:
Fit with shared parameter a:

Categories

Resources