Numdifftools to calculate lmfit fit uncertainties

Numdifftools to calculate lmfit fit uncertainties - python

I'm using lmfit to estimate the parameters of a coupled ODE system, based on thus example: https://people.duke.edu/~ccc14/sta-663/CalibratingODEs.html.
In order to obtain a global minimal of the residual I switched to use either the "basinhopping" or "ampgo" methods, but I get such warning when displaying the results:
Warning: uncertainties could not be estimated:
this fitting method does not natively calculate uncertainties
and numdifftools is not installed for lmfit to do this. Use
`pip install numdifftools` for lmfit to estimate uncertainties
with this fitting method.
I have installed "numdifftools" via conda, but the warning (and the lack of uncertainties) persists.
How can I solve this?
Here is the code with minimal data:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from lmfit import minimize, Parameters, Parameter, report_fit
from scipy.integrate import odeint
def f(xs, t, ps):
Ksor = ps['Ksor'].value
Kdes = ps['Kdes'].value
Cw, Cs, SS = xs
return [-Ksor*SS*Cw+Kdes*Cs, Ksor*SS*Cw-Kdes*Cs,0]
def g(t, x0, ps):
"""
Solution to the ODE x'(t) = f(t,x,k) with initial condition x(0) = x0
"""
x = odeint(f, x0, t, args=(ps,))
return x
def residual(ps, ts, data):
x0 = ps['Cw0'].value, ps['Cs0'].value, ps['SS0'].value
model = g(ts, x0, ps)
return (model - data).ravel()
data1=np.array([[100. , 0. , 1. ],
[ 66.5507, 33.4493, 1. ],
[ 44.4018, 55.5982, 1. ],
[ 29.7357, 70.2643, 1. ]])
t = pd.Series([0.408,0.816,1.224,1.632])
x0 = np.array([100,0,1])
# set parameters incluing bounds
params = Parameters()
params.add('Cw0', value=100, vary=False)
params.add('Cs0', value=0, vary=False)
params.add('SS0', value=1, vary=False)
params.add('Ksor', value=2.0, min=0, max=100)
params.add('Kdes', value=1.0, min=0, max=100)
# fit model and find predicted values
result = minimize(residual, params, args=(t, data1), method='basinhopping')
final = data1 + result.residual.reshape(data1.shape)
# plot data and fitted curves
plt.plot(t, data1, 'o')
plt.plot(t, final, '-', linewidth=2);
# display fitted statistics
report_fit(result)
EDIT: the code works. I think that the installation of numdifftools wasn't detected and a restart of the PC solved the issue.

It is always helpful to provide a minimal but complete example that shows the problem and the result (including fit report and/or any exceptions) that you get. Please modify the question to include these and give the version of lmfit and numdifftools you are using.
Also: The use of numdifftools to calculate the uncertainties requires the length of the residual array to be larger than the number of variables.

Related

Displaying chi squared as the uncertainty of fit parameters in scipy.optimize

I am doing some curve fitting in python with the aid of scipy.optimize curve_fit. Normally I am satisfied with the scipy's default results.
However this time I would like to display the function with chi_squared as the uncertainty of my fit parameters and I don't know how to deal with this.
I have tried to use the parameter absolute_sigma=True instead of the default absolute_sigma=False. However according to my separate calculations, for absolute_sigma=False the uncertainty of the parameters, is equal to reduced_chi_squared. But it isn't chi_squared for absolute_sigma=True.
Within the documentation itself: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html chi_squared is mentioned few time, however it's not written explicitly how to display it within the plot and how to use it as the uncertainties for of the fit parameters.
My code is as follows:
# Necessary libraries
# Numpy will be used for the actual function (in this example it's not necessary)
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from uncertainties import *
#Some shortened data
Wave_lengths_500_est = [380, 447.1, 486, 492.2, 656, 700, 706.5]
Avraged_deflection_angles_500 = [11.965, 12.8, 13.93, 14.325, 19.37, 18.26, 18.335]
#The function to which the data are to be fitted
def lin_function(x,a,b):
return a*x + b
def line_fit_lamp_1_500():
#Niepewność kątowa na poziomie 2 minut kątowych. Wyrażona w stopniach
angle_error = 0.03333
#Dodanie punktów danych na wykresie
plt.scatter(Wave_lengths_500_est, Avraged_deflection_angles_500, color='b')
plt.errorbar(Wave_lengths_500_est, Avraged_deflection_angles_500, yerr = angle_error, fmt = 'o')
#Fitting of the function "function_lamp1_500" to the data points
Wave_length = np.array(Wave_lengths_500_est)
Defletion_angle = np.array(Avraged_deflection_angles_500)
popt, pcov = curve_fit(lin_function, Wave_length, Defletion_angle, absolute_sigma=True)
perr = np.sqrt(np.diag(pcov))
y = lin_function(Wave_length, *popt)
# Graph looks
plt.plot(Wave_length, y, '--', color = 'g', label="fit with: $a={:.3f}\pm{:.5f}$, $b={:.3f}\pm{:.5f}$" .format(popt[0], perr[0], popt[1], perr[1]))
plt.legend()
plt.xlabel('Długość fali [nm]')
plt.ylabel('Kąt załamania [Stopnie]')
plt.show()
# Function call
line_fit_lamp_1_500()
Toggling between absolute_sigma=True/False I get a change of uncertainty of the parameter a\pm0.00308 and b\pm1.74571 for absolute_sigma=True to a\pm0.022 and b\pm1.41679 for absolute_sigma=False. Versus an expected value of a\pm0.0001027 and b\pm0.058132 for chi_squared and a\pm0.002503 and b\pm1.4167 for reduced_chi_squared
Additionally could you please elaborate what does the expression .format(popt[0], perr[0], popt[1], perr[1]) do?
All help is appreciated, thanks in advance.

Solving coupled differential equations with sympy

I am trying to solve the following system of first order coupled differential equations:
- dR/dt=k2*Y(t)-k1*R(t)*L(t)+k4*X(t)-k3*R(t)*I(t)
- dL/dt=k2*Y(t)-k1*R(t)*L(t)
- dI/dt=k4*X(t)-k3*R(t)*I(t)
- dX/dt=k1*R(t)*L(t)-k2*Y(t)
- dY/dt=k3*R(t)*I(t)-k4*X(t)
The knowed initial conditions are: X(0)=0, Y(0)=0
The knowed constants values are: k1=6500, k2=0.9
This equations defines a kinetick model and I need to solve them to get the Y(t) function to fit my data and find k3 and k4 values. In order to that, I have tried to solve the system simbologically with sympy. There is my code:
import matplotlib.pyplot as plt
import numpy as np
import sympy
from sympy.solvers.ode.systems import dsolve_system
from scipy.integrate import solve_ivp
from scipy.integrate import odeint
k1 = sympy.Symbol('k1', real=True, positive=True)
k2 = sympy.Symbol('k2', real=True, positive=True)
k3 = sympy.Symbol('k3', real=True, positive=True)
k4 = sympy.Symbol('k4', real=True, positive=True)
t = sympy.Symbol('t',real=True, positive=True)
L = sympy.Function('L')
R = sympy.Function('R')
I = sympy.Function('I')
Y = sympy.Function('Y')
X = sympy.Function('X')
f1=k2*Y(t)-k1*R(t)*L(t)+k4*X(t)-k3*R(t)*I(t)
f2=k2*Y(t)-k1*R(t)*L(t)
f3=k4*X(t)-k3*R(t)*I(t)
f4=-f2
f5=-f3
eq1=sympy.Eq(sympy.Derivative(R(t),t),f1)
eq2=sympy.Eq(sympy.Derivative(L(t),t),f2)
eq3=sympy.Eq(sympy.Derivative(I(t),t),f3)
eq4=sympy.Eq(sympy.Derivative(Y(t),t),f4)
eq5=sympy.Eq(sympy.Derivative(X(t),t),f5)
Sys=(eq1,eq2,eq3,eq4,eq5])
solsys=dsolve_system(eqs=Sys,funcs=[X(t),Y(t),R(t),L(t),I(t)], t=t, ics={Y(0):0, X(0):0})
There is the answer:
NotImplementedError:
The system of ODEs passed cannot be solved by dsolve_system.
I have tried with dsolve too, but I get the same.
Is there any other solver I can use or some way of doing this that will allow me to get the function for the fitting? I'm using python 3.8 in Spider with Anaconda in windows64.
Thank you!
# Update
Following
You are saying "experiment". So you have data and want to fit the model to them, find appropriate values for k3 and k4 at least, and perhaps for all coefficients and the initial conditions (the first measured data point might not be the initial condition for the best fit)? See stackoverflow.com/questions/71722061/… for a recent attempt on such a task. –
Lutz Lehmann
23 hours
There is my new code:
t=[0,0.25,0.5,0.75,1.5,2.27,3.05,3.82,4.6,5.37,6.15,6.92,7.7,8.47,13.42,18.42,23.42,28.42,33.42,38.42,43.42,48.42,53.42,58.42,63.42,68.42,83.4,98.4,113.4,128.4,143.4,158.4,173.4,188.4,203.4,218.4,233.4,248.4]
yexp=[832.49,1028.01,1098.12,1190.08,1188.97,1377.09,1407.47,1529.35,1431.72,1556.66,1634.59,1679.09,1692.05,1681.89,1621.88,1716.77,1717.91,1686.7,1753.5,1722.98,1630.14,1724.16,1670.45,1677.16,1614.98,1671.16,1654.03,1661.84,1675.31,1626.76,1638.29,1614.41,1594.31,1584.73,1599.22,1587.85,1567.74,1602.69]
def derivative(S, t, k3, k4):
k1=1798931
k2=0.2629
x, y,r,l,i = S
doty = k1*r*l+k2*y
dotx = k3*r*i-k4*x
dotr = k2*y-k1*r*l+k4*x-k3*r*i
dotl = k2*y-k1*r*l
doti = k4*x-k3*r*i
return np.array([doty, dotx, dotr, dotl, doti])
def solver(XY,t,para):
return odeint(derivative, XY, t, args = para, atol=1e-8, rtol=1e-11)
def integration(XY_arr,*para):
XY0 = para[:5]
para = para[5:]
T = np.arange(len(XY_arr))
res0 = solver(XY0,T, para)
res1 = [ solver(XY0,[t,t+1],para)[-1]
for t,XY in enumerate(XY_arr[:-1]) ]
return np.concatenate([res0,res1]).flatten()
XData =yexp
YData = np.concatenate([ yexp,yexp,yexp,yexp,yexp,yexp[1:],yexp[1:],yexp[1:],yexp[1:],yexp[1:]]).flatten()
p0 =[0,0,100,10,10,1e8,0.01]
params, info = curve_fit(integration,XData,YData,p0=p0, maxfev=5000)
XY0, para = params[:5], params[5:]
print(XY0,tuple(para))
t_plot = np.linspace(0,len(t),500)
x_plot = solver(XY0, t_plot, tuple(para))
But the output are not correct, as are the same as initial condition p0:
[ 0. 0. 100. 10. 10.] (100000000.0, 0.01)
Graph
I understand that the function 'integration' gives packed values of y for each function at each instant of time, but I don't know how to unpack them to make the curve_fitt separately. Maybe I don't quite understand how it works.
Thank you!

As you observed, sympy is not able to solve this system. This might mean that
the procedure to classify ODE in sympy is not complete enough, or
some trick/method is needed above the standard set of methods that is implemented in sympy, or
that there just is no symbolic solution.
The last case is the generic one, take a symbolically solvable ODE, add some random term, and almost certainly the resulting ODE is no longer symbolically solvable.
As I understand with the comments, you have an model via ODE system with state space (cX,cY,cR,cL,cI) with equations with 4 parameters k1,k2,k3,k4 and, by the structure of a reaction system R+I <-> X, R+L <-> Y, the sums cR+cX+cY, cL+cY, cI+cX are all constant.
For some other process that is approximately represented by the model, you have time series data t[k],y[k] for the Y component. Also you have partial information on the initial state and the parameter set. If there are sufficiently many data points one could also forget about these, fit for all parameters, and compare how far away the given parameters are to the computed ones.
There are several modules and packages that solve this fitting task in a more or less abstract fashion. I think pyomo and gekko can both be used. More directly one can use the facilities of scipy.odr or scipy.optimize.
Define the forward function that transforms time and parameters
def model(t,u,k1,k2,k3,k4):
X,Y,R,L,I = u
dL = k2*Y - k1*R*L
dI = k4*X - k3*R*I
dR = dL+dI
dX = -dI
dY = -dL
return dX,dY,dR,dL,dI
def solver(t,u0,k):
res = solve_ivp(model, [0, t[-1]], u0, args=tuple(k), t_eval=t,
method="DOP853", atol=1e-7, rtol=1e-10)
return res.y
Prepare some data plus noise
k1o = 6.500; k2o=0.9
T = np.linspace(0,0.05,21)
U = solver(T, [0,0,50,40,25], [k1o, k2o, 5.400, 0.7])
Y = U[1] # equilibrium slightly above 30
Y += np.random.uniform(high=0.05, size=Y.shape)
Prepare the function that splits the combined parameter vector in initial state and coefficients, call the curve fitting function
from scipy.optimize import curve_fit
def partial(t,R,L,I,k3,k4):
print(R,L,I,k3,k4)
U = solver(t,[0,0,R,L,I],[k1o,k2o,k3,k4])
return U[1]
params, info = curve_fit(partial,T,Y, p0=[30,20,10, 0.3,3.000])
R,L,I, k3,k4 = params
print(R,L,I, k3,k4)
It turns out that curve_fit goes into strange regions with large negative values. A likely reason is that the Y component is, in the end, not coupled strongly enough to all the other components, meaning that large changes in some of the parameters have minimal influence on Y, so that minimal noise in Y can lead to large deviations in these parameters. Here this apparently happens (first) to k3.

How do you get scipy curve_fit to find a reasonable result if you don't have a good initial parameter guess?

I am trying to get a simple fit to my data of an exponential decay of the form a*(x-x0)**b, where I know a and b must be negative. So if you plot it on a log-log plot, I should see a linear trend for the obtained data.
As such, I'm giving scipy.optimize initial guesses where a and b are negative, but it keeps ignoring them and giving me the error,
OptimizeWarning: Covariance of the parameters could not be estimated
.. and giving me values for a and b that are positive. It then also does not give me an exponential decay, but a parabola that bottoms out and begins to increase.
I have tried many different guesses as to the initial parameters over a large range of values (one such is in the code below), but none worked without giving me the nonsensical return and the error. This has made me start to wonder if my code is wrong, or if there's just some obvious way to get good initial guesses into the code that won't be rejected.
import math
import numpy as np
import sys
import matplotlib.pyplot as plt
import scipy as sp
import scipy.optimize
from scipy.optimize import curve_fit
import numpy.polynomial.polynomial as poly
x= [1987, 1993.85, 2003, 2010.45, 2009.3, 2019.4]
t= [31, 8.6, 4.84, 1.96, 3.9, 1.875]
def model_func(x, a, b, x0):
return (a*(x-x0)**b)
# curve fit
p0 = (-.0005,-.0005,100)
opt, pcov = curve_fit(model_func, x, t,p0)
a, b, x0 = opt
# test result
x2 = np.linspace(1980, 2020, 100)
y2 = model_func(x2, a, b,x0)
coefs, cov = poly.polyfit(x, t, 2,full=True)
ffit = poly.polyval(x2, coefs)
plt.loglog(x,t,'.')
plt.loglog(x2, ffit,'--', color="#1f77b4")
print('S = (',coefs[0],'*(t-',coefs[2],')^',coefs[1])

Lognorm distribution fitting

I am trying to do a lognorm distribution fit but the resulting paramter seem a bit odd. Could you please show me my mistake or explain to me if I am misinterpreting the parameters.
import numpy as np
import scipy.stats as st
data = np.array([1050000, 1100000, 1230000, 1300000, 1450000, 1459785, 1654000, 1888000])
s, loc, scale = st.lognorm.fit(data)
#calculating the mean
lognorm_mean = st.lognorm.mean(s = s, loc = loc, scale = scale)
The resulting mean is: 945853602904015.8.
But this doesn't make any sense.
The mean should be:
data_ln = np.log(data)
ln_mean = np.mean(data_ln)
ln_std = np.std(data_ln)
mean = np.exp(ln_mean + np.power(ln_std, 2)/2)
Here the resulting mean is 1391226.31. This should be correct.
Can you please help me with this topic?
Best regards
Norbi

I think you can tune the parameters of the minimizer to get an acceptable result:
import numpy as np
import scipy.stats as st
from scipy.optimize import minimize
data = np.array([1050000, 1100000, 1230000, 1300000,
1450000, 1459785, 1654000, 1888000])
def opti_wrap(fun, x0, args, disp=0, **kwargs):
return minimize(fun, x0, args=args, method='SLSQP',
tol=1e-12, options={'maxiter': 1000}).x
s, loc, scale = st.lognorm.fit(data, optimizer=opti_wrap)
lognorm_mean = st.lognorm.mean(s=s, loc=loc, scale=scale)
print(lognorm_mean) # should give 1392684.4350
The reason you are seeing a strange result is due to the default minimizer failing to converge on the maximum likelihood result. This could be due to a mis-behaving cost function with so few data points (you are trying to fit 3 params but only have 8 data points...). Note: I'm using scipy version 1.1.0.

Non-linear fit in Python 3.3

I would like to know how to do a non-linear fit in Python 3.3. I am not finding any easy examples online. I am not well aware of these fitting techniques.
Any help will be welcome!
Thanks in advance.

To follow an easy example, visit http://www.walkingrandomly.com/?p=5215
Here is the code with explanations!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.69,0.70,0.69,1.0,1.9,2.4,1.9,0.9,-0.7,-1.4])
def func(x, p1,p2):
return p1*np.cos(p2*x) + p2*np.sin(p1*x)
# Here you give the initial parameters for p0 which Python then iterates over
# to find the best fit
popt, pcov = curve_fit(func,xdata,ydata,p0=(1.0,0.3))
print(popt) # This contains your two best fit parameters
# Performing sum of squares
p1 = popt[0]
p2 = popt[1]
residuals = ydata - func(xdata,p1,p2)
fres = sum(residuals**2)
print(fres)
xaxis = np.linspace(-2,3,100) # we can plot with xdata, but fit will not look good
curve_y = func(xaxis,p1,p2)
plt.plot(xdata,ydata,'*')
plt.plot(xaxis,curve_y,'-')
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numdifftools to calculate lmfit fit uncertainties - python

Related

Displaying chi squared as the uncertainty of fit parameters in scipy.optimize

Solving coupled differential equations with sympy

How do you get scipy curve_fit to find a reasonable result if you don't have a good initial parameter guess?

Lognorm distribution fitting

Non-linear fit in Python 3.3

Categories

Resources