Can't get the fit with lmfit - python

I want to do a fit using lmfit but I am having some issues. Here is my code:
from lmfit import Model
import numpy as np
def fit_func(x,a,b,c):
return a*(b-x)**(5/8)+c
x = np.array([ 131.871 , 218.825 , 305.046 , 390.533 ,
475.128 , 558.959 , 642.001 , 724.307 ,
805.794 , 886.422 , 966.20900001, 1045.19300001,
1123.39300001, 1200.75800001, 1277.23700001, 1352.83300001,
1427.57800001, 1501.49800001, 1574.55300001, 1646.69500001,
1717.90800001, 1788.22100001, 1857.65100001, 1926.18300001,
1993.76400001, 2060.37000001, 2126.00900001, 2190.70600001,
2254.44800001, 2317.20000001, 2378.92000001, 2439.60300001,
2499.25800001, 2557.89000001, 2615.46600001, 2671.95000001,
2727.30900001, 2781.54300001, 2834.64700001, 2886.60600001,
2937.38000001, 2986.92900001])
y = np.array([ 0. , 3.14159265, 6.28318531, 9.42477796,
12.56637061, 15.70796327, 18.84955592, 21.99114858,
25.13274123, 28.27433388, 31.41592654, 34.55751919,
37.69911184, 40.8407045 , 43.98229715, 47.1238898 ,
50.26548246, 53.40707511, 56.54866776, 59.69026042,
62.83185307, 65.97344573, 69.11503838, 72.25663103,
75.39822369, 78.53981634, 81.68140899, 84.82300165,
87.9645943 , 91.10618695, 94.24777961, 97.38937226,
100.53096491, 103.67255757, 106.81415022, 109.95574288,
113.09733553, 116.23892818, 119.38052084, 122.52211349,
125.66370614, 128.8052988 ])
fit_model = Model(fit_func)
params = fit_model.make_params()
params['b'].set(5000, min=3500)
result = fit_model.fit(y, x=x)
But I am getting this error:
ValueError: The model function generated NaN values and the fit aborted! Please check your model function and/or set boundaries on parameters where applicable. In cases like this, using "nan_policy='omit'" will probably not work.
What am I doing wrong? I tried to adjust the a, b, c parameters by hand and a=-1.2, b=3600, c=196 give a pretty good fit, so the program should be able to find something similar to that.

Two things are missing:
a) you need to pass params to fit_model.fit() as with
result = fit_model.fit(y, params, x=x)
b) you need to give initial values for all parameters. Un-initialized parameters will have a value of -np.inf, which is deliberately chosen because it will throw such errors.
You say you know reasonable values for a, b, and c. Use that knowledge! Something like
fit_model = Model(fit_func)
params = fit_model.make_params(a=-1, b=4000, c=200)
params['b'].min = x.max() * (1.000001) # prevent (negative number)**fraction
result = fit_model.fit(y, params, x=x)
print(result.fit_report())
should work.

Related

Simultaneous Fit of Two ODE's to data in Scipy

NOTE: Fit differential equation with scipy has been tried and a few other answers also. None seem to work.
I have two data sets d1 & d2 which I am trying to fit with two coupled ODE's (solver given below). d1 and d2 correspond to lab data for an experiment control & treatment.
When I do the fit for the first ODE (only control case), it works fine and I get the desired parameters. When I do the same for the treatment case, using the fit parameters obtained from the control case, my code doesn't seem to optimize anything whatsoever.
from scipy.integrate import odeint
import scipy
d1 = [113.75981939, 224.732254 , 437.00727486, 533.3249591 ,900.19498288, 1460.34662166, 2276.34857406, 3288.90246842,3888.70188293, 5102.45452895]
d2 = [118.69478959, 201.30146742, 287.50835473, 437.70461121,511.9610845, 982.88626039, 1115.37610645, 1235.95872766,1622.57717685, 1776.95184626]
time = [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
'''
#ODE's
dS/dt = r*S*(1-(S/K))-(kappa*S*T)
dT/dt = a*S-d*T
'''
#Curve Fit Section
#First get r,K from control data, i.e, case where kappa = 0
def func(y,t,r,K):
S = y
dydt = r*S*(1-(S/K))
return dydt
y0 = [100]
t = time
guess = [0.3,5000] #[guess_r,guess_K]
def fit(params):
r,K = params
test_solve = odeint(func,y0,t,args=(r,K))
return np.linalg.norm(test_solve[:,0]-d1)
res = scipy.optimize.minimize(fit,np.array(guess))
r,K = res.x #Returns the r, K parameter values that fit the control data perfectly.
#Curve fit for treatment case
#ODE Solver for Control and Treatment Model
def func(y,t,r,K,a,kappa,d):
S,T = y
dydt = [r*S*(1-(S/K))-(kappa*S*T),a*S-d*T]
return dydt
y0 = [100,0]
t = time
guess_t = [r,K,2,3,0.01]
#Fitting the experimental data set
def fit(params):
r,K,a,d,kappa = params
test_solve = odeint(func,y0,t,args=(r,K,a,d,kappa))
return np.linalg.norm(test_solve[:,0]-d2)
res2 = scipy.optimize.minimize(fit,np.array(guess_t))
result - res2.x is the same as the guess_t array - no change,i.e, no fit
When I try the second fit after using the parameters obtained from the first, I get no meaningful result. It doesn't work. What am I doing wrong here?
EDIT : The parameter values one gets (from a Matlab fit) - r=0.2629, K=7625.2, a=7.845, d=189.49. k =0.0026. The code above returns very similar values for r & K but not for the other 3 parameters (a,kappa,d). Not sure what is happening.
EDIT 2: I keep getting this error. Any idea what is going wrong?
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/scipy/integrate/odepack.py:247: ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
warnings.warn(warning_msg, ODEintWarning

scipy.odr fails in fitting exponential function

I'm working on an astrophysics project where I need to measure the density(ne) of the gas in the center of the galaxy by two methods(A and S). I made a plot of ne_s x ne_a and I want to try an exponential fit in this plot. The problems are the following:
the errors in the data are asymmetrical and, apparently, scipy.odr does not accept this type of error. When the erros are included 'ValueError: could not convert we to a suitable array' is raised.
even if I do not include the errors the fit still does not work.
The code used(errors in the fit not included):
import numpy as np
import matplotlib.pyplot as plt
ne_s = np.array([ 134.70722125, 316.27850769, 403.37221974, 579.91067991,
1103.06258335, 1147.23685549, 115.00820933, 476.42659337,
667.61690967, 403.30988606, 282.08007264, 479.98058352,
897.64247885, 214.75999934, 213.22512064, 491.81749573,
743.68513419, 374.37957281, 362.136037 , 893.88595455])
dne_s_max = np.array([23.6619623 , 5.85802097, 12.02456923, 1.50211648, 5.15987014,
10.3830146 , 10.5274528 , 0.82928872, 2.18586603, 31.95014727,
6.53134179, 2.38392559, 32.2838402 , 5.43629034, 1.02316579,
6.60281602, 14.53943481, 9.16809221, 6.84052648, 12.87655997])
dne_s_min = np.array([21.94513608, 5.80578938, 11.8303456 , 1.49856527, 5.1265976 ,
10.2523836 , 10.12663739, 0.82824884, 2.17914616, 30.55846643,
6.45691351, 2.37446669, 30.87025015, 5.37271061, 1.02087355,
6.5358395 , 14.21332643, 9.0523711 , 6.77187898, 12.64596461])
ne_a = np.array([ 890.61498788, 2872.03715706, 10222.33463389, 1946.48193766,
6695.25304235, 2107.36471192, 891.72010662, 3988.87511761,
11328.9670489 , 1097.38904905, 2896.62668843, 4849.57809801,
5615.96780935, 1415.18564794, 1204.00022768, 3616.05423907,
15638.52683391, 3300.6039601 , 775.28841051, 12325.54379524])
dne_a_max = np.array([1082.33639266, 571.57094375, 2396.39839075, 458.32058555,
796.79916236, 665.95370946, 2262.73423374, 1006.65192577,
1761.9251987 , 1718.78400914, 579.65477159, 245.54811362,
1652.50314639, 401.37677822, 178.03620792, 725.26490794,
6625.62353545, 908.21490446, 719.01117673, 2098.24809312])
dne_a_min = np.array([ 865.33019015, 518.08880981, 1877.85283954, 412.91242092,
724.38681574, 582.52644162, 870.14392196, 866.63643893,
1478.1792513 , 1076.64135559, 521.08794554, 236.2457763 ,
1349.36104495, 362.72343267, 169.23314057, 646.39803115,
4139.5768453 , 789.04878324, 620.55523654, 1720.06369942])
dne_a = [dne_a_min, dne_a_max]
dne_s = [dne_s_min, dne_s_max]
fig, ax = plt.subplots(1,1)
ax.errorbar(ne_s, ne_a, xerr = dne_s, yerr = dne_a,
linestyle = 'none', linewidth = 0.7, capsize = 5, color = 'crimson')
ax.scatter(ne_s, ne_a, s = 15, color = 'black')
ax.set_ylabel('$n_e(A)$'), ax.set_xlabel('$n_e(S)$')
from scipy.odr import Data, RealData, Model, ODR
def f(B, x):
return B[0] + B[1] * np.exp(B[2] * x)
exponential = Model(f)
data = RealData(ne_s, ne_a)
odr = ODR(data, exponential, beta0=[1, 200, 3e-3])
out = odr.run()
ax.plot(ne_s, f(out.beta, ne_s), linewidth = 0.7)
Which results in:
And the actual plot is:
So what am I missing here? Did I applied the odr routine erroneously? What should I do to make the fit work properly? And how to make scipy.odr accept asymmetrical error?
Important to add that I don't know too much about scipy.odr, I just adapted the documentation example to an exponential function.
Appreciate the help. Let me know if more information is necessary.

curve_fit making called func raise an IndexError

I'm trying to fit a parameter eta_H in function TGp_xx to some data (x_data, data_num_xx) using curve_fit. Now, the code below is a reduced version of what I'm using and it won't work by itself, but I hope the issue is conceptual enough to be understandable even from this
from scipy.optimize import curve_fit
Lx = 150
y_cut = 20
data = np.loadtxt("../dump/results.dat")
ux = data[:,3]
ux = np.reshape(ux , (Ly, Lx))
def Par_x(x,y,vec):
fdx = vec[(x+1)%Lx , y]
fsx = vec[(x-1+Lx)%Lx , y]
return (fdx - fsx) / 2.0
def TGp_xx(x, eta_H): return 2*eta_H*Par_x(x,y_cut,ux)
x_data = np.arange(Lx, dtype=np.int)
data_num_xx = np.empty(Lx, dtype='float64') #this is just a placeholder
popt_xx, pcov_xx = curve_fit(TGp_xx, x_data, data_num_xx)
I get an IndexError raised within Par_x:
fdx = vec[(x+1)%Lx , y]
IndexError: arrays used as indices must be of integer (or boolean) type
I tried something simpler like calling TGp_xx(x_data, some_constant) outside curve_fit, and it works. I don't really get why inside curve_fit i get the IndexError, as if I'm passing a float value (or an array of floats) as x, that can't be used as an index.

Lognormal Curve Fit

I have lognormal distributed data in x0 and y0 arrays:
x0.ravel() = array([19.8815 , 19.0141 , 18.1857 , 17.3943 , 16.6382 , 15.9158 ,
15.2254 , 14.5657 , 13.9352 , 13.3325 , 12.7564 , 12.2056 ,
11.679 , 11.1755 , 10.6941 , 10.2338 , 9.79353, 9.37249,
8.96979, 8.58462, 8.21619, 7.86376, 7.52662, 7.20409,
6.89552, 6.6003 , 6.31784, 6.04757, 5.78897, 5.54151,
5.30472, 5.07812, 4.86127, 4.65375, 4.45514, 4.26506,
4.08314, 3.90903, 3.74238, 3.58288, 3.4302 , 3.28407,
3.14419, 3.01029, 2.88212, 2.75943, 2.64198, 2.52955,
2.42192, 2.31889, 2.22026, 2.12583, 2.03543, 1.94889,
1.86604, 1.78671, 1.71077, 1.63807, 1.56845, 1.50181,
1.43801, 1.37691, 1.31842, 1.26242, 1.2088 , 1.15746,
1.10832, 1.06126, 1.01619])
y0.ravel() =array([1.01567e+03, 8.18397e+02, 7.31992e+02, 1.11397e+03, 2.39987e+03,
2.73762e+03, 4.65722e+03, 7.06308e+03, 9.67945e+03, 1.38983e+04,
1.98178e+04, 1.97461e+04, 3.28070e+04, 4.48814e+04, 5.80853e+04,
7.35511e+04, 8.94090e+04, 1.08274e+05, 1.28276e+05, 1.50281e+05,
1.69258e+05, 1.91944e+05, 2.16416e+05, 2.37259e+05, 2.57426e+05,
2.74818e+05, 2.90343e+05, 3.01369e+05, 3.09232e+05, 3.13713e+05,
3.17225e+05, 3.19177e+05, 3.17471e+05, 3.14415e+05, 3.08396e+05,
2.95692e+05, 2.76097e+05, 2.52075e+05, 2.29330e+05, 1.97843e+05,
1.74262e+05, 1.46360e+05, 1.20599e+05, 9.82223e+04, 7.80995e+04,
6.34618e+04, 4.77460e+04, 3.88737e+04, 3.23715e+04, 2.58129e+04,
2.15724e+04, 1.58737e+04, 1.13006e+04, 7.64983e+03, 4.64590e+03,
3.31463e+03, 2.40929e+03, 3.02183e+03, 1.47422e+03, 1.06046e+03,
1.34875e+03, 8.26674e+02, 9.53167e+02, 6.47428e+02, 9.83651e+02,
8.93673e+02, 1.23637e+03, 0.00000e+00, 8.36573e+01])
I want to use curve_fit to get an function, that fits my data points, to gain the mu (and then the exp(mu) for the median) and the sigma of this distribution.
import numpy as np
from scipy.optimize import *
def f(x, mu, sigma) :
return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)-
mu)**2)/(2*sigma**2))
params, extras = curve_fit(f, x0.ravel(), y0.ravel())
print "mu=%g, sigma=%g" % (params[0], params[1])
plt.plot(x0, y0, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
The result is the following:
mu=1.47897, sigma=0.0315236
Curve_fit
Obviously the function does not fit the data by any means.
When i multiply the fitting function by, lets say 1.3*10^(5) in the code:
plt.plot(x0, 1.3*10**5*f(x0 ,params[0], params[1]))
This is the result:
manually changed fitting curve
The calculated mu value, which is the mean value of the related normal distribution seems right, because when im using:
np.mean(np.log(x))
i get 1.4968838412183132, which is quite similar to the mu i obtain from curve_fit.
Calculating the median with
np.exp(np.mean(np.log(x))
gives 4.4677451525990675, which seems to be ok.
But unless i see the fitting function going threw my datapoints, i do not really trust these numbers. My problem is obviously, that the fitting function has no information of the (big) y0 values. How can i change that?
Any help apreciated!
The problem is, that your data are not(!) showing a lognormal pdf, since they are not normalized properly. Note, that the integral over a pdf has to be 1. If you numerically integrate your data and nomalize by that, e.g.
y1 = y0/np.trapz(x0, y0)
your approach works fine.
params, extras = curve_fit(f, x0, y1)
plt.plot(x0, y1, "o")
plt.plot(x0, f(x0 ,params[0], params[1]))
plt.legend(["data", "fit"], loc="best")
plt.show()
and
print("mu=%g, sigma=%g" % (params[0], params[1]))
resulting in
mu=1.80045, sigma=0.372185

using python to do 3-D surface fitting

i can use module(scipy.optimize.least_squares) to do 1-D curve fitting(of course,i can also use curve_fit module directly) , like this
def f(par,data,obs):
return par[0]*data+par[1]-obs
def get_f(x,a,b):
return x*a+b
data = np.linspace(0, 50, 100)
obs = get_f(data,3.2,2.3)
par = np.array([1.0, 1.0])
res_lsq = least_squares(f, par, args=(data, obs))
print res_lsq.x
i can get right fitting parameter (3.2,2.3),but when I generalize this method to multi-dimension,like this
def f(par,data,obs):
return par[0]*data[0,:]+par[1]*data[1,:]-obs
def get_f(x,a,b):
return x[0]*a+b*x[1]
data = np.asarray((np.linspace(0, 50, 100),(np.linspace(0, 50, 100)) ) )
obs = get_f(data,1.,1.)
par = np.array([3.0, 5.0])
res_lsq = least_squares(f, par, args=(data, obs))
print res_lsq.x
I find i can not get right answer, i.e (1.,1.),i have no idea whether i have made a mistake.
The way you generate data and observations in the "multi-dimensional" case effectively results in get_f returning (a+b)*x[0] (input values x[0], x[1] are always the same) and, similarly, f returning (par[0]+par[1])*data[0]-obs. Of course, with a=1 and b=1, the exact same obs would be generated by any other values a, b such that a+b=1. Scipy correctly returns one of the (infinite) possible values satisfying this constraint, depending on the initial estimate.

Categories

Resources