Fitting the curve on the gaussian - python

What is the problem on my code for fitting the curve?
I've written some code for fitting my data based on Gaussian distribution. However, I got some wrong value of a, b, c defined at the beginning of the code. Could you give me some advice to fix that problem?
from numpy import *
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*exp(-(x-b)**2/(2*c**2))
file = loadtxt("angdist1.xvg", skiprows = 18, dtype = float)
x = []
y = []
for i in range(shape(file)[0]):
x.append(file[i,0])
y.append(file[i,1])
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), color = 'red', linewidth=2)
plt.legend(['Original','fitting'], loc=0)
plt.show()

You did not provide initial guesses for your variables a, b, and c. scipy.optimize.curve_fit() will make the indefensible choice of silently assuming that you wanted initial values of a=b=c=1. Depending on your data, that could be so far off as to prevent the method from finding any solution at all.
The solution is to give initial values for the variables that are close. They don't have to be perfect. For example,
ainit = y.sum() # amplitude is within 10x of integral
binit = x.mean() # centroid is near mean x value
cinit = x.std() # standard deviation is near range of data
popt, pcov = curve_fit(func, x, y, [ainit, binit, cinit])
might give you a better result.

Related

exponential curve fit parameters in python do not make sense--fit itself looks great

I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]

Curve fit in a log-log plot in matplotlib and getting the equation of line

I am trying to draw a best fit curve for my data. It is terribly bad sample of data, but for simplicity's sake let's say, I expect to draw a straight line as a best fit in log-log scale.
I think I already did that with regression and it returns me a reasonable fit line. But I want to double check it with curve fit function in scipy. And I also want to extract the equation of the fit line.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.optimize as optimization
x = np.array([ 1.72724547e-08, 1.81960233e-08, 1.68093027e-08, 2.22839973e-08,
2.23090589e-08, 4.28020801e-08, 2.30004711e-08, 2.48543008e-08,
1.08633065e-07, 3.24417303e-08, 3.22946248e-08, 3.82328031e-08,
3.97713860e-08, 3.44080732e-08, 3.81526816e-08, 3.30756706e-08
])
y = np.array([ 4.18793565e+12, 4.40554864e+12, 4.48745390e+12, 4.50816705e+12,
4.57088190e+12, 4.60256574e+12, 4.66659380e+12, 4.79733449e+12, 7.31139083e+12, 7.53355564e+12, 8.03526122e+12, 8.14704284e+12,
8.47227414e+12, 8.62978548e+12, 8.81048873e+12, 9.46237161e+12
])
# Regression Function
def regress(x, y):
"""Return a tuple of predicted y values and parameters for linear regression."""
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
allx, ally = x, y # data, non-transformed
y_pred, _ = regress(np.log(allx), np.log(ally)) # change here # transformed input
plt.loglog(allx, ally, marker='$\\star$',color ='g', markersize=5,linestyle='None')
plt.loglog(allx, np.exp(y_pred), "c--", label="regression") # transformed output
# Let's fit an exponential function.
# This looks like a line on a lof-log plot.
def myExpFunc(x, a, b):
return a * np.power(x, b)
popt, pcov = curve_fit(myExpFunc, x, y, maxfev=1000)
plt.plot(x, myExpFunc(x, *popt), 'r:',
label="({0:.3f}*x**{1:.3f})".format(*popt))
print "Exponential Fit: y = (a*(x**b))"
print "\ta = popt[0] = {0}\n\tb = popt[1] = {1}".format(*popt)
plt.show()
Again I apologize for a bad dataset. your help will be very appreciated.
My plot looks like this:
enter code here

Optimal parameters not found for my curve fitting

Hello I have a problem to fit some data with Python. I just begin to fit my data with Python so I have some problems... This is my code :
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import *
from numpy import linalg as LA
def f(x,a,b,c):
return a*np.power(x,b)+c
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79])
y = np.array([7200,7925,8050,8200,8000,7550,7500,6800,6400,8150,6566,6280,6105,5963,5673,5495,5395,4800,4550,4558,4228,4087,3951,3817,3721,3612,3498,3416,3359,3269,3163,3241,2984,4475,2757,2644,2555,2600,3163,2720,2630,2543,2454,2441,2389,2339,2293,2261,2212,2180,2143,2450,2065,2032,1994,1960,1930,1897,1870,1838,1821,1785,1763,1741,1718,1689,1676,1662,1635,1635,1667,1633,1617,1615,1599,1581,1565,1547,1547])
params, extras = curve_fit(f, x, y)
plt.plot(x,y, 'o')
plt.plot(x, f(x, params[0], params[1], params[2]))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
And actually I want to fit my data with a function f(x) = a*x^b + c where I am looking for the best values of a, b and c to fit my data.
Do you know where there is something which is wrong ?
Thank you for your help.
Three caveats :
your model is not very good.
it diverge in x=0 : don't take first points.
you must give initial parameter estimations.
An exemple:
p0=[50000,-1,0]
x=x[10:]
y=y[10:]
params, cov = curve_fit(f, x, y,p0) #params=[3.16e+04 -5.83e-01 -1.00e+03]
plt.plot(x,y, 'o')
plt.plot(x, f(x, *params))
plt.title('Fit')
plt.legend(['data','fit'],loc='best')
plt.show()
You can estimate the quality of the model by
In [178]: np.sqrt(np.diag(cov))/params
Out[178]: array([ 0.12066005, -0.12537714, -0.53450057])
which shows that the estimation of error on parameters is greater than 10%.
The problem is the function you use for fitting. Consider using something like
def f(x, a, b, c):
return a*x + b*np.power(x, 2) + c
EDIT: accidentally posted the original function instead of the one I wanted to suggest.

scipy.optimize.curve_fit doesn't fit properly to the data

I am trying to fit my data with a gaussian curve. Here is my code :
import numpy as np
from scipy import optimize
# The independent variable where the data is measured
x_coord = np.array([-0.1216 , -0.11692308, -0.11224615, -0.10756923, -0.10289231,
-0.09821538, -0.09353846, -0.08886154, -0.08418462, -0.07950769,
-0.07483077, -0.07015385, -0.06547692, -0.0608 , -0.05612308,
-0.05144615, -0.04676923, -0.04209231, -0.03741538, -0.03273846,
-0.02806154, -0.02338462, -0.01870769, -0.01403077, -0.00935385,
-0.00467692, 0. , 0.00467692, 0.00935385, 0.01403077,
0.01870769, 0.02338462, 0.02806154, 0.03273846, 0.03741538,
0.04209231, 0.04676923, 0.05144615, 0.05612308, 0.0608 ,
0.06547692, 0.07015385, 0.07483077, 0.07950769, 0.08418462,
0.08886154, 0.09353846, 0.09821538, 0.10289231, 0.10756923,
0.11224615, 0.11692308])
# The dependent data — nominally f(x_coord)
y = np.array([-0.0221931 , -0.02323915, -0.02414913, -0.0255389 , -0.02652465,
-0.02888672, -0.03075954, -0.03355392, -0.03543005, -0.03839526,
-0.040933 , -0.0456585 , -0.04849097, -0.05038776, -0.0466699 ,
-0.04202133, -0.034239 , -0.02667525, -0.01404582, -0.00122683,
0.01703862, 0.03992694, 0.06704549, 0.11362071, 0.28149172,
0.6649422 , 1. , 0.6649422 , 0.28149172, 0.11362071,
0.06704549, 0.03992694, 0.01703862, -0.00122683, -0.01404582,
-0.02667525, -0.034239 , -0.04202133, -0.0466699 , -0.05038776,
-0.04849097, -0.0456585 , -0.040933 , -0.03839526, -0.03543005,
-0.03355392, -0.03075954, -0.02888672, -0.02652465, -0.0255389 ,
-0.02414913, -0.02323915])
# define a gaussian function to fit the data
def gaussian(x, a, b, c):
val = a * np.exp(-(x - b)**2 / c**2)
return val
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma = np.array([0.01] * len(x_coord)))
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, popt[0], popt[1], popt[2]), 'r:')
the figure shows that the fitting curve is completely wrong :
What should I do to obtain a well fitting curve?
This is actually a very nice question that illustrates that finding the right (local) optimum can be very difficult.
Via the p0 argument you can give the optimization routine a hint, where approximately you would expect the optimum.
If you start with the initial guess of [1,0,0.1]:
# fit the data
sigma = np.array([0.01] * len(x_coord))
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma=sigma, p0=[1,0,0.1])
You get following result:
A couple of notes: You forced curve_fit to fit a bell curve without a constant term. This made things a little awkward.
If you allow an offset d, you get:
# define a gaussian function to fit the data
def gaussian(x, a, b, c, d):
val = a* np.exp(-(x - b)**2 / c**2) + d
return val
And obtain following result:
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y)
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, *popt), 'r:')
Which looks much more like a reasonable fit. Although it seems that the gaussian does not fit the data well.
The very peaked shape suggests that a Laplacian might fit better:
# define a laplacian function to fit the data
def laplacian(x, a, b, c, d):
val = a* np.exp(-np.abs(x - b) / c) + d
return val
# fit the data
popt, pcov = optimize.curve_fit(laplacian, x_coord, y, p0=[1,0,0.01,-0.1])
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, laplacian(x_coord, *popt), 'r:')
This is the result:

How to correctly include uncertainties in fitting with python

I am trying to fit some data points with y uncertainties in python. The data are labeled in python as x,y and yerr.
I need to do a linear fit on that data in loglog scale. As a reference if the fit results are properly, i compare the python results with the ones from Scidavis
I tried curve_fit with
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
popt, pcov = curve_fit(func, x, y,sigma=yerr)
as well as kmpfit with
def funcL(p, x):
a,b = p
return ( np.exp(a*np.log(x)+np.log(b)) )
def residualsL(p, data):
a,b=p
x, y, errorfit = data
return (y-funcL(p,x)) / errorfit
a0=1
b0=0.1
p0 = [a0,b0]
fitterL = kmpfit.Fitter(residuals=residualsL, data=(x,y,yerr))
fitterL.parinfo = [{}, {}]
fitterL.fit(params0=p0)
and when i am trying to fit the data with one of those without uncertainties (i.e setting yerr=1), everything works just fine and the results are identical with the ones from scidavis. But if i set yerr to the uncertainties of the data file i get some disturbing results.
In python i get i.e. a=0.86 and in scidavis a=0.14. I read something about that the errors are included as weights. Do i have to change anything, in order to calculate the fit correctly? Or what am i doing wrong?
edit: here is an example of a data file (x,y,yerr)
3.942387e-02 1.987800e+00 5.513165e-01
6.623142e-02 7.126161e+00 1.425232e+00
9.348280e-02 1.238530e+01 1.536208e+00
1.353088e-01 1.090471e+01 7.829126e-01
2.028446e-01 1.023087e+01 3.839575e-01
3.058446e-01 8.403626e+00 1.756866e-01
4.584524e-01 7.345275e+00 8.442288e-02
6.879677e-01 6.128521e+00 3.847194e-02
1.032592e+00 5.359025e+00 1.837428e-02
1.549152e+00 5.380514e+00 1.007010e-02
2.323985e+00 6.404229e+00 6.534108e-03
3.355974e+00 9.489101e+00 6.342546e-03
4.384128e+00 1.497998e+01 2.273233e-02
and the result:
in python:
without uncertainties: a=0.06216 +/- 0.00650 ; b=8.53594 +/- 1.13985
with uncertainties: a=0.86051 +/- 0.01640 ; b=3.38081 +/- 0.22667
in scidavis:
without uncertainties: a = 0.06216 +/- 0.08060; b = 8.53594 +/- 1.06763
with uncertainties: a = 0.14154 +/- 0.005731; b = 7.38213 +/- 2.13653
I must be misunderstanding something. Your posted data does not look anything like
f(x,a,b) = np.exp(a*np.log(x)+np.log(b))
The red line is the result of scipy.optimize.curve_fit,
the green line is the result of scidavis.
My guess is that neither algorithm is converging toward a good fit, so it is not surprising that the results do not match.
I can't explain how scidavis finds its parameters, but according to the definitions as I understand them, scipy is finding parameters with lower least squares residuals than scidavis:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as optimize
def func(x, a, b):
return np.exp(a* np.log(x)+np.log(b))
def sum_square(residuals):
return (residuals**2).sum()
def residuals(p, x, y, sigma):
return 1.0/sigma*(y - func(x, *p))
data = np.loadtxt('test.dat').reshape((-1,3))
x, y, yerr = np.rollaxis(data, axis = 1)
sigma = yerr
popt, pcov = optimize.curve_fit(func, x, y, sigma = sigma, maxfev = 10000)
print('popt: {p}'.format(p = popt))
scidavis = (0.14154, 7.38213)
print('scidavis: {p}'.format(p = scidavis))
print('''\
sum of squares for scipy: {sp}
sum of squares for scidavis: {d}
'''.format(
sp = sum_square(residuals(popt, x = x, y = y, sigma = sigma)),
d = sum_square(residuals(scidavis, x = x, y = y, sigma = sigma))
))
plt.plot(x, y, 'bo', x, func(x,*popt), 'r-', x, func(x, *scidavis), 'g-')
plt.errorbar(x, y, yerr)
plt.show()
yields
popt: [ 0.86051258 3.38081125]
scidavis: (0.14154, 7.38213)
sum of squares for scipy: 53249.9915654
sum of squares for scidavis: 239654.84276

Categories

Resources