I'm trying to fit data with a Gaussian.
The raw data itself displays a very obvious peak.
When I attempt fitting using curve_fit, the fit identifies the peak but it does not have a curved top.
I am trying to fit the data now with spinmob's fitter as well. However, this fitting just gives a straight line.
I've tried changing several parameters of the fitter, the Gaussian function definition, and the initial parameters for the fit but nothing seems to work.
Here is the code:
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import spinmob as s
x = x30
y = ydata
def gaussian(x, A, mu, sig): # See http://mathworld.wolfram.com/GaussianFunction.html
return A/(sig * np.sqrt(2*np.pi)) * np.exp(-np.power(x-mu, 2) / (2 * np.power(sig, 2)))
popt,pcov = curve_fit(gaussian,x,y,p0=[1,7.688,0.005])
FWHM = 2*np.sqrt(2*np.log(2))*popt[2]
print("FWHM: {}".format(FWHM))
plt.plot(x,y,'bo',label='data')
plt.plot(x,gaussian(x,*popt),'r+-',label='fit')
plt.legend()
fitter = s.data.fitter()
fitter.set(subtract_bg=True, plot_guess_zoom=True)
fitter.set_functions(f=gaussian, p='A=1,mu=8.688,sig=0.001')
fitter.set_data(x, y, eydata = 0.03)
fitter.fit()
The curve_fit returns this plot:
Curve_fit plot
The spinmob fitter plot gives this:
Spinmob Fitter Plot
Assuming that spinmob actually uses scipy.curve_fit under the hood, I would guess (sorry) that the problem is that the initial values you give to it are so far off that it cannot possibly find a solution.
For sure, A=1 is not a very good guess for either scipy.curve_fit() or spinmob.fitter(). The peak is definitely negative, and you should be guessing a value more like -0.1 than +1. In fact you could probably assert that A must be < 0.
The initial value of 7.688 for mu that you give to curve_fit() is pretty good, and will allow a solution. I do not know whether it is a typo or not, but the initial value of 8.688 for mu that you give to spinmob.fitter() is very far off (that is, way outside the data range), and the fit will never be able to refine its way to the correct solution from there.
Initial values matter for curve-fitting and poor initial values can lead to bad results.
It might be viewed by some as a shameless plug, but allow me to encourage you to try lmfit (https://lmfit.github.io/lmfit-py/) (I am a lead author) for this kind of problem. Lmfit replaces the array of parameter values with named Parameter objects for better organization of fits. It also has a built-in Gaussian model (which also calculates FWHM, including an uncertainty). That is, with Lmfit, your script might look like:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel
from lmfit.lineshapes import gaussian
# create fake data that looks like yours
xdata = 7.670 + np.arange(41)*0.0010
ydata = gaussian(xdata, amplitude=-0.196, center=7.6881, sigma=0.001)
ydata += np.random.normal(size=41, scale=10.0)
# create gaussian model
gmodel = GaussianModel()
# fit data, giving initial values for amplitude, center, and sigma
result = gmodel.fit(ydata, x=xdata, amplitude=-0.1, center=7.688, sigma=0.005)
# show results
print(result.fit_report())
plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, result.best_fit, 'r+-', label='fit')
plt.legend()
plt.show()
This will print out a report like
[Model]]
Model(gaussian)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 21
# data points = 41
# variables = 3
chi-square = 5114.87632
reduced chi-square = 134.602009
Akaike info crit = 203.879794
Bayesian info crit = 209.020510
[[Variables]]
sigma: 9.7713e-04 +/- 1.5456e-04 (15.82%) (init = 0.005)
center: 7.68822727 +/- 1.5484e-04 (0.00%) (init = 7.688)
amplitude: -0.19273945 +/- 0.02643400 (13.71%) (init = -0.1)
fwhm: 0.00230096 +/- 3.6396e-04 (15.82%) == '2.3548200*sigma'
height: -78.6917624 +/- 10.7894236 (13.71%) == '0.3989423*amplitude/max(1.e-15, sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(sigma, amplitude) = -0.577
and produce a plot of data and best fit like
which should be close to what you are trying to do.
Related
I have to do study the laser beam profile. To this aim, I need to find a Super Gaussian curve fit for my data.
Super Gaussian equation:
I * exp(- 2 * ((x - x0) /sigma)^P)
where P takes into account the flat-top laser beam curve characteristics.
I started doing a simple Gaussian fit of my curve, in Python. The fit returns a Gaussian curve where the values of I, x0 and sigma are optimized. (I used the function curve_fit)
Gaussian curve equation:
I * exp(-(x - x0)^2 / (2 * sigma^2))
Now, I would like to do a step forward. I would like to do the Super Gaussian curve fit because I need to consider the flat-top characteristics of the beam. Thus, I need a fit which optimizes also the P parameter.
Does someone know how to do a Super Gaussian curve fit with Python?
I know that there is a way to do a Super Gaussian fit with wolfram mathematica which is not opensource. I do not have it. Thus, I would like also to know if someone knows an open source software thanks to which it is possible to do a Super Gaussian curve fit or to execute wolfram mathematica.
Thank you
Well, you would need to write a function that calculates a parameterized super-Gaussian and use that to model data, say with scipy.optimize.curve_fit. As a lead author of LMFIT (https://lmfit.github.io/lmfit-py/) which provides a high-level interface to fitting and curve-fitting, I would recommend trying that library. With that approach, your model function for a super-Gaussian and using to fit data might look like this:
import numpy as np
from lmfit import Model
def super_gaussian(x, amplitude=1.0, center=0.0, sigma=1.0, expon=2.0):
"""super-Gaussian distribution
super_gaussian(x, amplitude, center, sigma, expon) =
(amplitude/(sqrt(2*pi)*sigma)) * exp(-abs(x-center)**expon / (2*sigma**expon))
"""
sigma = max(1.e-15, sigma)
return ((amplitude/(np.sqrt(2*np.pi)*sigma))
* np.exp(-abs(x-center)**expon / 2*sigma**expon))
# generate some test data
x = np.linspace(0, 10, 101)
y = super_gaussian(x, amplitude=7.1, center=4.5, sigma=2.5, expon=1.5)
y += np.random.normal(size=len(x), scale=0.015)
# make Model from the super_gaussian function
model = Model(super_gaussian)
# build a set of Parameters to be adjusted in fit, named from the arguments
# of the model function (super_gaussian), and providing initial values
params = model.make_params(amplitude=1, center=5, sigma=2., expon=2)
# you can place min/max bounds on parameters
params['amplitude'].min = 0
params['sigma'].min = 0
params['expon'].min = 0
params['expon'].max = 100
# note: if you wanted to make this strictly Gaussian, you could set
# expon=2 and prevent it from varying in the fit:
### params['expon'].value = 2.0
### params['expon'].vary = False
# now do the fit
result = model.fit(y, params, x=x)
# print out the fit statistics, best-fit parameter values and uncertainties
print(result.fit_report())
# plot results
import matplotlib.pyplot as plt
plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.legend()
plt.show()
This will print a report like
[[Model]]
Model(super_gaussian)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 53
# data points = 101
# variables = 4
chi-square = 0.02110713
reduced chi-square = 2.1760e-04
Akaike info crit = -847.799755
Bayesian info crit = -837.339273
[[Variables]]
amplitude: 6.96892162 +/- 0.09939812 (1.43%) (init = 1)
center: 4.50181661 +/- 0.00217719 (0.05%) (init = 5)
sigma: 2.48339218 +/- 0.02134446 (0.86%) (init = 2)
expon: 3.25148164 +/- 0.08379431 (2.58%) (init = 2)
[[Correlations]] (unreported correlations are < 0.100)
C(amplitude, sigma) = 0.939
C(sigma, expon) = -0.774
C(amplitude, expon) = -0.745
and generate a plot like this
This is the function for the super gaussian
def super_gaussian(x, amp, x0, sigma):
rank = 2
return amp * ((np.exp(-(2 ** (2 * rank - 1)) * np.log(2) * (((x - x0) ** 2) / ((sigma) ** 2)) ** (rank))) ** 2)
And then you need to call it with scipy optimize curve fit like this:
from scipy import optimize
opt, _ = optimize.curve_fit(super_gaussian, x, y)
vals = super_gaussian(x, *opt)
'vals' is what you need to plot, that is the fitted super gaussian function.
This is what you get with rank=1:
rank=2:
rank=3:
The answer of #M Newville works perfectly for me.
But be careful ! Parenthesis have been fogotten in the quotient of the exponential in the definition of super_gaussian function
def super_gaussian(x, amplitude=1.0, center=0.0, sigma=1.0, expon=2.0):
...
return ((amplitude/(np.sqrt(2*np.pi)*sigma))
* np.exp(-abs(x-center)**expon / 2*sigma**expon))
should be replaced by
def super_gaussian(x, amplitude=1.0, center=0.0, sigma=1.0, expon=2.0):
...
return (amplitude/(np.sqrt(2*np.pi)*sigma))
* np.exp(-abs(x-center)**expon / (2*sigma**expon))
Then the FWHM of the super-gaussian function which writes:
FWHM = 2.*sigma*(2.*np.log(2.))**(1/expon)
is well calculated and in excellent agreement with the plot.
I am sorry to write this text as an answer. But my reputation score is low to add a comment to M Newville post...
Fitting of y(x)=a *exp(-b *(x-c)**p) to data for parameters a,b,c,p.
The example of numerical calculus below shows an non-iterative method which doesn't require initial guess of parameters.
This in an application of the general principle explained in the paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
In the present version of the paper the case of Super-Gaussian isn't explicitely treated. It is not necessary to read the paper since the screen copy below shows the calculus in whole details.
Note that the numerical results a,b,c,p can be used as initial values for classical iterative methotds of regression.
Note:
The linear equation considered is :
A,B,C,D are the parameters to be computed thanks to linear regression. Numerical values S(k) of the integral are directly computed by numerical integration from the given data (As shown in the above example).
I have to calculate a non-linear least-square regression for my ~30 data points following the formula
I tried the curve_fit function out of scipy.optimize using the following code
def func(x, p1 ,p2):
return p1*x/(1-x/p2)
popt, pcov = curve_fit(func, CSV[:,1], CSV[:,0])
p1 = popt[0]
p2 = popt[1]
with p1 and p2 being equivalent to A and C, respectively, and CSV being my data-array. The functions runs without error message, but the outcome is not as expected. I've plotted the outcome of the function together with the original data points. I was not looking to get this nearly straight line (red line in plot), but something more close to the green line, which is simply a second order polynomial fit from Excel. The green dashed line shows just a quick manual try to get closer to the polynomial fit.
wrong calcualtin of the fit-function, together with the original data points: 1
Does anyone has an idea how to make the calculation run as i want it to?
Your code is fine. The data though is not easy to fit to. There are too few points on the right side of the chart and too much noise on the left hand side. This is why curve_fit fails.
Some ways to improve the solution could be:
raising maxfev parameter for curve_fit() see here
giving starting values to curve_fit() - see same place
add more data points
use more parameters in the function or different function.
curve_fit() may not be the strongest tool. See if you can get better results with other regression-type tools.
Below is the best I could get with your initial data and formula:
df = pd.read_csv("c:\\temp\\data.csv", header=None, dtype = 'float' )
df.columns = ('x','y')
def func(x, p1 ,p2):
return p1*x/(1-x/p2)
popt, pcov = curve_fit(func, df.x, df.y, maxfev=3000)
print('p1,p2:',popt)
p1, p2 = popt
y_pred = [ p1*x/(1-x/p2)+p3*x for x in range (0, 140, 5)]
plt.scatter(df.x, df.y)
plt.scatter(range (0, 140, 5), y_pred)
plt.show()
p1,p2: [-8.60771432e+02 1.08755430e-05]
I think i've figured out the best way to solve this problem by using the lmfit package (https://lmfit.github.io/lmfit-py/v). It worked best when i tried to fit the non-linear least-square regression not to the original data but to the fitting function provided by Excel (not very elegant, though).
from lmfit import Model
import matplotlib.pyplot as plt
import numpy as np
def func(x, o1 ,o2):
return o1*x/(1-x/o2)
xt = np.arange(0, 0.12, 0.005)
yt = 2.2268*np.exp(40.755*xt)
model = Model(func)
result = model.fit(yt, x=xt, o1=210, o2=0.118)
print(result.fit_report())
plt.plot(xt, yt, 'bo')
plt.plot(xt, result.init_fit, 'k--', label='initial fit')
plt.plot(xt, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show
The results look pretty nice and the package is really easy to use (i've left out the final plot)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 25
# data points = 24
# variables = 2
chi-square = 862.285318
reduced chi-square = 39.1947872
Akaike info crit = 89.9567771
Bayesian info crit = 92.3128848
[[Variables]]
o1: 310.243771 +/- 12.7126811 (4.10%) (init = 210)
o2: 0.13403974 +/- 0.00120453 (0.90%) (init = 0.118)
[[Correlations]] (unreported correlations are < 0.100)
C(o1, o2) = 0.930
I am having a hard time trying to understand why my Gaussian fit to a set of data (ydata) does not work well if I shift the interval of x-values corresponding to that data (xdata1 to xdata2). The Gaussian is written as:
where A is just an amplitude factor. Changing some of the values of the data, it is easy to make it work for both cases, but one can also easily find cases in which it does not work well for xdata1 and also in which covariance of the parameters is not estimated.
I am using scipy.optimize.curve_fit in Spyder with Python 3.7.1 on Windows 7.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xdata1 = np.linspace(-9,4,20, endpoint=True) # works fine
xdata2 = xdata1+2
ydata = np.array([8,9,15,12,14,20,24,40,54,94,160,290,400,420,300,130,40,10,8,4])
def gaussian(x, amp, mean, sigma):
return amp*np.exp(-(((x-mean)**2)/(2*sigma**2)))/(sigma*np.sqrt(2*np.pi))
popt1, pcov1 = curve_fit(gaussian, xdata1, ydata)
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata)
fig, ([ax1, ax2]) = plt.subplots(nrows=1, ncols=2,figsize=(9, 4))
ax1.plot(xdata1, ydata, 'b+:', label='xdata1')
ax1.plot(xdata1, gaussian(xdata1, *popt1), 'r-', label='fit')
ax1.legend()
ax2.plot(xdata2, ydata, 'b+:', label='xdata2')
ax2.plot(xdata2, gaussian(xdata2, *popt2), 'r-', label='fit')
ax2.legend()
The problem is your second attempt at fitting a gaussian is getting stuck in a local minimum while searching parameter space: curve_fit is a wrapper for least_squares which uses gradient descent to minimize the cost function and this is liable to get stuck in local minima.
You should try providing reasonable starting parameters (by using the p0 argument of curve_fit) to avoid this:
#... your code
y_max = np.max(y_data)
max_pos = ydata[ydata==y_max][0]
initial_guess = [y_max, max_pos, 1] # amplitude, mean, std
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata, p0=initial_guess)
Which as you can see provides a reasonable fit:
You should write a function which can provide reasonable estimates of the starting parameters. Here I just found the maximum y value and used this to determine the initial parameters. I've found this works well for the fitting normal distributions but you could consider other methods.
Edit:
You can also solve the problem by scaling the amplitude: the amplitude is so large the parameter space is distorted and the gradient descent simply follows the direction of greatest change in the amplitude and effectively ignores the sigma. Consider the following plot in parameter space (Colour is the sum of the squared residuals of the fit for given parameters and the white cross shows the optimal solution):
Make sure to make note of the different scales for the x and y axis.
One needs to make a large number of 'unit' sized steps in y (amplitude) to get to the minimum from the point x,y = (0,0), where as you only need less than one 'unit' sized step to get to the minimum in x (sigma). The algorithm simply takes steps in amplitude as this is the steepest gradient. When it gets to the amplitude which minimises the cost function it simply stops the algorithm as it appears to have converged and makes little or no changes in the sigma parameter.
One way to fix this is to scale your ydata to un-distort the parameter space: divide your ydata by 100 and you will see your fit works without providing any starting parameters!
I am trying to fit a power-law function, and in order to find the best fit parameter. However, I find that if the initial guess of parameter is different, the "best fit" output is different. Unless I find the right initial guess, I can get the best optimizing, instead of local optimizing. Is there any way to find the **appropriate initial guess ** ????. My code is listed below. Please feel free make any input. Thanks!
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
%matplotlib inline
# power law function
def func_powerlaw(x,a,b,c):
return a*(x**b)+c
test_X = [1.0,2,3,4,5,6,7,8,9,10]
test_Y =[3.0,1.5,1.2222222222222223,1.125,1.08,1.0555555555555556,1.0408163265306123,1.03125, 1.0246913580246915,1.02]
predict_Y = []
for x in test_X:
predict_Y.append(2*x**-2+1)
If I align with default initial guess, which p0 = [1,1,1]
popt, pcov = curve_fit(func_powerlaw, test_X[1:], test_Y[1:], maxfev=2000)
plt.figure(figsize=(10, 5))
plt.plot(test_X, func_powerlaw(test_X, *popt),'r',linewidth=4, label='fit: a=%.4f, b=%.4f, c=%.4f' % tuple(popt))
plt.plot(test_X[1:], test_Y[1:], '--bo')
plt.plot(test_X[1:], predict_Y[1:], '-b')
plt.legend()
plt.show()
The fit is like below, which is not the best fit.
If I change the initial guess to p0 = [0.5,0.5,0.5]
popt, pcov = curve_fit(func_powerlaw, test_X[1:], test_Y[1:], p0=np.asarray([0.5,0.5,0.5]), maxfev=2000)
I can get the best fit
---------------------Updated in 10.7.2018-------------------------------------------------------------------------------------------------------------------------
As I need to run thousands to even millions of Power Law function, using #James Phillips's method is too expensive. So what method is appropriate besides curve_fit? such as sklearn, np.linalg.lstsq etc.
Here is example code using the scipy.optimize.differential_evolution genetic algorithm, with your data and equation. This scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space and so requires bounds within which to search - in this example, those bounds are based on the data maximum and minimum values. For other problems you might need to supply different search bounds if you know what range of parameter values to expect.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
# power law function
def func_power_law(x,a,b,c):
return a*(x**b)+c
test_X = [1.0,2,3,4,5,6,7,8,9,10]
test_Y =[3.0,1.5,1.2222222222222223,1.125,1.08,1.0555555555555556,1.0408163265306123,1.03125, 1.0246913580246915,1.02]
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func_power_law(test_X, *parameterTuple)
return numpy.sum((test_Y - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(test_X)
minX = min(test_X)
maxY = max(test_Y)
minY = min(test_Y)
maxXY = max(maxX, maxY)
parameterBounds = []
parameterBounds.append([-maxXY, maxXY]) # seach bounds for a
parameterBounds.append([-maxXY, maxXY]) # seach bounds for b
parameterBounds.append([-maxXY, maxXY]) # seach bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# generate initial parameter values
geneticParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, pcov = curve_fit(func_power_law, test_X, test_Y, geneticParameters)
print('Parameters', fittedParameters)
modelPredictions = func_power_law(test_X, *fittedParameters)
absError = modelPredictions - test_Y
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(test_Y))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(test_X, test_Y, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(test_X), max(test_X))
yModel = func_power_law(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
There is no simple answer: if there was, it would be implemented in curve_fit and then it would not have to ask you for the starting point. One reasonable approach is to fit the homogeneous model y = a*x**b first. Assuming positive y (which is usually the case when you work with power law), this can be done in a rough and quick way: on the log-log scale, log(y) = log(a) + b*log(x) which is linear regression which can be solved with np.linalg.lstsq. This gives candidates for log(a) and for b; the candidate for c with this approach is 0.
test_X = np.array([1.0,2,3,4,5,6,7,8,9,10])
test_Y = np.array([3.0,1.5,1.2222222222222223,1.125,1.08,1.0555555555555556,1.0408163265306123,1.03125, 1.0246913580246915,1.02])
rough_fit = np.linalg.lstsq(np.stack((np.ones_like(test_X), np.log(test_X)), axis=1), np.log(test_Y))[0]
p0 = [np.exp(rough_fit[0]), rough_fit[1], 0]
The result is the good fit you see in the second picture.
By the way, it's better to make test_X a NumPy array at once. Otherwise, you are slicing X[1:] first, this gets NumPy-fied as an array of integers, and then an error is thrown with negative exponents. (And I suppose the purpose of 1.0 was to make it a float array? This is what dtype=np.float parameter should be used for.)
In addition to the very fine answers from Welcome to Stack Overflow that "there is no easy, universal approach and James Phillips that "differential evolution often
helps find good starting points (or even good solutions!) if somewhat slower than curve_fit()", allow me to give a separate answer that you may find helpful.
First, the fact that curve_fit() defaults to any parameter values is soul-crushingly bad idea. There is no possible justification for this behavior, and you and everyone else should treat the fact that there are default values for parameters as a serious error in the implementation of curve_fit() and pretend this bug does not exist. NEVER believe these defaults are reasonable.
From a simple plot of data, it should be obvious that a=1, b=1, c=1 are very, very bad starting values. The function decays, so b < 0. In fact, if you had started with a=1, b=-1, c=1 you would have found the correct solution.
It may have also helped to place sensible bounds on the parameters. Even setting bounds of c of (-100, 100) may have helped. As with the sign of b, I think you could have seen that boundary from a simple plot of the data. When I try this for your problem, bounds on c do not help if the initial value is b=1, but it does for b=0 or b=-5.
More importantly, although you print the best-fit params popt in the plot, you do not print the uncertainties or correlations between variables held in pcov, and thus your interpretation of the results is incomplete. If you had looked at these values, you would have seen that starting with b=1 leads not only to bad values but also to huge uncertainties in the parameters and very, very high correlation. This is the fit telling you that it found a poor solution. Unfortunately, the return pcov from curve_fit is not very easy to unpack.
Allow me to recommend lmfit (https://lmfit.github.io/lmfit-py/) (disclaimer: I'm a lead developer). Among other features, this module forces you to give non-default starting values, and to more easily a more complete report. For your problem, even starting with a=1, b=1, c=1 would have given a more meaningful indication that something was wrong:
from lmfit import Model
mod = Model(func_powerlaw)
params = mod.make_params(a=1, b=1, c=1)
ret = mod.fit(test_Y[1:], params, x=test_X[1:])
print(ret.fit_report())
which would print out:
[[Model]]
Model(func_powerlaw)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 1318
# data points = 9
# variables = 3
chi-square = 0.03300395
reduced chi-square = 0.00550066
Akaike info crit = -44.4751740
Bayesian info crit = -43.8835003
[[Variables]]
a: -1319.16780 +/- 6892109.87 (522458.92%) (init = 1)
b: 2.0034e-04 +/- 1.04592341 (522076.12%) (init = 1)
c: 1320.73359 +/- 6892110.20 (521839.55%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
C(a, c) = -1.000
C(b, c) = -1.000
C(a, b) = 1.000
That is a = -1.3e3 +/- 6.8e6 -- not very well defined! In addition all parameters are completely correlated.
Changing the initial value for b to -0.5:
params = mod.make_params(a=1, b=-0.5, c=1) ## Note !
ret = mod.fit(test_Y[1:], params, x=test_X[1:])
print(ret.fit_report())
gives
[[Model]]
Model(func_powerlaw)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 31
# data points = 9
# variables = 3
chi-square = 4.9304e-32
reduced chi-square = 8.2173e-33
Akaike info crit = -662.560782
Bayesian info crit = -661.969108
[[Variables]]
a: 2.00000000 +/- 1.5579e-15 (0.00%) (init = 1)
b: -2.00000000 +/- 1.1989e-15 (0.00%) (init = -0.5)
c: 1.00000000 +/- 8.2926e-17 (0.00%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
C(a, b) = -0.964
C(b, c) = -0.880
C(a, c) = 0.769
which is somewhat better.
In short, initial values always matter, and the result is not only the best-fit values, but includes the uncertainties and correlations.
I have a set of points in the first quadrant that look like a gaussian, and I am trying to fit it using a gaussian in python and my code is as follows:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import math
x=ar([37,69,157,238,274,319,391,495,533,626,1366,1855,2821,3615,4130,4374,6453,6863,7021,
7951,8646,9656,10464,11400])
y=ar([1.77,1.67,1.65,1.17,1.34,1.46,0.75,1,0.8,1.02,0.65,0.69,0.44,0.44,0.55,0.43,0.75,0.27,0.26,
0.44,0.04,0.44,0.26,0.04])
n = 24 #the number of data
mean = sum(x*y)/n #note this correction
sigma = math.sqrt(sum(y*(x-mean)**2)/n) #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=None, sigma=None) #'''p0=[1,mean,sigma]'''
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
And the output is: this figure:
http://s2.postimg.org/wevggkc95/Workspace_1_022.png
Why are all the red points coming below, Also note that I am interested in a half gaussian as my data is like that, so my y values are big at first and then decreasing like one side of the gaussian bell. Can anyone tell me how to fit this curve in python, (in case it cannot be fit to gaussian). Or in other words, I want code to fit the half(left side) gaussian of my points (in the first quadrant only). Note that my points cannot be fit as an exponentially decreasing curve as I tried that earlier, and it is not fitting well at lower 'x' values.
Apparently your data do not fit well or easily to a Gaussian function. You use the default initial guesses for p0 = [1,1,1] which is so far away from any kind of optimal choice that curve_fit gives up before it gets started (check the values of popt=[1,1,1] and pcov=[inf, inf, inf]). You could try with better guesses (e.g. p0 = [2,0, 2000]), but on my system it won't converge: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
To fit a "half-Gaussian", don't float the centre position x0 (just leave it equal to 0):
def gaus(x,a,sigma):
return a*exp(-(x)**2/(2*sigma**2))
p0 = [1.2, 4000]
popt,pcov = curve_fit(gaus,x,y,p0=p0)
Unless you have a particular reason for wanting to fit a Gaussian, why not do a more robust linear least squares fit to a polynomial, e.g.:
pfit = np.polyfit(x, y, 3)
poly = np.poly1d(pfit)