I am trying to fit my data with a gaussian curve. Here is my code :
import numpy as np
from scipy import optimize
# The independent variable where the data is measured
x_coord = np.array([-0.1216 , -0.11692308, -0.11224615, -0.10756923, -0.10289231,
-0.09821538, -0.09353846, -0.08886154, -0.08418462, -0.07950769,
-0.07483077, -0.07015385, -0.06547692, -0.0608 , -0.05612308,
-0.05144615, -0.04676923, -0.04209231, -0.03741538, -0.03273846,
-0.02806154, -0.02338462, -0.01870769, -0.01403077, -0.00935385,
-0.00467692, 0. , 0.00467692, 0.00935385, 0.01403077,
0.01870769, 0.02338462, 0.02806154, 0.03273846, 0.03741538,
0.04209231, 0.04676923, 0.05144615, 0.05612308, 0.0608 ,
0.06547692, 0.07015385, 0.07483077, 0.07950769, 0.08418462,
0.08886154, 0.09353846, 0.09821538, 0.10289231, 0.10756923,
0.11224615, 0.11692308])
# The dependent data — nominally f(x_coord)
y = np.array([-0.0221931 , -0.02323915, -0.02414913, -0.0255389 , -0.02652465,
-0.02888672, -0.03075954, -0.03355392, -0.03543005, -0.03839526,
-0.040933 , -0.0456585 , -0.04849097, -0.05038776, -0.0466699 ,
-0.04202133, -0.034239 , -0.02667525, -0.01404582, -0.00122683,
0.01703862, 0.03992694, 0.06704549, 0.11362071, 0.28149172,
0.6649422 , 1. , 0.6649422 , 0.28149172, 0.11362071,
0.06704549, 0.03992694, 0.01703862, -0.00122683, -0.01404582,
-0.02667525, -0.034239 , -0.04202133, -0.0466699 , -0.05038776,
-0.04849097, -0.0456585 , -0.040933 , -0.03839526, -0.03543005,
-0.03355392, -0.03075954, -0.02888672, -0.02652465, -0.0255389 ,
-0.02414913, -0.02323915])
# define a gaussian function to fit the data
def gaussian(x, a, b, c):
val = a * np.exp(-(x - b)**2 / c**2)
return val
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma = np.array([0.01] * len(x_coord)))
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, popt[0], popt[1], popt[2]), 'r:')
the figure shows that the fitting curve is completely wrong :
What should I do to obtain a well fitting curve?
This is actually a very nice question that illustrates that finding the right (local) optimum can be very difficult.
Via the p0 argument you can give the optimization routine a hint, where approximately you would expect the optimum.
If you start with the initial guess of [1,0,0.1]:
# fit the data
sigma = np.array([0.01] * len(x_coord))
popt, pcov = optimize.curve_fit(gaussian, x_coord, y, sigma=sigma, p0=[1,0,0.1])
You get following result:
A couple of notes: You forced curve_fit to fit a bell curve without a constant term. This made things a little awkward.
If you allow an offset d, you get:
# define a gaussian function to fit the data
def gaussian(x, a, b, c, d):
val = a* np.exp(-(x - b)**2 / c**2) + d
return val
And obtain following result:
# fit the data
popt, pcov = optimize.curve_fit(gaussian, x_coord, y)
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, gaussian(x_coord, *popt), 'r:')
Which looks much more like a reasonable fit. Although it seems that the gaussian does not fit the data well.
The very peaked shape suggests that a Laplacian might fit better:
# define a laplacian function to fit the data
def laplacian(x, a, b, c, d):
val = a* np.exp(-np.abs(x - b) / c) + d
return val
# fit the data
popt, pcov = optimize.curve_fit(laplacian, x_coord, y, p0=[1,0,0.01,-0.1])
# plot the data and the fitting curve
plt.plot(x_coord, y, 'b-', x_coord, laplacian(x_coord, *popt), 'r:')
This is the result:
Related
I'm doing a curve fit in python using scipy.curve_fit, and the fit itself looks great, however the parameters that are generated don't make sense.
The equation is (ax)^b + cx, but with the params python finds a = -c and b = 1, so the whole equation just equals 0 for every value of x.
here is the plot
(https://i.stack.imgur.com/fBfg7.png)](https://i.stack.imgur.com/fBfg7.png)
here is the experimental raw data I used: https://pastebin.com/CR2BCJji
xdata = cfu_u
ydata = OD_u
min_cfu = 0.1
max_cfu = 9.1
x_vec = pow(10,np.arange(min_cfu,max_cfu,0.1))
def func(x,a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data',marker='.',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.xscale("log")
plt.gcf().set_size_inches(7, 5)
plt.show()
print(popt)
[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I used the curve_fit function from scipy to fit an exponential curve to some data. The fit looks very good, so that part was a success.
However, the parameters output by the curve_fit function do not make sense, and solving f(x) with them results in f(x)=0 for every value of x, which is clearly not what is happening in the curve.
Modify your model to show what's actually happening:
def func(x: np.ndarray, a: float, b: float, c: float) -> np.ndarray:
return (a*x)**(1 - b) + (c - a)*x
producing optimized parameters
[3.49003332e-04 6.60420171e-06 3.13366557e-08]
This is likely to be numerically unstable. Try optimizing in the log domain instead.
When I run your example (after adding imports, etc.), I get NaNs for popt, and I eventually realized you were allowing general, real b with negative x. If I fit to the positive x only, I get a popt of [1.89176133e+01 5.66689997e+00 1.29380532e+08]. The fit isn't too bad (see below), but perhaps you need to restrict b to be an integer to fit the whole set. I'm not sure how to do that in Scipy (I assume you need mixed integer-real optimization, and I haven't investigated if Scipy supports that.)
Code:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
cfu_u, OD_u = np.loadtxt('data.txt', skiprows=1).T
# fit to positive x only
posmask = cfu_u > 0
xdata = cfu_u[posmask]
ydata = OD_u[posmask]
def func(x, a, b, c):
return (a*x)**b + c*x
popt, pcov = curve_fit(func, xdata, ydata, p0=[1000,2,1])
x_vec = np.geomspace(xdata.min(), xdata.max())
plt.plot(x_vec, func(x_vec, *popt), label = 'curve fit',color='slateblue',linewidth = 2.2)
plt.plot(cfu_u,OD_u,'-',label = 'experimental data', marker='x',markersize=8,color='deepskyblue',linewidth = 1.4)
plt.legend(loc='upper left',fontsize=12)
plt.ylabel("Y",fontsize=12)
plt.xlabel("X",fontsize=12)
plt.yscale("log")
plt.xscale("symlog")
plt.show()
print(popt)
#[ 1.44930871e+03 1.00000000e+00 -1.44930871e+03]
I am trying to fit a gaussian to a discrete potential using the astropy.modeling package. Although I assign a negative amplitude to the gaussian, it returns a null gaussian, i.e, with zero amplitude everywhere:
plot(c[0], pot_x, label='Discrete potential')
plot(c[0], g(c[0]), label='Gaussian fit')
legend()
I have the following code lines to perform the fitting:
g_init = models.Gaussian1D(amplitude=-1., mean=0, stddev=1.)
fit_g = fitting.LevMarLSQFitter()
g = fit_g(g_init, c[0], pot_x)
Where
c[0] = array([13.31381488, 13.31944489, 13.32507491, 13.33070493, 13.33633494,
13.34196496, 13.34759498, 13.35322499, 13.35885501, 13.36448503,
13.37011504, 13.37574506, 13.38137507, 13.38700509, 13.39263511,
13.39826512, 13.40389514, 13.40952516, 13.41515517, 13.42078519])
pot_x = array([ -1.72620157, -3.71811187, -6.01282809, -6.98874144,
-8.36645166, -14.31787771, -23.3688849 , -26.14679496,
-18.85970983, -10.73888697, -7.10763373, -5.81176637,
-5.44146953, -5.37165105, -4.6454408 , -2.90307138,
-1.66250349, -1.66096343, -1.8188269 , -1.41980552])
Does anyone have an ideia what the problem might be?
Solved: I just had to assign a mean that is in the range of the domain, like 13.35.
As I am not familiar with Astropy, I used scipy. The code below provides the following outpt:
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
x = np.asarray([13.31381488, 13.31944489, 13.32507491, 13.33070493, 13.33633494,
13.34196496, 13.34759498, 13.35322499, 13.35885501, 13.36448503,
13.37011504, 13.37574506, 13.38137507, 13.38700509, 13.39263511,
13.39826512, 13.40389514, 13.40952516, 13.41515517, 13.42078519])
y = -np.asarray([ -1.72620157, -3.71811187, -6.01282809, -6.98874144,
-8.36645166, -14.31787771, -23.3688849 , -26.14679496,
-18.85970983, -10.73888697, -7.10763373, -5.81176637,
-5.44146953, -5.37165105, -4.6454408 , -2.90307138,
-1.66250349, -1.66096343, -1.8188269 , -1.41980552])
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
def Gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma])
plt.plot(x, y, 'b+:', label='data')
plt.plot(x, Gauss(x, *popt), 'r-', label='fit')
plt.legend()
By simplicity, I reused this answer. I am not entirely certain about the mean and sigma definition, as I am not used to fitting a Gaussian on a 2D dataset. However, it doesn't really matter as it is simply used to define an approximatation used to start the curve_fit algorithm.
I recently got a script running to fit a gaussian to my absorption profile with help of SO. My hope was that things would work fine if I simply replace the Gauss function by a Voigt one, but this seems not to be the case. I think mainly due to the fact that it is a shifted voigt.
Edit: The profiles are absorption lines that vary in optical thickness. In practice they will be a mix between optically thick and thin features. Like the bottom part in this diagram. The current data will be more like the top image, but maybe the bottom is already flattened a bit. (And we only see the left side of the profile, a bit beyond the center)
For a Gauss it looks like this and as predicted the bottom seems to be less deep than the fit wants it to be, but still quite close. The profile itself should still be a voigt though. But now I realize that the central points might throw off the fit. So maybe a weight should be added based on wing position?
I'm mostly wondering if the shifted function could be mis-defined or if its my starting values.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
from scipy.special import wofz
x = np.arange(13)
xx = xx = np.linspace(0, 13, 100)
y = np.array([19699.959 , 21679.445 , 21143.195 , 20602.875 , 16246.769 ,
11635.25 , 8602.465 , 7035.493 , 6697.0337, 6510.092 ,
7717.772 , 12270.446 , 16807.81 ])
# weighted arithmetic mean (corrected - check the section below)
#mean = 2.4
sigma = 2.4
gamma = 2.4
def Gauss(x, y0, a, x0, sigma):
return y0 + a * np.exp(-(x - x0)**2 / (2 * sigma**2))
def Voigt(x, x0, y0, a, sigma, gamma):
#sigma = alpha / np.sqrt(2 * np.log(2))
return y0 + a * np.real(wofz((x - x0 + 1j*gamma)/sigma/np.sqrt(2))) / sigma /np.sqrt(2*np.pi)
popt, pcov = curve_fit(Voigt, x, y, p0=[8, np.max(y), -(np.max(y)-np.min(y)), sigma, gamma])
#p0=[8, np.max(y), -(np.max(y)-np.min(y)), mean, sigma])
plt.plot(x, y, 'b+:', label='data')
plt.plot(xx, Voigt(xx, *popt), 'r-', label='fit')
plt.legend()
plt.show()
I may be misunderstanding the model you're using, but I think you would need to include some sort of constant or linear background.
To do that with lmfit (which has Voigt, Gaussian, and many other models built in, and tries very hard to make these interchangeable), I would suggest starting with something like this:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel, VoigtModel, LinearModel, ConstantModel
x = np.arange(13)
xx = np.linspace(0, 13, 100)
y = np.array([19699.959 , 21679.445 , 21143.195 , 20602.875 , 16246.769 ,
11635.25 , 8602.465 , 7035.493 , 6697.0337, 6510.092 ,
7717.772 , 12270.446 , 16807.81 ])
# build model as Voigt + Constant
## model = GaussianModel() + ConstantModel()
model = VoigtModel() + ConstantModel()
# create parameters with initial values
params = model.make_params(amplitude=-1e5, center=8,
sigma=2, gamma=2, c=25000)
# maybe place bounds on some parameters
params['center'].min = 2
params['center'].max = 12
params['amplitude'].max = 0.
# do the fit, print out report with results
result = model.fit(y, params, x=x)
print(result.fit_report())
# plot data, best fit, fit interpolated to `xx`
plt.plot(x, y, 'b+:', label='data')
plt.plot(x, result.best_fit, 'ko', label='fitted points')
plt.plot(xx, result.eval(x=xx), 'r-', label='interpolated fit')
plt.legend()
plt.show()
And, yes, you can simply replace VoigtModel() with GaussianModel() or LorentzianModel() and redo the fit and compare the fit statistics to see which model is better.
For the Voigt model fit, the printed report would be
[[Model]]
(Model(voigt) + Model(constant))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 41
# data points = 13
# variables = 4
chi-square = 17548672.8
reduced chi-square = 1949852.54
Akaike info crit = 191.502014
Bayesian info crit = 193.761811
[[Variables]]
amplitude: -173004.338 +/- 30031.4068 (17.36%) (init = -100000)
center: 8.06574198 +/- 0.16209266 (2.01%) (init = 8)
sigma: 1.96247322 +/- 0.23522096 (11.99%) (init = 2)
c: 23800.6655 +/- 1474.58991 (6.20%) (init = 25000)
gamma: 1.96247322 +/- 0.23522096 (11.99%) == 'sigma'
fwhm: 7.06743644 +/- 0.51511574 (7.29%) == '1.0692*gamma+sqrt(0.8664*gamma**2+5.545083*sigma**2)'
height: -18399.0337 +/- 2273.61672 (12.36%) == '(amplitude/(max(2.220446049250313e-16, sigma*sqrt(2*pi))))*wofz((1j*gamma)/(max(2.220446049250313e-16, sigma*sqrt(2)))).real'
[[Correlations]] (unreported correlations are < 0.100)
C(amplitude, c) = -0.957
C(amplitude, sigma) = -0.916
C(sigma, c) = 0.831
C(center, c) = -0.151
Note that by default gamma is constrained to be the same value as sigma. This constraint can be lifted and gamma made to vary independently with params['gamma'].set(expr=None, vary=True, min=1.e-9). I think that you may not have enough data points in this data set to robustly and independently determine gamma.
The plot for that fit would look like this:
I managed to get something, but not very satisfying. If you remove the offset as a parameter and add 20000 directly in the Voigt function, with starting values [8, 126000, 0.71, 2] (the particular values don't' matter much) you'll get something like
Now, the fit produces a value for gamma which is negative which I cannot really justify. I would expect gamma to always be positive, but maybe I'm wrong and it's completely fine.
One thing you could try is to mirror your data so that its a "positive" peak (and while at it removing the background) and/or normalize the values. That might help you in the convergence.
I have no idea why when using the offset as a parameter the solver has problems finding an optimum. Maybe you need a different optimizer routine.
Maybe it'll be a better option to use the lmfit package that it's a wrapper over scipy to fit nonlinear functions with many prebuilt lineshapes. There is even an example of fitting a Voigt profile.
I'd like to make a Gaussian Fit for some data that has a rough gaussian fit. I'd like the information of data peak (A), center position (mu), and standard deviation (sigma), along with 95% confidence intervals for these values.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.stats import norm
# gaussian function
def gaussian_func(x, A, mu, sigma):
return A * np.exp( - (x - mu)**2 / (2 * sigma**2))
# generate toy data
x = np.arange(50)
y = [ 97.04421053, 96.53052632, 96.85684211, 96.33894737, 96.85052632,
96.30526316, 96.87789474, 96.75157895, 97.05052632, 96.73473684,
96.46736842, 96.23368421, 96.22526316, 96.11789474, 96.41263158,
96.32631579, 96.33684211, 96.44421053, 96.48421053, 96.49894737,
97.30105263, 98.58315789, 100.07368421, 101.43578947, 101.92210526,
102.26736842, 101.80421053, 101.91157895, 102.07368421, 102.02105263,
101.35578947, 99.83578947, 98.28, 96.98315789, 96.61473684,
96.82947368, 97.09263158, 96.82105263, 96.24210526, 95.95578947,
95.84210526, 95.67157895, 95.83157895, 95.37894737, 95.25473684,
95.32842105, 95.45684211, 95.31578947, 95.42526316, 95.30526316]
plt.scatter(x,y)
# initial_guess_of_parameters
# この値はソルバーとかで求めましょう.
parameter_initial = np.array([652, 2.9, 1.3])
# estimate optimal parameter & parameter covariance
popt, pcov = curve_fit(gaussian_func, x, y, p0=parameter_initial)
# plot result
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_func(xd, popt[0], popt[1], popt[2])
plt.plot(xd, estimated_curve, label="Estimated curve", color="r")
plt.legend()
plt.savefig("gaussian_fitting.png")
plt.show()
# estimate standard Error
StdE = np.sqrt(np.diag(pcov))
# estimate 95% confidence interval
alpha=0.025
lwCI = popt + norm.ppf(q=alpha)*StdE
upCI = popt + norm.ppf(q=1-alpha)*StdE
# print result
mat = np.vstack((popt,StdE, lwCI, upCI)).T
df=pd.DataFrame(mat,index=("A", "mu", "sigma"),
columns=("Estimate", "Std. Error", "lwCI", "upCI"))
print(df)
Data Plot with Fitted Curve
The data peak and center position seems correct, but the standard deviation is off. Any input is greatly appreciated.
Your scatter indeed looks similar to a gaussian distribution, but it is not centered around zero. Given the specifics of the Gaussian function it will therefor be hard to nicely fit a Gaussian distribution to the data the way you gave us. I would therefor propose by starting with demeaning the x series:
x = np.arange(0, 50) - 24.5
Next I would add one additional parameter to your gaussian function, the offset. Since the regular Gaussian function will always have its tails close to zero it is impossible to otherwise nicely fit your scatterplot:
def gaussian_function(x, A, mu, sigma, offset):
return A * np.exp(-np.power((x - mu)/sigma, 2.)/2.) + offset
Next you should define an error_loss_function to minimise:
def error_loss_function(params):
gaussian = gaussian_function(x, params[0], params[1], params[2], params[3])
errors = gaussian - y
return sum(np.power(errors, 2)) # You can also pick a different error loss function!
All that remains is fitting our curve now:
fit = scipy.optimize.minimize(fun=error_loss_function, x0=[2, 0, 0.2, 97])
params = fit.x # A: 6.57592661, mu: 1.95248855, sigma: 3.93230503, offset: 96.12570778
xd = np.arange(x.min(), x.max(), 0.01)
estimated_curve = gaussian_function(xd, params[0], params[1], params[2], params[3])
plt.plot(xd, estimated_curve, label="Estimated curve", color="b")
plt.legend()
plt.show(block=False)
Hopefully this helps. Looks like a fun project, let me know if my answer is not clear.
What is the problem on my code for fitting the curve?
I've written some code for fitting my data based on Gaussian distribution. However, I got some wrong value of a, b, c defined at the beginning of the code. Could you give me some advice to fix that problem?
from numpy import *
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*exp(-(x-b)**2/(2*c**2))
file = loadtxt("angdist1.xvg", skiprows = 18, dtype = float)
x = []
y = []
for i in range(shape(file)[0]):
x.append(file[i,0])
y.append(file[i,1])
popt, pcov = curve_fit(func, x, y)
plt.plot(x, func(x, *popt), color = 'red', linewidth=2)
plt.legend(['Original','fitting'], loc=0)
plt.show()
You did not provide initial guesses for your variables a, b, and c. scipy.optimize.curve_fit() will make the indefensible choice of silently assuming that you wanted initial values of a=b=c=1. Depending on your data, that could be so far off as to prevent the method from finding any solution at all.
The solution is to give initial values for the variables that are close. They don't have to be perfect. For example,
ainit = y.sum() # amplitude is within 10x of integral
binit = x.mean() # centroid is near mean x value
cinit = x.std() # standard deviation is near range of data
popt, pcov = curve_fit(func, x, y, [ainit, binit, cinit])
might give you a better result.