I have a set of points in the first quadrant that look like a gaussian, and I am trying to fit it using a gaussian in python and my code is as follows:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import math
x=ar([37,69,157,238,274,319,391,495,533,626,1366,1855,2821,3615,4130,4374,6453,6863,7021,
7951,8646,9656,10464,11400])
y=ar([1.77,1.67,1.65,1.17,1.34,1.46,0.75,1,0.8,1.02,0.65,0.69,0.44,0.44,0.55,0.43,0.75,0.27,0.26,
0.44,0.04,0.44,0.26,0.04])
n = 24 #the number of data
mean = sum(x*y)/n #note this correction
sigma = math.sqrt(sum(y*(x-mean)**2)/n) #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=None, sigma=None) #'''p0=[1,mean,sigma]'''
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
And the output is: this figure:
http://s2.postimg.org/wevggkc95/Workspace_1_022.png
Why are all the red points coming below, Also note that I am interested in a half gaussian as my data is like that, so my y values are big at first and then decreasing like one side of the gaussian bell. Can anyone tell me how to fit this curve in python, (in case it cannot be fit to gaussian). Or in other words, I want code to fit the half(left side) gaussian of my points (in the first quadrant only). Note that my points cannot be fit as an exponentially decreasing curve as I tried that earlier, and it is not fitting well at lower 'x' values.
Apparently your data do not fit well or easily to a Gaussian function. You use the default initial guesses for p0 = [1,1,1] which is so far away from any kind of optimal choice that curve_fit gives up before it gets started (check the values of popt=[1,1,1] and pcov=[inf, inf, inf]). You could try with better guesses (e.g. p0 = [2,0, 2000]), but on my system it won't converge: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
To fit a "half-Gaussian", don't float the centre position x0 (just leave it equal to 0):
def gaus(x,a,sigma):
return a*exp(-(x)**2/(2*sigma**2))
p0 = [1.2, 4000]
popt,pcov = curve_fit(gaus,x,y,p0=p0)
Unless you have a particular reason for wanting to fit a Gaussian, why not do a more robust linear least squares fit to a polynomial, e.g.:
pfit = np.polyfit(x, y, 3)
poly = np.poly1d(pfit)
Related
So I've fitted a Gaussian curve to some very noisy data. I was wondering how I'd go about finding the coordinates of the peak of the Gaussian line?
def fit_func(x,a,mu,sig,m,c):
gauss = a*sp.exp(-(x-mu)**2/(2*sig**2))
line = m*x+c
return gauss + line
initial_guess=[160,mean,sd,2,100]
po,po_cov=sp.optimize.curve_fit(fit_func,x,y,initial_guess)
This is the code I've used to fit that Gaussian. Would I have to add more to this? Or just something else?
For a normal distribution, the peak will be located at the mean, so the peak coordinates would be (mu, a) for an expression like gauss = a*exp(-(x-mu)**2/(2*sig**2)). For your case, you also added a linear function, so the global maxima will be either at plus or minus infinity. There will be also a local maximum near the mean of the gaussian distribution, you can find an analytical expression for this taking the derivative of the whole expression and equaling to 0.
As you have a vector of the fitted values, you can use np.argmax to do that. Include import numpy as np at the beginning of your code and use:
fitted = fit_func(x,po[0],po[1],po[2],po[3],po[4]) # fitted curve
max_at = x[np.argmax(fitted)] # 125.0
plt.plot(x, fitted, label='Fit results')
plt.axvline(x = max_at, color='red')
plt.show()
Apparently the fitted curve achieves its (local) maximum at 125.0.
I have tried to implement a Gaussian fit in Python with the given data. However, I am unable to obtain the desired fit. Any suggestions would help.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar, exp
xData=ar([-7.66E-06,-7.60E-06,-7.53E-06,-7.46E-06,-7.40E-06,-7.33E-06,-7.26E-06,-7.19E-06,-7.13E-06,-7.06E-06,-6.99E-06,
-6.93E-06,-6.86E-06,-6.79E-06,-6.73E-06,-6.66E-06,-6.59E-06,-6.52E-06,-6.46E-06,-6.39E-06,-6.32E-06,-6.26E-06,-6.19E-06,
-6.12E-06,-6.06E-06,-5.99E-06,-5.92E-06,-5.85E-06,-5.79E-06,-5.72E-06])
yData=ar([17763,2853,3694,4203,4614,4984,5080,7038,6905,8729,11687,13339,14667,16175,15953,15342,14340,15707,13001,10982,8867,6827,5262,4760,3869,3232,2835,2746,2552,2576])
#plot the data points
plt.plot(xData,yData,'bo',label='experimental_data')
plt.show()
#define the function we want to fit the plot into
# Define the Gaussian function
n = len(xData)
mean = sum(xData*yData)/n
sigma = np.sqrt(sum(yData*(xData-mean)**2)/n)
def Gauss(x,I0,x0,sigma,Background):
return I0*exp(-(x-x0)**2/(2*sigma**2))+Background
popt,pcov = curve_fit(Gauss,xData,yData,p0=[1,mean,sigma, 0.0])
print(popt)
plt.plot(xData,yData,'b+:',label='data')
plt.plot(xData,Gauss(xData,*popt),'ro:',label='fit')
plt.legend()
plt.title('Gaussian_Fit')
plt.xlabel('x-axis')
plt.ylabel('PL Intensity')
plt.show()
When computing mean and sigma, divide by sum(yData), not n.
mean = sum(xData*yData)/sum(yData)
sigma = np.sqrt(sum(yData*(xData-mean)**2)/sum(yData))
The reason is that, say for mean, you need to compute the average of xData weighed by yData. For this, you need to normalize yData to have sum 1, i.e., you need to multiply xData with yData / sum(yData) and take the sum.
With the correction by j1-lee and removing the first point which clearly doesn't agree with the Gaussian model, the fit looks like this:
Removing the bin that clearly doesn't belong in the fit reduces the fitted width by some 20% and the (fitted) noise to background ratio by some 30%. The mean is only marginally affected.
I am having a hard time trying to understand why my Gaussian fit to a set of data (ydata) does not work well if I shift the interval of x-values corresponding to that data (xdata1 to xdata2). The Gaussian is written as:
where A is just an amplitude factor. Changing some of the values of the data, it is easy to make it work for both cases, but one can also easily find cases in which it does not work well for xdata1 and also in which covariance of the parameters is not estimated.
I am using scipy.optimize.curve_fit in Spyder with Python 3.7.1 on Windows 7.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
xdata1 = np.linspace(-9,4,20, endpoint=True) # works fine
xdata2 = xdata1+2
ydata = np.array([8,9,15,12,14,20,24,40,54,94,160,290,400,420,300,130,40,10,8,4])
def gaussian(x, amp, mean, sigma):
return amp*np.exp(-(((x-mean)**2)/(2*sigma**2)))/(sigma*np.sqrt(2*np.pi))
popt1, pcov1 = curve_fit(gaussian, xdata1, ydata)
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata)
fig, ([ax1, ax2]) = plt.subplots(nrows=1, ncols=2,figsize=(9, 4))
ax1.plot(xdata1, ydata, 'b+:', label='xdata1')
ax1.plot(xdata1, gaussian(xdata1, *popt1), 'r-', label='fit')
ax1.legend()
ax2.plot(xdata2, ydata, 'b+:', label='xdata2')
ax2.plot(xdata2, gaussian(xdata2, *popt2), 'r-', label='fit')
ax2.legend()
The problem is your second attempt at fitting a gaussian is getting stuck in a local minimum while searching parameter space: curve_fit is a wrapper for least_squares which uses gradient descent to minimize the cost function and this is liable to get stuck in local minima.
You should try providing reasonable starting parameters (by using the p0 argument of curve_fit) to avoid this:
#... your code
y_max = np.max(y_data)
max_pos = ydata[ydata==y_max][0]
initial_guess = [y_max, max_pos, 1] # amplitude, mean, std
popt2, pcov2 = curve_fit(gaussian, xdata2, ydata, p0=initial_guess)
Which as you can see provides a reasonable fit:
You should write a function which can provide reasonable estimates of the starting parameters. Here I just found the maximum y value and used this to determine the initial parameters. I've found this works well for the fitting normal distributions but you could consider other methods.
Edit:
You can also solve the problem by scaling the amplitude: the amplitude is so large the parameter space is distorted and the gradient descent simply follows the direction of greatest change in the amplitude and effectively ignores the sigma. Consider the following plot in parameter space (Colour is the sum of the squared residuals of the fit for given parameters and the white cross shows the optimal solution):
Make sure to make note of the different scales for the x and y axis.
One needs to make a large number of 'unit' sized steps in y (amplitude) to get to the minimum from the point x,y = (0,0), where as you only need less than one 'unit' sized step to get to the minimum in x (sigma). The algorithm simply takes steps in amplitude as this is the steepest gradient. When it gets to the amplitude which minimises the cost function it simply stops the algorithm as it appears to have converged and makes little or no changes in the sigma parameter.
One way to fix this is to scale your ydata to un-distort the parameter space: divide your ydata by 100 and you will see your fit works without providing any starting parameters!
I'm trying to fit data with a Gaussian.
The raw data itself displays a very obvious peak.
When I attempt fitting using curve_fit, the fit identifies the peak but it does not have a curved top.
I am trying to fit the data now with spinmob's fitter as well. However, this fitting just gives a straight line.
I've tried changing several parameters of the fitter, the Gaussian function definition, and the initial parameters for the fit but nothing seems to work.
Here is the code:
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import spinmob as s
x = x30
y = ydata
def gaussian(x, A, mu, sig): # See http://mathworld.wolfram.com/GaussianFunction.html
return A/(sig * np.sqrt(2*np.pi)) * np.exp(-np.power(x-mu, 2) / (2 * np.power(sig, 2)))
popt,pcov = curve_fit(gaussian,x,y,p0=[1,7.688,0.005])
FWHM = 2*np.sqrt(2*np.log(2))*popt[2]
print("FWHM: {}".format(FWHM))
plt.plot(x,y,'bo',label='data')
plt.plot(x,gaussian(x,*popt),'r+-',label='fit')
plt.legend()
fitter = s.data.fitter()
fitter.set(subtract_bg=True, plot_guess_zoom=True)
fitter.set_functions(f=gaussian, p='A=1,mu=8.688,sig=0.001')
fitter.set_data(x, y, eydata = 0.03)
fitter.fit()
The curve_fit returns this plot:
Curve_fit plot
The spinmob fitter plot gives this:
Spinmob Fitter Plot
Assuming that spinmob actually uses scipy.curve_fit under the hood, I would guess (sorry) that the problem is that the initial values you give to it are so far off that it cannot possibly find a solution.
For sure, A=1 is not a very good guess for either scipy.curve_fit() or spinmob.fitter(). The peak is definitely negative, and you should be guessing a value more like -0.1 than +1. In fact you could probably assert that A must be < 0.
The initial value of 7.688 for mu that you give to curve_fit() is pretty good, and will allow a solution. I do not know whether it is a typo or not, but the initial value of 8.688 for mu that you give to spinmob.fitter() is very far off (that is, way outside the data range), and the fit will never be able to refine its way to the correct solution from there.
Initial values matter for curve-fitting and poor initial values can lead to bad results.
It might be viewed by some as a shameless plug, but allow me to encourage you to try lmfit (https://lmfit.github.io/lmfit-py/) (I am a lead author) for this kind of problem. Lmfit replaces the array of parameter values with named Parameter objects for better organization of fits. It also has a built-in Gaussian model (which also calculates FWHM, including an uncertainty). That is, with Lmfit, your script might look like:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel
from lmfit.lineshapes import gaussian
# create fake data that looks like yours
xdata = 7.670 + np.arange(41)*0.0010
ydata = gaussian(xdata, amplitude=-0.196, center=7.6881, sigma=0.001)
ydata += np.random.normal(size=41, scale=10.0)
# create gaussian model
gmodel = GaussianModel()
# fit data, giving initial values for amplitude, center, and sigma
result = gmodel.fit(ydata, x=xdata, amplitude=-0.1, center=7.688, sigma=0.005)
# show results
print(result.fit_report())
plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, result.best_fit, 'r+-', label='fit')
plt.legend()
plt.show()
This will print out a report like
[Model]]
Model(gaussian)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 21
# data points = 41
# variables = 3
chi-square = 5114.87632
reduced chi-square = 134.602009
Akaike info crit = 203.879794
Bayesian info crit = 209.020510
[[Variables]]
sigma: 9.7713e-04 +/- 1.5456e-04 (15.82%) (init = 0.005)
center: 7.68822727 +/- 1.5484e-04 (0.00%) (init = 7.688)
amplitude: -0.19273945 +/- 0.02643400 (13.71%) (init = -0.1)
fwhm: 0.00230096 +/- 3.6396e-04 (15.82%) == '2.3548200*sigma'
height: -78.6917624 +/- 10.7894236 (13.71%) == '0.3989423*amplitude/max(1.e-15, sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(sigma, amplitude) = -0.577
and produce a plot of data and best fit like
which should be close to what you are trying to do.
I am trying to smooth a noisy one-dimensional physical signal, y, while retaining correspondence between the signal's amplitude and its units. I'm applying a Gaussian kernel and normalizing the Gaussian itself based on its own integral.
import numpy.random
import matplotlib.pyplot as plt
import numpy as np
from scipy.integrate import quad
#Gaussian kernel (not normalized here)
def gaussian(x, sigma):
return np.exp(-(x/sigma)**2/2)
#convolution
def smooth(y,box_pts):
x = (np.linspace(-box_pts/2.,box_pts/2.,box_pts + 1)) #Gaussian centred on 0
std_norm = 3. #3. is an arbitrary value for normalizing the sigma
sigma = box_pts/std_norm
integral = quad(gaussian, x[0], x[-1], args=(sigma))[0]
box = gaussian(x, sigma)/integral
y_smooth = np.convolve(y,box,mode='same')
return y_smooth
box_size = 10
length = 100
y = numpy.random.randn(length)
y_smooth = smooth(y,box_size)
plt.plot(y)
plt.show()
plt.plot(y_smooth)
plt.show()
In this example, y is the signal I want to smooth, box_pts is the width or region over which the Gaussian kernel is applied as it is translated. And to normalize my convolution, I've simply divided by the integral of the Gaussian by itself. Since the Gaussian has a certain width (given by box_pts) and is zero outside of this interval compared to the physical axis of y, I am normalizing the Gaussian to the region over which it is non-zero as opposed to $-\infty$ to $ \infty$. This method appears simple enough and the normalization appears to work, but is dividing by the kernel's integral itself the "right" or suggested approach to take when normalizing convolutions/kernels?
Based on a simple example, it appears keep the integral about the same, but amplitudes are no longer consistent as visualized below which is a bit problematic as I want to retain a smoothed signal while removing the noise:
Original:
Smoothed:
I get improved amplitudes when I decrease box_size relative to length, but this keeps the signal noisy. Decreasing std_norm also appears to help reduce noise, but this influences the amplitude. My question boils down to deciding how one chooses optimal std_norm and box_norm for a given noisy signal y of size length such that I can reduce the noise while keeping the output physically sound (i.e. smoothed signal amplitude is somewhat consistent with the original signal and the integral under the smoothed curve is fairly similar, too). I have applied a Gaussian kernel here, but am interested in hopefully a general approach for any arbitrary kernel.