I have tried to implement a Gaussian fit in Python with the given data. However, I am unable to obtain the desired fit. Any suggestions would help.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar, exp
xData=ar([-7.66E-06,-7.60E-06,-7.53E-06,-7.46E-06,-7.40E-06,-7.33E-06,-7.26E-06,-7.19E-06,-7.13E-06,-7.06E-06,-6.99E-06,
-6.93E-06,-6.86E-06,-6.79E-06,-6.73E-06,-6.66E-06,-6.59E-06,-6.52E-06,-6.46E-06,-6.39E-06,-6.32E-06,-6.26E-06,-6.19E-06,
-6.12E-06,-6.06E-06,-5.99E-06,-5.92E-06,-5.85E-06,-5.79E-06,-5.72E-06])
yData=ar([17763,2853,3694,4203,4614,4984,5080,7038,6905,8729,11687,13339,14667,16175,15953,15342,14340,15707,13001,10982,8867,6827,5262,4760,3869,3232,2835,2746,2552,2576])
#plot the data points
plt.plot(xData,yData,'bo',label='experimental_data')
plt.show()
#define the function we want to fit the plot into
# Define the Gaussian function
n = len(xData)
mean = sum(xData*yData)/n
sigma = np.sqrt(sum(yData*(xData-mean)**2)/n)
def Gauss(x,I0,x0,sigma,Background):
return I0*exp(-(x-x0)**2/(2*sigma**2))+Background
popt,pcov = curve_fit(Gauss,xData,yData,p0=[1,mean,sigma, 0.0])
print(popt)
plt.plot(xData,yData,'b+:',label='data')
plt.plot(xData,Gauss(xData,*popt),'ro:',label='fit')
plt.legend()
plt.title('Gaussian_Fit')
plt.xlabel('x-axis')
plt.ylabel('PL Intensity')
plt.show()
When computing mean and sigma, divide by sum(yData), not n.
mean = sum(xData*yData)/sum(yData)
sigma = np.sqrt(sum(yData*(xData-mean)**2)/sum(yData))
The reason is that, say for mean, you need to compute the average of xData weighed by yData. For this, you need to normalize yData to have sum 1, i.e., you need to multiply xData with yData / sum(yData) and take the sum.
With the correction by j1-lee and removing the first point which clearly doesn't agree with the Gaussian model, the fit looks like this:
Removing the bin that clearly doesn't belong in the fit reduces the fitted width by some 20% and the (fitted) noise to background ratio by some 30%. The mean is only marginally affected.
Related
I am trying to fit a curve over the histogram of a Poisson distribution that looks like this
I have modified the fit function so that it resembles a Poisson distribution, with the parameter t as a variable. But the curve_fit function can not be plotted and I am not sure why.
def histo(bsize):
N = bsize
#binwidth
bw = (dt.max()-dt.min())/(N-1.)
bin1 = dt.min()+ bw*np.arange(N)
#define the array to hold the occurrence count
bincount= np.array([])
for bin in bin1:
count = np.where((dt>=bin)&(dt<bin+bw))[0].size
bincount = np.append(bincount,count)
#bin center
binc = bin1+0.5*bw
plt.figure()
plt.plot(binc,bincount,drawstyle= 'steps-mid')
plt.xlabel("Interval[ticks]")
plt.ylabel("Frequency")
histo(30)
plt.xlim(0,.5e8)
plt.ylim(0,25000)
import numpy as np
from scipy.optimize import curve_fit
delta_t = 1.42e7
def func(x, t):
return t * np.exp(- delta_t/t)
popt, pcov = curve_fit(func, np.arange(0,.5e8),histo(30))
plt.plot(popt)
The problem with your code is that you do not know what the return values of curve_fit are. It is the parameters for the fit-function and their covariance matrix - not something you can plot directly.
Binned Least-Squares Fit
In general you can get everything much, much more easily:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.special import factorial
from scipy.stats import poisson
# get poisson deviated random numbers
data = np.random.poisson(2, 1000)
# the bins should be of integer width, because poisson is an integer distribution
bins = np.arange(11) - 0.5
entries, bin_edges, patches = plt.hist(data, bins=bins, density=True, label='Data')
# calculate bin centers
bin_centers = 0.5 * (bin_edges[1:] + bin_edges[:-1])
def fit_function(k, lamb):
'''poisson function, parameter lamb is the fit parameter'''
return poisson.pmf(k, lamb)
# fit with curve_fit
parameters, cov_matrix = curve_fit(fit_function, bin_centers, entries)
# plot poisson-deviation with fitted parameter
x_plot = np.arange(0, 15)
plt.plot(
x_plot,
fit_function(x_plot, *parameters),
marker='o', linestyle='',
label='Fit result',
)
plt.legend()
plt.show()
This is the result:
Unbinned Maximum-Likelihood fit
An even better possibility would be to not use a histogram at all
and instead to carry out a maximum-likelihood fit.
But by closer examination even this is unnecessary, because the
maximum-likelihood estimator for the parameter of the poissonian distribution is the arithmetic mean.
However, if you have other, more complicated PDFs, you can use this as example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.special import factorial
from scipy import stats
def poisson(k, lamb):
"""poisson pdf, parameter lamb is the fit parameter"""
return (lamb**k/factorial(k)) * np.exp(-lamb)
def negative_log_likelihood(params, data):
"""
The negative log-Likelihood-Function
"""
lnl = - np.sum(np.log(poisson(data, params[0])))
return lnl
def negative_log_likelihood(params, data):
''' better alternative using scipy '''
return -stats.poisson.logpmf(data, params[0]).sum()
# get poisson deviated random numbers
data = np.random.poisson(2, 1000)
# minimize the negative log-Likelihood
result = minimize(negative_log_likelihood, # function to minimize
x0=np.ones(1), # start value
args=(data,), # additional arguments for function
method='Powell', # minimization method, see docs
)
# result is a scipy optimize result object, the fit parameters
# are stored in result.x
print(result)
# plot poisson-distribution with fitted parameter
x_plot = np.arange(0, 15)
plt.plot(
x_plot,
stats.poisson.pmf(x_plot, result.x),
marker='o', linestyle='',
label='Fit result',
)
plt.legend()
plt.show()
Thank you for the wonderful discussion!
You might want to consider the following:
1) Instead of computing "poisson", compute "log poisson", for better numerical behavior
2) Instead of using "lamb", use the logarithm (let me call it "log_mu"), to avoid the fit "wandering" into negative values of "mu".
So
log_poisson(k, log_mu): return k*log_mu - loggamma(k+1) - math.exp(log_mu)
Where "loggamma" is the scipy.special.loggamma function.
Actually, in the above fit, the "loggamma" term only adds a constant offset to the functions being minimized, so one can just do:
log_poisson_(k, log_mu): return k*log_mu - math.exp(log_mu)
NOTE: log_poisson_() not the same as log_poisson(), but when used for minimization in the manner above, will give the same fitted minimum (the same value of mu, up to numerical issues). The value of the function being minimized will have been offset, but one doesn't usually care about that anyway.
I want to fit the Laplace distribution to specific data. But I want the mean to be equal to 0.
I believe using the fit function in scipy.stats and then setting the mean (loc) parameter to zero is not the logical solution.
Is there any better solution? Thanks in advance.
from scipy.stats import laplace
import numpy as np
import matplotlib.pyplot as plt
def fit_laplace(arr, axes):
params = laplace.fit(arr)
x = np.linspace(min(arr), max(arr), 100)
print("PARAMS: ", params)
pdf_fitted = laplace.pdf(x, 0, params[1])
axes.plot(x, pdf_fitted, color='red')
The correct approach is to first transform your data with z-transform, then fit the distribution.
To constrain the mean to be zero, use floc=0 in the call of the fit method:
params = laplace.fit(arr, floc=0)
I want a normal curve to fit the histogram I already have.
navf2 is a list of normalized random numbers and the histogram is based on those, and I want a curve to show the general trend of the histogram.
while len(navf2)<252:
number=np.random.normal(0,1,None)
navf2.append(number)
bin_edges=np.arange(70,130,1)
plt.style.use(["dark_background",'ggplot'])
plt.hist(navf2, bins=bin_edges, alpha=1)
plt.ylabel("Frequency of final NAV")
plt.xlabel("Ranges")
ymin=0
ymax=100
plt.ylim([ymin,ymax])
plt.show()
Here You go:
=^..^=
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
# create raw data
data = np.random.uniform(size=252)
# distribution fitting
mu, sigma = norm.fit(data)
# fitting distribution
x = np.linspace(-0.5,1.5,100)
y = norm.pdf(x, loc=mu, scale=sigma)
# plot data
plt.plot(x, y,'r-')
plt.hist(data, density=1, alpha=1)
plt.show()
Output:
Here is a another solution using your code as mentioned in the question. We can achieve the expected result without the use of the scipy library. we will have to do three things, compute the mean of the data set, compute the standard deviation of the set, and create a function that generates the normal or Gaussian curve.
To compute the mean we can use the function within numpy library, ie mu = np.mean(your_data_set_here)
The standard deviation of the set is the square root of the sum of the differences of the values and mean squared https://en.wikipedia.org/wiki/Standard_deviation. We can express it in code as follows, using the numpy library again:
data_set = [] # some data set
sigma = np.sqrt(1/(len(data_set))*sum((data_set-mu)**2))
Finally we have to build the function for the normal curve or Gaussian https://en.wikipedia.org/wiki/Gaussian_function, it relies on both the mean (mu) and the standard deviation (sigma), so we will use those as parameters in our function:
def Gaussian(x,sigma,mu): # sigma is the standard deviation and mu is the mean
return ((1/(np.sqrt(2*np.pi)*sigma))*np.exp(-(x-mu)**2/(2*sigma**2)))
putting it all together looks like this:
import numpy as np
import matplotlib.pyplot as plt
navf2 = []
while len(navf2)<252:
number=np.random.normal(0,1,None) # since all values will be between 0,1 the bin size doesnt work
navf2.append(number)
navf2 = np.asarray(navf2) # convert to array for better results
mu = np.mean(navf2) #the avg of all values in navf2
sigma = np.sqrt(1/(len(navf2))*sum((navf2-mu)**2)) # standard deviation of navf2
x_vals = np.arange(min(navf2),max(navf2),0.001) # create a flat range based off data
# to build the curve
gauss = [] #store values for normal curve here
def Gaussian(x,sigma,mu): # defining the normal curve
return ((1/(np.sqrt(2*np.pi)*sigma))*np.exp(-(x-mu)**2/(2*sigma**2)))
for val in x_vals :
gauss.append(Gaussian(val,sigma,mu))
plt.style.use(["dark_background",'ggplot'])
plt.hist(navf2, density = 1, alpha=1) # add density = 1 to fix the scaling issues
plt.ylabel("Frequency of final NAV")
plt.xlabel("Ranges")
plt.plot(x_vals,gauss)
plt.show()
Here is a picture of an output:
Hope this helps, I tired to keep it as close to your original code as possible !
I have a set of points in the first quadrant that look like a gaussian, and I am trying to fit it using a gaussian in python and my code is as follows:
import pylab as plb
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
import math
x=ar([37,69,157,238,274,319,391,495,533,626,1366,1855,2821,3615,4130,4374,6453,6863,7021,
7951,8646,9656,10464,11400])
y=ar([1.77,1.67,1.65,1.17,1.34,1.46,0.75,1,0.8,1.02,0.65,0.69,0.44,0.44,0.55,0.43,0.75,0.27,0.26,
0.44,0.04,0.44,0.26,0.04])
n = 24 #the number of data
mean = sum(x*y)/n #note this correction
sigma = math.sqrt(sum(y*(x-mean)**2)/n) #note this correction
def gaus(x,a,x0,sigma):
return a*exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=None, sigma=None) #'''p0=[1,mean,sigma]'''
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit for Time Constant')
plt.xlabel('Time (s)')
plt.ylabel('Voltage (V)')
plt.show()
And the output is: this figure:
http://s2.postimg.org/wevggkc95/Workspace_1_022.png
Why are all the red points coming below, Also note that I am interested in a half gaussian as my data is like that, so my y values are big at first and then decreasing like one side of the gaussian bell. Can anyone tell me how to fit this curve in python, (in case it cannot be fit to gaussian). Or in other words, I want code to fit the half(left side) gaussian of my points (in the first quadrant only). Note that my points cannot be fit as an exponentially decreasing curve as I tried that earlier, and it is not fitting well at lower 'x' values.
Apparently your data do not fit well or easily to a Gaussian function. You use the default initial guesses for p0 = [1,1,1] which is so far away from any kind of optimal choice that curve_fit gives up before it gets started (check the values of popt=[1,1,1] and pcov=[inf, inf, inf]). You could try with better guesses (e.g. p0 = [2,0, 2000]), but on my system it won't converge: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
To fit a "half-Gaussian", don't float the centre position x0 (just leave it equal to 0):
def gaus(x,a,sigma):
return a*exp(-(x)**2/(2*sigma**2))
p0 = [1.2, 4000]
popt,pcov = curve_fit(gaus,x,y,p0=p0)
Unless you have a particular reason for wanting to fit a Gaussian, why not do a more robust linear least squares fit to a polynomial, e.g.:
pfit = np.polyfit(x, y, 3)
poly = np.poly1d(pfit)
Hi I want to calculate errors in slope and intercept which are calculated by scipy.polyfit function. I have (+/-) uncertainty for ydata so how can I include it for calculating uncertainty into slope and intercept? My code is,
from scipy import polyfit
import pylab as plt
from numpy import *
data = loadtxt("data.txt")
xdata,ydata = data[:,0],data[:,1]
x_d,y_d = log10(xdata),log10(ydata)
polycoef = polyfit(x_d, y_d, 1)
yfit = 10**( polycoef[0]*x_d+polycoef[1] )
plt.subplot(111)
plt.loglog(xdata,ydata,'.k',xdata,yfit,'-r')
plt.show()
Thanks a lot
You could use scipy.optimize.curve_fit instead of polyfit. It has a parameter sigma for errors of ydata. If you have your error for every y value in a sequence yerror (so that yerror has the same length as your y_d sequence) you can do:
polycoef, _ = scipy.optimize.curve_fit(lambda x, a, b: a*x+b, x_d, y_d, sigma=yerror)
For an alternative see the paragraph Fitting a power-law to data with errors in the Scipy Cookbook.