I am trying to fit a gaussian distribution to some data I have, the data depicts variation in density with height. Here is the code I have so far:
import matplotlib.pyplot as plt
from astropy.modeling import models, fitting
x = heights
y = densities
#calculate fit parameters
n = len(x) #no. of obs
mean = sum(x*y)/n #average
sigma = sum(y*(x-mean)**2)/n #std dev
amplitude = max(y)
g_init = models.Gaussian1D(amplitude, mean, sigma)
fit_g = fitting.LevMarLSQFitter()
g = fit_g(g_init, x, y)
plt.plot(heights, densities)
plt.plot(x, g(x), label='Gaussian')
#plot labels
plt.xlabel("Height[km]")
plt.ylabel("Density")
plt.show()
However, the plot of the gaussian is just a straight line. Please help me figure out how to correct this. I searched, and the problem seems to be that it is not converging, so I supplied the amplitude as max(y).. but it won't work. Thanks in advance.
Related
For a physics lab project, I am measuring various emission lines from various elements. High intensity peaks occur at certain wavelengths. My goal is to fit a Gaussian function in python in order to find at which wavelength the intensity is peaking.
I have already tried using the norm function from the scipy.stats library. Below is the code and the graph that is produced.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
mean, std = norm.fit(he3888_1[:,0])
plt.plot(he3888_1[:,0], he3888_1[:,1], color='r')
x = np.linspace(min(he3888_1[:,0]), max(he3888_1[:,0]), 100)
y = norm.pdf(x, mean, std)
plt.plot(x, y)
plt.xlabel("Wavelength (Angstroms)")
plt.ylabel("Intensity")
plt.show()
Could this be because the intensity is low for a relatively long period prior to it?
Lmfit seems like a good option for your case. The code below simulates a Gaussian peak with a linear background added and shows how you can extract the parameters with lmfit. The latter has a number of other built-in models (Lorentzian, Voight, etc.) that can be easily combined with each other.
import numpy as np
from lmfit.models import Model, LinearModel
from lmfit.models import GaussianModel, LorentzianModel
import matplotlib.pyplot as plt
def generate_gaussian(amp, mu, sigma_sq, slope=0, const=0):
x = np.linspace(mu-10*sigma_sq, mu+10*sigma_sq, num=200)
y_gauss = (amp/np.sqrt(2*np.pi*sigma_sq))*np.exp(-0.5*(x-mu)**2/sigma_sq)
y_linear = slope*x + const
y = y_gauss + y_linear
return x, y
# Gaussiand peak generation
amplitude = 6
center = 3884
variance = 4
slope = 0
intercept = 0.05
x, y = generate_gaussian(amplitude, center, variance, slope, intercept)
#Create a lmfit model: Gaussian peak + linear background
gaussian = GaussianModel()
background = LinearModel()
model = gaussian + background
#Find what model parameters you need to specify
print('parameter names: {}'.format(model.param_names))
print('independent variables: {}'.format(model.independent_vars))
#Model fit
result = model.fit(y, x=x, amplitude=3, center=3880,
sigma=3, slope=0, intercept=0.1)
y_fit = result.best_fit #the simulated intensity
result.best_values #the extracted peak parameters
# Comparison of the fitted spectrum with the original one
plt.plot(x, y, label='model spectrum')
plt.plot(x, y_fit, label='fitted spectrum')
plt.xlabel('wavelength, (angstroms')
plt.ylabel('intensity')
plt.legend()
Output:
parameter names: ['amplitude', 'center', 'sigma', 'slope', 'intercept']
independent variables: ['x']
result.best_values
Out[139]:
{'slope': 2.261379140543626e-13,
'intercept': 0.04999999912168238,
'amplitude': 6.000000000000174,
'center': 3883.9999999999977,
'sigma': 2.0000000000013993}
I need to fit a curve with my histogram in python. I did this before with normal histograms, this time I am trying to do the same with a logarithmic plot in x.
This is my code:
import numpy as np
import matplotlib.pyplot as plt
//radius is my np.array
Rmin = min(radius)
Rmax = max(radius)
logmin = np.log(Rmin)
logmax = np.log(Rmax)
bins = 10**(np.arange(logmin,logmax,0.1))
plt.figure()
plt.xscale("log")
plt.hist(radius, bins, color = 'red')
plt.show()
This is showing a gaussian distribution. I am trying to fit a curve with it and what I did is computing the following before the show() command.
(mu, sigma) = np.log(norm.fit((radius)))
y = (mlab.normpdf(np.log(bins), mu, sigma))
plt.plot(bins, y, 'b--', linewidth=2)
My result is a very flattened curve with respect to my distribution.
Can someone help me?
I can not add the whole array r(50000 points), therefore I have added a picture showing my result. See image
What I am trying to produce is something similar to this plot:
Which is a contour plot representing 68%, 95%, 99.7% of the particles comprised in two data sets.
So far, I have tried to implement a gaussain KDE estimate, and plotting those particles gaussians on a contour.
Files are added here https://www.dropbox.com/sh/86r9hf61wlzitvy/AABG2mbmmeokIiqXsZ8P76Swa?dl=0
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
# My data
x = RelDist
y = RadVel
# Peform the kernel density estimate
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
fig = plt.figure()
ax = fig.gca()
CS = ax.contour(xi, yi, zi.reshape(xi.shape), colors='darkslateblue')
plt.clabel(CS, inline=1, fontsize=10)
ax.set_xlim(20, 800)
ax.set_ylim(-450, 450)
ax.set_xscale('log')
plt.show()
Producing this:
]2
Where 1) I do not know how to necessarily control the bin number in gaussain kde, 2) The contour labels are all zero, 3) I have no clue on determining the percentiles.
Any help is appreciated.
taken from this example in the matplotlib documentation
you can transform your data zi to a percentage scale (0-1) and then contour plot.
You can also manually determine the levels of the countour plot when you call plt.contour().
Below is an example with 2 randomly generated normal bivariate distributions:
delta = 0.025
x = y = np.arange(-3.0, 3.01, delta)
X, Y = np.meshgrid(x, y)
Z1 = plt.mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
Z2 = plt.mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
Z = 10* (Z1- Z2)
#transform zi to a 0-1 range
Z = Z = (Z - Z.min())/(Z.max() - Z.min())
levels = [0.68, 0.95, 0.997]
origin = 'lower'
CS = plt.contour(X, Y, Z, levels,
colors=('k',),
linewidths=(3,),
origin=origin)
plt.clabel(CS, fmt='%2.3f', colors='b', fontsize=14)
Using the data you provided the code works just as well:
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
RadVel = np.loadtxt('RadVel.txt')
RelDist = np.loadtxt('RelDist.txt')
x = RelDist
y = RadVel
k = gaussian_kde(np.vstack([RelDist, RadVel]))
xi, yi = np.mgrid[x.min():x.max():x.size**0.5*1j,y.min():y.max():y.size**0.5*1j]
zi = k(np.vstack([xi.flatten(), yi.flatten()]))
#set zi to 0-1 scale
zi = (zi-zi.min())/(zi.max() - zi.min())
zi =zi.reshape(xi.shape)
#set up plot
origin = 'lower'
levels = [0,0.1,0.25,0.5,0.68, 0.95, 0.975,1]
CS = plt.contour(xi, yi, zi,levels = levels,
colors=('k',),
linewidths=(1,),
origin=origin)
plt.clabel(CS, fmt='%.3f', colors='b', fontsize=8)
plt.gca()
plt.xlim(10,1000)
plt.xscale('log')
plt.ylim(-200,200)
The answer from #Tkanno is programmatically correct but does not do exactly what was asked in the question.
The kde returns the likelihood of a sample according to the modeled distribution. The contour plots are therefore limits on the probability of a sample. The 0.1 contour plot would show the limit beyond which samples have less than 10% of chance to appear according to the modeled distribution. Now by normalising the z value as proposed by Tkanno, it is now relative probabilities that are plotted so in Tkanno's answer the 0.1 contour plot is the limit beyond which samples are 10 times less likely to appear than the most likely sample.
You could very similar contour plots as proposed by Tkanno (yet not smoothed) by doing a 2d-histogram, normalizing by the most frequent bin and plotting the contours with same levels.
This is not to be assimilated with a limit containing 90% of the data.
I think contour plots that encompass given fraction of the data are a bit more complicated to get (cf https://stats.stackexchange.com/questions/68105/contours-containing-a-given-fraction-of-x-y-points and the solution with bag plots).
Apparently there is an implementation of bag plots in R, maybe someone has/will make it for python.
To illustrate the difficulty of solving the question, one can think of a dataset with 100 points. Any volume containing 95 points, excluding 5 would actually answer the question. What is probably implicitly asked is the smallest volume containing 95 points (hence representing the highest likelyhood or density), and this is a combinatorial optimisation problem.
I am trying to fit a gaussian curve to my data which is a list of density variations with height, however the plot of the fitted curve generated is always off (peak doesn't align, width is overestimated). Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
#Gaussian function
def gauss_function(x, a, x0, sigma):
return a*np.exp(-(x-x0)**2/float((2*sigma**2)))
x = heights5
y = demeans5 #density values at each height
amp = max(y)
center = x[np.argmax(y)]
width = 20 #eye-balled estimate
#p0 = amp, width, center
popt, pcov = curve_fit(gauss_function, x, y, p0 = [amp, width, center])
#plot
dataplot = plt.scatter(x, y, marker = '.', label = 'Observations')
gausplot = plt.plot(x,gauss_function(x, *popt), color='red', label ='Gaussian fit')
string = 'fwhm = ' + str(2.355*popt[2]) + '\npeak = ' + str(popt[0]) + '\nmean = ' + str(popt[1]) + '\nsigma = ' + str(popt[2])
#plot labels etc.
plt.xlabel("Height[km]")
plt.ylabel("Density")
plt.legend([dataplot, gausplot], labels = ['fit', 'Observations'])
plt.text(130, 2000, string)
plt.show()
This is the plot it generates:
How do I fit the curve more accurately? And also, is there a way to estimate the width with the data?
For a more accurate fit, you could look into scipy.interpolate module. The functions there do a good job with interpolating and fitting.
Other fitting techniques which could do a good job are:
a) CSTs
b) BSplines
c) Polynomial interpolation
Scipy also has an implementation for BSplines. The other two, you may have to implement yourself.
A very similar question about using Python to fit double peaks is answered here:
How to guess the actual lorentzian function without relaxation behavior with Least square curve fitting
I am trying to cluster data according to the density of the data points.
I want to draw contours around these regions according to the density.Like so:
I am trying to adapt the following code from here to get to this point:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
img=ax.scatter(x, y, c=z, edgecolor='')
plt.show()
To cluster by density, try an algorithm like DBSCAN. However, it looks like you rather want to estimate the density itself and not cluster points together because you want to color your output by density. In that case, use a simple Kernel density estimate (density function in R) or an adaptive kernel density estimate if you have broad and very sharp peaks at the same time. Example for Matlab: Adaptive Kernel density estimate on Matlab File Exchange