I'm trying to fit some data with a poisson distribution, but it doesn't work.
x = [46,71,106,126,40,27,19,103,46,89,31,70,35,43,82,128,185,47,18,36,96,30,135,36,40,72,32,86,76,116,51,23,40,121,22,107,65,93,25,74,73,73,111,56,34,28,87,14,70,54,63,50,89,62,35,59,71,39,23,46,32,56,15,68,30,69,37,41,43,106,20,35,63,44,40,32,102,28,54,32,42,19,69,31,36,86,41,57,39,53,48,121,35,51,10,68,14,140,57,50,178,37,121,35,206,26,54,5,53,17,139,49,122,110,62,81,43,83,47,62,2,50,36,190,32,124,89,60,39,156,89,26,57,34,58,29,22,96,132,59,34,43,50,58,48,56,43,54,22,26,60,43,69,58,100,122,48,55,29,55,57,36,42,51,24,81,66,73,112,34,54,45,29,53,43,60,72,13,72,85,49,80,47,40,28,43,37,48,31,60,33,75,53,71,49,142,47,28,51,80,50,33,67,28,101,80,60,80,98,39,69,27,32,11,32,62,32,77,110,45,61,22,23,73,25,27,41,42,65,23,127,128,42,44,10,50,56,73,42,63,70,148,18,109,111,54,34,18,32,50,100,41,39,58,93,42,86,70,41,27,24,57,77,81,101,48,52,146,59,87,86,120,28,23,76,52,59,31,60,32,65,49,27,106,136,23,15,77,44,96,62,66,26,41,70,13,64,124,49,44,55,68,54,58,72,41,21,80,3,49,54,35,48,38,83,59,36,80,47,32,38,16,43,196,19,80,28,56,23,81,103,45,25,42,44,34,106,23,47,53,119,56,54,108,35,20,34,39,70,61,40,35,51,104,63,55,93,22,32,48,20,121,55,76,36,32,121,58,42,101,32,49,77,23,95,32,75,53,106,194,54,31,104,69,58,66,29,66,37,28,59,60,70,95,63,103,173,47,59,27] #geiger count
bins = np.histogram_bin_edges(x)
n, bins_edges, patches = plt.hist(x,bins, density=1, facecolor='darkblue',ec='white', log=0)
print(n)
bin_middles = 0.5*(bins_edges[1:] + bins_edges[:-1])
def fit_function(k, lamb):
return poisson.pmf(k, lamb)
parameters, cov_matrix = curve_fit(fit_function, bin_middles,n)
x_plot = np.arange(0,max(x))
plt.plot(x_plot,fit_function(x_plot, *parameters),label='Poisson')
plt.show()
I'm getting this as result but as we can see it's not right
You are using functions such as np.histogram_bin_edges meant for continuous distributions, while the Poisson distribution is discrete.
According to wikipedia, lambda can be estimated by just taking the mean of the samples:
from scipy.stats import poisson
import numpy as np
from matplotlib import pyplot as plt
x = [46,71,106,126,40,27,19,103,46,89,31,70,35,43,82,128,185,47,18,36,96,30,135,36,40,72,32,86,76,116,51,23,40,121,22,107,65,93,25,74,73,73,111,56,34,28,87,14,70,54,63,50,89,62,35,59,71,39,23,46,32,56,15,68,30,69,37,41,43,106,20,35,63,44,40,32,102,28,54,32,42,19,69,31,36,86,41,57,39,53,48,121,35,51,10,68,14,140,57,50,178,37,121,35,206,26,54,5,53,17,139,49,122,110,62,81,43,83,47,62,2,50,36,190,32,124,89,60,39,156,89,26,57,34,58,29,22,96,132,59,34,43,50,58,48,56,43,54,22,26,60,43,69,58,100,122,48,55,29,55,57,36,42,51,24,81,66,73,112,34,54,45,29,53,43,60,72,13,72,85,49,80,47,40,28,43,37,48,31,60,33,75,53,71,49,142,47,28,51,80,50,33,67,28,101,80,60,80,98,39,69,27,32,11,32,62,32,77,110,45,61,22,23,73,25,27,41,42,65,23,127,128,42,44,10,50,56,73,42,63,70,148,18,109,111,54,34,18,32,50,100,41,39,58,93,42,86,70,41,27,24,57,77,81,101,48,52,146,59,87,86,120,28,23,76,52,59,31,60,32,65,49,27,106,136,23,15,77,44,96,62,66,26,41,70,13,64,124,49,44,55,68,54,58,72,41,21,80,3,49,54,35,48,38,83,59,36,80,47,32,38,16,43,196,19,80,28,56,23,81,103,45,25,42,44,34,106,23,47,53,119,56,54,108,35,20,34,39,70,61,40,35,51,104,63,55,93,22,32,48,20,121,55,76,36,32,121,58,42,101,32,49,77,23,95,32,75,53,106,194,54,31,104,69,58,66,29,66,37,28,59,60,70,95,63,103,173,47,59,27] bins = np.histogram_bin_edges(x)
n, bins_edges, patches = plt.hist(x, bins, density=1, facecolor='darkblue', ec='white', log=0)
lamd = np.mean(x)
x_plot = np.arange(0, max(x) + 1)
plt.plot(x_plot, poisson.pmf(x_plot, lamd), label='Poisson')
plt.show()
The calculated lambda is about 60. The plot seems to indicate that the Poisson distribution isn't a very close fit for the given samples.
Related
I have a set of data that follows a normal distribution in which I can fit the histogram and obtain the mean and sigma.
For the sake of example, I will approximate it by generating a random normal distribution as follows:
from scipy.stats import maxwell
import math
import random
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import curve_fit
from IPython import embed # put embed() where you want to stop
import matplotlib.ticker as ticker
data = random.gauss(307, 16)
N, bins, patches = plt.hist(data, bins=40, density=True, alpha=0.5, histtype='bar', ec='black')
mu, std = norm.fit(data)
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2, label= r'$\mu$ = '+'{:0.1f}'.format(mu)+r' $\pm$ '+'{:0.1f}'.format(std))
What I would like to do next is to generate a Maxwell distribution from this "normal" distribution and be able to fit
I have read scipy.stats.maxwell webpage and several other related questions but was not able to generate such a distribution from "a gauss distribution" and fit it. Any help would much appreciate it.
Well, knowing that each Maxwell is distribution of the absolute value of the molecule velocity, where each component is normally distributed, you could make sampling like code below
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import maxwell
def maxw(size = None):
"""Generates size samples of maxwell"""
vx = np.random.normal(size=size)
vy = np.random.normal(size=size)
vz = np.random.normal(size=size)
return np.sqrt(vx*vx + vy*vy + vz*vz)
mdata = maxw(100000)
h, bins = np.histogram(mdata, bins = 101, range=(0.0, 10.0))
x = np.linspace(0.0, 10.0, 100)
rv = maxwell()
fig, ax = plt.subplots(1, 1)
ax.hist(mdata, bins = bins, density=True)
ax.plot(x, rv.pdf(x), 'k-', lw=2, label='Maxwell pdf')
plt.title("Maxwell")
plt.show()
And here is the picture with sampling and Maxwell PDF overlapped
For a physics lab project, I am measuring various emission lines from various elements. High intensity peaks occur at certain wavelengths. My goal is to fit a Gaussian function in python in order to find at which wavelength the intensity is peaking.
I have already tried using the norm function from the scipy.stats library. Below is the code and the graph that is produced.
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
mean, std = norm.fit(he3888_1[:,0])
plt.plot(he3888_1[:,0], he3888_1[:,1], color='r')
x = np.linspace(min(he3888_1[:,0]), max(he3888_1[:,0]), 100)
y = norm.pdf(x, mean, std)
plt.plot(x, y)
plt.xlabel("Wavelength (Angstroms)")
plt.ylabel("Intensity")
plt.show()
Could this be because the intensity is low for a relatively long period prior to it?
Lmfit seems like a good option for your case. The code below simulates a Gaussian peak with a linear background added and shows how you can extract the parameters with lmfit. The latter has a number of other built-in models (Lorentzian, Voight, etc.) that can be easily combined with each other.
import numpy as np
from lmfit.models import Model, LinearModel
from lmfit.models import GaussianModel, LorentzianModel
import matplotlib.pyplot as plt
def generate_gaussian(amp, mu, sigma_sq, slope=0, const=0):
x = np.linspace(mu-10*sigma_sq, mu+10*sigma_sq, num=200)
y_gauss = (amp/np.sqrt(2*np.pi*sigma_sq))*np.exp(-0.5*(x-mu)**2/sigma_sq)
y_linear = slope*x + const
y = y_gauss + y_linear
return x, y
# Gaussiand peak generation
amplitude = 6
center = 3884
variance = 4
slope = 0
intercept = 0.05
x, y = generate_gaussian(amplitude, center, variance, slope, intercept)
#Create a lmfit model: Gaussian peak + linear background
gaussian = GaussianModel()
background = LinearModel()
model = gaussian + background
#Find what model parameters you need to specify
print('parameter names: {}'.format(model.param_names))
print('independent variables: {}'.format(model.independent_vars))
#Model fit
result = model.fit(y, x=x, amplitude=3, center=3880,
sigma=3, slope=0, intercept=0.1)
y_fit = result.best_fit #the simulated intensity
result.best_values #the extracted peak parameters
# Comparison of the fitted spectrum with the original one
plt.plot(x, y, label='model spectrum')
plt.plot(x, y_fit, label='fitted spectrum')
plt.xlabel('wavelength, (angstroms')
plt.ylabel('intensity')
plt.legend()
Output:
parameter names: ['amplitude', 'center', 'sigma', 'slope', 'intercept']
independent variables: ['x']
result.best_values
Out[139]:
{'slope': 2.261379140543626e-13,
'intercept': 0.04999999912168238,
'amplitude': 6.000000000000174,
'center': 3883.9999999999977,
'sigma': 2.0000000000013993}
I want to make lognormal distribution. I'm using a logarithmic x-axis. However, I can't scale Probability density function correctly. I found one post on the forum for a full complete answer I did not find there(Scaling and fitting to a log-normal distribution using a logarithmic axis in python). Along with the change in the parameter "s" the graph has a different shape. Can the unambiguously correct distribution shape be obtained?Thank you for help.
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import pandas as pd
from math import log
# Import data
data = pd.read_excel('data2018.xlsx')
# Create bins (log)
classes = 16
s=10
bins_log10 = np.logspace(np.log10(data['diff'].min()), np.log10(data['diff'].max()), classes + 1)
bins_log10_s = np.logspace(np.log10(data['diff'].min()), np.log10(data['diff'].max()), (classes + 1) * s)
# Plot histogram
plt.style.use('ggplot')
counts, bins, _ = plt.hist(data['diff'], bins=bins_log10, edgecolor='black', linewidth=1,
label="Histogram")
# Calculation of bin centers and multiplied them
restored = [[d] * int(counts[n]) for n, d in enumerate((bins[1:] + bins[:-1]) / 2)]
# Flatten the result
restored = [item for sublist in restored for item in sublist]
# Calculate of fitting parameters. shape = sigma, log(scale) = mu
shape, loc, scale = stats.lognorm.fit(restored, floc=0)
# Calculate centers and length log_bins
cen_log_bins = (bins_log10[1:] + bins_log10[:-1]) / 2
len_log_bins = (bins_log10[1:] - bins_log10[:-1])
samples_fit_log_cntr = stats.lognorm.pdf(cen_log_bins, shape, loc=loc, scale=scale)
plt.plot(cen_log_bins,samples_fit_log_cntr * len_log_bins * counts.sum(), ls='dashed',label='PDF with centers', linewidth=2)
# Smooth pdf
bins_log10_cntr_s = (bins_log10_s[1:] + bins_log10_s[:-1]) / 2
samples_fit_log_cntr = stats.lognorm.pdf(bins_log10_cntr_s, shape, loc=loc, scale=scale)
bins_log_cntr = bins_log10_s[1:] - bins_log10_s[:-1]
plt.plot(bins_log10_cntr_s, samples_fit_log_cntr * bins_log_cntr * counts.sum() * s,color='blue',label='Smooth PDF with centers', linewidth=2)
plt.title("Fit results: $\mu = %.2f, \sigma$ = %.2f" % (log(scale), shape))
plt.xscale('log')
plt.legend()
plt.tight_layout()
plt.show()
In Python, I have estimated the parameters for the density of a model of my distribution and I would like to plot the density function above the histogram of the distribution. In R it is similar to using the option prop=TRUE.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# initialization of the list "data"
# estimation of the parameter, in my case, mean and variance of a normal distribution
plt.hist(data, bins="auto") # data is the list of data
# here I would like to draw the density above the histogram
plt.show()
I guess the trickiest part is to make it fit.
Edit: I have tried this according to the first answer:
mean = np.mean(logdata)
var = np.var(logdata)
std = np.sqrt(var) # standard deviation, used by numpy as a replacement of the variance
plt.hist(logdata, bins="auto", alpha=0.5, label="données empiriques")
x = np.linspace(min(logdata), max(logdata), 100)
plt.plot(x, mlab.normpdf(x, mean, std))
plt.xlabel("log(taille des fichiers)")
plt.ylabel("nombre de fichiers")
plt.legend(loc='upper right')
plt.grid(True)
plt.show()
But it doesn't fit the graph, here is how it looks:
** Edit 2 ** Works with the option normed=True in the histogram function.
If I understand you correctly you have the mean and standard deviation of some data. You have plotted a histogram of this and would like to plot the normal distribution line over the histogram. This line can be generated using matplotlib.mlab.normpdf(), the documentation can be found here.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
mean = 100
sigma = 5
data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 100)
plt.hist(data, bins="auto",normed=True)
plt.plot(x, mlab.normpdf(x, mean, sigma))
plt.show()
Which gives the following figure:
Edit: The above only works with normed = True. If this is not an option, we can define our own function:
def gauss_function(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
mean = 100
sigma = 5
data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 1000)
test = gauss_function(x, max(data), mean, sigma)
plt.hist(data, bins="auto")
plt.plot(x, test)
plt.show()
All what you are looking for, already are in seaborn.
You just have to use distplot
import seaborn as sns
import numpy as np
data = np.random.normal(5, 2, size=1000)
sns.distplot(data)
I am trying to fit a gaussian distribution to some data I have, the data depicts variation in density with height. Here is the code I have so far:
import matplotlib.pyplot as plt
from astropy.modeling import models, fitting
x = heights
y = densities
#calculate fit parameters
n = len(x) #no. of obs
mean = sum(x*y)/n #average
sigma = sum(y*(x-mean)**2)/n #std dev
amplitude = max(y)
g_init = models.Gaussian1D(amplitude, mean, sigma)
fit_g = fitting.LevMarLSQFitter()
g = fit_g(g_init, x, y)
plt.plot(heights, densities)
plt.plot(x, g(x), label='Gaussian')
#plot labels
plt.xlabel("Height[km]")
plt.ylabel("Density")
plt.show()
However, the plot of the gaussian is just a straight line. Please help me figure out how to correct this. I searched, and the problem seems to be that it is not converging, so I supplied the amplitude as max(y).. but it won't work. Thanks in advance.