PDF of a Lognormal Distribution - python

I have tried to draw a distribution function with a given mean and standard deviation. However, drawing the distribution function only shows the histograms and not the distribution function and I do not know why it is not drawn:
mean = 15.14
stdev = 0.3738
phi = (stdev ** 2 + mean ** 2) ** 0.5
mu = np.log(mean ** 2 / phi)
sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5
data=np.random.lognormal(mu, sigma , 1000)
mu, sigma, n= lognorm.fit(data)
plt.hist(data, bins=30, density=True, alpha=0.5, color='b')
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = lognorm.pdf(x, mu, sigma)
plt.plot(x, p, 'k', linewidth=2)
title = "LogNormal Distribution: Media: {:.2f} y Dev.Est: {:.2f}".format(mean, stdev)
plt.title(title)
plt.show()
The result that I have obtained:

Pay attention to the line:
mu, sigma, n = lognorm.fit(data)
there you are overwriting mu and sigma values used later.
lognorm.pdf(x, mu, sigma) returns zeros because you are evaluating the PDF far away from the mean, where the PDF is actually zero.
In order to properly center the PDF on the mean value, you should replace this line of your code:
p = lognorm.pdf(x, mu, sigma)
with:
p = lognorm.pdf(x = x, scale = mean, s = sigma)
Complete Code
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import lognorm
mean = 15.14
stdev = 0.3738
phi = (stdev ** 2 + mean ** 2) ** 0.5
mu = np.log(mean ** 2 / phi)
sigma = (np.log(phi ** 2 / mean ** 2)) ** 0.5
data=np.random.lognormal(mu, sigma , 1000)
# mu, sigma, n= lognorm.fit(data)
plt.hist(data, bins=30, density=True, alpha=0.5, color='b')
# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 1000)
p = lognorm.pdf(x = x, scale = mean, s = sigma)
plt.plot(x, p, 'k', linewidth=2)
title = "LogNormal Distribution: Media: {:.2f} y Dev.Est: {:.2f}".format(mean, stdev)
plt.title(title)
plt.show()

Related

Tuning the percentiles of a probability density function with values of mu and sigma

I am trying to probability density functions something like the sketch below:
so that the 50th percentile of the distribution has the highest PDF value, and it is at 400. Then the 2.5th and 97.5th percentiles are 320 and 480 respectively. Here is my code I tried:
import numpy as np
mu, sigma = 3, 1. # mean and standard deviation
s = np.random.lognormal(mu, sigma, 480)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
The issues I am having is defining the range for the function I want. By placing 480 in the line s = np.random.lognormal(mu, sigma, 480) does not change the shape of the distribution. Secondly, by change the values of mu and sigma, it merely changes the scale on the x axis. Where is my methodology going wrong?
I might misunderstand the question, but in case you are looking for a normal distribution with a mean of 400 and standard deviation of 40, you could use this:
mu, sigma = 400, 40.
s = np.random.normal(mu, sigma, 100000)
count, bins, ignored = plt.hist(s, 1000, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2) ),linewidth=3, color='r')

How can I plot Gaussian pseudo-random noise for N = 2?

How would one plot the Gaussian pseudo-random noise when N = 2 from the given code below? I don't know how to incorporate N into the formula in the code.
I need to plot for N =1; N = 2; and N=10
Here is the requirement:
Create "size=1000" "N"-sample averages of uniformly distributed random variables. This requires that you generate N * size psuedo-random numbers.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import math
# For reproducibility
np.random.seed(42)
y = np.random.uniform(size=1000)
# Plot the normalized histogram (i.e., the "sample" probability distribution)
facetGrid = sns.displot(y, stat="density", bins=25)
# Note: sns.displot returns a FacetGrid instance -- adding a title is a pain but it can be done
# See the API docs -- https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
facetGrid.fig.suptitle("Gaussian pseudo-random noise")
# Plot a Gaussian PDF using sample mean and variance
sigma = np.std(y)
mu = np.mean(y)
x = np.linspace(np.min(y), np.max(y), 100)
a = 1 / math.sqrt(2 * math.pi) / sigma
x2 = -.5 * ((x - mu) / sigma)**2
y = a * np.exp(x2)
plt.plot(x, y, 'r-', lw=5, alpha=0.6, label='norm pdf');
plt.show()
Here is the plot from the above:
Could you do something like this instead?
# Plot a Gaussian PDF using sample mean and variance
mu = np.mean(y)
sigma = np.std(y)
x = np.linspace(mu - 3 * sigma, mu + 3 * sigma, 100)
plt.plot(x, np.exp(-(x - mu) ** 2 / (2 * sigma ** 2)) / (math.sqrt(2 * math.pi) * sigma), "r-")

The scaling of my plots make the curves appear the same when they are not. How to shape them so that the difference can be seen?

I have placed my two plots side by side.
However, I have noticed that the plots have been shaped to be the same size, and this has caused the distribution curves to appear the same when I know they are not. The Cobalt curve should be shorter and fatter than the Rhodium curve.
fig, (ax1, ax2) = plt.subplots(1, 2)
mu = Mean_Sd(rhodium_data, "Mean all Angles")[2]
sigma = Mean_Sd(rhodium_data, "Mean all Angles")[3]
x = mu + sigma * np.random.randn(437)
num_bins = 50
n, bins, patches = ax.hist(x, num_bins, density=1) # creates histogram
# line of best fit
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
#Creating the plot graphic
ax1.plot(bins, y, '-')
ax1.tick_params(top=True, right=True)
ax1.tick_params(direction='in', length=6, width=1, colors='0')
ax1.grid()
ax1.set_xlabel("Mean of the Four Angles")
ax1.set_ylabel("Probability density")
ax1.set_title(r"Rhodium Distribution")
#####-----------------------------------------------------------------------------------####
mu = Mean_Sd(cobalt_data, "Mean all Angles")[2]
sigma = Mean_Sd(cobalt_data, "Mean all Angles")[3]
x = mu + sigma * np.random.randn(437)
num_bins = 50
n, bins, patches = ax.hist(x, num_bins, density=1) # creates histogram
# line of best fit
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
#Creating the plot graphic
ax2.plot(bins, y, '-')
ax2.tick_params(top=True, right=True)
ax2.tick_params(direction='in', length=6, width=1, colors='0')
ax2.grid()
ax2.set_xlabel("Mean of the Four Angles")
ax2.set_ylabel("Probability density")
ax2.set_title(r"Cobalt Distribution")
####----------------------------------------------------------------------------------####
fig.tight_layout()
plt.show()
Here is my code. I'm working with Python 3 on Jupyter Notebooks.
Edit
The mean of 'Mean all Angles' from 'Cobalt Data' is 105.1 Degrees. The standard deviation of 'Mean all Angles' from column 'Cobalt Data' is 7.866 Degrees.
The mean of 'Mean all Angles' from 'Rhodium Data' is 90.19 Degrees. The standard deviation of 'Mean all Angles' from column 'Rhodium Data' is 1.35 Degrees.
mu will be the mean, and sigma is the standard deviation.
Rhodium: mu = 90.19. sigma = 1.35
Cobalt: mu = 105.1. sigma = 7.866
As you have pointed out, the range difference between the two distributions is substantial. You could try to set ax1.set_xlim, ax1.set_ylim, ax2.set_xlim, ax2.set_ylim, but in my opinion at least one subplot would end up to be hardly legible.
What if you combine the two subplots into one?
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1)
mu = 105.1
sigma = 7.866
x1 = mu + sigma * np.random.randn(437)
num_bins = 50
n, bins1, patches = ax.hist(x1, num_bins, density=1, color="tab:blue", alpha=0.4) # creates histogram
# line of best fit
y1 = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins1 - mu))**2))
#####-----------------------------------------------------------------------------------####
mu = 90.19
sigma = 1.35
x2 = mu + sigma * np.random.randn(437)
num_bins = 50
n, bins2, patches = ax.hist(x2, num_bins, density=1, color="tab:orange", alpha=0.4) # creates histogram
# line of best fit
y2 = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins2 - mu))**2))
#Creating the plot graphic
ax.plot(bins1, y1, '-', label="Rhodium Distribution", color="tab:blue")
ax.plot(bins2, y2, '-', label="Cobalt Distribution", color="tab:orange")
ax.set_xlabel("Mean of the Four Angles")
ax.grid()
ax.set_ylabel("Probability density")
ax.tick_params(top=True, right=True)
ax.tick_params(direction='in', length=6, width=1, colors='0')
ax.legend()
ax.grid(which='major', axis='x', linewidth=0.75, linestyle='-', color='0.85')
ax.grid(which='minor', axis='x', linewidth=0.25, linestyle='--', color='0.80')
ax.grid(which='major', axis='y', linewidth=0.75, linestyle='-', color='0.85')
ax.grid(which='minor', axis='y', linewidth=0.25, linestyle='--', color='0.80')
ax.minorticks_on()
####----------------------------------------------------------------------------------####
fig.tight_layout()
plt.show()

fit more than one lognormal data with python

I am not high proficiency at maths, what i'm trying to do is to fit several populations that are supposed to be lognormal distributed. My piece of code is the next:
from scipy.optimize import curve_fit
# Generation of 3 population:
import numpy as np
s1 = np.random.lognormal(2, 0.6, 1000) #mu and sigma
s2 = np.random.lognormal(1.6, 0.3, 1000) #mu and sigma
s3 = np.random.lognormal(1.8, 0.5, 1000) #mu and sigma
mb = np.max([s1,s2,s3])
X = np.arange(1,mb,1)
#histogram population 1
Y11, bins1 = np.histogram(s1, X)
Y1 = Y11/Y11.sum()
X1 = bins1[:-1]
#histogram population 2
Y22, bins2 = np.histogram(s2, X)
Y2 = Y22/Y22.sum()
X2 = bins2[:-1]
#histogram population 3
Y33, bins3 = np.histogram(s3, X)
Y3 = Y33/Y33.sum()
X3 = bins3[:-1]
#universe, with all mixed populations
S = np.concatenate((s1, s2, s3), axis=None)
Yi, bins = np.histogram(S, X)
Y = Yi/Yi.sum()
X = bins[:-1]
def logN(x, mu, sigma):
return (np.exp(-(np.log(x) - mu)**2 / (2 * sigma **2)) / (x * sigma * np.sqrt(2 * np.pi))) #lognormal function
params, pcov = curve_fit(logN, X,Y, method="lm")
print(params)
plt.plot(X1, Y1, 'o')
plt.plot(X2, Y2, 'o')
plt.plot(X3, Y3, 'o')
plt.plot(X, Y, 'r-o')
plt.plot(X, logN(X ,params[0], params[1]))
plt.show()
This code produces a graph where I can get the global parameters mu and sigma. However, I'm confusing how should I do to get back the parameters of each population from the mixed population data. Any idea how to handle this problem is welcome

non-random sampling versions of np.random.normal

I'm trying to generate a single array that follows an exact gaussian distribution. np.random.normal sort of does this by randomly sampling from a gaussian, but how can I reproduce and exact gaussian given some mean and sigma. So the array would produce a histogram that follows an exact gaussian, not just an approximate gaussian as shown below.
mu, sigma = 10, 1
s = np.random.normal(mu, sigma, 1000)
fig = figure()
ax = plt.axes()
totaln, bbins, patches = ax.hist(s, 10, normed = 1, histtype = 'stepfilled', linewidth = 1.2)
plt.show()
If you'd like an exact gaussian histogram, don't generate points. You can never get an "exact" gaussian distribution from observed points, simply because you can't have a fraction of a point within a histogram bin.
Instead, plot the curve in the form of a bar graph.
import numpy as np
import matplotlib.pyplot as plt
def gaussian(x, mean, std):
scale = 1.0 / (std * np.sqrt(2 * np.pi))
return scale * np.exp(-(x - mean)**2 / (2 * std**2))
mean, std = 2.0, 5.0
nbins = 30
npoints = 1000
x = np.linspace(mean - 3 * std, mean + 3 * std, nbins + 1)
centers = np.vstack([x[:-1], x[1:]]).mean(axis=0)
y = npoints * gaussian(centers, mean, std)
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), color='lightblue')
# Optional...
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()

Categories

Resources