Fitting a log-normal function on array values using Python - python

I am trying to fit log-normal pdf on the matrix generated using the inbuilt log-normal function but it doesn't fit. I was wondering why it is off. The plot is attached for reference.
import numpy as np
import matplotlib.pyplot as plt
mu, sigma = 0.2, 0.5 # mean and standard deviation
A=np.random.lognormal(mean=0.2, sigma=0.5, size=(10, 10))
count, bins, ignored = plt.hist(A, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()

Related

Tuning the percentiles of a probability density function with values of mu and sigma

I am trying to probability density functions something like the sketch below:
so that the 50th percentile of the distribution has the highest PDF value, and it is at 400. Then the 2.5th and 97.5th percentiles are 320 and 480 respectively. Here is my code I tried:
import numpy as np
mu, sigma = 3, 1. # mean and standard deviation
s = np.random.lognormal(mu, sigma, 480)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
The issues I am having is defining the range for the function I want. By placing 480 in the line s = np.random.lognormal(mu, sigma, 480) does not change the shape of the distribution. Secondly, by change the values of mu and sigma, it merely changes the scale on the x axis. Where is my methodology going wrong?
I might misunderstand the question, but in case you are looking for a normal distribution with a mean of 400 and standard deviation of 40, you could use this:
mu, sigma = 400, 40.
s = np.random.normal(mu, sigma, 100000)
count, bins, ignored = plt.hist(s, 1000, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2) ),linewidth=3, color='r')

How can I plot Gaussian pseudo-random noise for N = 2?

How would one plot the Gaussian pseudo-random noise when N = 2 from the given code below? I don't know how to incorporate N into the formula in the code.
I need to plot for N =1; N = 2; and N=10
Here is the requirement:
Create "size=1000" "N"-sample averages of uniformly distributed random variables. This requires that you generate N * size psuedo-random numbers.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import math
# For reproducibility
np.random.seed(42)
y = np.random.uniform(size=1000)
# Plot the normalized histogram (i.e., the "sample" probability distribution)
facetGrid = sns.displot(y, stat="density", bins=25)
# Note: sns.displot returns a FacetGrid instance -- adding a title is a pain but it can be done
# See the API docs -- https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
facetGrid.fig.suptitle("Gaussian pseudo-random noise")
# Plot a Gaussian PDF using sample mean and variance
sigma = np.std(y)
mu = np.mean(y)
x = np.linspace(np.min(y), np.max(y), 100)
a = 1 / math.sqrt(2 * math.pi) / sigma
x2 = -.5 * ((x - mu) / sigma)**2
y = a * np.exp(x2)
plt.plot(x, y, 'r-', lw=5, alpha=0.6, label='norm pdf');
plt.show()
Here is the plot from the above:
Could you do something like this instead?
# Plot a Gaussian PDF using sample mean and variance
mu = np.mean(y)
sigma = np.std(y)
x = np.linspace(mu - 3 * sigma, mu + 3 * sigma, 100)
plt.plot(x, np.exp(-(x - mu) ** 2 / (2 * sigma ** 2)) / (math.sqrt(2 * math.pi) * sigma), "r-")

Clarification of log-normal distribution using Python

What's the meaning of count, bins, ignored in the code below which I found on the numpy website (https://numpy.org/doc/stable/reference/random/generated/numpy.random.lognormal.html).
import numpy as np
import matplotlib.pyplot as plt
mu, sigma = 0.2, 0.5 # mean and standard deviation
s = np.random.lognormal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
count is the value for the density in each bin (in this example 100 values). bins contains the bin edges (101 in this example), where each pair of edges [i,i+1] are the edges of the bin. ignored is not important for the purpose of that plot. According to the documentation of plt.hist, it is a "Container of individual artists used to create the histogram or list of such containers if there are multiple input datasets".

Draw the density curve exactly on the Histogram without normalizing

I need to draw the density curve on the Histogram with the actual height of the bars (actual frequency) as the y-axis.
Try1:
I found a related answer here but, it has normalized the Histogram to the range of the curve.
Below is my code and the output.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
plt.hist(data, bins=12, normed=True)
plt.plot(x, mlab.normpdf(x, mu, sigma))
plt.show()
Try2:
There #DavidG has given an option, a user defined function even it doesn't cover the density of the Histogram accurately.
def gauss_function(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
test = gauss_function(x, max(data), mu, sigma)
plt.hist(data, bins=12)
plt.plot(x, test)
plt.show()
The result for this was,
But the actual Histogram is below, where Y-axis ranges from 0 to 8,
And I want to draw the density curve exactly on that. Any help this regards will be really appreciated.
Is this what you're looking for? I'm multiplying the pdf by the area of the histogram.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
values, bins, _ = plt.hist(data, bins=12)
area = sum(np.diff(bins) * values)
plt.plot(x, norm.pdf(x, mu, sigma) * area, 'r')
plt.show()
Result:

non-random sampling versions of np.random.normal

I'm trying to generate a single array that follows an exact gaussian distribution. np.random.normal sort of does this by randomly sampling from a gaussian, but how can I reproduce and exact gaussian given some mean and sigma. So the array would produce a histogram that follows an exact gaussian, not just an approximate gaussian as shown below.
mu, sigma = 10, 1
s = np.random.normal(mu, sigma, 1000)
fig = figure()
ax = plt.axes()
totaln, bbins, patches = ax.hist(s, 10, normed = 1, histtype = 'stepfilled', linewidth = 1.2)
plt.show()
If you'd like an exact gaussian histogram, don't generate points. You can never get an "exact" gaussian distribution from observed points, simply because you can't have a fraction of a point within a histogram bin.
Instead, plot the curve in the form of a bar graph.
import numpy as np
import matplotlib.pyplot as plt
def gaussian(x, mean, std):
scale = 1.0 / (std * np.sqrt(2 * np.pi))
return scale * np.exp(-(x - mean)**2 / (2 * std**2))
mean, std = 2.0, 5.0
nbins = 30
npoints = 1000
x = np.linspace(mean - 3 * std, mean + 3 * std, nbins + 1)
centers = np.vstack([x[:-1], x[1:]]).mean(axis=0)
y = npoints * gaussian(centers, mean, std)
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), color='lightblue')
# Optional...
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()

Categories

Resources