What's the meaning of count, bins, ignored in the code below which I found on the numpy website (https://numpy.org/doc/stable/reference/random/generated/numpy.random.lognormal.html).
import numpy as np
import matplotlib.pyplot as plt
mu, sigma = 0.2, 0.5 # mean and standard deviation
s = np.random.lognormal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
count is the value for the density in each bin (in this example 100 values). bins contains the bin edges (101 in this example), where each pair of edges [i,i+1] are the edges of the bin. ignored is not important for the purpose of that plot. According to the documentation of plt.hist, it is a "Container of individual artists used to create the histogram or list of such containers if there are multiple input datasets".
Related
I am trying to probability density functions something like the sketch below:
so that the 50th percentile of the distribution has the highest PDF value, and it is at 400. Then the 2.5th and 97.5th percentiles are 320 and 480 respectively. Here is my code I tried:
import numpy as np
mu, sigma = 3, 1. # mean and standard deviation
s = np.random.lognormal(mu, sigma, 480)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))
/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
The issues I am having is defining the range for the function I want. By placing 480 in the line s = np.random.lognormal(mu, sigma, 480) does not change the shape of the distribution. Secondly, by change the values of mu and sigma, it merely changes the scale on the x axis. Where is my methodology going wrong?
I might misunderstand the question, but in case you are looking for a normal distribution with a mean of 400 and standard deviation of 40, you could use this:
mu, sigma = 400, 40.
s = np.random.normal(mu, sigma, 100000)
count, bins, ignored = plt.hist(s, 1000, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
np.exp( - (bins - mu)**2 / (2 * sigma**2) ),linewidth=3, color='r')
I am trying to fit log-normal pdf on the matrix generated using the inbuilt log-normal function but it doesn't fit. I was wondering why it is off. The plot is attached for reference.
import numpy as np
import matplotlib.pyplot as plt
mu, sigma = 0.2, 0.5 # mean and standard deviation
A=np.random.lognormal(mean=0.2, sigma=0.5, size=(10, 10))
count, bins, ignored = plt.hist(A, 100, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 10000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.axis('tight')
plt.show()
I need to draw the density curve on the Histogram with the actual height of the bars (actual frequency) as the y-axis.
Try1:
I found a related answer here but, it has normalized the Histogram to the range of the curve.
Below is my code and the output.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
plt.hist(data, bins=12, normed=True)
plt.plot(x, mlab.normpdf(x, mu, sigma))
plt.show()
Try2:
There #DavidG has given an option, a user defined function even it doesn't cover the density of the Histogram accurately.
def gauss_function(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
test = gauss_function(x, max(data), mu, sigma)
plt.hist(data, bins=12)
plt.plot(x, test)
plt.show()
The result for this was,
But the actual Histogram is below, where Y-axis ranges from 0 to 8,
And I want to draw the density curve exactly on that. Any help this regards will be really appreciated.
Is this what you're looking for? I'm multiplying the pdf by the area of the histogram.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
data = [125.36, 126.66, 130.28, 133.74, 126.92, 120.85, 119.42, 128.61, 123.53, 130.15, 126.02, 116.65, 125.24, 126.84,
125.95, 114.41, 138.62, 127.4, 127.59, 123.57, 133.76, 124.6, 113.48, 128.6, 121.04, 119.42, 120.83, 136.53, 120.4,
136.58, 121.73, 132.72, 109.25, 125.42, 117.67, 124.01, 118.74, 128.99, 131.11, 112.27, 118.76, 119.15, 122.42,
122.22, 134.71, 126.22, 130.33, 120.52, 126.88, 117.4]
(mu, sigma) = norm.fit(data)
x = np.linspace(min(data), max(data), 100)
values, bins, _ = plt.hist(data, bins=12)
area = sum(np.diff(bins) * values)
plt.plot(x, norm.pdf(x, mu, sigma) * area, 'r')
plt.show()
Result:
I have a range of particle size distribution data arranged by percentage volume fraction, like so:;
size %
6.68 0.05
9.92 1.15
etc.
I need to fit this data to a lognormal distribution, which I planned to do using python's stats.lognorm.fit function, but this seems to expect the input as an array of variates rather than binned data, judging by what I've read.
I was planning to use a for loop to iterate through the data and .extend each size entry to a placeholder array the required number of times to create an array with a list of variates that corresponds to the binned data.
This seems really ugly and inefficient though, and the kind of thing that there's probably an easy way to do. Is there a way to input binned data into the stats.lognorm.fit function?
I guess one possible workaround is to manually fit a pdf to your bin data, assuming x values are the midpoint of each interval, and y values are the corresponding bin frequency. And then fit a curve based on x and y values using scipy.optimize.curve_fit. I think accuracy of the results will depend the number of bins you have. An example is shown below:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def pdf(x, mu, sigma):
"""pdf of lognormal distribution"""
return (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
mu, sigma = 3., 1. # actual parameter value
data = np.random.lognormal(mu, sigma, size=1000) # data generation
h = plt.hist(data, bins=30, normed = True)
y = h[0] # frequencies for each bin, this is y value to fit
xs = h[1] # boundaries for each bin
delta = xs[1] - xs[0] # width of bins
x = xs[:-1] + delta / # midpoints of bins, this is x value to fit
popt, pcov = curve_fit(pdf, x, y, p0=[1, 1]) # data fitting, popt contains the fitted parameters
print(popt)
# [ 3.13048122 1.01360758] fitting results
fig, ax = plt.subplots()
ax.hist(data, bins=30, normed=True, align='mid', label='Histogram')
xr = np.linspace(min(xs), max(xs), 10000)
yr = pdf(xr, mu, sigma)
yf = pdf(xr, *popt)
ax.plot(xr, yr, label="Actual")
ax.plot(xr, yf, linestyle = 'dashed', label="Fitted")
ax.legend()
I'm trying to generate a single array that follows an exact gaussian distribution. np.random.normal sort of does this by randomly sampling from a gaussian, but how can I reproduce and exact gaussian given some mean and sigma. So the array would produce a histogram that follows an exact gaussian, not just an approximate gaussian as shown below.
mu, sigma = 10, 1
s = np.random.normal(mu, sigma, 1000)
fig = figure()
ax = plt.axes()
totaln, bbins, patches = ax.hist(s, 10, normed = 1, histtype = 'stepfilled', linewidth = 1.2)
plt.show()
If you'd like an exact gaussian histogram, don't generate points. You can never get an "exact" gaussian distribution from observed points, simply because you can't have a fraction of a point within a histogram bin.
Instead, plot the curve in the form of a bar graph.
import numpy as np
import matplotlib.pyplot as plt
def gaussian(x, mean, std):
scale = 1.0 / (std * np.sqrt(2 * np.pi))
return scale * np.exp(-(x - mean)**2 / (2 * std**2))
mean, std = 2.0, 5.0
nbins = 30
npoints = 1000
x = np.linspace(mean - 3 * std, mean + 3 * std, nbins + 1)
centers = np.vstack([x[:-1], x[1:]]).mean(axis=0)
y = npoints * gaussian(centers, mean, std)
fig, ax = plt.subplots()
ax.bar(x[:-1], y, width=np.diff(x), color='lightblue')
# Optional...
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()