Random number generator from a given distribution function - python

I am completely new in programming. I have density function which have two range. How can i get random function according to this function.
The probability density function for the last return time is:
(1/sqrt(2*pi*std**2))*exp(-(x+24-µ2)**2/2*std**2) , 0 < x ≤ µ2 − 12
f(x) =
(1/sqrt(2*pi*std**2))*exp(-(x-µ2)**2/2*std**2) , µ2 − 12 < x ≤ 24
std=3.4,µ2=17.6
After finding for couple hours i get this answers
1.get random number from 0to 1
2.calculate cdf
3.calculate inverse cdf
4.get random number
But i dont know how i can implement this in python.

You can create your own distribution using scipy.stats.rv_continuous as the base class. This class has fast default implementations of CDF, random number generator, SF, ISF, ect given the PDF of the distribution. You can implement your own distribution using something like:
import numpy as np
from numpy import exp
from scipy.stats import rv_continuous
class my_distribution_gen(rv_continuous):
def _logpdf(self, x, mu, std):
# code the log of your pdf function here
result = # here goes your equation
return result
def _pdf(self, x, mu, std):
return exp(self._logpdf(x, mu, std))
my_distribution = my_distribution_gen(name='my_distribution')
Once you have the above class ready, you can enjoy the default implementations by calling methods like rvs, cdf, etc.
mu, std = 0, 1
rvs = my_distribution.rvs(mu, std)

Related

Scipy: sample lognormal distribution from tail only

I'm building a simulation which requires random draws from the tail of a lognormal distribution. A threshold τ (tau) is chosen, and a resulting conditional distribution is given by:
I need to randomly sample from that conditional distribution, where F(x) is lognormal with a chosen µ (mu) and σ (sigma), and τ (tau) is set by the user.
My inelegant solution right now is simply to sample from the lognormal, tossing out any values under τ (tau), until I have the sample size I need. But I'm sure this can be improved.
Thanks for the help!
The easiest way is probably to leverage the truncated normal distribution as provided by Scipy.
This gives the following code, with ν (nu) as the variable of the standard Gaussian distribution, and τ (tau) mapping to ν0 on that distribution. This function returns a Numpy array containing ranCount lognormal variates:
import numpy as np
from scipy.stats import truncnorm
def getMySamplesScipy(ranCount, mu, sigma, tau):
nu0 = (math.log(tau) - mu) / sigma # position of tau on unit Gaussian
xs = truncnorm.rvs(nu0, np.inf, size=ranCount) # truncated unit normal samples
ys = np.exp(mu + sigma * xs) # go back to x space
return ys
If for some reason this is not suitable, well some of the tricks commonly used for Gaussian variates, such as Box-Muller do not work for a truncated distribution, but we can resort always to a general principle: the Inverse Transform Sampling theorem.
So we generate cumulative probabilities for our variates, by transforming uniform variates. And we trust Scipy, using its inverse of the erf error function to go back from our probabilities to the x space values.
This gives something like the following Python code (without any attempt at optimization):
import math
import random
import numpy as np
import numpy.random as nprd
import scipy.special as spfn
# using the "Inverse Method":
def getMySamples(ranCount, mu, sigma, tau):
nu0 = (math.log(tau) - mu) / sigma # position of tau in standard Gaussian curve
headCP = (1/2) * (1 + spfn.erf(nu0/math.sqrt(2)))
tailCP = 1.0 - headCP # probability of being in the "tail"
uvs = np.random.uniform(0.0, 1.0, ranCount) # uniform variates
cps = (headCP + uvs * tailCP) # Cumulative ProbabilitieS
nus = (math.sqrt(2)) * spfn.erfinv(2*cps-1) # positions in standard Gaussian
xs = np.exp(mu + sigma * nus) # go back to x space
return xs
Alternatives:
We can leverage the significant amount of material related to the Truncated Gaussian distribution.
There is a relatively recent (2016) review paper on the subject by Zdravko Botev and Pierre L'Ecuyer. This paper provides a pointer to publicly available R source code. Some material is seriously old, for example the 1986 book by Luc Devroye: Non-Uniform Random Variate Generation.
For example, a possible rejection-based method: if τ (tau) maps to ν0 on the standard Gaussian curve, the unit Gaussian distribution is like exp(-ν2/2). If we write ν = ν0 + δ, this is proportional to: exp(-δ2/2) * exp(-ν0*δ).
The idea is to approximate the exact distribution beyond ν0 by an exponential one, of parameter ν0. Note that the exact distribution is constantly below the approximate one. Then we can randomly accept the relatively cheap exponential variates with a probability of exp(-δ2/2).
We can just pick an equivalent algorithm in the literature. In the Devroye book, chapter IX page 382, there is some pseudo-code:
REPEAT
generate independent exponential random variates X and Y
UNTIL X2 <= 2*ν02*Y
RETURN R <-- ν0 + X/ν0
for which a Numpy rendition could be written like this:
def getMySamplesXpRj(rawRanCount, mu, sigma, tau):
nu0 = (math.log(tau) - mu) / sigma # position of tau in standard Gaussian
if (nu0 <= 0):
print("Error: τ (tau) too small in getMySamplesXpRj")
rnu0 = 1.0 / nu0
xs = nprd.exponential(1.0, rawRanCount) # exponential "raw" variates
ys = nprd.exponential(1.0, rawRanCount)
allSamples = nu0 + (rnu0 * xs)
boolArray = (xs*xs - 2*nu0*nu0*ys) <= 0.0
samples = allSamples[boolArray]
ys = np.exp(mu + sigma * samples) # go back to x space
return ys
According to Table 3 in the Botev-L'Ecuyer paper, the rejection rate of this algorithm is nicely low.
Besides, if you are willing to allow for some sophistication, there is also some literature about the Ziggurat algorithm as used for truncated Gaussian distributions, for example the 2012 arXiv 1201.6140 paper by Nicolas Chopin at ENSAE-CREST.
Side note: with recent versions of Python, it seems that you can use Greek letters for your variable names directly, σ instead of sigma, τ instead of tau, just as in the statistics books:
$ python3
Python 3.9.6 (default, Jun 29 2021, 00:00:00)
>>>
>>> σ = 2
>>> τ = 7
>>>
>>> στ = σ * τ
>>>
>>> στ + 1
15
>>>
A clean way is to define a subclass of rv_continuous with an implementation of _cdf. To draw variates you may want to also define _ppf or _rvs methods.

Moment matching - simulating a discrete distribution with specified moments (mean, standard deviation, skewness, kurtosis) in Python

Is there any library/function in Python which allows us to generate discrete data that matches given target moments (mean, standard deviation, skewness, kurtosis)? I do not wish to necessarily enforce any specific underlying continuous distribution.
That is, I want to generate, say, 10000 numbers, such that when we calculate their first four moments using standard formulae we get something close to the target moments given as input.
Any known library in Python that implements such method? Her is an example of a paper in which this specific problem is solved (as part of a larger problem):
https://link.springer.com/article/10.1023/A:1021853807313
Thanks!
Yes, although not with 100% accuracy, this is possible.
import statsmodels.sandbox.distributions.extras as extras
import scipy.interpolate as interpolate
import scipy.stats as ss
import matplotlib.pyplot as plt
import numpy as np
def generate_normal_four_moments(mu, sigma, skew, kurt, size=10000, sd_wide=10):
f = extras.pdf_mvsk([mu, sigma, skew, kurt])
x = np.linspace(mu - sd_wide * sigma, mu + sd_wide * sigma, num=500)
y = [f(i) for i in x]
yy = np.cumsum(y) / np.sum(y)
inv_cdf = interpolate.interp1d(yy, x, fill_value="extrapolate")
rr = np.random.rand(size)
return inv_cdf(rr)
Next, we generate the data by using
data = generate_normal_four_moments(mu=0, sigma=1, skew=-1, kurt=3)
Let's check the moments:
np.mean(data)
np.var(data)
ss.skew(data)
ss.kurtosis(data)
gives
-0.039986656405454374
1.051375501684874
-1.071149838792561
2.9813805363255472

Fast arbitrary distribution random sampling (inverse transform sampling)

The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.
I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
Pythonic way to select list elements with different probability
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
#property
def ndim(self):
return len(self.shape)
#property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
Here is a rather nice way of performing inverse transform sampling with a decorator.
import numpy as np
from scipy.interpolate import interp1d
def inverse_sample_decorator(dist):
def wrapper(pnts, x_min=-100, x_max=100, n=1e5, **kwargs):
x = np.linspace(x_min, x_max, int(n))
cumulative = np.cumsum(dist(x, **kwargs))
cumulative -= cumulative.min()
f = interp1d(cumulative/cumulative.max(), x)
return f(np.random.random(pnts))
return wrapper
Using this decorator on a Gaussian distribution, for example:
#inverse_sample_decorator
def gauss(x, amp=1.0, mean=0.0, std=0.2):
return amp*np.exp(-(x-mean)**2/std**2/2.0)
You can then generate sample points from the distribution by calling the function. The keyword arguments x_min and x_max are the limits of the original distribution and can be passed as arguments to gauss along with the other key word arguments that parameterise the distribution.
samples = gauss(5000, mean=20, std=0.8, x_min=19, x_max=21)
Alternatively, this can be done as a function that takes the distribution as an argument (as in your original question),
def inverse_sample_function(dist, pnts, x_min=-100, x_max=100, n=1e5,
**kwargs):
x = np.linspace(x_min, x_max, int(n))
cumulative = np.cumsum(dist(x, **kwargs))
cumulative -= cumulative.min()
f = interp1d(cumulative/cumulative.max(), x)
return f(np.random.random(pnts))
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples = []
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.

How to weigh a function with 2 variables with a Gaussian distribution in python?

I've been working with this for the last days and I couldn't see yet where is the problem.
I'm trying to weight a function with 2 variables f(q,r) within a Gaussian distribution g(r) with a specific mean value (R0) and deviation (sigma). This is needed because the theoretical function f(q) has a certain dispersity in its r variable when analyzed experimentally. Therefore, we use a probability density function to weigh our function in the r variable.
I include the code, which works, but doesn't give the expected result (the weighted curve should be smoother as the polydispersity grows (higher sigma) as it is shown below. As you can see, I integrated the convolution of the 2 functions f(r,q)*g(r) from r = 0 to r = +inf.
The result is plotted to compare the weigh result with the simple function:
from scipy.integrate import quad, quadrature
import numpy as np
import math as m
import matplotlib.pyplot as plt
#function weighted with a probability density function (gaussian)
def integrand(r,q):
#gaussian function normalized
def gauss_nor(r):
#gaussian function
def gauss(r):
return m.exp(-((r-R0)**2)/(2*sigma**2))
return (m.exp(-((r-R0)**2)/(2*sigma**2)))/(quad(gauss,0,np.inf)[0])
#function f(r,q)
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
return gauss_nor(r)*f(r,q)
#quadratic integration of the integrand (from 0 to +inf)
#integrand is function*density_function (gauss)
def function(q):
return quad(integrand, 0, np.inf, args=(q))[0]
#parameters used in the function
R0=20
sigma=5
#range to plot q
q=np.arange(0.001,2.0,0.005)
#vector where the result of the integral will be saved
function_vec = np.vectorize(function)
#vector for the squared power of the integral
I=[]
I=(function_vec(q))**2
#function without density function
I0=[]
I0=(3*(np.sin(q*R0)-q*R0*np.cos(q*R0))/((R0*q)**3))**2
#plot of weighted and non-weighted functions
p1,=plt.plot(q,I,'b')
p3,=plt.plot(q,I0,'r')
plt.legend([p1,p3],('Weighted','No weighted'))
plt.yscale('log')
plt.xscale('log')
plt.show()
Thank you very much. I've been with this problems for some days already and I haven't found the mistake.
Maybe somebody know how to weigh a function with a PDF in an easier way.
I simplified your code, the output is the same as yours. I think it's already very smooth, there are some very sharp peak in the log-log graph, just because the curve has zero points. So it's not smooth in a log-log graph, but it's smooth in a normal X-Y graph.
import numpy as np
def gauss(r):
return np.exp(-((r-R0)**2)/(2*sigma**2))
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
R0=20
sigma=5
qm, rm = np.ogrid[0.001:2.0:0.005, 0.001:40:1000j]
gr = gauss(rm)
gr /= np.sum(gr)
fm = f(rm, qm)
fm *= gr
plot(qm.ravel(), fm.sum(axis=1)**2)
plt.yscale('log')
plt.xscale('log')

How do I get a lognormal distribution in Python with Mu and Sigma?

I have been trying to get the result of a lognormal distribution using Scipy. I already have the Mu and Sigma, so I don't need to do any other prep work. If I need to be more specific (and I am trying to be with my limited knowledge of stats), I would say that I am looking for the cumulative function (cdf under Scipy). The problem is that I can't figure out how to do this with just the mean and standard deviation on a scale of 0-1 (ie the answer returned should be something from 0-1). I'm also not sure which method from dist, I should be using to get the answer. I've tried reading the documentation and looking through SO, but the relevant questions (like this and this) didn't seem to provide the answers I was looking for.
Here is a code sample of what I am working with. Thanks.
from scipy.stats import lognorm
stddev = 0.859455801705594
mean = 0.418749176686875
total = 37
dist = lognorm.cdf(total,mean,stddev)
UPDATE:
So after a bit of work and a little research, I got a little further. But I still am getting the wrong answer. The new code is below. According to R and Excel, the result should be .7434, but that's clearly not what is happening. Is there a logic flaw I am missing?
dist = lognorm([1.744],loc=2.0785)
dist.cdf(25) # yields=0.96374596, expected=0.7434
UPDATE 2:
Working lognorm implementation which yields the correct 0.7434 result.
def lognorm(self,x,mu=0,sigma=1):
a = (math.log(x) - mu)/math.sqrt(2*sigma**2)
p = 0.5 + 0.5*math.erf(a)
return p
lognorm(25,1.744,2.0785)
> 0.7434
I know this is a bit late (almost one year!) but I've been doing some research on the lognorm function in scipy.stats. A lot of folks seem confused about the input parameters, so I hope to help these people out. The example above is almost correct, but I found it strange to set the mean to the location ("loc") parameter - this signals that the cdf or pdf doesn't 'take off' until the value is greater than the mean. Also, the mean and standard deviation arguments should be in the form exp(Ln(mean)) and Ln(StdDev), respectively.
Simply put, the arguments are (x, shape, loc, scale), with the parameter definitions below:
loc - No equivalent, this gets subtracted from your data so that 0 becomes the infimum of the range of the data.
scale - exp μ, where μ is the mean of the log of the variate. (When fitting, typically you'd use the sample mean of the log of the data.)
shape - the standard deviation of the log of the variate.
I went through the same frustration as most people with this function, so I'm sharing my solution. Just be careful because the explanations aren't very clear without a compendium of resources.
For more information, I found these sources helpful:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html#scipy.stats.lognorm
https://stats.stackexchange.com/questions/33036/fitting-log-normal-distribution-in-r-vs-scipy
And here is an example, taken from #serv-inc 's answer, posted on this page here:
import math
from scipy import stats
# standard deviation of normal distribution
sigma = 0.859455801705594
# mean of normal distribution
mu = 0.418749176686875
# hopefully, total is the value where you need the cdf
total = 37
frozen_lognorm = stats.lognorm(s=sigma, scale=math.exp(mu))
frozen_lognorm.cdf(total) # use whatever function and value you need here
It sounds like you want to instantiate a "frozen" distribution from known parameters. In your example, you could do something like:
from scipy.stats import lognorm
stddev = 0.859455801705594
mean = 0.418749176686875
dist=lognorm([stddev],loc=mean)
which will give you a lognorm distribution object with the mean and standard deviation you specify. You can then get the pdf or cdf like this:
import numpy as np
import pylab as pl
x=np.linspace(0,6,200)
pl.plot(x,dist.pdf(x))
pl.plot(x,dist.cdf(x))
Is this what you had in mind?
from math import exp
from scipy import stats
def lognorm_cdf(x, mu, sigma):
shape = sigma
loc = 0
scale = exp(mu)
return stats.lognorm.cdf(x, shape, loc, scale)
x = 25
mu = 2.0785
sigma = 1.744
p = lognorm_cdf(x, mu, sigma) #yields the expected 0.74341
Similar to Excel and R, The lognorm_cdf function above parameterizes the CDF for the log-normal distribution using mu and sigma.
Although SciPy uses shape, loc and scale parameters to characterize its probability distributions, for the log-normal distribution I find it slightly easier to think of these parameters at the variable level rather than at the distribution level. Here's what I mean...
A log-normal variable X is related to a normal variable Z as follows:
X = exp(mu + sigma * Z) #Equation 1
which is the same as:
X = exp(mu) * exp(Z)**sigma #Equation 2
This can be sneakily re-written as follows:
X = exp(mu) * exp(Z-Z0)**sigma #Equation 3
where Z0 = 0. This equation is of the form:
f(x) = a * ( (x-x0) ** b ) #Equation 4
If you can visualize equations in your head it should be clear that the scale, shape and location parameters in Equation 4 are: a, b and x0, respectively. This means that in Equation 3 the scale, shape and location parameters are: exp(mu), sigma and zero, respectfully.
If you can't visualize that very clearly, let's rewrite Equation 2 as a function:
f(Z) = exp(mu) * exp(Z)**sigma #(same as Equation 2)
and then look at the effects of mu and sigma on f(Z). The figure below holds sigma constant and varies mu. You should see that mu vertically scales f(Z). However, it does so in a nonlinear manner; the effect of changing mu from 0 to 1 is smaller than the effect of changing mu from 1 to 2. From Equation 2 we see that exp(mu) is actually the linear scaling factor. Hence SciPy's "scale" is exp(mu).
The next figure holds mu constant and varies sigma. You should see that the shape of f(Z) changes. That is, f(Z) has a constant value when Z=0 and sigma affects how quickly f(Z) curves away from the horizontal axis. Hence SciPy's "shape" is sigma.
Even more late, but in case it's helpful to anyone else: I found that the Excel's
LOGNORM.DIST(x,Ln(mean),standard_dev,TRUE)
provides the same results as python's
from scipy.stats import lognorm
lognorm.cdf(x,sigma,0,mean)
Likewise, Excel's
LOGNORM.DIST(x,Ln(mean),standard_dev,FALSE)
seems equivalent to Python's
from scipy.stats import lognorm
lognorm.pdf(x,sigma,0,mean).
#lucas' answer has the usage down pat. As a code example, you could use
import math
from scipy import stats
# standard deviation of normal distribution
sigma = 0.859455801705594
# mean of normal distribution
mu = 0.418749176686875
# hopefully, total is the value where you need the cdf
total = 37
frozen_lognorm = stats.lognorm(s=sigma, scale=math.exp(mu))
frozen_lognorm.cdf(total) # use whatever function and value you need here
Known mean and stddev of the lognormal distribution
In case someone is looking for it, here is a solution for getting the scipy.stats.lognorm distribution if the mean mu and standard deviation sigma of the lognormal distribution are known. In this case we have to calculate the stats.lognorm parameters from the known mu and sigma like so:
import numpy as np
from scipy import stats
mu = 10
sigma = 3
a = 1 + (sigma / mu) ** 2
s = np.sqrt(np.log(a))
scale = mu / np.sqrt(a)
This was obtained by looking into the implementation of the variance and mean calculations in the stats.lognorm.stats method and essentially reversing it (solving for the input).
Then we can initialize the frozen distribution instance
distr = stats.lognorm(s, 0, scale)
# generate some randomvals
randomvals = distr.rvs(1_000_000)
# calculate mean and variance using the dedicated method
mu_stats, var_stats = distr.stats("mv")
Compare means and stddevs from input, randomvals and analytical solution from distr.stats:
print(f"""
Mean Std
----------------------------
Input: {mu:6.2f} {sigma:6.2f}
Randomvals: {randomvals.mean():6.2f} {randomvals.std():6.2f}
lognorm.stats: {mu_stats:6.2f} {np.sqrt(var_stats):6.2f}
""")
Mean Std
----------------------------
Input: 10.00 3.00
Randomvals: 10.00 3.00
lognorm.stats: 10.00 3.00
Plot PDF from stats.lognorm and histogram of the random values:
import holoviews as hv
hv.extension('bokeh')
x = np.linspace(0, 30, 301)
counts, _ = np.histogram(randomvals, bins=x)
counts = counts / counts.sum() / (x[1] - x[0])
(hv.Histogram((counts, x))
* hv.Curve((x, distr.pdf(x))).opts(color="r").opts(width=900))
If you read this and just want a function with the behaviour similar to lnorm in R. Well, then relieve yourself from violent anger and use numpy's numpy.random.lognormal.

Categories

Resources