Python lognorm.cdf vs. formula based implementation not matching

Python lognorm.cdf vs. formula based implementation not matching - python

Okay I am converting the scipy.stats.lognorm.cdf function over to a Cython function and using the formula here: http://www.cs.unitn.it/~taufer/SR/P-LN.pdf as 1/2 + 1/2* erf((ln(x)-mu)/sigma*sqrt(2). The results don't match, despite many other references to the same function online. EDIT: just fixed, only had to do np.log(mu) 2x ... Fixed code:
import numpy as np
from scipy.stats import lognorm
from scipy.special import erf
def lognormcdf(x, mu, sigma):
return 0.5 + 0.5*erf((np.log(x)-np.log(mu))/(np.sqrt(2.0)*sigma))
mu = 3.85
sigma = 0.346
x = [-9.997137267734412802e-01,-9.984919506395958377e-01,-9.962951347331251428e-01,-9.931249370374434227e-01,-9.889843952429917540e-01,-9.838775407060570410e-01,-9.778093584869183008e-01,-9.707857757637063933e-01,-9.628136542558155542e-01,-9.539007829254917414e-01,-9.440558701362560257e-01,-9.332885350430795146e-01,-9.216092981453339883e-01,-9.090295709825296777e-01,-8.955616449707269888e-01,-8.812186793850184108e-01,-8.660146884971646752e-01,-8.499645278795913139e-01,-8.330838798884008245e-01,-8.153892383391762033e-01,-7.968978923903144995e-01,-7.776279096494954635e-01,-7.575981185197071532e-01,-7.368280898020207470e-01,-7.153381175730564312e-01,-6.931491993558019926e-01,-6.702830156031409636e-01,-6.467619085141292912e-01,-6.226088602037077591e-01,-5.978474702471787694e-01,-5.725019326213811599e-01,-5.465970120650941455e-01,-5.201580198817630230e-01,-4.932107892081909473e-01,-4.657816497733580086e-01,-4.378974021720314913e-01,-4.095852916783015440e-01,-3.808729816246299582e-01,-3.517885263724216949e-01,-3.223603439005291449e-01,-2.926171880384719759e-01,-2.625881203715034751e-01,-2.323024818449739570e-01,-2.017898640957360157e-01,-1.710800805386032686e-01,-1.402031372361139672e-01,-1.091892035800611088e-01,-7.806858281343663497e-02,-4.687168242159163445e-02,-1.562898442154308370e-02,1.562898442154308370e-02,4.687168242159163445e-02,7.806858281343663497e-02,1.091892035800611088e-01,1.402031372361139672e-01,1.710800805386032686e-01,.017898640957360157e-01,2.323024818449739570e-01,2.625881203715034751e-01,2.926171880384719759e-01,3.223603439005291449e-01,3.517885263724216949e-01,3.808729816246299582e-01,4.095852916783015440e-01,4.378974021720314913e-01,4.657816497733580086e-01,4.932107892081909473e-01,5.201580198817630230e-01,5.465970120650941455e-01,5.725019326213811599e-01,5.978474702471787694e-01,6.226088602037077591e-01,6.467619085141292912e-01,6.702830156031409636e-01,6.931491993558019926e-01,7.153381175730564312e-01,7.368280898020207470e-01,7.575981185197071532e-01,7.776279096494954635e-01,7.968978923903144995e-01,8.153892383391762033e-01,8.330838798884008245e-01,8.499645278795913139e-01,8.660146884971646752e-01,8.812186793850184108e-01,8.955616449707269888e-01,9.090295709825296777e-01,9.216092981453339883e-01,9.332885350430795146e-01,9.440558701362560257e-01,9.539007829254917414e-01,9.628136542558155542e-01,9.707857757637063933e-01,9.778093584869183008e-01,9.838775407060570410e-01,9.889843952429917540e-01,9.931249370374434227e-01,9.962951347331251428e-01,9.984919506395958377e-01,9.997137267734412802e-01]
mycdf = lognormcdf(x, np.log(mu), sigma)
scipycdf = lognorm.cdf(x, scale=np.log(mu), s=sigma)
# This line comparing the Scipy function and mine displays the results below
np.sum(np.nan_to_num(mycdf)-scipycdf)
Results:
1.2011928779531548e-15

The original post was edited to reflect the correct formula.
def lognormcdf(x, mu, sigma):
return 0.5 + 0.5*erf((np.log(x)-np.log(mu))/(np.sqrt(2.0)*sigma))
Pass np.log(mu) in for mu and it works.

Related

parameterization of the negative binomial in scipy via mean and std

I am trying to fit my data to a Negative Binomial Distribution with the package scipy in Python. However, my validation seems to fail.
These are my steps:
I have some demand data which is described by the statistics:
mu = 1.4
std = 1.59
print(mu, std)
I use the parameterization function below, taken from this post to compute the two NB parameters.
def convert_params(mu, theta):
"""
Convert mean/dispersion parameterization of a negative binomial to the ones scipy supports
See https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations
"""
r = theta
var = mu + 1 / r * mu ** 2
p = (var - mu) / var
return r, 1 - p
I pass (hopefully correctly...) my two statistics - the naming convention between different sources is rather confusing at this point p, r, k
firstParam, secondParam = convert_params(mu, std)
I would then use these two parameters to fit the distribution:
from scipy.stats import nbinom
rv = nbinom(firstParam, secondParam)
Then I calculate a value R with the Percent Point Function .ppf(0.95). The value R in the context of my problem is a Reorder Point.
R = rv.ppf(0.95)
Now is when I expect to validate the previous steps, but I do not manage to retrieve my original statistics mu and std with mean and math.sqrt(var) respectively.
import math
mean, var = nbinom.stats(firstParam, secondParam, moments='mv')
print(mean, math.sqrt(var))
What am I missing? Any feedback about the parameterization implemented in Scipy?

Conversion code is wrong, I believe, SciPy is NOT using Wiki convention, but Mathematica convention
#%%
import numpy as np
from scipy.stats import nbinom
def convert_params(mean, std):
"""
Convert mean/dispersion parameterization of a negative binomial to the ones scipy supports
See https://mathworld.wolfram.com/NegativeBinomialDistribution.html
"""
p = mean/std**2
n = mean*p/(1.0 - p)
return n, p
mean = 1.4
std = 1.59
n, p = convert_params(mean, std)
print((n, p))
#%%
m, v = nbinom.stats(n, p, moments='mv')
print(m, np.sqrt(v))
Code prints back 1.4, 1.59 pair
And reorder point computed as
rv = nbinom(n, p)
print("reorder point:", rv.ppf(0.95))
outputs 5

It looks like you are using a different conversion. The last bullet at the cited wikipedia section gives the formulas shown below. With these formulas you get back the exact same mu and std:
import numpy as np
from scipy.stats import nbinom
def convert_mu_std_to_r_p(mu, std):
r = mu ** 2 / (std ** 2 - mu)
p = 1 - mu / std ** 2
return r, 1 - p
mu = 1.4
std = 1.59
print("mu, std:", mu, std)
firstParam, secondParam = convert_mu_std_to_r_p(mu, std)
mean, var = nbinom.stats(firstParam, secondParam, moments='mv')
print("mean, sqrt(var):", mean, np.sqrt(var))
rv = nbinom(firstParam, secondParam)
print("reorder point:", rv.ppf(0.95))
Output:
mu, std: 1.4 1.59
mean, sqrt(var): 1.4 1.59
reorder point: 5.0

Comparing convolutions in Mathematica and Python

I'm comparing the results of convolution in Python (using sympy's symbolic variables) and Mathematica with its Convolve function.
In Python, my MWE is
from numpy import linspace, pi
from numpy.random import randn
from scipy.signal import fftconvolve
import matplotlib.pyplot as plt
from sympy import symbols
from sympy.utilities.lambdify import lambdify
a = 0.43
b = 0.41
c = 0.65
d = 0.71
x = symbols('x')
f = 2*b / ((x-a)**2 + b**2)
g = 2*d / ((x-c)**2 + d**2)
fog = fftconvolve(f,g,mode='same')
fog_fun = lambdify(x,fog,'numpy') # returns a numpy-ready function
x = linspace(-20,20,int(1e3))
dx = x[1]-x[0]
fogS = fog_fun(x)
fogA = 4*pi*(b+d)/((x-a-c)**2+(b+d)**2) # correct analytic solution
plt.figure()
plt.plot(x,fogA,lw=2,label='analytic')
plt.plot(x,fogS,lw=2,label='sympy')
plt.grid()
plt.legend(loc='best')
plt.show()
which calculates a convolution using symbolic variable x. The resulting function (before lambdifying) is
fog = 1.1644/(((x - 0.65)**2 + 0.5041)*((x - 0.43)**2 + 0.1681))
There is no agreement between analytic (fogA, Mathematica) and sympy (fogS, Python):
My Mathematica code is:
a = 0.43; b = 0.41; c = 0.65; d = 0.71;
fogA = FullSimplify[Convolve[2*b/((t-a)^2+b^2),2*d/((t-c)^2+d^2), t, x]];
fogS = 1.1644/(((x - 0.65)^2 + 0.5041)*((x - 0.43)^2 + 0.1681));
where
fogA = (17.683+x*(-30.4006+14.0743*x))/(3.04149+x*(-7.9428+x*(8.3428+x*(-4.32+1.*x))))
and graphs for fogS and fogA are the same as for Python.
Why is there such a large disagreement between the analytic and sympy solution? I suspect the problem lies with sympy. Another Pythonic method is to convolve two arrays which seems to agree with the analytic solution.
f = 2*b / ((x-a)**2 + b**2)
g = 2*d / ((x-c)**2 + d**2)
fogN = fftconvolve(f,g,mode='same')*dx # numeric
(Note: this is a MWE. The actual f and g I want to convolve are much more complicated than the Lorentzians defined in this post.)

I do not think this is a reasonable way of using scipy + sympy.
I am actually quite surprised that you get a result from lambdify at all.
What you should be doing, instead of using scipy.signal.fftconvolve(), is to use a symbolic definition of the convolution, e.g.:
from sympy import oo, Symbol, integrate
def convolve(f, g, t, lower=-oo, upper=oo):
tau = Symbol('__very_unlikely_name__', real=True)
return integrate(
f.subs(t, tau) * g.subs(t, t - tau), (tau, lower, upper))
adapted from here.

Simultaneous numerical fit of two equations using Numpy least square method

I am trying to fit below mentioned two equations using python leastsq method but am not sure whether this is the right approach. First equation has incomplete gamma function in it while the second one is slightly complex, and along with an exponential function contains a term which is obtained by using a separate fitting formula.
J_mg = T_incomplete(hw/T_mag)
J_nmg = e^(-hw/T)*g(w,T)
Here g is a function of w and T and is calucated using a given fitting formula.
I am following the steps outlined in this question.
Here is what I have done
import numpy as np
from scipy.optimize import leastsq
from scipy.special import gammaincc
from scipy.special import gamma
from matplotlib.pyplot import plot
# generating data
NPTS = 10
hw = np.linspace(0.5, 10, NPTS)
j1 = np.linspace(0.001,10,NPTS)
j2 = np.linspace(0.003,10,NPTS)
T_mag = np.linspace(0.3,0.5,NPTS)
#defining functions
def calc_gaunt_factor(hw,T):
fitting_coeff= np.loadtxt('fitting_coeff.txt', skiprows=1)
#T is in KeV
#K_b = 8.6173303(50)e−5 ev/K
g = 0
gamma = 0.0136/T
theta= hw/T
A= (np.log10(gamma**2) +0.5)*0.4
B= (np.log10(theta)+1.5)*0.4
for i in range(11):
for j in range(11):
g_ij = fitting_coeff[i][j]*(A**i)*(B**j)
g = g_ij+g
return g
def j_w_mag(hw,T_mag):
order= 0.001
return np.sqrt(1/T_mag)*gamma(order)*gammaincc(order,hw/T_mag)
def j_w_nonmag(hw,T):
gamma = 0.0136/T
theta= hw/T
return np.sqrt(1/T)*np.exp((-hw)/T)*calc_gaunt_factor(hw,T)
def residual_func(T,T_mag,hw,j1,j2):
err_unmag = np.nan_to_num(j1 - j_w_nonmag(hw,T))
err_mag = np.nan_to_num(j2 - j_w_mag(hw,T_mag))
err= np.concatenate((err_unmag, err_mag))
return err
par_init = np.array([.35])
best, cov, info, message, ler = leastsq(residual_func,par_init,args=(T_mag,hw,j1,j2),full_output=True)
print("Best-Fit Parameters:")
print("T=%s" %(best[0]))
I am getting weird value for my fitting parameter, T. Is this the right approach? Thanks.

skew normal distribution in scipy

Does anyone know how to plot a skew normal distribution with scipy?
I supose that stats.norm class can be used but I just can't figure out how.
Furthermore, how can I estimate the parameters describing the skew normal distribution of a unidimensional dataset?

From the Wikipedia description,
from scipy import linspace
from scipy import pi,sqrt,exp
from scipy.special import erf
from pylab import plot,show
def pdf(x):
return 1/sqrt(2*pi) * exp(-x**2/2)
def cdf(x):
return (1 + erf(x/sqrt(2))) / 2
def skew(x,e=0,w=1,a=0):
t = (x-e) / w
return 2 / w * pdf(t) * cdf(a*t)
# You can of course use the scipy.stats.norm versions
# return 2 * norm.pdf(t) * norm.cdf(a*t)
n = 2**10
e = 1.0 # location
w = 2.0 # scale
x = linspace(-10,10,n)
for a in range(-3,4):
p = skew(x,e,w,a)
plot(x,p)
show()
If you want to find the scale, location, and shape parameters from a dataset use scipy.optimize.leastsq, for example using e=1.0,w=2.0 and a=1.0,
fzz = skew(x,e,w,a) + norm.rvs(0,0.04,size=n) # fuzzy data
def optm(l,x):
return skew(x,l[0],l[1],l[2]) - fzz
print leastsq(optm,[0.5,0.5,0.5],(x,))
should give you something like,
(array([ 1.05206154, 1.96929465, 0.94590444]), 1)

The accepted answer is more or less outdated, because a skewnorm function is now implemented in scipy. So the code can be written a lot shorter:
from scipy.stats import skewnorm
import numpy as np
from matplotlib import pyplot as plt
X = np.linspace(min(your_data), max(your_data))
plt.plot(X, skewnorm.pdf(X, *skewnorm.fit(your_data)))

How to calculate cumulative normal distribution?

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.

Here's an example:
>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435
In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero.
If you need the inverse CDF:
>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)

It may be too late to answer the question but since Google still leads people here, I decide to write my solution here.
That is, since Python 2.7, the math library has integrated the error function math.erf(x)
The erf() function can be used to compute traditional statistical functions such as the cumulative standard normal distribution:
from math import *
def phi(x):
#'Cumulative distribution function for the standard normal distribution'
return (1.0 + erf(x / sqrt(2.0))) / 2.0
Ref:
https://docs.python.org/2/library/math.html
https://docs.python.org/3/library/math.html
How are the Error Function and Standard Normal distribution function related?

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.
It can be used to get the cumulative distribution function (cdf - probability that a random sample X will be less than or equal to x) for a given mean (mu) and standard deviation (sigma):
from statistics import NormalDist
NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796
Which can be simplified for the standard normal distribution (mu = 0 and sigma = 1):
NormalDist().cdf(1.96)
# 0.9750021048517796
NormalDist().cdf(-1.96)
# 0.024997895148220428

Adapted from here http://mail.python.org/pipermail/python-list/2000-June/039873.html
from math import *
def erfcc(x):
"""Complementary error function."""
z = abs(x)
t = 1. / (1. + 0.5*z)
r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
t*(.09678418+t*(-.18628806+t*(.27886807+
t*(-1.13520398+t*(1.48851587+t*(-.82215223+
t*.17087277)))))))))
if (x >= 0.):
return r
else:
return 2. - r
def ncdf(x):
return 1. - 0.5*erfcc(x/(2**0.5))

To build upon Unknown's example, the Python equivalent of the function normdist() implemented in a lot of libraries would be:
def normcdf(x, mu, sigma):
t = x-mu;
y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
if y>1.0:
y = 1.0;
return y
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
def normdist(x, mu, sigma, f):
if f:
y = normcdf(x,mu,sigma)
else:
y = normpdf(x,mu,sigma)
return y

Alex's answer shows you a solution for standard normal distribution (mean = 0, standard deviation = 1). If you have normal distribution with mean and std (which is sqr(var)) and you want to calculate:
from scipy.stats import norm
# cdf(x < val)
print norm.cdf(val, m, s)
# cdf(x > val)
print 1 - norm.cdf(val, m, s)
# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)
Read more about cdf here and scipy implementation of normal distribution with many formulas here.

Taken from above:
from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435
For a two-tailed test:
Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087

Simple like this:
import math
def my_cdf(x):
return 0.5*(1+math.erf(x/math.sqrt(2)))
I found the formula in this page https://www.danielsoper.com/statcalc/formulas.aspx?id=55

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python lognorm.cdf vs. formula based implementation not matching - python

The original post was edited to reflect the correct formula. def lognormcdf(x, mu, sigma): return 0.5 + 0.5erf((np.log(x)-np.log(mu))/(np.sqrt(2.0)sigma)) Pass np.log(mu) in for mu and it works.

Related

parameterization of the negative binomial in scipy via mean and std

Comparing convolutions in Mathematica and Python

Simultaneous numerical fit of two equations using Numpy least square method

skew normal distribution in scipy

How to calculate cumulative normal distribution?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python lognorm.cdf vs. formula based implementation not matching - python

The original post was edited to reflect the correct formula. def lognormcdf(x, mu, sigma): return 0.5 + 0.5*erf((np.log(x)-np.log(mu))/(np.sqrt(2.0)*sigma)) Pass np.log(mu) in for mu and it works.

Related

parameterization of the negative binomial in scipy via mean and std

Comparing convolutions in Mathematica and Python

Simultaneous numerical fit of two equations using Numpy least square method

skew normal distribution in scipy

How to calculate cumulative normal distribution?

Categories

Resources

The original post was edited to reflect the correct formula. def lognormcdf(x, mu, sigma): return 0.5 + 0.5erf((np.log(x)-np.log(mu))/(np.sqrt(2.0)sigma)) Pass np.log(mu) in for mu and it works.