This question already has answers here:
How can you perform this improper integral as Mathematica does?
(2 answers)
Closed 3 years ago.
I am trying to compute the Kullback Leibler divergence between two probability distributions. To do this I need to perform this integral.
Here is my simpified code which currently fails:
from scipy.integrate import quad
import numpy as np
def f(x):
return sum([ps[idx]*lambdas[idx]*np.exp(- lambdas[idx] * x) for idx in range(len(ps))])
def g(x):
return scipy.stats.weibull_min.pdf(x, c=c)
c = 0.9
ps = [1]
lambdas = [1]
eps = 0.001 # weibull_min is only defined for x > 0
print(quad(lambda x: f(x) * np.log(f(x) / g(x)), eps, np.inf)) # Output should be greater than 0
This gives:
(nan, nan)
/home/user/.local/lib/python3.5/site-packages/ipykernel_launcher.py:11: RuntimeWarning: divide by zero encountered in log
# This is added back by InteractiveShellApp.init_path()
/home/user/.local/lib/python3.5/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in double_scalars
# This is added back by InteractiveShellApp.init_path()
/home/user/.local/lib/python3.5/site-packages/ipykernel_launcher.py:11: IntegrationWarning: The occurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.
# This is added back by InteractiveShellApp.init_path()
Why doesn't it work and how can I get it to work?
The problem is that f(x)/g(x) tends towards zero and can cause numerical errors. Since the whole integrand tends towards zero quite fast, you can simply integrate over a finite range (say [0.001, 20]) and still get a precise estimation of the integral:
from scipy.stats import weibull_min
from scipy.integrate import quad
import numpy as np
c = 0.9
ps = [1]
lambdas = [1]
def f(x):
return sum([ps[idx]*lambdas[idx]*np.exp(- lambdas[idx] * x) for idx in range(len(ps))])
def g(x):
return scipy.stats.weibull_min.pdf(x, c=c)
print(scipy.integrate.quad(lambda x: f(x) * np.log(f(x) / g(x)), 0.001, 30))
I did not do a numerical analysis of the precision, but according to the comparison with the result from Mathematica, it is precise to the 9th decimal. Here is the test code in Mathematica (simplified for your parameters):
f[x_] := Exp[-x];
c = 0.9;
g[x_] := c*x^(c - 1)*Exp[-x^c];
SetPrecision[Integrate[f[x]*Log[f[x]/g[x]], {x, 0.001, \[Infinity]}],20]
Mathematica result: 0.010089328699390866240
Scipy result: 0.01008932870010536
Related
I'm trying to subclass rv_continuous from scipy.stats in order to implement the 3 parameter log-normal distribution. I've only re-implemented the _pdf method so far just to see if I can get a minimal example running:
from scipy import stats
from math import sqrt, pi, exp, log, e
class LogNormal3P(stats.rv_continuous):
def _pdf(self, x, alpha, m, sigma):
return 1 / (sigma * (x - alpha) * sqrt(2 * pi)) * exp(-(log(x - alpha, e) - m)**2 / (2 * sigma**2))
lognorm3p = LogNormal3P(name='lognorm3p')
if __name__ == "__main__":
import numpy as np
data = np.array([16.66, 28.0, 14.3, 15.99, 16.26, 22.69, 19.1, 14.82, 13.91, 11.1])
ln3p = lognorm3p.fit(data)
print('Done')
When running this code I get the error ValueError: math domain error. This is because on the first iteration of the fitting process a point is reached where, x = -11 and the value of the other parameters are equal to 1. This causes a negative value in the log function resulting in the error.
What is the typical work-around for this? None of the data values are negative so I'm assuming this happens during a normalization step. Am I missing any methods of rv_continuous that should be re-implemented prior to trying to fit the data? The documentation mentions that either _pdf or _cdf should be re-implemented at a minimum.
I'm trying to numerically solve the equation x=a*sin(x), where a is some constant, in python. I already tried first solving the equation symbolically, but it seems this particular shape of expression isn't implemented in sympy. I also tried using sympy.nsolve(), but it only gives me the first solution it encounters.
My plan looks something like this:
x=0
a=1
rje=[]
while(x<a):
if (x-numpy.sin(x))<=error_sin:
rje.append(x)
x+=increment
print(rje)
I don't want to waste time or risk missing solutions, so I want to know how to find out how precise numpy's sinus is on my device (that would become error_sin).
edit: I tried making both error_sin and increment equal to the machine epsilon of my device but it a) takes to much time, and b) sin(x) is less precise that x and so I get a lot of non-solutions (or rather repeated solutions because sin(x) grows much slower than x). Hence the question.
edit2: Could you please just help me answer the question about precision of numpy.sin(x)? I provided information about the purpose purely for context.
The answer
np.sin will in general be as precise as possible, given the precision of the double (ie 64-bit float) variables in which the input, output, and intermediate values are stored. You can get a reasonable measure of the precision of np.sin by comparing it to the arbitrary precision version of sin from mpmath:
import matplotlib.pyplot as plt
import mpmath
from mpmath import mp
# set mpmath to an extremely high precision
mp.dps = 100
x = np.linspace(-np.pi, np.pi, num=int(1e3))
# numpy sine values
y = np.sin(x)
# extremely high precision sine values
realy = np.array([mpmath.sin(a) for a in x])
# the end results are arrays of arbitrary precision mpf values (ie abserr.dtype=='O')
diff = realy - y
abserr = np.abs(diff)
relerr = np.abs(diff/realy)
plt.plot(x, abserr, lw=.5, label='Absolute error')
plt.plot(x, relerr, lw=.5, label='Relative error')
plt.axhline(2e-16, c='k', ls='--', lw=.5, label=r'$2 \cdot 10^{-16}$')
plt.yscale('log')
plt.xlim(-np.pi, np.pi)
plt.ylim(1e-20, 1e-15)
plt.xlabel('x')
plt.ylabel('Error in np.sin(x)')
plt.legend()
Output:
Thus, it is reasonable to say that both the relative and absolute errors of np.sin have an upper bound of 2e-16.
A better answer
There's an excellent chance that if you make increment small enough for your approach to be accurate, your algorithm will be too slow for practical use. The standard equation solving approaches won't work for you, since you don't have a standard function. Instead, you have an implicit, multi-valued function. Here's a stab at a general purpose approach for getting all solutions to this kind of equation:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as spo
eps = 1e-4
def func(x, a):
return a*np.sin(x) - x
def uniqueflt(arr):
b = arr.copy()
b.sort()
d = np.append(True, np.diff(b))
return b[d>eps]
initial_guess = np.arange(-9, 9) + eps
# uniqueflt removes any repeated roots
roots = uniqueflt(spo.fsolve(func, initial_guess, args=(10,)))
# roots is an array with the 7 unique roots of 10*np.sin(x) - x == 0:
# array([-8.42320394e+00, -7.06817437e+00, -2.85234190e+00, -8.13413225e-09,
# 2.85234189e+00, 7.06817436e+00, 8.42320394e+00])
x = np.linspace(-20, 20, num=int(1e3))
plt.plot(x, x, label=r'$y = x$')
plt.plot(x, 10*np.sin(x), label=r'$y = 10 \cdot sin(x)$')
plt.plot(roots, 10*np.sin(roots), '.', c='k', ms=7, label='Solutions')
plt.ylim(-10.5, 20)
plt.gca().set_aspect('equal', adjustable='box')
plt.legend()
Output:
You'll have to tweak the initial_guess depending on your value of a. initial_guess has to be at least as large as the actual number of solutions.
The accuracy of the sine function is not so relevant here, you'd better perform the study of the equation.
If you write it in the form sin x / x = sinc x = 1 / a, you immediately see that the number of solutions is the number of intersections of the cardinal sine with an horizontal. This number depends on the ordinates of the extrema of the latter.
The extrema are found where x cos x - sin x = 0 or x = tan x, and the corresponding values are cos x. This is again a transcendental equation, but it is parameterless and you can solve it once for all. Also note that for increasing values of x, the solutions get closer and closer to (k+1/2)π.
Now for a given value of 1 / a, you can find all the extrema below and above and this will give you starting intervals where to look for the roots. The secant method will be handy.
A simple way to estimate the accuracy of sin() AND cos() for a given argument x would be:
eps_trig = np.abs(1 - (np.sin(x)**2 + np.cos(x)**2)) / 2
You may want to drop last 2 just to be on the "safe side" (well, there are values of x for which this approximation does not hold very well, in particular for x close to -90 deg). I would suggest to test at around x=pi/4
Explanation:
The basic idea behind this approach is as follows... Let's say our sin(x) and cos(x) deviates from exact values by a single "error value" eps. That is, exact_sin(x) = sin(x) + eps (same for cos(x)). Also, let's call delta to be the measured deviation from the Pythagorean trigonometric identity:
delta = 1 - sin(x)**2 - cos(x)**2
For exact functions, delta should be zero:
1 - exact_sin(x)**2 - exact_cos(x)**2 == 0
or, going to inexact functions:
1 - (sin(x) + eps)**2 - (cos(x) + eps)**2 == 0 =>
1 - sin(x)**2 - cos(x)**2 = delta = 2*eps*(sin(x) + cos(x)) + 2*eps**2
Neglecting last term 2*eps**2 (assume small errors):
2*eps*(sin(x)+cos(x)) = 1 - sin(x)**2 - cos(x)**2
If we choose x such that sin(x)+cos(x) hovers around 1 (or, somewhere in the range 0.5-2), we can roughly estimate that eps = |1 - sin(x)**2 - cos(x)**2|/2.
To the precision you already got good answers. To the task itself, you can be faster by investing some calculus.
First, from the bounds of the sine you know that any solution must be in the interval [-abs(a),abs(a)]. If abs(a)\le 1 then the only root in [-1,1] is x=0
Apart from the interval containing zero, you also know that there is exactly one root in any of the intervals between the roots of cos(x)=1/a which are the extrema of a*sin(x)-x. Set phi=arccos(1/a) in [0,pi], then these roots are -phi+2*k*pi and phi+2*k*pi.
The interval for k=0 might contain 3 roots if 1<a<0.5*pi. For the positive root one knows x/a=sin(x)>x-x^3/6 so that x^2>6-6/a.
And lastly, the problem is symmetric, if x is a root, so is -x so all you have to do is find the positive roots.
So to compute the roots,
Start the root list with the root 0.
in the case abs(a)<=1, there are no further roots, return. One could also use -pi/2<=a<=1.
in the case 1<a<pi/2, apply the chosen bracketing method to the interval [sqrt(6-6/a), pi/2], add the root to the list, and return.
In the remaining cases where abs(a)>=0.5*pi:
Compute phi=arccos(1/a).
Then for any positive integer k apply the bracketing method to the intervals [2*(k-1)*pi+phi,2*k*pi-phi] and [2*k*pi-phi,2*k*pi-phi so that (k-0.5)*pi < abs(a) [(k-0.5)*pi, (k+0.5)*pi] as long as the lower interval boundary is smaller than abs(a) and the function has a sign change over the interval.
Add the root found to the list. Return with the list after the loop ends.
let a=10;
function f(x) { return x - a * Math.sin(x); }
findRoots();
//-------------------------------------------------
function findRoots() {
log.innerHTML = `<p>roots for parameter a=${a}`;
rootList.innerHTML = "<tr><th>root <i>x</i></th><th><i>x-a*sin(x)</i></th><th>numSteps</th></tr>";
rootList.innerHTML += "<tr><td>0.0<td>0.0<td>0</tr>";
if( Math.abs(a)<=1) return;
if( (1.0<a) && (a < 0.5*Math.PI) ) {
illinois(Math.sqrt(6-6/a), 0.5*Math.PI);
return;
}
const phi = Math.acos(1.0/a);
log.innerHTML += `phi=${phi}<br>`;
let right = 2*Math.PI-phi;
for (let k=1; right<Math.abs(a); k++) {
let left = right;
right = (k+2)*Math.PI + ((0==k%2)?(-phi):(phi-Math.PI));
illinois(left, right);
}
}
function illinois(a, b) {
log.innerHTML += `<p>regula falsi variant illinois called for interval [a,b]=[${a}, ${b}]`;
let fa = f(a);
let fb = f(b);
let numSteps=2;
log.innerHTML += ` values f(a)=${fa}, f(b)=${fb}</p>`;
if (fa*fb > 0) return;
if (Math.abs(fa) < Math.abs(fb)) { var h=a; a=b; b=h; h=fa; fa=fb; fb=h;}
while(Math.abs(b-a) > 1e-15*Math.abs(b)) {
let c = b - fb*(b-a)/(fb-fa);
let fc = f(c); numSteps++;
log.innerHTML += `step ${numSteps}: midpoint c=${c}, f(c)=${fc}<br>`;
if ( fa*fc < 0 ) {
fa *= 0.5;
} else {
a = b; fa = fb;
}
b = c; fb = fc;
}
rootList.innerHTML += `<tr><td>${b}<td>${fb}<td>${numSteps}</tr>`;
}
aInput.addEventListener('change', () => {
let a_new = Number.parseFloat(aInput.value);
if( isNaN(a_new) ) {
alert('Not a number '+aInput.value);
} else if(a!=a_new) {
a = a_new;
findRoots();
}
});
<p>Factor <i>a</i>: <input id="aInput" value="10" /></p>
<h3>Root list</h3>
<table id="rootList" border = 1>
</table>
<h3>Computation log</h3>
<div id="log"/>
The solution should be precise up to machine epsilon
>>> from numpy import sin as sin_np
>>> from math import sin as sin_math
>>> x = 0.0
>>> sin_np(x) - x
0.0
>>> sin_math(x) - x
0.0
>>>
You could consider using scipy.optimize for this problem:
>>> from scipy.optimize import minimize
>>> from math import sin
>>> a = 1.0
Then define your objective as so:
>>> def obj(x):
... return abs(x - a*sin(x))
...
And you can go ahead and solve this problem numerically by:
>>> sol = minimize(obj, 0.0)
>>> sol
fun: array([ 0.])
hess_inv: array([[1]])
jac: array([ 0.])
message: 'Optimization terminated successfully.'
nfev: 3
nit: 0
njev: 1
status: 0
success: True
x: array([ 0.])
Now lets try with a new value of a
>>> a = .5
>>> sol = minimize(obj, 0.0)
>>> sol
fun: array([ 0.])
hess_inv: array([[1]])
jac: array([ 0.5])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 315
nit: 0
njev: 101
status: 2
success: False
x: array([ 0.])
>>>
In case you want to find a non-trivial solution to this problem, you need to change x0 iteratively to values greater than zero and also lesser than. Also, manage the bounds of x in minimise by setting bounds in scipy.optimize.minimize, you would be able to walk from -infty to +infty ( or very large numbers ).
Motivation for the question
I'm trying to integrate a function f(x,y,z) over all space.
I have tried using scipy.integrate.tplquad & scipy.integrate.nquad for the integration, but both methods return the integral as 0 (when the integral should be finite). This is because, as the volume of integration increases, the region where the integrand is non-zero gets sampled less and less. The integral 'misses' this region of space. However, scipy.integrate.quad does seem to be able to cope with integrals from [-infinity, infinity] by performing a change of variables...
Question
Is it possible to use scipy.integrate.quad 3 times to perform a triple integral. The code I have in mind would look something like the following:
x_integral = quad(f, -np.inf, np.inf)
y_integral = quad(x_integral, -np.inf, np.inf)
z_integral = quad(y_integral, -np.inf, np.inf)
where f is the function f(x, y, z), x_integral should integrate from x = [- infinity, infinity], y_integral should integrate from y = [- infinity, infinity], and z_integral should integrate from z = [- infinity, infinity]. I am aware that quad wants to return a float, and so does not like integrating a function f(x, y, z) over x to return a function of y and z (as the x_integral = ... line from the code above is attempting to do). Is there a way of implementing the code above?
Thanks
Here is an example with nested call to quad performing the integration giving 1/8th of the sphere volume:
import numpy as np
from scipy.integrate import quad
def fz(x, y):
return quad( lambda z:1, 0, np.sqrt(x**2+y**2) )[0]
def fy(x):
return quad( fz, 0, np.sqrt(1-x**2), args=(x, ) )[0]
def fx():
return quad( fy, 0, 1 )[0]
fx()
>>> 0.5235987755981053
4/3*np.pi/8
>>> 0.5235987755982988
I'm trying to integrate a function f(x,y,z) over all space.
First of all you'll have to ask yourself why the integral should converge at all. Does it have a factor exp(-r) or exp(-r^2)? In both of these cases, quadpy (a project of mine has something for you), e.g.,
import quadpy
scheme = quadpy.e3r2.stroud_secrest_10a()
val = scheme.integrate(lambda x: x[0]**2)
print(val)
2.784163998415853
I would like to fit a sinc function to a bunch of datalines.
Using a gauss the fit itself does work but the data does not seem to be sufficiently gaussian, so I figured I could just switch to sinc..
I just tried to put together a short piece of self running code but realized, that I probably do not fully understand, how arrays are handled if handed over to a function, which could be part of the reason, why I get error messages calling my program
So my code currently looks as follows:
from numpy import exp
from scipy.optimize import curve_fit
from math import sin, pi
def gauss(x,*p):
print(p)
A, mu, sigma = p
return A*exp(-1*(x[:]-mu)*(x[:]-mu)/sigma/sigma)
def sincSquare_mod(x,*p):
A, mu, sigma = p
return A * (sin(pi*(x[:]-mu)*sigma) / (pi*(x[:]-mu)*sigma))**2
p0 = [1., 30., 5.]
xpos = range(100)
fitdata = gauss(xpos,p0)
p1, var_matrix = curve_fit(sincSquare_mod, xpos, fitdata, p0)
What I get is:
Traceback (most recent call last):
File "orthogonal_fit_test.py", line 18, in <module>
fitdata = gauss(xpos,p0)
File "orthogonal_fit_test.py", line 7, in gauss
A, mu, sigma = p
ValueError: need more than 1 value to unpack
From my understanding p is not handed over correctly, which is odd, because it is in my actual code. I then get a similar message from the sincSquare function, when fitted, which could probably be the same type of error. I am fairly new to the star operator, so there might be a glitch hidden...
Anybody some ideas? :)
Thanks!
You need to make three changes,
def gauss(x, A, mu, sigma):
return A*exp(-1*(x[:]-mu)*(x[:]-mu)/sigma/sigma)
def sincSquare_mod(x, A, mu, sigma):
x=np.array(x)
return A * (np.sin(pi*(x[:]-mu)*sigma) / (pi*(x[:]-mu)*sigma))**2
fitdata = gauss(xpos,*p0)
1, See Documentation
2, replace sin by the numpy version for array broadcasting
3, straight forward right? :P
Note, i think you are looking for p1, var_matrix = curve_fit(gauss,... rather than the one in the OP, which appears do not have a solution.
Also worth noting is that you will get rounding errors as x*Pi gets close to zero that might get magnified. You can approximate as demonstrated below for better results (VB.NET, sorry):
Private Function sinc(x As Double) As Double
x = (x * Math.PI)
'The Taylor Series expansion of Sin(x)/x is used to limit rounding errors for small values of x
If x < 0.01 And x > -0.01 Then
Return 1.0 - x ^ 2 / 6.0 + x ^ 4 / 120.0
End If
Return Math.Sin(x) / x
End Function
http://www.wolframalpha.com/input/?i=taylor+series+sin+%28x%29+%2F+x&dataset=&equal=Submit
I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.
Here's an example:
>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435
In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero.
If you need the inverse CDF:
>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)
It may be too late to answer the question but since Google still leads people here, I decide to write my solution here.
That is, since Python 2.7, the math library has integrated the error function math.erf(x)
The erf() function can be used to compute traditional statistical functions such as the cumulative standard normal distribution:
from math import *
def phi(x):
#'Cumulative distribution function for the standard normal distribution'
return (1.0 + erf(x / sqrt(2.0))) / 2.0
Ref:
https://docs.python.org/2/library/math.html
https://docs.python.org/3/library/math.html
How are the Error Function and Standard Normal distribution function related?
Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.
It can be used to get the cumulative distribution function (cdf - probability that a random sample X will be less than or equal to x) for a given mean (mu) and standard deviation (sigma):
from statistics import NormalDist
NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796
Which can be simplified for the standard normal distribution (mu = 0 and sigma = 1):
NormalDist().cdf(1.96)
# 0.9750021048517796
NormalDist().cdf(-1.96)
# 0.024997895148220428
Adapted from here http://mail.python.org/pipermail/python-list/2000-June/039873.html
from math import *
def erfcc(x):
"""Complementary error function."""
z = abs(x)
t = 1. / (1. + 0.5*z)
r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
t*(.09678418+t*(-.18628806+t*(.27886807+
t*(-1.13520398+t*(1.48851587+t*(-.82215223+
t*.17087277)))))))))
if (x >= 0.):
return r
else:
return 2. - r
def ncdf(x):
return 1. - 0.5*erfcc(x/(2**0.5))
To build upon Unknown's example, the Python equivalent of the function normdist() implemented in a lot of libraries would be:
def normcdf(x, mu, sigma):
t = x-mu;
y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
if y>1.0:
y = 1.0;
return y
def normpdf(x, mu, sigma):
u = (x-mu)/abs(sigma)
y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
return y
def normdist(x, mu, sigma, f):
if f:
y = normcdf(x,mu,sigma)
else:
y = normpdf(x,mu,sigma)
return y
Alex's answer shows you a solution for standard normal distribution (mean = 0, standard deviation = 1). If you have normal distribution with mean and std (which is sqr(var)) and you want to calculate:
from scipy.stats import norm
# cdf(x < val)
print norm.cdf(val, m, s)
# cdf(x > val)
print 1 - norm.cdf(val, m, s)
# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)
Read more about cdf here and scipy implementation of normal distribution with many formulas here.
Taken from above:
from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435
For a two-tailed test:
Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087
Simple like this:
import math
def my_cdf(x):
return 0.5*(1+math.erf(x/math.sqrt(2)))
I found the formula in this page https://www.danielsoper.com/statcalc/formulas.aspx?id=55