Integrating a gaussian over a very long interval - python

I want to integrate a Gaussian function over a very large interval. I chose spicy.integrate.quad function for the integration. The function seems to work only when I select a small enough interval. When I use the codes below,
from scipy.integrate import quad
from math import pi, exp, sqrt
def func(x, mean, sigma):
return 1/(sqrt(2*pi)*sigma) * exp(-1/2*((x-mean)/sigma)**2)
print(quad(func, 0, 1e+31, args=(1e+29, 1e+28))[0]) # case 1
print(quad(func, 0, 1e+32, args=(1e+29, 1e+28))[0]) # case 2
print(quad(func, 0, 1e+33, args=(1e+29, 1e+28))[0]) # case 3
print(quad(func, 1e+25, 1e+33, args=(1e+29, 1e+28))[0]) # case 4
then the followings are printed.
1.0
1.0000000000000004
0.0
0.0
To obtain a reasonable result, I had to try and change the lower/upper bounds of the integral several times and empirically determine it to [0, 1e+32]. This seems risky to me, as when the mean and sigma of the gaussian function changes, then I always have to try different bounds.
Is there a clear way to integrate the function from 0 to 1e+50 without bothering with bounds? If not, how do you expect from beginning which bounds would give non-zero value?

In short, you can't.
On this long interval, the region where the gaussian is non-zero is tiny, and the adaptive procedure which works under the hood of integrate.quad fails to see it. And so would pretty much any adaptive routine, unless by chance.

Notice,
and the CDF of a normal random variable is known as ϕ(x) as it can not be expressed by an elementary function. So take ϕ((b-m)/s) - ϕ((a-m)/s). Also note that ϕ(x) = 1/2(1 + erf(x/sqrt(2))) so you need not call .quad to actually perform an integration and may have better luck with erf from scipy.
from scipy.special import erf
def prob(mu, sigma, a, b):
phi = lambda x: 1/2*(1 + erf((x - mu)/(sigma*np.sqrt(2))))
return phi(b) - phi(a)
This may give more accurate results (it does than the above)
>>> print(prob(0, 1e+31, 0, 1e+50))
0.5
>>> print(prob(0, 1e+32, 1e+28, 1e+29))
0.000359047985937333
>>> print(prob(0, 1e+33, 1e+28, 1e+29))
3.5904805169684195e-05
>>> print(prob(1e+25, 1e+33, 1e+28, 1e+29))
3.590480516979522e-05
and avoid the intense floating point error you are experiencing. However, the regions you integrate are so small in area that you may still see 0.

Related

How to deal with numerical integration in python with small results?

I am running into an issue with integration in Python returning incorrect values for an integral with a known analytical solution. The integral in question is
LaTex expression for the integral (can't post photos yet)
For the value of sigma I am using (1e-15),the solution to this integral has a value of ~ 1.25e-45. However when I use the scipy integrate package to calculate this I get zero, which I believe has to do with the precision required from the calculation.
#scipy method
import numpy as np
from scipy.integrate import quad
sigma = 1e-15
f = lambda x: (x**2) * np.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = quad(f,0,np.inf)[0]
print(solution)
0.0
And since precision was an issue I tried to also use another recommended package mpmath, which did not return 0, but was off by ~7 orders of magnitude from the correct answer. Testing larger values of sigma result in the solution being very close to the corresponding exact solution, but it seems to get increasingly incorrect as sigma gets smaller.
#mpmath method
import mpmath as mp
sigma = 1e-15
f = lambda x: (x**2) * mp.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = mp.quad(f,[0,np.inf])
print(solution)
2.01359486678988e-52
From here I could use some advice on getting a more accurate answer, as I would like to have some confidence applying python integration methods to integrals that cannot be solved analytically.
you should add extra points for the function as 'mid points', i added 100 points from 1e-100 to 1 to increase accuracy.
#mpmath method
import numpy as np
import mpmath as mp
sigma = 1e-15
f = lambda x: (x**2) * mp.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = mp.quad(f,[0,*np.logspace(-100,0,100),np.inf])
print(solution)
1.25286197427129e-45
Edit: turns out you need 10000 points instead of 100 points to get a more accurate result, of 1.25331413731554e-45, but it takes a few seconds to calculate.
Most numerical integrators will run into issues with numbers that small due to floating point precision. One solution is to scale the integral before calculating. Letting q -> x/sigma, the integral becomes:
f = lambda q: sigma**3*(q**2) * np.exp(-q**2/2)
solution = quad(f, 0, np.inf)[0]
# solution: 1.2533156529417088e-45

Are there any inherent limitations to the scipy.integrate.quad function?

I am currently attempting to perform a definite integral of a gaussian function and I am receiving an answer of 0 when I am convinced that is not the case.
This leads me to ask, are there limitations on what exactly the quad function can do when performing definite integral? Am I using quad in the correct application? How exactly does quad find an integral anyway?
import math
from scipy.integrate import quad
def g(λ,a,u,o):
return a*math.exp((λ-u)**2/(-2*o**2))
exc = quad(g, 4000, 8000, args=(1,6700,2.125))[0]
print(exc)
I have plotted this gaussian so I know that it is not zero within the range I have set. I have also plugged the integral in my scientific calculator and it spits out the answer of 5.33. So now I am at the conclusion that I have either made some mistake that I could not find or I am utilising quad in the wrong situation.
Any and all help is appreciated :)
Your function is basically 0 everywhere bar a small range, relative to the area you are trying to integrate over
You can add some points to help the function break the integration into smaller parts
points(sequence of floats,ints), optional A sequence of break points
in the bounded integration interval where local difficulties of the
integrand may occur (e.g., singularities, discontinuities). The
sequence does not have to be sorted. Note that this option cannot be
used in conjunction with weight.
import math
from scipy.integrate import quad
def g(λ,a,u,o):
return a*math.exp((λ-u)**2/(-2*o**2))
exc = quad(g, 4000, 8000, args=(1,6700,2.125), full_output=1, points=[6500, 7000])[0]
print(exc)
5.3265850835908095
There seems to be no way around this problem
As mentioned by Tom, the region where your function is significantly greater than 0 is too far out to be detected by the integration process. Theoretically, your u could also be 1e12, but it's asked a bit much by an integration scheme to find that.
One easy remedy is to increase the quadrature domain to [-inf, +inf], and shift the function such that the "interesting" part is around 0.
import math
from scipy.integrate import quad
import numpy as np
a = 1.0
u = 0.0
o = 2.125
def g(x):
return a * math.exp(-(x - u) ** 2 / (2 * o ** 2))
exc = quad(g, -np.inf, +np.inf)[0]
print(exc)
5.326585083590876

How to optimise the numerical evaluation of a SymPy integral?

I'm somewhat of a newbie to SymPy and was hoping someone could point out ways to optimise my code.
I need to numerically evaluate a somewhat involved expression with very high decimal places (150–300), and it is taking 30 seconds or longer per parameter set – which is very long given the parameter space to be calculated.
I have used lambdify with the mpmath backend and meijerg=True in the integral handling and it brought down run-times significantly. Are there any other methods that could be used? Ideally it would be great to push evaluation times below 1 second. My code is:
import mpmath
from mpmath import mpf, mp
mp.dps = 150 # ideally would like to have this set to 300
import numpy as np
from sympy import besselj, symbols, hankel2, legendre, sin, cos, tan, summation, I
from sympy import lambdify, expand, Integral
import time
x, alpha, k, m,n, r1, R, theta = symbols('x alpha k m n r1 R theta')
r1 = (R*cos(alpha))/cos(theta) #
Imn_part1 = (n*hankel2(n-1,k*r1)-(n+1)*hankel2(n+1,k*r1))*legendre(n, cos(theta))*cos(theta)
Imn_part2 = n*(n+1)*hankel2(n, k*r1)*(legendre(n-1, cos(theta)-legendre(n+1, cos(theta))))/k*r1
Imn_parts = expand(Imn_part1+Imn_part2)
Imn_expr = expand(Imn_parts*legendre(m,cos(theta))*(r1**2/R**2)*tan(theta))
Imn = Integral(Imn_expr, (theta, 0, alpha)).doit(meijerg=True)
# the lambdified expression
Imn_lambdify = lambdify([m,n,k,R,alpha], Imn,'mpmath')
When giving numerical inputs to the function – it takes a long time (30 s – 40 s).
substitute_dict = {'alpha':mpf(np.radians(10)), 'k':5,'R':mpf(0.1), 'm':20,'n':10}
print('starting calculation...')
start = time.time()
output = Imn_lambdify(substitute_dict['m'],
substitute_dict['n'],
substitute_dict['k'],
substitute_dict['R'],
substitute_dict['alpha'])
print(time.time()-start)
OS/package versions used:
Linux Mint 19.2
Python 3.8.5
SymPy 1.7.1
MPMath 1.2.1
Setting meijerg=True has just caused SymPy to not try as hard in evaluating the integral. It still can't evaluate it, but it has split it into 5 sub-integrals, which you can see if you print Imn. You might as well just leave it as one integral (leave off the doit()):
Imn = Integral(Imn_expr, (theta, 0, alpha))
For me, the split integral evaluates a little faster, but this is also about the same speed
Imn = Integral(simplify(Imn_expr), (theta, 0, alpha))
Ultimately, the thing that makes things slow is the number of digits that you are using. If you don't actually need these many digits, you shouldn't use them. Note that mpmath will automatically increase the precision internally to avoid cancellation, so it is unnecessary to do so yourself. I get the same value (with fewer digits) with the default dps of 15 as 150.
You can try substituting your values directly into your expression, if they do not change, and seeing if SymPy can simplify Imn_expr further with them.
As an aside, you are using np.radians(10), which a machine float, since that is what NumPy uses. This completely defeats the purpose of computing the final answer to 150 digits, since this input parameter is only accurate to 15. Consider using mpmath.pi/18 instead to get a value that is correct to the number of digits you specified.

Calculating 1/r*d/dr(r*f) numerically in python when r=0. f is a function of r

Usually when you do this by hand there's no problem as the 1/r usually gets cancelled with another r. But doing this numerically with scipy.misc.derivative works like a charm for r different from zero. But of course, as soon as I ask for r = 0, I get division by zero, which I expected. So how else could you calculate this numerically. I insist on the fact that everything has to be done numerically as my function are now so complicated that I won't be able to find a derivative manually. Thank you!
My code:
rAtheta = lambda _r: _r*Atheta(_r,theta,z,t)
if r != 0:
return derivative(rAtheta,r,dx=1e-10,order=3)/r
else:
#What should go here so that it doesn't blow up when calculating the gradient?
tl;dr: use symbolic differentiation, or complex step differentiation if that fails
If you insist on using numerical methods, you really have to approximate the limit of the derivative as r->0 one way or the other.
I suggest trying complex step differentiation. The idea is to use complex arguments inside the function you're trying to differentiate, but it usually gets rid of the numerical instability that is imposed by standard finite difference schemes. The result is a procedure that needs complex arithmetic (hooray numpy, and python in general!) but in turn can be much more stable at small dx values.
Here's another point: complex step differentiation uses
F′(x0) = Im(F(x0+ih))/h + O(h^2)
Let's apply this to your r=0 case:
F′(0) = Im(F(ih))/h + O(h^2)
There are no singularities even for r=0! Choose a small h, possibly the same dx you're passing to your function, and use that:
def rAtheta(_r):
# note that named lambdas are usually frowned upon
return _r*Atheta(_r,theta,z,t)
tol = 1e-10
dr = 1e-12
if np.abs(r) > tol: # or math.abs or your favourite other abs
return derivative(rAtheta,r,dx=dr,order=3)/r
else:
return rAtheta(r + 1j*dr).imag/dr/r
Here is the above in action for f = r*ln(r):
The result is straightforwardly smooth, even though the points below r=1e-10 were computed with complex step differentiation.
Very important note: notice the separation between tol and dr in the code. The former is used to determine when to switch between methods, and the latter is used as a step in complex step differentiation. Look what happens when tol=dr=1e-10:
the result is a smoothly wrong function below r=1e-10! That's why you always have to be careful with numerical differentiation. And I wouldn't advise going too much below that in dr, as machine precision will bite you sooner or later.
But why stop here? I'm fairly certain that your functions could be written in a vectorized way, i.e. they could accept an array of radial points. Using complex step differentiation you don't have to loop over the radial points (which you would have to resort to using scipy.misc.derivative). Example:
import numpy as np
import matplotlib.pyplot as plt
def Atheta(r,*args):
return r*np.log(r) # <-- vectorized expression
def rAtheta(r):
return r*Atheta(r) #,theta,z,t) # <-- vectorized as much as Atheta is
def vectorized_difffun(rlist):
r = np.asarray(rlist)
dr = 1e-12
return (rAtheta(r + 1j*dr)).imag/dr/r
rarr = np.logspace(-12,-2,20)
darr = vectorized_difffun(rarr)
plt.figure()
plt.loglog(rarr,np.abs(darr),'.-')
plt.xlabel(r'$r$')
plt.ylabel(r'$|\frac{1}{r} \frac{d}{dr}(r^2 \ln r)|$')
plt.tight_layout()
plt.show()
The result should be familiar:
Having cleared the fun weirdness that is complex step differentiation, I should note that you should strongly consider using symbolic math. In cases like this when 1/r factors disappear exactly, it wouldn't hurt if you reached this conclusion exactly. After all double precision is still just double precision.
For this you'll need the sympy module, define your function symbolically once, differentiate it symbolically once, turn your simplified result into a numpy function using sympy.lambdify, and use this numerical function as much as you need (assuming that this whole process runs in finite time and the resulting function is not too slow to use). Example:
import sympy as sym
# only needed for the example:
import numpy as np
import matplotlib.pyplot as plt
r = sym.symbols('r')
f = r*sym.ln(r)
df = sym.diff(r*f,r)
res_sym = sym.simplify(df/r)
res_num = sym.lambdify(r,res_sym,'numpy')
rarr = np.logspace(-12,-2,20)
darr = res_num(rarr)
plt.figure()
plt.loglog(rarr,np.abs(darr),'.-')
plt.xlabel(r'$r$')
plt.ylabel(r'$|\frac{1}{r} \frac{d}{dr}(r^2 \ln r)|$')
plt.tight_layout()
plt.show()
resulting in
As you see, the resulting function was vectorized thanks to lambdify using numpy during the conversion from symbolic to numeric function. Obviously, the best solution is the symbolic one as long as the resulting function is not so complicated to make its practical use impossible. I urge you to first try the symbolic version, and if for some reason it's not applicable, switch to complex step differentiation, with due caution.

Integral of Intensity function in python

There is a function which determine the intensity of the Fraunhofer diffraction pattern of a circular aperture... (more information)
Integral of the function in distance x= [-3.8317 , 3.8317] must be about 83.8% ( If assume that I0 is 100) and when you increase the distance to [-13.33 , 13.33] it should be about 95%.
But when I use integral in python, the answer is wrong.. I don't know what's going wrong in my code :(
from scipy.integrate import quad
from scipy import special as sp
I0=100.0
dist=3.8317
I= quad(lambda x:( I0*((2*sp.j1(x)/x)**2)) , -dist, dist)[0]
print I
Result of the integral can't be bigger than 100 (I0) because this is the diffraction of I0 ... I don't know.. may be scaling... may be the method! :(
The problem seems to be in the function's behaviour near zero. If the function is plotted, it looks smooth:
However, scipy.integrate.quad complains about round-off errors, which is very strange with this beautiful curve. However, the function is not defined at 0 (of course, you are dividing by zero!), hence the integration does not go well.
You may use a simpler integration method or do something about your function. You may also be able to integrate it to very close to zero from both sides. However, with these numbers the integral does not look right when looking at your results.
However, I think I have a hunch of what your problem is. As far as I remember, the integral you have shown is actually the intensity (power/area) of Fraunhofer diffraction as a function of distance from the center. If you want to integrate the total power within some radius, you will have to do it in two dimensions.
By simple area integration rules you should multiply your function by 2 pi r before integrating (or x instead of r in your case). Then it becomes:
f = lambda(r): r*(sp.j1(r)/r)**2
or
f = lambda(r): sp.j1(r)**2/r
or even better:
f = lambda(r): r * (sp.j0(r) + sp.jn(2,r))
The last form is best as it does not suffer from any singularities. It is based on Jaime's comment to the original answer (see the comment below this answer!).
(Note that I omitted a couple of constants.) Now you can integrate it from zero to infinity (no negative radii):
fullpower = quad(f, 1e-9, np.inf)[0]
Then you can integrate from some other radius and normalize by the full intensity:
pwr = quad(f, 1e-9, 3.8317)[0] / fullpower
And you get 0.839 (which is quite close to 84 %). If you try the farther radius (13.33):
pwr = quad(f, 1e-9, 13.33)
which gives 0.954.
It should be noted that we introduce a small error by starting the integration from 1e-9 instead of 0. The magnitude of the error can be estimated by trying different values for the starting point. The integration result changes very little between 1e-9 and 1e-12, so they seem to be safe. Of course, you could use, e.g., 1e-30, but then there may be numerical instability in the division. (In this case there isn't, but in general singularities are numerically evil.)
Let us do one thing still:
import matplotlib.pyplot as plt
import numpy as np
x = linspace(0.01, 20, 1000)
intg = np.array([ quad(f, 1e-9, xx)[0] for xx in x])
plt.plot(x, intg/fullpower)
plt.grid('on')
plt.show()
And this is what we get:
At least this looks right, the dark fringes of the Airy disk are clearly visible.
What comes to the last part of the question: I0 defines the maximum intensity (the units may be, e.g. W/m2), whereas the integral gives total power (if the intensity is in W/m2, the total power is in W). Setting the maximum intensity to 100 does not guarantee anything about the total power. That is why it is important to calculate the total power.
There actually exists a closed form equation for the total power radiated onto a circular area:
P(x) = P0 ( 1 - J0(x)^2 - J1(x)^2 ),
where P0 is the total power.
Note that you also can get a closed form solution for your integration using Sympy:
import sympy as sy
sy.init_printing() # LaTeX like pretty printing in IPython
x,d = sy.symbols("x,d", real=True)
I0=100
dist=3.8317
f = I0*((2*sy.besselj(1,x)/x)**2) # the integrand
F = f.integrate((x, -d, d)) # symbolic integration
print(F.evalf(subs={d:dist})) # numeric evalution
F evaluates to:
1600*d*besselj(0, Abs(d))**2/3 + 1600*d*besselj(1, Abs(d))**2/3 - 800*besselj(1, Abs(d))**2/(3*d)
with besselj(0,r) corresponding to sp.j0(r).
They might be a singularity in the integration algorithm when doing the jacobian at x = 0. You can exclude this points from the integration with "points":
f = lambda x:( I0*((2*sp.j1(x)/x)**2))
I = quad(f, -dist, dist, points = [0])
I get then the following result (is this your desired result?)
331.4990321315221

Categories

Resources