I am trying to compute the integrals more precise by specifying the parameter epsabs for scipy.integrate.quad, say we are integrating the function sin(x) / x^2 from 1e-16 to 1.0
from scipy.integrate import quad
import numpy
integrand = lambda x: numpy.sin(x) / x ** 2
integral = quad(integrand, 1e-16, 1.0)
This gives us
(36.760078801255595, 0.01091187908038005)
To make the results more precise, we specify the absolute error tolerance by epsabs
from scipy.integrate import quad
import numpy
integrand = lambda x: numpy.sin(x) / x ** 2
integral = quad(integrand, 1e-16, 1.0, epsabs = 1e-4)
The result is exactly the same and the error is still as large as 0.0109! Am I understanding the parameter epsabs wrong? What should I do differently to increase the precision of integral?
According to scipy manual quad function has limit argument to specify
An upper bound on the number of subintervals used in the adaptive algorithm.
By default the value of limit is 50.
You code return warning message
quadpack.py:364: IntegrationWarning: The maximum number of
subdivisions (50) has been achieved. If increasing the limit yields
no improvement it is advised to analyze the integrand in order to
determine the difficulties. If the position of a local difficulty
can be determined (singularity, discontinuity) one will probably
gain from splitting up the interval and calling the integrator on
the subranges. Perhaps a special-purpose integrator should be used.
warnings.warn(msg, IntegrationWarning)
You have to change limit argument, i.e.:
from scipy.integrate import quad
import numpy
integrand = lambda x: numpy.sin(x) / x ** 2
print(quad(integrand, 1e-16, 1.0, epsabs = 1e-4, limit=100))
Output:
(36.7600787611414, 3.635057215414274e-05)
There is no warning message in output. Number of subdivisions is under 100 and quad got required accuracy.
Related
I am running into an issue with integration in Python returning incorrect values for an integral with a known analytical solution. The integral in question is
LaTex expression for the integral (can't post photos yet)
For the value of sigma I am using (1e-15),the solution to this integral has a value of ~ 1.25e-45. However when I use the scipy integrate package to calculate this I get zero, which I believe has to do with the precision required from the calculation.
#scipy method
import numpy as np
from scipy.integrate import quad
sigma = 1e-15
f = lambda x: (x**2) * np.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = quad(f,0,np.inf)[0]
print(solution)
0.0
And since precision was an issue I tried to also use another recommended package mpmath, which did not return 0, but was off by ~7 orders of magnitude from the correct answer. Testing larger values of sigma result in the solution being very close to the corresponding exact solution, but it seems to get increasingly incorrect as sigma gets smaller.
#mpmath method
import mpmath as mp
sigma = 1e-15
f = lambda x: (x**2) * mp.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = mp.quad(f,[0,np.inf])
print(solution)
2.01359486678988e-52
From here I could use some advice on getting a more accurate answer, as I would like to have some confidence applying python integration methods to integrals that cannot be solved analytically.
you should add extra points for the function as 'mid points', i added 100 points from 1e-100 to 1 to increase accuracy.
#mpmath method
import numpy as np
import mpmath as mp
sigma = 1e-15
f = lambda x: (x**2) * mp.exp(-x**2/(2*sigma**2))
#perform the integral and print the result
solution = mp.quad(f,[0,*np.logspace(-100,0,100),np.inf])
print(solution)
1.25286197427129e-45
Edit: turns out you need 10000 points instead of 100 points to get a more accurate result, of 1.25331413731554e-45, but it takes a few seconds to calculate.
Most numerical integrators will run into issues with numbers that small due to floating point precision. One solution is to scale the integral before calculating. Letting q -> x/sigma, the integral becomes:
f = lambda q: sigma**3*(q**2) * np.exp(-q**2/2)
solution = quad(f, 0, np.inf)[0]
# solution: 1.2533156529417088e-45
I am asked to write an implementation of the gradient descent in python with the signature gradient(f, P0, gamma, epsilon) where f is an unknown and possibly multivariate function, P0 is the starting point for the gradient descent, gamma is the constant step and epsilon the stopping criteria.
What I find tricky is how to evaluate the gradient of f at the point P0 without knowing anything on f. I know there is numpy.gradient but I don't know how to use it in the case where I don't know the dimensions of f. Also, numpy.gradient works with samples of the function, so how to choose the right samples to compute the gradient at a point without any information on the function and the point?
I'm assuming here, So how can i choose a generic set of samples each time I need to compute the gradient at a given point? means, that the dimension of the function is fixed and can be deduced from your start point.
Consider this a demo, using scipy's approx_fprime, which is an easier to use wrapper-method for numerical-differentiation and also used in scipy's optimizers when a jacobian is needed, but not given.
Of course you can't ignore the parameter epsilon, which can make a difference depending on the data.
(This code is also ignoring optimize's args-parameter which is usually a good idea; i'm using the fact that A and b are inside the scope here; surely not best-practice)
import numpy as np
from scipy.optimize import approx_fprime, minimize
np.random.seed(1)
# Synthetic data
A = np.random.random(size=(1000, 20))
noiseless_x = np.random.random(size=20)
b = A.dot(noiseless_x) + np.random.random(size=1000) * 0.01
# Loss function
def fun(x):
return np.linalg.norm(A.dot(x) - b, 2)
# Optimize without any explicit jacobian
x0 = np.zeros(len(noiseless_x))
res = minimize(fun, x0)
print(res.message)
print(res.fun)
# Get numerical-gradient function
eps = np.sqrt(np.finfo(float).eps)
my_gradient = lambda x: approx_fprime(x, fun, eps)
# Optimize with our gradient
res = res = minimize(fun, x0, jac=my_gradient)
print(res.message)
print(res.fun)
# Eval gradient at some point
print(my_gradient(np.ones(len(noiseless_x))))
Output:
Optimization terminated successfully.
0.09272331925776327
Optimization terminated successfully.
0.09272331925776327
[15.77418041 16.43476772 15.40369129 15.79804516 15.61699104 15.52977276
15.60408688 16.29286766 16.13469887 16.29916573 15.57258797 15.75262356
16.3483305 15.40844536 16.8921814 15.18487358 15.95994091 15.45903492
16.2035532 16.68831635]
Using:
# Get numerical-gradient function with a way too big eps-value
eps = 1e-3
my_gradient = lambda x: approx_fprime(x, fun, eps)
shows that eps is a critical parameter resulting in:
Desired error not necessarily achieved due to precision loss.
0.09323354898565098
Scipy's quad function can be used to numerically integrate indefinite integrals. However, some functions have a rather narrow range where most of their area is (for example, likelihood functions) and quad sometimes misses it. It returns that the integral is approximately 0 when it really just missed the range of the function that isn't 0.
For example, the area under the curve for a log normal distribution from 0 to inf should be 1. Here it succeeds with a geometric mean of 1 but not 2:
from scipy.integrate import quad
from scipy.stats import lognorm
from scipy import inf
quad(lambda x: lognorm.pdf(x, 0.01, scale=1), 0, inf)
# (1.0000000000000002, 1.6886909404731594e-09)
quad(lambda x: lognorm.pdf(x, 0.01, scale=2), 0, inf)
# (6.920637959567767e-14, 1.2523928482954713e-13)
I often know beforehand approximately where the bulk of the mass is. How do I tell quad to start there? If this isn't possible, I'll accept a different tool.
The points parameter of the quad method can be used to tell it where (approximately) it should look. It can't be used with an improper integral, so the range of integration needs to be split into the finite interval up to the last point, plus an infinite tail.
points = (0.1, 1, 10, 100)
func = lambda x: lognorm.pdf(x, 0.01, scale=2) # works for other scales too
integral = quad(func, 0, points[-1], points=points)[0] + quad(func, points[-1], np.inf)[0]
A geometric sequence of points, like in this example, is good enough for a wide range of scales.
If the boundaries are 0, -inf, or inf, then the first guess made by quad is always 1. This can be exploited by shifting the integral so that the waypoint is at 1. For example, shifting the log normal distribution so that its mode is at 1 doesn't change the area, but guarantees that quad finds the bulk of the distribution:
from scipy.integrate import quad
from scipy.stats import lognorm
from scipy import exp, log, inf
mode = exp(log(2) - 0.01**2)
quad(lambda x: lognorm.pdf(x + mode - 1, 0.01, scale=2), -inf, inf)
# (0.9999999999999984, 2.2700129642154882e-09)
This only works if there is only one point of interest and the bounds are -inf to inf (otherwise, shifting the function shifts the bounds, which changes the first guess). If so, then this allows the integral to be computed with a single call to quad.
There is a function which determine the intensity of the Fraunhofer diffraction pattern of a circular aperture... (more information)
Integral of the function in distance x= [-3.8317 , 3.8317] must be about 83.8% ( If assume that I0 is 100) and when you increase the distance to [-13.33 , 13.33] it should be about 95%.
But when I use integral in python, the answer is wrong.. I don't know what's going wrong in my code :(
from scipy.integrate import quad
from scipy import special as sp
I0=100.0
dist=3.8317
I= quad(lambda x:( I0*((2*sp.j1(x)/x)**2)) , -dist, dist)[0]
print I
Result of the integral can't be bigger than 100 (I0) because this is the diffraction of I0 ... I don't know.. may be scaling... may be the method! :(
The problem seems to be in the function's behaviour near zero. If the function is plotted, it looks smooth:
However, scipy.integrate.quad complains about round-off errors, which is very strange with this beautiful curve. However, the function is not defined at 0 (of course, you are dividing by zero!), hence the integration does not go well.
You may use a simpler integration method or do something about your function. You may also be able to integrate it to very close to zero from both sides. However, with these numbers the integral does not look right when looking at your results.
However, I think I have a hunch of what your problem is. As far as I remember, the integral you have shown is actually the intensity (power/area) of Fraunhofer diffraction as a function of distance from the center. If you want to integrate the total power within some radius, you will have to do it in two dimensions.
By simple area integration rules you should multiply your function by 2 pi r before integrating (or x instead of r in your case). Then it becomes:
f = lambda(r): r*(sp.j1(r)/r)**2
or
f = lambda(r): sp.j1(r)**2/r
or even better:
f = lambda(r): r * (sp.j0(r) + sp.jn(2,r))
The last form is best as it does not suffer from any singularities. It is based on Jaime's comment to the original answer (see the comment below this answer!).
(Note that I omitted a couple of constants.) Now you can integrate it from zero to infinity (no negative radii):
fullpower = quad(f, 1e-9, np.inf)[0]
Then you can integrate from some other radius and normalize by the full intensity:
pwr = quad(f, 1e-9, 3.8317)[0] / fullpower
And you get 0.839 (which is quite close to 84 %). If you try the farther radius (13.33):
pwr = quad(f, 1e-9, 13.33)
which gives 0.954.
It should be noted that we introduce a small error by starting the integration from 1e-9 instead of 0. The magnitude of the error can be estimated by trying different values for the starting point. The integration result changes very little between 1e-9 and 1e-12, so they seem to be safe. Of course, you could use, e.g., 1e-30, but then there may be numerical instability in the division. (In this case there isn't, but in general singularities are numerically evil.)
Let us do one thing still:
import matplotlib.pyplot as plt
import numpy as np
x = linspace(0.01, 20, 1000)
intg = np.array([ quad(f, 1e-9, xx)[0] for xx in x])
plt.plot(x, intg/fullpower)
plt.grid('on')
plt.show()
And this is what we get:
At least this looks right, the dark fringes of the Airy disk are clearly visible.
What comes to the last part of the question: I0 defines the maximum intensity (the units may be, e.g. W/m2), whereas the integral gives total power (if the intensity is in W/m2, the total power is in W). Setting the maximum intensity to 100 does not guarantee anything about the total power. That is why it is important to calculate the total power.
There actually exists a closed form equation for the total power radiated onto a circular area:
P(x) = P0 ( 1 - J0(x)^2 - J1(x)^2 ),
where P0 is the total power.
Note that you also can get a closed form solution for your integration using Sympy:
import sympy as sy
sy.init_printing() # LaTeX like pretty printing in IPython
x,d = sy.symbols("x,d", real=True)
I0=100
dist=3.8317
f = I0*((2*sy.besselj(1,x)/x)**2) # the integrand
F = f.integrate((x, -d, d)) # symbolic integration
print(F.evalf(subs={d:dist})) # numeric evalution
F evaluates to:
1600*d*besselj(0, Abs(d))**2/3 + 1600*d*besselj(1, Abs(d))**2/3 - 800*besselj(1, Abs(d))**2/(3*d)
with besselj(0,r) corresponding to sp.j0(r).
They might be a singularity in the integration algorithm when doing the jacobian at x = 0. You can exclude this points from the integration with "points":
f = lambda x:( I0*((2*sp.j1(x)/x)**2))
I = quad(f, -dist, dist, points = [0])
I get then the following result (is this your desired result?)
331.4990321315221
I've discovered a strange behavior when using scipy.integrate.quad. This behavior also shows up in Octave's quad function, which leads me to believe that it may have something to do with QUADPACK itself. Interestingly enough, using the exact same Octave code, this behavior does not show up in MATLAB.
On to the question. I'm numerically integrating a lognormal distribution over various bounds. For F is cdf of lognormal, a is lower bound and b is upper bound, I find that under some conditions,
integral(F, a, b) = 0 when b is a "very large number," while
integral(F, a, b) = the correct limit when b is np.inf. (or just Inf for Octave.)
Here's some example code to show it in action:
from __future__ import division
import numpy as np
import scipy.stats as stats
from scipy.integrate import quad
# Set up the probability space:
sigma = 0.1
mu = -0.5*(sigma**2) # To get E[X] = 1
N = 7
z = stats.lognormal(sigma, 0, np.exp(mu))
# Set up F for integration:
F = lambda x: x*z.pdf(x)
# An example that appears to work correctly:
a, b = 1.0, 10
quad(F, a, b)
# (0.5199388..., 5.0097567e-11)
# But if we push it higher, we get a value which drops to 0:
quad(F, 1.0, 1000)
# (1.54400e-11, 3.0699e-11)
# HOWEVER, if we shove np.inf in there, we get correct answer again:
quad(F, 1.0, np.inf)
# (0.5199388..., 3.00668e-09)
# If we play around we can see where it "breaks:"
quad(F, 1.0, 500) # Ok
quad(F, 1.0, 831) # Ok
quad(F, 1.0, 832) # Here we suddenly hit close to zero.
quad(F, 1.0, np.inf) # Ok again
What is going on here? Why does quad(F, 1.0, 500) evaluate to approximately the correct thing, but quad(F, 1.0, b) goes to zero for all values 832 <= b < np.inf?
While I'm not exactly familiar with QUADPACK, adaptive integration generally works by increasing resolution until the answer no longer improves. Your function is so close to 0 for most of the interval (with F(10)==9.356e-116) that the improvement is negligible for the initial grid points that quad chooses, and it decides that the integral must be close to 0. Basically, if your data hides in a very narrow subinterval of the range of integration, quad eventually won't be able to find it.
For integration from 0 to inf, the interval obviously cannot be subdivided into a finite number of intervals, so quad will need some preprocessing before computing the integral. For example, a change of variables like y=1/(1+x) would map the interval 0..inf to 0..1. Subdividing that interval will sample more points near zero from the original function, enabling quad to find your data.
try lowering the error tolerance
>>> quad(F, a, 1000, epsabs=1.49e-11)
(0.5199388058383727, 2.6133800952484582e-11)
I guess numerical integration is just sensitive to certain configuration. You can try to debug it by calling quad(..., full_output=1) and analyzing the verbose output carefully. Sorry if the answer is not satisfactory though.