I am trying to use scipy.integrate.quad to integrate a function over a very large range (0..10,000). The function is zero over most of its range but has a spike in a very small range (e.g. 1,602..1,618).
When integrating, I would expect the output to be positive, but I guess that somehow quad's guessing algorithm is getting confused and outputting zero. What I would like to know is, is there a way to overcome this (e.g. by using a different algorithm, some other parameter, etc.)? I don't usually know where the spike is going to be, so I can't just split the integration range and sum the parts (unless somebody has a good idea on how to do that).
Thanks!
Sample output:
>>>scipy.integrate.quad(weighted_ftag_2, 0, 10000)
(0.0, 0.0)
>>>scipy.integrate.quad(weighted_ftag_2, 0, 1602)
(0.0, 0.0)
>>>scipy.integrate.quad(weighted_ftag_2, 1602, 1618)
(3.2710994652983256, 3.6297354011338712e-014)
>>>scipy.integrate.quad(weighted_ftag_2, 1618, 10000)
(0.0, 0.0)
You might want to try other integration methods, such as the integrate.romberg() method.
Alternatively, you can get the location of the point where your function is large, with weighted_ftag_2(x_samples).argmax(), and then use some heuristics to cut the integration interval around the maximum of your function (which is located at x_samples[….argmax()]. You must taylor the list of sampled abscissas (x_samples) to your problem: it must always contain points that are in the region where your function is maximum.
More generally, any specific information about the function to be integrated can help you get a good value for its integral. I would combine a method that works well for your function (one of the many methods offered by Scipy) with a reasonable splitting of the integration interval (for instance along the lines suggested above).
How about evaluating your function f() over each integer range [x, x+1),
and adding up e.g. romb(), as EOL suggests, where it's > 0:
from __future__ import division
import numpy as np
from scipy.integrate import romb
def romb_non0( f, a=0, b=10000, nromb=2**6+1, verbose=1 ):
""" sum romb() over the [x, x+1) where f != 0 """
sum_romb = 0
for x in xrange( a, b ):
y = f( np.arange( x, x+1, 1./nromb ))
if y.any():
r = romb( y, 1./nromb )
sum_romb += r
if verbose:
print "info romb_non0: %d %.3g" % (x, r) # , y
return sum_romb
#...........................................................................
if __name__ == "__main__":
np.set_printoptions( 2, threshold=100, suppress=True ) # .2f
def f(x):
return x if (10 <= x).all() and (x <= 12).all() \
else np.zeros_like(x)
romb_non0( f, verbose=1 )
Related
I have following function which I need to minimize utilizing least square method (I am using lmfit).
y = a * exp(-x/b) + c
I have for example following data:
profitlist = [-10000, 100.00, 1000.00, 100000.00, 1000000.00]
utilitylist = [0, 0.2, 0.4, 0.6, 1]
App returns the following error:
ValueError: NaN values detected in your input data or the output of your objective/model function - fitting algorithms cannot handle this! Please read https://lmfit.github.io/lmfit-py/faq.html#i-get-errors-from-nan-in-my-fit-what-can-i-do for more information.
Problem seems to be that: exp(-x/b) returns inf or -inf if profitList contains any bigger negative number (-1000 worked, -100000 not). So it overflows probably.
The values in the profitList can be very large float numbers and they are not always the same. So how can I optimize it with these huge numbers? It seems that lmfit does not support decimal numbers which would fix the issue... What can I do to make it work?
class LeastSquares:
def __init__(self, profitList, utilityList):
self.profitList = np.asarray(profitList)
self.utilityList = np.asanyarray(utilityList)
def function(self, params, x):
a = params["a"]
b = params["b"]
c = params["c"]
return a * np.exp(-x/b) + c
def residual(self, params, x, y):
return (y - self.function(params, x))**2
def setParameters(self, a_start, b_start, c_start):
parameters = Parameters()
parameters.add(name="a", value=a_start, min=None, max=0, vary=True)
parameters.add(name="b", value=b_start, vary=True, min=0.1, max=None)
parameters.add(name="c", value=c_start, vary=True)
return parameters
def startOptimalization(self):
parameters = self.setParameters(-1, 1, 1)
result = minimize(self.residual, parameters, args=(self.profitList, self.utilityList), method="leastsq")
result.params.pretty_print()
print(fit_report(result))
print("SSE")
print(np.sum(result.residual))
As you see, numpy.exp(arg) gives Infinity for any argument greater than ~709, and you will need to avoid such extreme values. The underlying solvers simply cannot solve them. Since your argument for arg is -x/b, you need to make sure that b is not so small as to blow up the argument to numpy.exp().
In fact, your code shows that you do set a lower bound on b of 0.1.
But with values of profitlist extending to 1e7, that lower bound is too small to prevent Infinity - your lower limit on b would have to be around 14,000.
If your values for profitlist are changing for each optimization run, you may need to do something like this (in your startOptimization):
parameters = self.setParameters(-1, 1, 1)
parameters['b'].min = max(abs(self.profitList))/700.0
result = minimize(self.residual, parameters, args=(self.profitList, self.utilityList), method="leastsq")
result.params.pretty_print()
Also, when fitting exponential changes, it is often helpful to compute your exponential model function, and then take the residual as the logarithm of your data and the logarithm of your model, effectively doing the fit in log-space, as you would likely plot the data.
And, finally, don't take the square or the sum of squares of the difference yourself, just return the residual array with sign in tact. That is, you will probably be better off using something like:
def residual(self, params, x, y):
return np.log(y) - np.log(self.function(params, x))
The only example/docs I can find are on the Scipy docs page.
To test, I'm looking at a time-independent Schrod eq in a 1d infinite potential well. This has a neat analytic solution found by solving the DE, and inserting boundary conditions of ψ(0) = 0, ψ(L) = 0, and that the func soln to 1, but this question applies to solving any DE where the BCs we know aren't for the initial value.
You can solve it numerically with Scipy's solve_ivp by starting with ψ(0) = 0, and cheating to place ψ'(0) appropriately using the analytic soln. Can use shooting method to find an appropriate E value, eg the normalization condition above.
These are two sets of BCs: ψ(0) = 0 for both, normalization for both, and a second value of ψ for the analytic approach, and an initial value of ψ' for the ivp approach. Scipy's solve_bvp seems to offer a solution using the first set of BCs numerically (since we're cheating by inserting ψ'), but i can't get it working. This pseudocode describes the problem, and is how I expect the API to behave:
bcs = {0: (0, None), L: (0, None)} # Two BCs on ψ; no BCs on derivative
x_span = (0, L)
sol = solve_bvp(rhs, bcs, x_span)
In reality, the code looks something like this, and I can't get it to work:
def bc(ψ_a, ψ_b):
return np.array([ψ_a[0], ψ_b[0]])
x_span = (0, L)
x_eval = np.linspace(x_span[0], x_span[1], int(1e5))
x_guess = np.array([0, L])
ψ_guess = np.array([[0, 1], [0, -1]])
res = solve_bvp(rhs_1d, bc, x_guess, ψ_guess)
I've no idea how to build the bc function, and don't know why the guesses are set up the way they are. And unsure how I can guess for the value of ψ without also inserting a guess for ψ'. (The docs imply you can) Also of note, the docs shows an example implying you can use solve_bvp for a normalization BC as well, but not sure how to approach. (Example is too sparse)
The equivalent and working ivp code, for ref: (Compare to my solve_bvp pseudocode)
Python code:
ψ_0 = (0, sqrt(2/L) * n*π/L)
x_span = (0, L)
sol = solve_ivp(rhs_1d, x_span, ψ_0)
For the eigenvalue problem
-u''+V(x)u = c*u
with boundary conditions
u(0)=0=u(L)
and normalization
int(u(x)^2, x=0 to L)=1
set up the integral as third component. With the eigenvalue as parameter these are 4 dimensions allowing for 4 boundary conditions, the additional 2 are that the integral at 0 is zero and that the integral at L has value 1.
# some length
L = 10;
# some potential function
def V(x): return 1+(2*x-L)**2;
# the ODE function
def odesys(x,y,p):
u,v,S = y; c=p[0]
return [v, (V(x)-c)*u , u**2 ]
# the boundary conditions
def boundary(y0, yL, c):
return [ y0[0], yL[0], y0[2], yL[2]-1 ]
With the initial guess you select approximately what eigenfunction/eigenvalue you will get, more or less.
n=11;
w = (np.pi*n)/L
x_init = np.linspace(0,L,4*n+1);
u_init = np.sin(w*x_init);
v_init = np.cos(w*x_init)*w;
y_init = [ u_init, v_init, x_init/L ]
There is no need to put too many points into the guess, just enough that the structure of the first component is faithfully represented.
Then call the solver with the prepared data, take notice that the default tolerance is 1e-3, if you want better you have to allow for a finer subdivision. If everything runs fine, plot the solution.
res = solve_bvp(odesys, boundary, x_init, y_init, p=[w**2], max_nodes=10000, tol=1e-6)
print res.message
if res.success:
x_disp = np.linspace(0,L,3001)
y_disp = res.sol(x_disp)
plt.plot(x_disp, y_disp[0])
plt.title("eigenfunction to eigenvalue $\lambda=%.6f$"%res.p[0]);
plt.grid(); plt.show()
For a program, I need an algorithm to very quickly compute the volume of a solid. This shape is specified by a function that, given a point P(x,y,z), returns 1 if P is a point of the solid and 0 if P is not a point of the solid.
I have tried using numpy using the following test:
import numpy
from scipy.integrate import *
def integrand(x,y,z):
if x**2. + y**2. + z**2. <=1.:
return 1.
else:
return 0.
g=lambda x: -2.
f=lambda x: 2.
q=lambda x,y: -2.
r=lambda x,y: 2.
I=tplquad(integrand,-2.,2.,g,f,q,r)
print I
but it fails giving me the following errors:
Warning (from warnings module):
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 321
warnings.warn(msg, IntegrationWarning)
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 321
warnings.warn(msg, IntegrationWarning)
IntegrationWarning: The algorithm does not converge. Roundoff error is detected
in the extrapolation table. It is assumed that the requested tolerance
cannot be achieved, and that the returned result (if full_output = 1) is
the best which can be obtained.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 321
warnings.warn(msg, IntegrationWarning)
IntegrationWarning: The occurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\scipy\integrate\quadpack.py", line 321
warnings.warn(msg, IntegrationWarning)
IntegrationWarning: The integral is probably divergent, or slowly convergent.
So, naturally, I looked for "special-purpose integrators", but could not find any that would do what I needed.
Then, I tried writing my own integration using the Monte Carlo method and tested it with the same shape:
import random
# Monte Carlo Method
def get_volume(f,(x0,x1),(y0,y1),(z0,z1),prec=0.001,init_sample=5000):
xr=(x0,x1)
yr=(y0,y1)
zr=(z0,z1)
vdomain=(x1-x0)*(y1-y0)*(z1-z0)
def rand((p0,p1)):
return p0+random.random()*(p1-p0)
vol=0.
points=0.
s=0. # sum part of variance of f
err=0.
percent=0
while err>prec or points<init_sample:
p=(rand(xr),rand(yr),rand(zr))
rpoint=f(p)
vol+=rpoint
points+=1
s+=(rpoint-vol/points)**2
if points>1:
err=vdomain*(((1./(points-1.))*s)**0.5)/(points**0.5)
if err>0:
if int(100.*prec/err)>=percent+1:
percent=int(100.*prec/err)
print percent,'% complete\n error:',err
print int(points),'points used.'
return vdomain*vol/points
f=lambda (x,y,z): ((x**2)+(y**2)<=4.) and ((z**2)<=9.) and ((x**2)+(y**2)>=0.25)
print get_volume(f,(-2.,2.),(-2.,2.),(-2.,2.))
but this works too slowly. For this program I will be using this numerical integration about 100 times or so, and I will also be doing it on larger shapes, which will take minutes if not an hour or two at the rate it goes now, not to mention that I want a better precision than 2 decimal places.
I have tried implementing a MISER Monte Carlo method, but was having some difficulties and I'm still unsure how much faster it would be.
So, I am asking if there are any libraries that can do what I am asking, or if there are any better algorithms which work several times faster (for the same accuracy). Any suggestions are welcome, as I've been working on this for quite a while now.
EDIT:
If I cannot get this working in Python, I am open to switching to any other language that is both compilable and has relatively easy GUI functionality. Any suggestions are welcome.
As the others have already noted, finding the volume of domain that's given by a Boolean function is hard. You could used pygalmesh (a small project of mine) which sits on top of CGAL and gives you back a tetrahedral mesh. This
import numpy
import pygalmesh
import meshplex
class Custom(pygalmesh.DomainBase):
def __init__(self):
super(Custom, self).__init__()
return
def eval(self, x):
return (x[0]**2 + x[1]**2 + x[2]**2) - 1.0
def get_bounding_sphere_squared_radius(self):
return 2.0
mesh = pygalmesh.generate_mesh(Custom(), cell_size=1.0e-1)
gives you
From there on out, you can use a variety of packages for extracting the volume. One possibility: meshplex (another one out of my zoo):
import meshplex
mp = meshplex.MeshTetra(mesh.points, mesh.cells["tetra"])
print(numpy.sum(mp.cell_volumes))
gives
4.161777
which is close enough to the true value 4/3 pi = 4.18879020478.... If want more precision, decrease the cell_size in the mesh generation above.
Your function is not a continuous function, I think it's difficult to do the integration.
How about:
import numpy as np
def sphere(x,y,z):
return x**2 + y**2 + z**2 <= 1
x, y, z = np.random.uniform(-2, 2, (3, 2000000))
sphere(x, y, z).mean() * (4**3), 4/3.0*np.pi
output:
(4.1930560000000003, 4.1887902047863905)
Or VTK:
from tvtk.api import tvtk
n = 151
r = 2.0
x0, x1 = -r, r
y0, y1 = -r, r
z0, z1 = -r, r
X,Y,Z = np.mgrid[x0:x1:n*1j, y0:y1:n*1j, z0:z1:n*1j]
s = sphere(X, Y, Z)
img = tvtk.ImageData(spacing=((x1-x0)/(n-1), (y1-y0)/(n-1), (z1-z0)/(n-1)),
origin=(x0, y0, z0), dimensions=(n, n, n))
img.point_data.scalars = s.astype(float).ravel()
blur = tvtk.ImageGaussianSmooth(input=img)
blur.set_standard_deviation(1)
contours = tvtk.ContourFilter(input = blur.output)
contours.set_value(0, 0.5)
mp = tvtk.MassProperties(input = contours.output)
mp.volume, mp.surface_area
output:
4.186006622559839, 12.621690438955586
It seems to be very difficult without giving a little hint on the boundary:
import numpy as np
from scipy.integrate import *
def integrand(z,y,x):
return 1. if x**2 + y**2 + z**2 <= 1. else 0.
g=lambda x: -2
h=lambda x: 2
q=lambda x,y: -np.sqrt(max(0, 1-x**2-y**2))
r=lambda x,y: np.sqrt(max(0, 1-x**2-y**2))
I=tplquad(integrand,-2.,2.,g,h,q,r)
print I
I want to integrate the product of two time- and frequency-shifted Hermite functions using scipy.integrate.quad.
However, since large order-polynomials are included, there are numerical errors occuring. Here's my Code:
import numpy as np
import scipy.integrate
import scipy.special as sp
from math import pi
def makeFuncs():
# Create the 0th, 4th, 8th, 12th and 16th order hermite function
return [lambda t, n=n: np.exp(-0.5*t**2)*sp.hermite(n)(t) for n in np.arange(5)*4]
def ambgfun(funcs, i, k, tau, f):
# Integrate f1(t)*f2(t+tau)*exp(-j2pift) over t from -inf to inf
f1 = funcs[i]
f2 = funcs[k]
func = lambda t: np.real(f1(t) * f2(t+tau) * np.exp(-1j*(2*pi)*f*t))
return scipy.integrate.quad(func, -np.inf, np.inf)
def main():
f = makeFuncs()
print "A00(0,0):", ambgfun(f, 0, 0, 0, 0)
print "A01(0,0):", ambgfun(f, 0, 1, 0, 0)
print "A34(0,0):", ambgfun(f, 3, 4, 0, 0)
if __name__ == '__main__':
main()
The hermite functions are orthogonal, thus all integrals should be equal to zero. However, they are not, as the output shows:
A00(0,0): (1.7724538509055159, 1.4202636805184462e-08)
A01(0,0): (8.465450562766819e-16, 8.862237123626351e-09)
A34(0,0): (-10.1875, 26.317246925873935)
How can I make this calculation more accurate? The hermite-function from scipy contain a weights variable which should be used for Gaussian Quadrature, as given in the documentation (http://docs.scipy.org/doc/scipy/reference/special.html#orthogonal-polynomials). However, I have not found a hint in the docs how to use these weights.
I hope you can help :)
Thanks, Max
The answer is that the result you get is numerically as close to zero as it gets. I don' think it's really possible to get much better results if you work with floating point numbers --- you are facing a general problem in numerical integration.
Consider this:
import numpy as np
from scipy import integrate, special
f = lambda t: np.exp(-t**2) * special.eval_hermite(12, t) * special.eval_hermite(16, t)
abs_ig, abs_err = integrate.quad(lambda t: abs(f(t)), -np.inf, np.inf)
ig, err = integrate.quad(f, -np.inf, np.inf)
print ig
# -10.203125
print abs_ig
# 2.22488114805e+15
print ig / abs_ig, err / abs_ig
# -4.58591912155e-15 1.18053770382e-14
The value of the integrand has therefore been computed to an accuracy comparable to the floating point epsilon. Because of the rounding error in subtracting values of a large-magnitude oscillating integrand, it's not really possible to get better results.
So how to proceed? In my experience, what you'd need to do now is to approach the problem not numerically, but analytically. Importantly, the Fourier transform of Hermite polynomials times the weight function is known, so you can work in the Fourier space all the time here.
I'm using right now the scipy.integrate.quad to successfully integrate some real integrands. Now a situation appeared that I need to integrate a complex integrand. quad seems not be able to do it, as the other scipy.integrate routines, so I ask: is there any way to integrate a complex integrand using scipy.integrate, without having to separate the integral in the real and the imaginary parts?
What's wrong with just separating it out into real and imaginary parts? scipy.integrate.quad requires the integrated function return floats (aka real numbers) for the algorithm it uses.
import scipy
from scipy.integrate import quad
def complex_quadrature(func, a, b, **kwargs):
def real_func(x):
return scipy.real(func(x))
def imag_func(x):
return scipy.imag(func(x))
real_integral = quad(real_func, a, b, **kwargs)
imag_integral = quad(imag_func, a, b, **kwargs)
return (real_integral[0] + 1j*imag_integral[0], real_integral[1:], imag_integral[1:])
E.g.,
>>> complex_quadrature(lambda x: (scipy.exp(1j*x)), 0,scipy.pi/2)
((0.99999999999999989+0.99999999999999989j),
(1.1102230246251564e-14,),
(1.1102230246251564e-14,))
which is what you expect to rounding error - integral of exp(i x) from 0, pi/2 is (1/i)(e^i pi/2 - e^0) = -i(i - 1) = 1 + i ~ (0.99999999999999989+0.99999999999999989j).
And for the record in case it isn't 100% clear to everyone, integration is a linear functional, meaning that ∫ { f(x) + k g(x) } dx = ∫ f(x) dx + k ∫ g(x) dx (where k is a constant with respect to x). Or for our specific case ∫ z(x) dx = ∫ Re z(x) dx + i ∫ Im z(x) dx as z(x) = Re z(x) + i Im z(x).
If you are trying to do a integration over a path in the complex plane (other than along the real axis) or region in the complex plane, you'll need a more sophisticated algorithm.
Note: Scipy.integrate will not directly handle complex integration. Why? It does the heavy lifting in the FORTRAN QUADPACK library, specifically in qagse.f which explicitly requires the functions/variables to be real before doing its "global adaptive quadrature based on 21-point Gauss–Kronrod quadrature within each subinterval, with acceleration by Peter Wynn's epsilon algorithm." So unless you want to try and modify the underlying FORTRAN to get it to handle complex numbers, compile it into a new library, you aren't going to get it to work.
If you really want to do the Gauss-Kronrod method with complex numbers in exactly one integration, look at wikipedias page and implement directly as done below (using 15-pt, 7-pt rule). Note, I memoize'd function to repeat common calls to the common variables (assuming function calls are slow as if the function is very complex). Also only did 7-pt and 15-pt rule, since I didn't feel like calculating the nodes/weights myself and those were the ones listed on wikipedia, but getting reasonable errors for test cases (~1e-14)
import scipy
from scipy import array
def quad_routine(func, a, b, x_list, w_list):
c_1 = (b-a)/2.0
c_2 = (b+a)/2.0
eval_points = map(lambda x: c_1*x+c_2, x_list)
func_evals = map(func, eval_points)
return c_1 * sum(array(func_evals) * array(w_list))
def quad_gauss_7(func, a, b):
x_gauss = [-0.949107912342759, -0.741531185599394, -0.405845151377397, 0, 0.405845151377397, 0.741531185599394, 0.949107912342759]
w_gauss = array([0.129484966168870, 0.279705391489277, 0.381830050505119, 0.417959183673469, 0.381830050505119, 0.279705391489277,0.129484966168870])
return quad_routine(func,a,b,x_gauss, w_gauss)
def quad_kronrod_15(func, a, b):
x_kr = [-0.991455371120813,-0.949107912342759, -0.864864423359769, -0.741531185599394, -0.586087235467691,-0.405845151377397, -0.207784955007898, 0.0, 0.207784955007898,0.405845151377397, 0.586087235467691, 0.741531185599394, 0.864864423359769, 0.949107912342759, 0.991455371120813]
w_kr = [0.022935322010529, 0.063092092629979, 0.104790010322250, 0.140653259715525, 0.169004726639267, 0.190350578064785, 0.204432940075298, 0.209482141084728, 0.204432940075298, 0.190350578064785, 0.169004726639267, 0.140653259715525, 0.104790010322250, 0.063092092629979, 0.022935322010529]
return quad_routine(func,a,b,x_kr, w_kr)
class Memoize(object):
def __init__(self, func):
self.func = func
self.eval_points = {}
def __call__(self, *args):
if args not in self.eval_points:
self.eval_points[args] = self.func(*args)
return self.eval_points[args]
def quad(func,a,b):
''' Output is the 15 point estimate; and the estimated error '''
func = Memoize(func) # Memoize function to skip repeated function calls.
g7 = quad_gauss_7(func,a,b)
k15 = quad_kronrod_15(func,a,b)
# I don't have much faith in this error estimate taken from wikipedia
# without incorporating how it should scale with changing limits
return [k15, (200*scipy.absolute(g7-k15))**1.5]
Test case:
>>> quad(lambda x: scipy.exp(1j*x), 0,scipy.pi/2.0)
[(0.99999999999999711+0.99999999999999689j), 9.6120083407040365e-19]
I don't trust the error estimate -- I took something from wiki for recommended error estimate when integrating from [-1 to 1] and the values don't seem reasonable to me. E.g., the error above compared with truth is ~5e-15 not ~1e-19. I'm sure if someone consulted num recipes, you could get a more accurate estimate. (Probably have to multiple by (a-b)/2 to some power or something similar).
Recall, the python version is less accurate than just calling scipy's QUADPACK based integration twice. (You could improve upon it if desired).
I realize I'm late to the party, but perhaps quadpy (a project of mine) can help. This
import quadpy
import numpy
val, err = quadpy.quad(lambda x: numpy.exp(1j * x), 0, 1)
print(val)
correctly gives
(0.8414709848078964+0.4596976941318605j)