how to improve this adaptive trapezoidal rule? - python

I have attempted an exercise from the computational physics written by Newman and written the following code for an adaptive trapezoidal rule. When the error estimate of each slide is larger than the permitted value, it divides that portion into two halves. I am just wondering what else I can do to make the algorithm more efficient.
xm=[]
def trap_adapt(f,a,b,epsilon=1.0e-8):
def step(x1,x2,f1,f2):
xm = (x1+x2)/2.0
fm = f(xm)
h1 = x2-x1
h2 = h1/2.0
I1 = (f1+f2)*h1/2.0
I2 = (f1+2*fm+f2)*h2/2.0
error = abs((I2-I1)/3.0) # leading term in the error expression
if error <= h2*delta:
points.append(xm) # add the points to the list to check if it is really using more points for more rapid-varying regions
return h2/3*(f1 + 4*fm + f2)
else:
return step(x1,xm,f1,fm)+step(xm,x2,fm,f2)
delta = epsilon/(b-a)
fa, fb = f(a), f(b)
return step(a,b,fa,fb)
Besides, I used a few simple formula to compare this to Romberg integration, and found that for the same accuracy, this adaptive method uses many more point to calculate the integral.
Is it just because of its inherent limitations? Any advantages of using this adaptive algorithm instead of the Romberg method? any ways to make it faster and more accurate?

Your code is refining to meet an error tolerance in each individual subinterval. It's also using a low-order integration rule. Improvements in both of these can significantly reduce the number of function evaluations.
Rather than considering the error in each subinterval separately, more advanced codes compute the total error over all the subintervals and refine until the total error is below the desired threshold. Subintervals are chosen for refinement according to their contribution to the total error, with larger errors being refined first. Typically a priority queue is used to quickly chose the subinterval for refinement.
Higher-order integration rules can integrate more complicated functions exactly. For example, your code is based on Simpson's rule, which is exact for polynomials of degree up to 3. A more advanced code will probably use a rule that's exact for polynomials of much higher degree (say 10-15).
From a practical point of view, the simplest thing is to use a canned routine that implements the above ideas, e.g., scipy.integrate.quad. Unless you have particular knowledge of what you want to integrate, you're unlikely to do better.
Romberg integration requires evaluation at equally-spaced points. If you can evaluate the function at any point, then other methods are generally more accurate for "smooth" (polynomial-like) functions. And if your function is not smooth everywhere, then an adaptive code will do much better because it can focus on beating down the error in the non-smooth regions.

Related

How does automatic differentiation with respect to the input work?

I've been trying to understand how automatic differentiation (autodiff) works. There are several implementations of this that can be found in Tensorflow, PyTorch and other programs.
There are three aspects of automatic differentiation that currently seem vague to me.
The exact process used to calculate the gradients
How autodiff works with respect to inputs
How autodiff works with respect to a singular value as input
So far, it seems to roughly follow the following steps:
Break up original function into elementary operations (individual arithmetic operations, composition and function calls).
The elementary operations are combined to form a computational graph in such a way that the original function can be calculated using the computational graph.
The computational graph is executed for a certain input, and each operation is recorded
Walking through the recorded operations in reverse using the chain rule gives us the gradient.
First of all, is this a correct overview of the steps that are taken in automatic differentiation?
Secondly, how would the above process work for a derivative with respect to the inputs. For instance, a function would need a difference in the x value. Does that mean that the derivative can only be calculated after at least two different x values have been provided as the input? Or does it require multiple inputs at once (i.e. vector input) over which it can calculate a difference? And how does this compare when we calculate the gradient with respect to the model weights (i.e. as done in backpropagation).
Thirdly, how can we take the derivative of a singular value. Take, for instance, the following Python code where the derivative of is calculated:
x = tf.constant(3.0)
with tf.GradientTape() as tape:
  tape.watch(x)
  y = x**2
# dy = 2x * dx
dy_dx = tape.gradient(y, x)
print(dy_dx.numpy()) # prints: '6.0'
Since dx is the difference between several x inputs, would that not mean that dx = 0?
I found that this paper had a pretty good overview of the various modes of autodiff. As well as the differences as compared to numerical and symbolic differentiation. However, it did not bring a full understanding and I would still like to understand the autodiff process in context of these traditional differentiation techniques.
Rather than applying it practically, I would love to get a more theoretical understanding.
I had similar questions in my mind a few weeks ago until I started to code my own Automatic Differentiation package tensortrax in Python. It uses forward-mode AD with a hyper-dual number approach. I wrote a Readme (landing page of the repository, section Theory) with an example which could be of interest for you.
I think what you need to understand first is what is a derivative, many math textbooks could help you with that. The notation dx means an infinitesimal variation, so you not actually compute any difference, but do a symbolic operation on your function f that transforms it to a function f' also noted df/dx, which you then apply at any point where it is defined.
Regarding the algorithm used for automatic differentiation, you understood it right, the part that you seem to be missing is how the derivatives of elementary operations are computed and what do they mean, but it would be hard to do a crash course about that in a SO answer.

Checkgradient without solving optimization problem in MATLAB

I have a relatively complicated function and I have calculated the analytical form of the Jacobian of this function. However, sometimes, I mess up this Jacobian.
MATLAB has a nice way to check for the accuracy of the Jacobian when using some optimization technique as described here.
The problem though is that it looks like MATLAB solves the optimization problem and then returns if the Jacobian was correct or not. This is extremely time consuming, especially considering that some of my optimization problems take hours or even days to compute.
Python has a somewhat similar function in scipy as described here which just compares the analytical gradient with a finite difference approximation of the gradient for some user provided input.
Is there anything I can do to check the accuracy of the Jacobian in MATLAB without having to solve the entire optimization problem?
A laborious but useful method I've used for this sort of thing is to check that the (numerical) integral of the purported derivative is the difference of the function at the end points. I have found this more convenient than comparing fractions like (f(x+h)-f(x))/h with f'(x) because of the difficulty of choosing h so that on the one hand h is not so small that the fraction is not dominated by rounding error and on the other h is small enough that the fraction should be close to f'(x)
In the case of a function F of a single variable, the assumption is that you have code f to evaluate F and fd say to evaluate F'. Then the test is, for various intervals [a,b] to look at the differences, which the fundamental theorem of calculus says should be 0,
Integral{ 0<=x<=b | fd(x)} - (f(b)-f(a))
with the integral being computed numerically. There is no need for the intervals to be small.
Part of the error will, of course, be due to the error in the numerical approximation to the integral. For this reason I tend to use, for example, and order 40 Gausss Legendre integrator.
For functions of several variables, you can test one variable at a time. For several functions, these can be tested one at a time.
I've found that these tests, which are of course by no means exhaustive, show up the kinds of mistakes that occur in computing derivatives quire readily.
Have you considered the usage of Complex step differentiation to check your gradient? See this description

How to effectively solve a compound cost function optimisation problem?

I want to solve the following optimization problem with Python:
I have a black box function f with multiple variables as input.
The execution of the black box function is quite time consuming, therefore I would like to avoid a brute force approach.
I would like to find the optimum input parameters for that black box function f.
In the following, for simplicity I just write the dependency for one dimension x.
An optimum parameter x is defined as:
the cost function cost(x) is maximized with the sum of
f(x) value
a maximum standard deviation of f(x)
.
cost(x) = A * f(x) + B * max(standardDeviation(f(x)))
The parameters A and B are fix.
E.g., for the picture below, the value of x at the position 'U' would be preferred over the value of x at the positon of 'V'.
My question is:
Is there any easily adaptable framework or process that I could utilize (similar to e. g. simulated annealing or bayesian optimisation)?
As mentioned, I would like to avoid a brute force approach.
I’m still not 100% sure of your approach, but does this formula ring true to you:
A * max(f(x)) + B * max(standardDeviation(f(x)))
?
If it does, then I guess you may want to consider that maximizing f(x) may (or may not) be compatible with maximizing the standard deviation of f(x), which means you may be facing a multi-objective optimization problem.
Again, you haven’t specified what f(x) returns - is it a vector? I hope it is, otherwise I’m unclear on what you can calculate the standard deviation on.
The picture you posted is not so obvious to me. F(x) is the entire black curve, it has a maximum at the point v, but what can you say about the standard deviation? To calculate the standard deviation of you have to take into account the entire f(x) curve (including the point u), not just the neighborhood of u and v. If you only want to get the standard deviation in an interval around a maximum for f(x), then I think you’re out of luck when it comes to frameworks. The best thing that comes to my mind is to use a local (or maybe global, better) optimization algorithm to hunt for the maximum of f(x) - simulated annealing, differential evolution, tunnelling, and so on - and then, when you have found a maximum for f(x), sample a few points on the left and right of your optimum and calculate the standard deviation of these evaluations. Then you’ll have to decide if the combination of the maximum of f(x) and this standard deviation is good enough or not compared to any previous “optimal” point found.
This is all speculation, as I’m unsure that your problem is really an optimization one or simply a “peak finding” exercise, for which there are many different - and more powerful and adequate- methods.
Andrea.

How does scipy.integrate.quad know when to stop?

I have a piece of code that I am using scipy.integrate.quad. The limits of integration are minus infinity to infinity. It runs OK, but I would like it faster.
The nature of the problem is that the function being integrated is the product of three functions: (1) one that is narrow (between zero and (2) one that is wide (between, say, 200,000 and 500,000), and (3) one that falls off as 1/abs(x).
I only need accuracy to .1%, if that.
I could do a lot of work and actually determine integration limits that are real numbers so no excess computation gets done; outside the regions of functions 1 and 2 they are both zero, so the 1/x doesn't even come into play there. But it would be a fair amount of error-prone code calculations.
How does this function know how to optimize, and is it pretty good at it, with infinite bounds?
Can I tune it through passing in guidance (like error tolerance)?
Or, would it be worthwhile to try to give it limited integration bounds?
quad uses different algorithms for finite and infinite intervals, but the general idea is the same: the integral is computed using two related methods (for example, 7-point Gauss rule and 15-point Kronrod rule), and the difference between those results provides an estimate for how accurate they are. If the accuracy is low, the interval is bisected and the process repeats for subintervals. A detailed explanation is beyond the scope of a Stack Overflow answer; numerical integration is complicated.
For large or infinite integration bounds, the accuracy and efficiency depend on the algorithm being able to locate the main features of the function. Passing the bounds as -np.inf, np.inf is risky. For example,
quad(lambda x: np.exp(-(x-20)**2), -np.inf, np.inf)
returns a wrong result (essentially zero instead of 1.77) because it does not notice the bump of the Gaussian function near 20.
On the other hand, arbitrarily imposing a finite interval is questionable in that you give up any control over error (no estimate on what was contained in the infinite tails that you cut off). I suggest the following:
Split the integral into three: (-np.inf, A), (A, B), and (B, np.inf) where, say, A is -1e6 and B is 1e6.
For the integral over (A, B), provide points parameter, which locates the features ("narrow parts") of the function. For example,
quad(lambda x: np.exp(-(x-20)**2), -1e6, 1e6, points=[10, 30])
returns 1.77 as it should.
Adjust epsabs (absolute error) and epsrel (relative error) to within desired accuracy, if you find that the default accuracy is too demanding.

Least Squares: Python

I am trying to implement least squares:
I have: $y=\theta\omega$
The least square solution is \omega=(\theta^{T}\theta)^{-1}\theta^{T}y
I tryied:
import numpy as np
def least_squares1(y, tx):
"""calculate the least squares solution."""
w = np.dot(np.linalg.inv(np.dot(tx.T,tx)), np.dot(tx.T,y))
return w
The problem is that this method becomes quickly unstable
(for small problems its okay)
I realized that, when I compared the result to this least square calculation:
import numpy as np
def least_squares2(y, tx):
"""calculate the least squares solution."""
a = tx.T.dot(tx)
b = tx.T.dot(y)
return np.linalg.solve(a, b)
Compare both methods:
I tried to fit data with a polynomial of degree 12 [1, x,x^2,x^3,x^4...,x^12]
First method:
Second method:
Do you know why the first method diverges for large polynomials ?
P.S. I only added "import numpy as np" for your convinience, if you want to test the functions.
There are three points here:
One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.
The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq
The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials
I am going to improve on what was said before. I answered this yesterday.
The problem with higher order polynomials is something called Runge's phenomena. The reason why the person resorted orthogonal polynomials which are known as Hermite polynomials is that they attempt to get rid of the Gibbs phenomenon which is an adverse oscillatory effect when Fourier series methods are applied to non-periodic signals.
You can sometimes improve under the conditioning be resorting to regularizing methods if the matrix is low rank as I did in the other post. Other parts may be due to smoothness properties of the vector.

Categories

Resources