I am trying to find a root of an equation using Newton-Raphson provided by SciPy (scipy.optimize.newton).
At the moment I don't have the fprime values that the documentation advises to use, and as far as I am aware this means that the Secant method is being used to find the roots.
Since the Newton-Raphson method has faster convergence than the Secant method, my gut thinks that maybe I should numerically approximate fprime and provide it so that Newton's method is used.
Which one would generally lead to faster convergence / faster actual computing of my roots?
Just using scipy.optimize.newton without providing fprime (i.e. the Secant Method, or
Using numerical differentiation to compute fprime (e.g. with numpy.diff) and providing it to scipy.optimize.newton so that the Newton-Raphson method is used.
The book Numerical Recipes in C, 2nd edition, in section "9.4 Newton-Raphson Method Using Derivative" on page 365, states:
The Newton-Raphson formula can also be applied using a numerical
difference to approximate the true local derivative,
f'(x) ≈ (f(x + dx) - f(x)) / dx .
This is not, however, a recommended procedure for the following
reasons: (i) You are doing two function evaluations per step, so at
best the superlinear order of convergence will be only sqrt(2). (ii)
If you take dx too small you will be wiped out by roundoff, while if
you take it too large your order of convergence will be only linear,
no better than using the initial evaluation f'(x_0) for all
subsequent steps. Therefore, Newton-Raphson with numerical derivatives
is (in one dimension) always dominated by the secant method of section
9.2.
(That was edited to fit the limitations of this site.) Choosing another method to improve the accuracy of the numeric derivative would increase the number of function evaluations and would thus decrease the order of convergence even more. Therefore you should choose your first method, which ends up using the secant method to find a root.
Related
I've written a Python script to solve the Time Difference of Arrival (TDoA) angular reconstruction problem in 3-dimensions. To do so, I'm using SciPy's scipy.optimize.root root finding algorithm to solve a system of nonlinear equations. I find that the Levenberg-Marquardt method is the only supported method capable of reliably producing accurate results (most others simply fail).
I'd like to assess the uncertainty in the resulting solution. For most methods (including the default hybr method), SciPy returns the inverse Hessian of the objective function (i.e. the covariance matrix), from which one may begin to calculate the uncertainty(ies) in the found roots. Unfortunately this is not the case for the Levenberg-Marquardt method (which I'm admittedly much less familiar with on a mathematical method than the other methods... it just seems to work).
How (in general) can I estimate the uncertainties in the solution returned by scipy.optimize.root when using the lm method?
I have a relatively complicated function and I have calculated the analytical form of the Jacobian of this function. However, sometimes, I mess up this Jacobian.
MATLAB has a nice way to check for the accuracy of the Jacobian when using some optimization technique as described here.
The problem though is that it looks like MATLAB solves the optimization problem and then returns if the Jacobian was correct or not. This is extremely time consuming, especially considering that some of my optimization problems take hours or even days to compute.
Python has a somewhat similar function in scipy as described here which just compares the analytical gradient with a finite difference approximation of the gradient for some user provided input.
Is there anything I can do to check the accuracy of the Jacobian in MATLAB without having to solve the entire optimization problem?
A laborious but useful method I've used for this sort of thing is to check that the (numerical) integral of the purported derivative is the difference of the function at the end points. I have found this more convenient than comparing fractions like (f(x+h)-f(x))/h with f'(x) because of the difficulty of choosing h so that on the one hand h is not so small that the fraction is not dominated by rounding error and on the other h is small enough that the fraction should be close to f'(x)
In the case of a function F of a single variable, the assumption is that you have code f to evaluate F and fd say to evaluate F'. Then the test is, for various intervals [a,b] to look at the differences, which the fundamental theorem of calculus says should be 0,
Integral{ 0<=x<=b | fd(x)} - (f(b)-f(a))
with the integral being computed numerically. There is no need for the intervals to be small.
Part of the error will, of course, be due to the error in the numerical approximation to the integral. For this reason I tend to use, for example, and order 40 Gausss Legendre integrator.
For functions of several variables, you can test one variable at a time. For several functions, these can be tested one at a time.
I've found that these tests, which are of course by no means exhaustive, show up the kinds of mistakes that occur in computing derivatives quire readily.
Have you considered the usage of Complex step differentiation to check your gradient? See this description
I want to find the minimum of a function in python y = f(x)
Problem : the solver tries to compute the gradient with super close x values (delta x around 1e-8), and my function f is not sensitive to such a small step (ie we can see y vary when delta x around 1e-1).
Hence gradient is 0 to the solver, and can not find the proper solution.
I've tried following solvers from scipy, I can't find the option I'm looking for..
scipy.optimize.minimize
scipy.optimize.fmin
In Matlab fmincon , there is an option that does the job 'DiffMinChange' : Minimum change in variables for finite-difference gradients (a positive scalar).
You may want to try and use L-BFGS-B from scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
And provide the “epsilon” parameter to be around 0.1/0.05 and see if it makes it better. I am of course assuming that you will let the solver compute the gradient for you by numerical differentiation (I.e., you pass fprime=None and approx_grad=True) to the routine.
I personally despise the “minimize” interface to various solvers so I prefer to deal with the actual solvers themselves.
I'm trying to solve a first-order ODE in Python:
where Gamma and u are square matrices.
I don't explicitly know u(t) at all times, but I do know it at discrete timesteps from doing an earlier calculation.
Every example I found of Python's solvers online (e.g. this one for scipy.integrate.odeint and scipy.integrate.ode) know the expression for the derivative analytically as a function of time.
Is there a way to call these (or other differential equation solvers) without knowing an analytic expression for the derivative?
For now, I've written my own Runge-Kutta solver and jitted it with numba.
You can use any of the SciPy interpolation methods, such as interp1d, to create a callable function based on your discrete data, and pass it to odeint. Cubic spline interpolation,
f = interp1d(x, y, kind='cubic')
should be good enough.
Is there a way to call these (or other differential equation solvers) without knowing an analytic expression for the derivative?
Yes, none of the solvers you mentioned (nor most other solvers) require an analytic expression for the derivative. Instead they call a function you supply that has to evaluate the derivative for a given time and state. So, your code would roughly look something like:
def my_derivative(time,flat_Gamma):
Gamma = flat_Gamma.reshape(dim_1,dim_2)
u = get_u_from_time(time)
dGamma_dt = u.dot(Gamma)
return dGamma_dt.flatten()
from scipy.integrate import ode
my_integrator = ode(my_derivative)
…
The difficulty in your situation is rather that you have to ensure that get_u_from_time provides an appropriate result for every time with which it is called. Probably the most robust and easy solution is to use interpolation (see the other answer).
You can also try to match your integration steps to the data you have, but at least for scipy.integrate.odeint and scipy.integrate.ode this will be very tedious as all the integrators use internal steps that are inconvenient for this purpose. For example, the fifth-order Dormand–Prince method (DoPri5) uses internal steps of 1/5, 3/10, 4/5, 8/9, and 1. This means that if you have temporally equidistant data for u, you would need 90 data points for each integration step (as 1/90 is the greatest common divisor of the internal steps). The only integrator that could make this remotely feasible is the Bogacki–Shampine integrator (RK23) from cipy.integrate.solve_ivp with internal steps of 1/2, 3/4, and 1.
I am familiar with some of the functions in scipy.optimize.optimize and have in the past used fmin_cg to minimize a function where I knew the derivative. However, I now have a formula which is not easily differentiated.
Several of the functions in that module (fmin_cg, for instance) do not actually require the derivative to be provided. I assume that they then calculate a quazi-derivative by adding a small value to each of the parameters in turn - is that correct?
My main question is this: Which of the functions (or one from elsewhere) is the best to use when minimising a function over multiple parameters with no given derivative?
Yes, calling any of fmin_bfgs fmin_cg fmin_powell as
fmin_xx( func, x0, fprime=None, epsilon=.001 ... )
estimates the gradient at x by (func( x + epsilon I ) - func(x)) / epsilon.
Which is "best" for your application, though,
depends strongly on how smooth your function is, and how many variables.
Plain Nelder-Mead, fmin, is a good first choice -- slow but sure;
unfortunately the scipy Nelder-Mead starts off with a fixed-size simplex, .05 / .00025 regardless of the scale of x.
I've heard that fmin_tnc in scipy.optimize.tnc is good:
fmin_tnc( func, x0, approx_grad=True, epsilon=.001 ... ) or
fmin_tnc( func_and_grad, x0 ... ) # func, your own estimated gradient
(fmin_tnc is ~ fmin_ncg with bound constraints, nice messages to see what's happening, somewhat different args.)
I'm not too familiar with what's available in SciPy, but the Downhill Simplex method (aka Nelder-Mead or the Amoeba method) frequently works well for multidimensional optimization.
Looking now at the scipy documentation, it looks like it is available as an option in the minimize() function using the method='Nelder-Mead' argument.
Don't confuse it with the Simplex (Dantzig) algorithm for Linear Programming...