I am familiar with some of the functions in scipy.optimize.optimize and have in the past used fmin_cg to minimize a function where I knew the derivative. However, I now have a formula which is not easily differentiated.
Several of the functions in that module (fmin_cg, for instance) do not actually require the derivative to be provided. I assume that they then calculate a quazi-derivative by adding a small value to each of the parameters in turn - is that correct?
My main question is this: Which of the functions (or one from elsewhere) is the best to use when minimising a function over multiple parameters with no given derivative?
Yes, calling any of fmin_bfgs fmin_cg fmin_powell as
fmin_xx( func, x0, fprime=None, epsilon=.001 ... )
estimates the gradient at x by (func( x + epsilon I ) - func(x)) / epsilon.
Which is "best" for your application, though,
depends strongly on how smooth your function is, and how many variables.
Plain Nelder-Mead, fmin, is a good first choice -- slow but sure;
unfortunately the scipy Nelder-Mead starts off with a fixed-size simplex, .05 / .00025 regardless of the scale of x.
I've heard that fmin_tnc in scipy.optimize.tnc is good:
fmin_tnc( func, x0, approx_grad=True, epsilon=.001 ... ) or
fmin_tnc( func_and_grad, x0 ... ) # func, your own estimated gradient
(fmin_tnc is ~ fmin_ncg with bound constraints, nice messages to see what's happening, somewhat different args.)
I'm not too familiar with what's available in SciPy, but the Downhill Simplex method (aka Nelder-Mead or the Amoeba method) frequently works well for multidimensional optimization.
Looking now at the scipy documentation, it looks like it is available as an option in the minimize() function using the method='Nelder-Mead' argument.
Don't confuse it with the Simplex (Dantzig) algorithm for Linear Programming...
Related
I'm relatively new to using SciPy; I'm currently using it to minimize a cost function for a multi-layer-perceptron model. I can't use scikit-learn because I need to have the ability to set the coefficients (they are read-only in the MLPClassifer) and add random permutations and noise to any and all parameters. I haven't finished the implementation quite yet, but I am confused about the parameters required for the minimize function.
For example, I have a function that I have written to calculate the "cost" (energy to minimize) of the function, and it calculates the gradient at the same time. That's nothing special as it's common practice. However, when calling scipy.optimize.minimize, it asks for two different functions: one that returns the scalar that is to be minimized (i.e., the cost in my case) and one that calculates the gradient of the current state. Example:
j,grad = myCostFunction(X,y)
Unless I am mistaken, it seems that it would need to call my function twice, with each call needing to be specified to return either the cost or the gradient, like so:
opt = scipy.optimize.minimize(fun=myJFunction, jac=myGradFunction, args = args,...)
Isn't this a waste of computation time? My data set will be > 1 million samples and 10ish features, so reducing redundant computation would be preferred since I will be training and retraining this thing tens of thousands of times for my project.
Another point of confusion is with the args input. Are the arguments passed like this:
# This is what I expect happens
myJFunction(x0,*args)
myGradFunction(x0,*args)
or like this:
# This is what I wish it did
myJFunction(x0,arg0,arg1,arg2)
myGradFunction(x0,arg3,arg4,arg5)
Thanks in advance!
After doing some experimentation and searching, I found the answers to my own questions.
While I can't say for sure about the scipy.optimize.minimize function, using other optimization functions (for example, scipy.optimize.fmin_tnc) explicitly states that the callable function func can either (1) return both the energy and the gradient, (2) return the energy and specify the gradient function for that parameter fprime (slower), or (3) return only the energy and have the function estimate the gradient through perturbation (much slower).
See the docs here: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_tnc.html
I was very happy to see that I could use only one function to return both parameters. I assume it is the same case for the minimize function, but I have not tested it to be sure (See Edit 1)
As for my second question, if you specify two different functions, the *args parameters are passed to both functions the same; you cannot specify individual parameters for both.
EDIT 1: Reading through the minimize documentation more, if the parameter jac is set to True, then the optimizer assumes that the func returns energy and gradient. Reading the docs thoroughly is helpful, it seems.
I want to find the minimum of a function in python y = f(x)
Problem : the solver tries to compute the gradient with super close x values (delta x around 1e-8), and my function f is not sensitive to such a small step (ie we can see y vary when delta x around 1e-1).
Hence gradient is 0 to the solver, and can not find the proper solution.
I've tried following solvers from scipy, I can't find the option I'm looking for..
scipy.optimize.minimize
scipy.optimize.fmin
In Matlab fmincon , there is an option that does the job 'DiffMinChange' : Minimum change in variables for finite-difference gradients (a positive scalar).
You may want to try and use L-BFGS-B from scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
And provide the “epsilon” parameter to be around 0.1/0.05 and see if it makes it better. I am of course assuming that you will let the solver compute the gradient for you by numerical differentiation (I.e., you pass fprime=None and approx_grad=True) to the routine.
I personally despise the “minimize” interface to various solvers so I prefer to deal with the actual solvers themselves.
I am trying to find a root of an equation using Newton-Raphson provided by SciPy (scipy.optimize.newton).
At the moment I don't have the fprime values that the documentation advises to use, and as far as I am aware this means that the Secant method is being used to find the roots.
Since the Newton-Raphson method has faster convergence than the Secant method, my gut thinks that maybe I should numerically approximate fprime and provide it so that Newton's method is used.
Which one would generally lead to faster convergence / faster actual computing of my roots?
Just using scipy.optimize.newton without providing fprime (i.e. the Secant Method, or
Using numerical differentiation to compute fprime (e.g. with numpy.diff) and providing it to scipy.optimize.newton so that the Newton-Raphson method is used.
The book Numerical Recipes in C, 2nd edition, in section "9.4 Newton-Raphson Method Using Derivative" on page 365, states:
The Newton-Raphson formula can also be applied using a numerical
difference to approximate the true local derivative,
f'(x) ≈ (f(x + dx) - f(x)) / dx .
This is not, however, a recommended procedure for the following
reasons: (i) You are doing two function evaluations per step, so at
best the superlinear order of convergence will be only sqrt(2). (ii)
If you take dx too small you will be wiped out by roundoff, while if
you take it too large your order of convergence will be only linear,
no better than using the initial evaluation f'(x_0) for all
subsequent steps. Therefore, Newton-Raphson with numerical derivatives
is (in one dimension) always dominated by the secant method of section
9.2.
(That was edited to fit the limitations of this site.) Choosing another method to improve the accuracy of the numeric derivative would increase the number of function evaluations and would thus decrease the order of convergence even more. Therefore you should choose your first method, which ends up using the secant method to find a root.
I'm trying to solve a first-order ODE in Python:
where Gamma and u are square matrices.
I don't explicitly know u(t) at all times, but I do know it at discrete timesteps from doing an earlier calculation.
Every example I found of Python's solvers online (e.g. this one for scipy.integrate.odeint and scipy.integrate.ode) know the expression for the derivative analytically as a function of time.
Is there a way to call these (or other differential equation solvers) without knowing an analytic expression for the derivative?
For now, I've written my own Runge-Kutta solver and jitted it with numba.
You can use any of the SciPy interpolation methods, such as interp1d, to create a callable function based on your discrete data, and pass it to odeint. Cubic spline interpolation,
f = interp1d(x, y, kind='cubic')
should be good enough.
Is there a way to call these (or other differential equation solvers) without knowing an analytic expression for the derivative?
Yes, none of the solvers you mentioned (nor most other solvers) require an analytic expression for the derivative. Instead they call a function you supply that has to evaluate the derivative for a given time and state. So, your code would roughly look something like:
def my_derivative(time,flat_Gamma):
Gamma = flat_Gamma.reshape(dim_1,dim_2)
u = get_u_from_time(time)
dGamma_dt = u.dot(Gamma)
return dGamma_dt.flatten()
from scipy.integrate import ode
my_integrator = ode(my_derivative)
…
The difficulty in your situation is rather that you have to ensure that get_u_from_time provides an appropriate result for every time with which it is called. Probably the most robust and easy solution is to use interpolation (see the other answer).
You can also try to match your integration steps to the data you have, but at least for scipy.integrate.odeint and scipy.integrate.ode this will be very tedious as all the integrators use internal steps that are inconvenient for this purpose. For example, the fifth-order Dormand–Prince method (DoPri5) uses internal steps of 1/5, 3/10, 4/5, 8/9, and 1. This means that if you have temporally equidistant data for u, you would need 90 data points for each integration step (as 1/90 is the greatest common divisor of the internal steps). The only integrator that could make this remotely feasible is the Bogacki–Shampine integrator (RK23) from cipy.integrate.solve_ivp with internal steps of 1/2, 3/4, and 1.
I'm having troubles to minimize a complex non linear function in python. This function is actually the chiSquare of a fitting model used to fit experimental data. In order to get the global minimum, I'm using the basinhopping function in scipy. This function is a wrapper of the minimize() function that adds some perturbation to look for different local minima. Right now my problem is that it has troubles to find the local minima.
There are a bunch of solvers that can be used in minimize(), and since I'm using bounds I chose between 'L-BFGS-B', 'SLSQP' and 'TNC'. None of them really find local minima. Is there a method based on the popular Levenberg-Marquardt algorithm that can be use to minimize? Maybe this does not make sense otherwise it would be already implemented, but I can't understand why.
My original idea was actually to use the leastsqbound function(https://pypi.python.org/pypi/leastsqbound) that I know is very good at providing accurate covariance matrix despide the bounds, and include it in a larger algorithm that would look for global minima (like the basinhopping function). Do you know if something like this already exist?
Thanks a lot for you advices!
Scipy has a Levenberg-Marquardt implementation: scipy.optimize.leastsq. It does not have the right return type to use with minimize (and therefore basin_hopping). However, it appears this could be remedied fairly straightforwardly.
Though I have not run it, this should do the trick:
def leastsq_for_minimize( *args, **kwargs ):
results = leastsq( *args, **kwargs )
optimize_results = scipy.optimize.OptimizeResult()
# Some code here to correctly copy results to optimize results
return optimize_results
scipy.optimize.basinhopping(
# your arguments here
minimizer_kwargs=dict(method=leastsq_for_minimize),
)
minimize documentation
basinhopping documentation:
OptimizeResult documentation