Custom minimizer based on Levenberg-Marquardt in scipy.optimize.basinhopping - python

I'm having troubles to minimize a complex non linear function in python. This function is actually the chiSquare of a fitting model used to fit experimental data. In order to get the global minimum, I'm using the basinhopping function in scipy. This function is a wrapper of the minimize() function that adds some perturbation to look for different local minima. Right now my problem is that it has troubles to find the local minima.
There are a bunch of solvers that can be used in minimize(), and since I'm using bounds I chose between 'L-BFGS-B', 'SLSQP' and 'TNC'. None of them really find local minima. Is there a method based on the popular Levenberg-Marquardt algorithm that can be use to minimize? Maybe this does not make sense otherwise it would be already implemented, but I can't understand why.
My original idea was actually to use the leastsqbound function(https://pypi.python.org/pypi/leastsqbound) that I know is very good at providing accurate covariance matrix despide the bounds, and include it in a larger algorithm that would look for global minima (like the basinhopping function). Do you know if something like this already exist?
Thanks a lot for you advices!

Scipy has a Levenberg-Marquardt implementation: scipy.optimize.leastsq. It does not have the right return type to use with minimize (and therefore basin_hopping). However, it appears this could be remedied fairly straightforwardly.
Though I have not run it, this should do the trick:
def leastsq_for_minimize( *args, **kwargs ):
results = leastsq( *args, **kwargs )
optimize_results = scipy.optimize.OptimizeResult()
# Some code here to correctly copy results to optimize results
return optimize_results
scipy.optimize.basinhopping(
# your arguments here
minimizer_kwargs=dict(method=leastsq_for_minimize),
)
minimize documentation
basinhopping documentation:
OptimizeResult documentation

Related

Scipy.Optimize.Minimize inefficient? Double calls to cost/gradient function

I'm relatively new to using SciPy; I'm currently using it to minimize a cost function for a multi-layer-perceptron model. I can't use scikit-learn because I need to have the ability to set the coefficients (they are read-only in the MLPClassifer) and add random permutations and noise to any and all parameters. I haven't finished the implementation quite yet, but I am confused about the parameters required for the minimize function.
For example, I have a function that I have written to calculate the "cost" (energy to minimize) of the function, and it calculates the gradient at the same time. That's nothing special as it's common practice. However, when calling scipy.optimize.minimize, it asks for two different functions: one that returns the scalar that is to be minimized (i.e., the cost in my case) and one that calculates the gradient of the current state. Example:
j,grad = myCostFunction(X,y)
Unless I am mistaken, it seems that it would need to call my function twice, with each call needing to be specified to return either the cost or the gradient, like so:
opt = scipy.optimize.minimize(fun=myJFunction, jac=myGradFunction, args = args,...)
Isn't this a waste of computation time? My data set will be > 1 million samples and 10ish features, so reducing redundant computation would be preferred since I will be training and retraining this thing tens of thousands of times for my project.
Another point of confusion is with the args input. Are the arguments passed like this:
# This is what I expect happens
myJFunction(x0,*args)
myGradFunction(x0,*args)
or like this:
# This is what I wish it did
myJFunction(x0,arg0,arg1,arg2)
myGradFunction(x0,arg3,arg4,arg5)
Thanks in advance!
After doing some experimentation and searching, I found the answers to my own questions.
While I can't say for sure about the scipy.optimize.minimize function, using other optimization functions (for example, scipy.optimize.fmin_tnc) explicitly states that the callable function func can either (1) return both the energy and the gradient, (2) return the energy and specify the gradient function for that parameter fprime (slower), or (3) return only the energy and have the function estimate the gradient through perturbation (much slower).
See the docs here: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_tnc.html
I was very happy to see that I could use only one function to return both parameters. I assume it is the same case for the minimize function, but I have not tested it to be sure (See Edit 1)
As for my second question, if you specify two different functions, the *args parameters are passed to both functions the same; you cannot specify individual parameters for both.
EDIT 1: Reading through the minimize documentation more, if the parameter jac is set to True, then the optimizer assumes that the func returns energy and gradient. Reading the docs thoroughly is helpful, it seems.

How to disable the local minimization process in scipy.optimize.basinhopping?

I am using scipy.optimize.basinhopping for finding the minima of a scalar function. I wonder whether it is possible to disable the local minimization part of scipy.optimize.basinhopping? As we can see from the output message below, minimization_failures and nit are nearly the same, indicating that the local minimization part may be useless for the global optimization process of basinhopping --- reason why I would like to disable the local minimization part, for the sake of efficiency.
You can avoid running the minimizer by using a custom minimizer that does nothing.
See the discussion on "Custom minimizers" in the documentation of minimize():
**Custom minimizers**
It may be useful to pass a custom minimization method, for example
when using a frontend to this method such as `scipy.optimize.basinhopping`
or a different library. You can simply pass a callable as the ``method``
parameter.
The callable is called as ``method(fun, x0, args, **kwargs, **options)``
where ``kwargs`` corresponds to any other parameters passed to `minimize`
(such as `callback`, `hess`, etc.), except the `options` dict, which has
its contents also passed as `method` parameters pair by pair. Also, if
`jac` has been passed as a bool type, `jac` and `fun` are mangled so that
`fun` returns just the function values and `jac` is converted to a function
returning the Jacobian. The method shall return an ``OptimizeResult``
object.
The provided `method` callable must be able to accept (and possibly ignore)
arbitrary parameters; the set of parameters accepted by `minimize` may
expand in future versions and then these parameters will be passed to
the method. You can find an example in the scipy.optimize tutorial.
Basically, you need to write a custom function that returns an OptimizeResult and pass it to basinhopping via the method part of minimizer_kwargs, for example
from scipy.optimize import OptimizeResult
def noop_min(fun, x0, args, **options):
return OptimizeResult(x=x0, fun=fun(x0), success=True, nfev=1)
...
sol = basinhopping(..., minimizer_kwargs=dict(method=noop_min))
Note: I don't know how skipping local minimization affects the convergence properties of the basinhopping algorithm.
You can use minimizer_kwargs to specify to minimize() what options your prefer to the local minimization step. See the dedicated part of the docs.
It is then up to what type of solver you ask minimize for. You can try setting a larger tol to make the local minimization step terminate earlier.
EDIT, in reply to the comment "What if I want to disable the local minimization part completely?"
The basinhopping algorithm from the docs works like:
The algorithm is iterative with each cycle composed of the following
features
random perturbation of the coordinates
local minimization accept or
reject the new coordinates based on the minimized function value
If the above is accurate there is no way to skip the local minimization step entirely, because its output is required by the algorithm to proceed further, i.e. keep or discard the new coordinate. However, I am not an expert of this algorithm.

parameter within an interval while optimizing

Usually I use Mathematica, but now trying to shift to python, so this question might be a trivial one, so I am sorry about that.
Anyways, is there any built-in function in python which is similar to the function named Interval[{min,max}] in Mathematica ? link is : http://reference.wolfram.com/language/ref/Interval.html
What I am trying to do is, I have a function and I am trying to minimize it, but it is a constrained minimization, by that I mean, the parameters of the function are only allowed within some particular interval.
For a very simple example, lets say f(x) is a function with parameter x and I am looking for the value of x which minimizes the function but x is constrained within an interval (min,max) . [ Obviously the actual problem is just not one-dimensional rather multi-dimensional optimization, so different paramters may have different intervals. ]
Since it is an optimization problem, so ofcourse I do not want to pick the paramter randomly from an interval.
Any help will be highly appreciated , thanks!
If it's a highly non-linear problem, you'll need to use an algorithm such as the Generalized Reduced Gradient (GRG) Method.
The idea of the generalized reduced gradient algorithm (GRG) is to solve a sequence of subproblems, each of which uses a linear approximation of the constraints. (Ref)
You'll need to ensure that certain conditions known as the KKT conditions are met, etc. but for most continuous problems with reasonable constraints, you'll be able to apply this algorithm.
This is a good reference for such problems with a few examples provided. Ref. pg. 104.
Regarding implementation:
While I am not familiar with Python, I have built solver libraries in C++ using templates as well as using function pointers so you can pass on functions (for the objective as well as constraints) as arguments to the solver and you'll get your result - hopefully in polynomial time for convex problems or in cases where the initial values are reasonable.
If an ability to do that exists in Python, it shouldn't be difficult to build a generalized GRG solver.
The Python Solution:
Edit: Here is the python solution to your problem: Python constrained non-linear optimization

Python function minimisation without derivative

I am familiar with some of the functions in scipy.optimize.optimize and have in the past used fmin_cg to minimize a function where I knew the derivative. However, I now have a formula which is not easily differentiated.
Several of the functions in that module (fmin_cg, for instance) do not actually require the derivative to be provided. I assume that they then calculate a quazi-derivative by adding a small value to each of the parameters in turn - is that correct?
My main question is this: Which of the functions (or one from elsewhere) is the best to use when minimising a function over multiple parameters with no given derivative?
Yes, calling any of fmin_bfgs fmin_cg fmin_powell as
fmin_xx( func, x0, fprime=None, epsilon=.001 ... )
estimates the gradient at x by (func( x + epsilon I ) - func(x)) / epsilon.
Which is "best" for your application, though,
depends strongly on how smooth your function is, and how many variables.
Plain Nelder-Mead, fmin, is a good first choice -- slow but sure;
unfortunately the scipy Nelder-Mead starts off with a fixed-size simplex, .05 / .00025 regardless of the scale of x.
I've heard that fmin_tnc in scipy.optimize.tnc is good:
fmin_tnc( func, x0, approx_grad=True, epsilon=.001 ... ) or
fmin_tnc( func_and_grad, x0 ... ) # func, your own estimated gradient
(fmin_tnc is ~ fmin_ncg with bound constraints, nice messages to see what's happening, somewhat different args.)
I'm not too familiar with what's available in SciPy, but the Downhill Simplex method (aka Nelder-Mead or the Amoeba method) frequently works well for multidimensional optimization.
Looking now at the scipy documentation, it looks like it is available as an option in the minimize() function using the method='Nelder-Mead' argument.
Don't confuse it with the Simplex (Dantzig) algorithm for Linear Programming...

SciPy global minimum curve fit

I'm using scipy.optimize.curve_fit, but I suspect it is converging to a local minimum and not the global minimum.
I tried using simulated annealing in the following way:
def fit(params):
return np.sum((ydata - specf(xdata,*params))**2)
p = scipy.optimize.anneal(fit,[1000,1E-10])
where specf is the curve I am trying to fit. The results in p though are clearly worse than the minimum returned by curve_fit even when the return value indicates the global minimum was reached (see anneal).
How can I improve the results? Is there a global curve fitter in SciPy?
You're right, it only converges towards a local minimum (when it converges) since it uses the Levenburg-Marquardt algorithm. There is no global curve fitter in SciPy, you have to write you own using the existing global optimizers . But be aware, that this still don't have to converge to the value you want. That's impossible in most cases.
The only method to improve your result is to guess the starting parameters quite well.
You might want to try using leastsq() (curve_fit actually uses this, but you dont get the full output) or the ODR package instead of curve_fit.
The full output of leastsq() gives you a lot more information, such as the chisquared value (if you want to use that as a quick and dirty goodness of fit test).
If you need to weight the fit you can just that this way:
fitfunc = lambda p,x: p[0]+ p[1]*exp(-x)
errfunc = lambda p, x, y, xerr: (y-fitfunc(p,x))/xerr
out = leastsq(errfunc, pinit, args=(x,y, xerr), full_output=1)
chisq=sum(infodict['fvec']*infodict['fvec'])
This is a nontrivial problem. Have you considered using Evolutionary Strategies? I have had great success with ecspy (see http://code.google.com/p/ecspy/) and the community is small but very helpful.

Categories

Resources