How to effectively solve a compound cost function optimisation problem?

How to effectively solve a compound cost function optimisation problem? - python

I want to solve the following optimization problem with Python:
I have a black box function f with multiple variables as input.
The execution of the black box function is quite time consuming, therefore I would like to avoid a brute force approach.
I would like to find the optimum input parameters for that black box function f.
In the following, for simplicity I just write the dependency for one dimension x.
An optimum parameter x is defined as:
the cost function cost(x) is maximized with the sum of
f(x) value
a maximum standard deviation of f(x)
.
cost(x) = A * f(x) + B * max(standardDeviation(f(x)))
The parameters A and B are fix.
E.g., for the picture below, the value of x at the position 'U' would be preferred over the value of x at the positon of 'V'.
My question is:
Is there any easily adaptable framework or process that I could utilize (similar to e. g. simulated annealing or bayesian optimisation)?
As mentioned, I would like to avoid a brute force approach.

I’m still not 100% sure of your approach, but does this formula ring true to you:
A * max(f(x)) + B * max(standardDeviation(f(x)))
?
If it does, then I guess you may want to consider that maximizing f(x) may (or may not) be compatible with maximizing the standard deviation of f(x), which means you may be facing a multi-objective optimization problem.
Again, you haven’t specified what f(x) returns - is it a vector? I hope it is, otherwise I’m unclear on what you can calculate the standard deviation on.
The picture you posted is not so obvious to me. F(x) is the entire black curve, it has a maximum at the point v, but what can you say about the standard deviation? To calculate the standard deviation of you have to take into account the entire f(x) curve (including the point u), not just the neighborhood of u and v. If you only want to get the standard deviation in an interval around a maximum for f(x), then I think you’re out of luck when it comes to frameworks. The best thing that comes to my mind is to use a local (or maybe global, better) optimization algorithm to hunt for the maximum of f(x) - simulated annealing, differential evolution, tunnelling, and so on - and then, when you have found a maximum for f(x), sample a few points on the left and right of your optimum and calculate the standard deviation of these evaluations. Then you’ll have to decide if the combination of the maximum of f(x) and this standard deviation is good enough or not compared to any previous “optimal” point found.
This is all speculation, as I’m unsure that your problem is really an optimization one or simply a “peak finding” exercise, for which there are many different - and more powerful and adequate- methods.
Andrea.

Related

Checkgradient without solving optimization problem in MATLAB

I have a relatively complicated function and I have calculated the analytical form of the Jacobian of this function. However, sometimes, I mess up this Jacobian.
MATLAB has a nice way to check for the accuracy of the Jacobian when using some optimization technique as described here.
The problem though is that it looks like MATLAB solves the optimization problem and then returns if the Jacobian was correct or not. This is extremely time consuming, especially considering that some of my optimization problems take hours or even days to compute.
Python has a somewhat similar function in scipy as described here which just compares the analytical gradient with a finite difference approximation of the gradient for some user provided input.
Is there anything I can do to check the accuracy of the Jacobian in MATLAB without having to solve the entire optimization problem?

A laborious but useful method I've used for this sort of thing is to check that the (numerical) integral of the purported derivative is the difference of the function at the end points. I have found this more convenient than comparing fractions like (f(x+h)-f(x))/h with f'(x) because of the difficulty of choosing h so that on the one hand h is not so small that the fraction is not dominated by rounding error and on the other h is small enough that the fraction should be close to f'(x)
In the case of a function F of a single variable, the assumption is that you have code f to evaluate F and fd say to evaluate F'. Then the test is, for various intervals [a,b] to look at the differences, which the fundamental theorem of calculus says should be 0,
Integral{ 0<=x<=b | fd(x)} - (f(b)-f(a))
with the integral being computed numerically. There is no need for the intervals to be small.
Part of the error will, of course, be due to the error in the numerical approximation to the integral. For this reason I tend to use, for example, and order 40 Gausss Legendre integrator.
For functions of several variables, you can test one variable at a time. For several functions, these can be tested one at a time.
I've found that these tests, which are of course by no means exhaustive, show up the kinds of mistakes that occur in computing derivatives quire readily.

Have you considered the usage of Complex step differentiation to check your gradient? See this description

How to solve numerically a nonlinear ODE (a BVP) with singular points in python?

So, I have to solve numerically the following ODE y''+f(y)*(y')^2 = 0: ODE
originally between [y_i,y_f] where y = y(t) and y(0) = y_i. The main problem is that the function f(y) has a (regular) singular point (couldn't post the image).
Also, f(y) is obtained from long numerical calculations, so I have no analytical expression for it. The issue is that the singular point lies between y_i and y_f. So far, I could not find any method which helps me to solve this kind of problem. It doesn't matter if it can be solved as a BVP o IVP, but I need to cross the singularity.
What I have tried:
I approximated f(y) as -2/(y-y*), where y* is the singularity, and tried to solved the problem as a IVP using Runge-kutta, odeint and Runge-Kutta-Fehlberg method. But I cannot cross it.
I tried to cheat using RKF, as y tends to be constant I manually increased it, forcing it to cross the singular point, but then the solution increase to infinity, so not a valid solution.
Same as before, but using the numerically defined function f(y), not its approx.
The same but as BVP using the shooting method and solve_bvp from Python.

You can immediately integrate after dividing by y' to get ln(y')+F(y)=c or y'*exp(F(y))=C, F'=f, so that in principle you could compute t(y) via simple quadrature, and then get the solution y(t) via inversion of the function value table.
Your example integrates to y'=C*(y-y*)^2, y(t)=y(0)/(1-C*y(0)*t), which could also lead to a singularity in the solution formula due to the denominator. This means that even if a solution exists, you will have to start quite close to it, else the solution process might have to cross a surface of singular Jacobians or other impossibilities, and will fail to continue at this point.

When searching for minimum of a function, how to set minimum change in variables for finite-difference gradients?

I want to find the minimum of a function in python y = f(x)
Problem : the solver tries to compute the gradient with super close x values (delta x around 1e-8), and my function f is not sensitive to such a small step (ie we can see y vary when delta x around 1e-1).
Hence gradient is 0 to the solver, and can not find the proper solution.
I've tried following solvers from scipy, I can't find the option I'm looking for..
scipy.optimize.minimize
scipy.optimize.fmin
In Matlab fmincon , there is an option that does the job 'DiffMinChange' : Minimum change in variables for finite-difference gradients (a positive scalar).

You may want to try and use L-BFGS-B from scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
And provide the “epsilon” parameter to be around 0.1/0.05 and see if it makes it better. I am of course assuming that you will let the solver compute the gradient for you by numerical differentiation (I.e., you pass fprime=None and approx_grad=True) to the routine.
I personally despise the “minimize” interface to various solvers so I prefer to deal with the actual solvers themselves.

Intuition behind a minimizer

I'm very new in quantitative and scientifical programming and I've run into the minimizer function of scipy scipy.optimize.fmin. Can someone explain me the basic intuition of this function for a non-engineering student?
Letz say I want to minimize following function:
def f(x): x**2
1) What does the minimizer actually minimize? The dependent or independent variable?
2) What's the difference between scipy.optimize.fmin and scipy.optimize.minimize?

Given a function which contains some unknown parameters (so actually it is a family of functions) and data, a minimizer tries to find parameters which minimize the distance of the function values to data. This is done colloquially speaking by iteratively adjusting the parameters until further change does not seem to improve the result.
This is the equivalent to the ball running down a hill, mentioned by #pylang in the comment. The "hill" is the distance to the data, given all possible parameter values. The rolling ball is the minimizer which "moves" over that landscape, trying out parameters until it is in a position where every move would lead to increasing distance to the data or at least no notable decrease.
Note however, that by this method you are searching for a local minimum of the function values to the data, given a set of parameters to the function. For a simple function like you posted, the local minimum is the only one and therefore the global one, but for complex functions involving many parameters this problem quickly can get quite tricky.
People then often use multiple runs of the minimizer to see if it stops at the same positions. If that is not the case, people say the mimimizer fails to converge, which means the function is too complex that one minimum is easily found. There are a lot of algorithms to counter that, simulated annealing or Monte Carlo methods come to my mind.
To your function: The function f which is mentioned in the example in the help of the fmin function is the distance function. It tells you how far a certain set of parameters puts you with respect to your target. Now you must define what distance means for you. Commonly, the sum of squared residuals (also called euclidean norm) is used:
sum((function values - data points)^2)
Say you have a function
def f(x, a, b): return a*x**2 + b
You want to find values for a and b such that your function comes as closely as possible to the data points given below with their respective x and y values:
datax = [ 0, 1, 2, 3, 4]
datay = [ 2, 3, 5, 9, 15]
Then if you use the euclidean norm, your distance function is (this is the function f in the fmin help)
def dist(params):
a, b = params
return sum((f(x,a,b) - y)**2 for x,y in zip(datax, datay))
You should be able (sorry, I have no scipy on my current machine, will test it tonight) to minimize to get fitting values of a and b using
import scipy.optimize
res = scipy.optimize.fmin(dist, x0 = (0,0))
Note that you need starting values x0 for your parameters a and b. These are the values which you choose randomly if you run the minimizer multiple times to see whether it converges.

Imagine a ball rolling down a hill. This is an optimization problem. The idea of the "minimizer" is to find the value that reduces the gradient/slope or derivative. In other words, you're finding where the ball eventually rests. Optimization problems can get more interesting and complicated, particularly if the ball rolls into a saddlepoint or local minimum (not the global minimum) or along a level ridge or plane, for which a minimum is nearly impossible to find. For example, consider the famous Rosenbrock function (see image), a 2D surface where finding the valley is simple, but locating the minimum is difficult.
The fmin and minimize functions are nearly equivalent for the simplex algorithm. However, fmin is less robust for complex functions. minimize is generalized for other algorithms, e.g. "Nelder-Mead" (simplex), "Powell", "CG", etc. These algorithms are just different approaches for "jostling" or helping the ball down the hill faster towards the minimum. Moreover, supplying a Jacobian and Hessian matrix as parameters to the minimum function improves computation efficiency. See the Scipy docs for more on how these functions are used.

Find all roots of a nonlinear function

Let us assume I have a smooth, nonlinear function f: R^n -> R with the (known) maximum number of roots N. How can I find the roots efficiently? Right now I have calculated the function on the grid on a preselected area, refined the grid where the function is below a predefined threshold and continued that routine, but this does not seem to be very efficient, though, because I have noticed that it is difficult to select the area correctly before and to define the threshold accordingly.

there are several ways to go about this of course, scipy is known to contain the safest and most efficient method for finding a single root provided you know the interval:
scipy.optimize.brentq
to find more roots using some estimate you can use:
scipy.optimize.fsolve
de Moivre's formulae to use for root finding that is fairly quick in comparison to others (in case you would rather build your own method):
given a complex number
the n roots are given by:
where k varies over the integer values from 0 to n − 1.

You can square the function and use global optimization software to locate all the minima inside a domain and pick those with a zero value.
Stochastic multistart based global optimization methods with clustering are quite proper for this task.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to effectively solve a compound cost function optimisation problem? - python

Related

Checkgradient without solving optimization problem in MATLAB

How to solve numerically a nonlinear ODE (a BVP) with singular points in python?

When searching for minimum of a function, how to set minimum change in variables for finite-difference gradients?

Intuition behind a minimizer

Find all roots of a nonlinear function

Categories

Resources