I need to use Particle Swarm Optimization for the problem with a large set of variables (<=10 parameters to optimize).
f(p1, p2, ..., pn)
p1, p2, ..., pn - parameters to optimize.
The objective function is unknown, so the PSO is planned to apply.
For each parameter to optimize we have a set of boundaries - e.g. for p1:
p1_lb <= p1 <= p1_ub,
the same for all the rest.
We also have the initial guesses for all the parameters, e.g.
p1_lb <= p1_ig <= p1_ub
Please suggest a library in Python that can handle the task:
can work with the lists of input parameters of various sizes (e.g. we
can have 2 parameters to optimize or 10 parameters),
can be given the lists of boundaries for each parameter - upper and
lower boundaries so the search space is always limited,
can be provided with the initial guesses as well - because it's quite important to
provide the closest values we have - to increase the speed of
optimization.
I have tried to work with pyswarm but it seems that it can't use any initial guesses, at least I couldn't find any examples.
Related
I have two related fitting parameters. They have the same fitting range. Let's call them r1 and r2. I know I can limit the fitting range using minuit.limits, but I have an additional criteria that r2 has to be smaller than r1, can I do that in iminuit?
I've found this, I hope this can help you!
Extracted from: https://iminuit.readthedocs.io/en/stable/faq.html
**Can I have parameter limits that depend on each other (e.g. x^2 + y^2 < 3)?**¶
MINUIT was only designed to handle box constrains, meaning that the limits on the parameters are independent of each other and constant during the minimisation. If you want limits that depend on each other, you have three options (all with caveats), which are listed in increasing order of difficulty:
Change the variables so that the limits become independent. For example, transform from cartesian coordinates to polar coordinates for a circle. This is not always possible, of course.
Use another minimiser to locate the minimum which supports complex boundaries. The nlopt library and scipy.optimize have such minimisers. Once the minimum is found and if it is not near the boundary, place box constraints around the minimum and run iminuit to get the uncertainties (make sure that the box constraints are not too tight around the minimum). Neither nlopt nor scipy can give you the uncertainties.
Artificially increase the negative log-likelihood in the forbidden region. This is not as easy as it sounds.
The third method done properly is known as the interior point or barrier method. A glance at the Wikipedia article shows that one has to either run a series of minimisations with iminuit (and find a clever way of knowing when to stop) or implement this properly at the level of a Newton step, which would require changes to the complex and convoluted internals of MINUIT2.
Warning: you cannot just add a large value to the likelihood when the parameter boundary is violated. MIGRAD expects the likelihood function to be differential everywhere, because it uses the gradient of the likelihood to go downhill. The derivative at a discrete step is infinity and zero in the forbidden region. MIGRAD does not like this at all.
The fsolve function of scipy library has an optional input argument, diag which reads as follows:
diag: sequence, optional
N positive entries that serve as a scale factors for the variables.
My question is, what are these scale variables?
It appears that it determines the step taken by the algorithm in each iteration during the search for the roots. Is that correct? If yes, it can be very useful for better convergence of the algorithm. Then, how do I determine the scale? I know the lower and upper bounds for my variables.
I want to solve the following optimization problem with Python:
I have a black box function f with multiple variables as input.
The execution of the black box function is quite time consuming, therefore I would like to avoid a brute force approach.
I would like to find the optimum input parameters for that black box function f.
In the following, for simplicity I just write the dependency for one dimension x.
An optimum parameter x is defined as:
the cost function cost(x) is maximized with the sum of
f(x) value
a maximum standard deviation of f(x)
.
cost(x) = A * f(x) + B * max(standardDeviation(f(x)))
The parameters A and B are fix.
E.g., for the picture below, the value of x at the position 'U' would be preferred over the value of x at the positon of 'V'.
My question is:
Is there any easily adaptable framework or process that I could utilize (similar to e. g. simulated annealing or bayesian optimisation)?
As mentioned, I would like to avoid a brute force approach.
I’m still not 100% sure of your approach, but does this formula ring true to you:
A * max(f(x)) + B * max(standardDeviation(f(x)))
?
If it does, then I guess you may want to consider that maximizing f(x) may (or may not) be compatible with maximizing the standard deviation of f(x), which means you may be facing a multi-objective optimization problem.
Again, you haven’t specified what f(x) returns - is it a vector? I hope it is, otherwise I’m unclear on what you can calculate the standard deviation on.
The picture you posted is not so obvious to me. F(x) is the entire black curve, it has a maximum at the point v, but what can you say about the standard deviation? To calculate the standard deviation of you have to take into account the entire f(x) curve (including the point u), not just the neighborhood of u and v. If you only want to get the standard deviation in an interval around a maximum for f(x), then I think you’re out of luck when it comes to frameworks. The best thing that comes to my mind is to use a local (or maybe global, better) optimization algorithm to hunt for the maximum of f(x) - simulated annealing, differential evolution, tunnelling, and so on - and then, when you have found a maximum for f(x), sample a few points on the left and right of your optimum and calculate the standard deviation of these evaluations. Then you’ll have to decide if the combination of the maximum of f(x) and this standard deviation is good enough or not compared to any previous “optimal” point found.
This is all speculation, as I’m unsure that your problem is really an optimization one or simply a “peak finding” exercise, for which there are many different - and more powerful and adequate- methods.
Andrea.
my problem at hand: I am using scipy curve_fit to fit a curve (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html) but in many occasions, the parameters estimated for such curve refer to one of the local many "local" minima and not "global" minimum. Now this is to be expected given how curve_fit was designed. Still, I really need my global minimum.
In order to find it, my initial hunch would be to multiply initial starting points, run multiple curve_fit instances and choose the one with the lowest fit error but I would suffer from a number of biases in my personal initial guess estimates (also potentially the number of combinations could be considerable and this would be detrimental to performance).
Do you happen to know better, faster and/or methodologically sounder methods on how to proceed? (they do not need to pass for least squares, I can build ad hoc stuff if necessary)
There are a couple possible approaches. One would be to do a "brute force" search through your parameter space to find candidate starting points for the local solver in curve_fit. Another would be to use a global solver such as differential evolution. For sure, both of these can be much slower than a single curve_fit, but they do aim at finding "global minima". Within scipy.optimize, these methods are brute and differential_evolution, respectively. It should be noted neither of these is actually global optimizers, as they both require upper and lower bounds for the search space of all parameters. Still, within those boundaries, they do attempt to find the best result, not just a local minimum close to your starting values.
A common approach is to use brute with medium-sized steps for each parameter, then take the best ten of those and use Levenberg-Marquardt (from leastsq, as used in curve_fit) starting from each of these. Similarly, you can use differential_evolution and then refine.
You might find lmfit (https://lmfit.github.io/lmfit-py) helpful, as it allows you to set up the model once and switch between solvers, including
brute, differential_evolution, and leastsq. Lmfit also makes it easy to fix some parameters or place upper/lower bounds on some parameters. It also provides a higher-level interface to model building and curve-fitting, and methods to explore the confidence intervals of parameters.
I'm very new in quantitative and scientifical programming and I've run into the minimizer function of scipy scipy.optimize.fmin. Can someone explain me the basic intuition of this function for a non-engineering student?
Letz say I want to minimize following function:
def f(x): x**2
1) What does the minimizer actually minimize? The dependent or independent variable?
2) What's the difference between scipy.optimize.fmin and scipy.optimize.minimize?
Given a function which contains some unknown parameters (so actually it is a family of functions) and data, a minimizer tries to find parameters which minimize the distance of the function values to data. This is done colloquially speaking by iteratively adjusting the parameters until further change does not seem to improve the result.
This is the equivalent to the ball running down a hill, mentioned by #pylang in the comment. The "hill" is the distance to the data, given all possible parameter values. The rolling ball is the minimizer which "moves" over that landscape, trying out parameters until it is in a position where every move would lead to increasing distance to the data or at least no notable decrease.
Note however, that by this method you are searching for a local minimum of the function values to the data, given a set of parameters to the function. For a simple function like you posted, the local minimum is the only one and therefore the global one, but for complex functions involving many parameters this problem quickly can get quite tricky.
People then often use multiple runs of the minimizer to see if it stops at the same positions. If that is not the case, people say the mimimizer fails to converge, which means the function is too complex that one minimum is easily found. There are a lot of algorithms to counter that, simulated annealing or Monte Carlo methods come to my mind.
To your function: The function f which is mentioned in the example in the help of the fmin function is the distance function. It tells you how far a certain set of parameters puts you with respect to your target. Now you must define what distance means for you. Commonly, the sum of squared residuals (also called euclidean norm) is used:
sum((function values - data points)^2)
Say you have a function
def f(x, a, b): return a*x**2 + b
You want to find values for a and b such that your function comes as closely as possible to the data points given below with their respective x and y values:
datax = [ 0, 1, 2, 3, 4]
datay = [ 2, 3, 5, 9, 15]
Then if you use the euclidean norm, your distance function is (this is the function f in the fmin help)
def dist(params):
a, b = params
return sum((f(x,a,b) - y)**2 for x,y in zip(datax, datay))
You should be able (sorry, I have no scipy on my current machine, will test it tonight) to minimize to get fitting values of a and b using
import scipy.optimize
res = scipy.optimize.fmin(dist, x0 = (0,0))
Note that you need starting values x0 for your parameters a and b. These are the values which you choose randomly if you run the minimizer multiple times to see whether it converges.
Imagine a ball rolling down a hill. This is an optimization problem. The idea of the "minimizer" is to find the value that reduces the gradient/slope or derivative. In other words, you're finding where the ball eventually rests. Optimization problems can get more interesting and complicated, particularly if the ball rolls into a saddlepoint or local minimum (not the global minimum) or along a level ridge or plane, for which a minimum is nearly impossible to find. For example, consider the famous Rosenbrock function (see image), a 2D surface where finding the valley is simple, but locating the minimum is difficult.
The fmin and minimize functions are nearly equivalent for the simplex algorithm. However, fmin is less robust for complex functions. minimize is generalized for other algorithms, e.g. "Nelder-Mead" (simplex), "Powell", "CG", etc. These algorithms are just different approaches for "jostling" or helping the ball down the hill faster towards the minimum. Moreover, supplying a Jacobian and Hessian matrix as parameters to the minimum function improves computation efficiency. See the Scipy docs for more on how these functions are used.