I'm very new in quantitative and scientifical programming and I've run into the minimizer function of scipy scipy.optimize.fmin. Can someone explain me the basic intuition of this function for a non-engineering student?
Letz say I want to minimize following function:
def f(x): x**2
1) What does the minimizer actually minimize? The dependent or independent variable?
2) What's the difference between scipy.optimize.fmin and scipy.optimize.minimize?
Given a function which contains some unknown parameters (so actually it is a family of functions) and data, a minimizer tries to find parameters which minimize the distance of the function values to data. This is done colloquially speaking by iteratively adjusting the parameters until further change does not seem to improve the result.
This is the equivalent to the ball running down a hill, mentioned by #pylang in the comment. The "hill" is the distance to the data, given all possible parameter values. The rolling ball is the minimizer which "moves" over that landscape, trying out parameters until it is in a position where every move would lead to increasing distance to the data or at least no notable decrease.
Note however, that by this method you are searching for a local minimum of the function values to the data, given a set of parameters to the function. For a simple function like you posted, the local minimum is the only one and therefore the global one, but for complex functions involving many parameters this problem quickly can get quite tricky.
People then often use multiple runs of the minimizer to see if it stops at the same positions. If that is not the case, people say the mimimizer fails to converge, which means the function is too complex that one minimum is easily found. There are a lot of algorithms to counter that, simulated annealing or Monte Carlo methods come to my mind.
To your function: The function f which is mentioned in the example in the help of the fmin function is the distance function. It tells you how far a certain set of parameters puts you with respect to your target. Now you must define what distance means for you. Commonly, the sum of squared residuals (also called euclidean norm) is used:
sum((function values - data points)^2)
Say you have a function
def f(x, a, b): return a*x**2 + b
You want to find values for a and b such that your function comes as closely as possible to the data points given below with their respective x and y values:
datax = [ 0, 1, 2, 3, 4]
datay = [ 2, 3, 5, 9, 15]
Then if you use the euclidean norm, your distance function is (this is the function f in the fmin help)
def dist(params):
a, b = params
return sum((f(x,a,b) - y)**2 for x,y in zip(datax, datay))
You should be able (sorry, I have no scipy on my current machine, will test it tonight) to minimize to get fitting values of a and b using
import scipy.optimize
res = scipy.optimize.fmin(dist, x0 = (0,0))
Note that you need starting values x0 for your parameters a and b. These are the values which you choose randomly if you run the minimizer multiple times to see whether it converges.
Imagine a ball rolling down a hill. This is an optimization problem. The idea of the "minimizer" is to find the value that reduces the gradient/slope or derivative. In other words, you're finding where the ball eventually rests. Optimization problems can get more interesting and complicated, particularly if the ball rolls into a saddlepoint or local minimum (not the global minimum) or along a level ridge or plane, for which a minimum is nearly impossible to find. For example, consider the famous Rosenbrock function (see image), a 2D surface where finding the valley is simple, but locating the minimum is difficult.
The fmin and minimize functions are nearly equivalent for the simplex algorithm. However, fmin is less robust for complex functions. minimize is generalized for other algorithms, e.g. "Nelder-Mead" (simplex), "Powell", "CG", etc. These algorithms are just different approaches for "jostling" or helping the ball down the hill faster towards the minimum. Moreover, supplying a Jacobian and Hessian matrix as parameters to the minimum function improves computation efficiency. See the Scipy docs for more on how these functions are used.
Related
I'm attempting to write a simple implementation of the Newton-Raphson method, in Python. I've already done so using the SymPy library, however what I'm working on now will (ultimately) end up running in an environment where only Numpy is available.
For those unfamiliar with the algorithm, it works (in my case) as follows:
I have some a system of symbolic equations, which I "stack" to form a matrix F. The unknowns are X,Y,Z,T (which I wish to determine). Some additional values are initially unknown, until passed to my solver, which substitutes these known values for variables in the symbolic expressions.
Now, the Jacobian matrix (J) of F is computed. This, too, is a matrix of symbolic expressions.
Now, I iterate in some range (max_iter). With each iteration, I form a matrix A by substituing for the unknowns X,Y,Z,T in F current estimates (starting with some initial values). Similarly, I form a matrix b by substituting for X,Y,Z,T current estimates.
I then obtain new estimates by solving the matrix equation Ax = b for x. This vector x holds dT, dX, dY, dZ. I then add these to current estimates for T,X,Y,Z, and iterate again.
Thus far, I've found my largest issue to be computing the Jacobian matrix. I need only to do this once, however it will be different depending upon the coefficients fed to the solver (not unknowns, but only known once fed to the solver, so I can't simply hard-code the Jacobian).
While I'm not terribly familiar with Numpy, I know that it offers numpy.gradient. I'm not sure, however, that this is the same as SymPy's .jacobian.
How can the Jacobian matrix be found, either in "pure" Python, or with Numpy?
EDIT:
Should it be useful to you, more information on the problem can be found [here]. 1. It can be formulated a few different ways, however (as of now) I'm writing it as 4 equations of the form:
\sqrt{(X-x_i)^2+(Y-y_i)^2+(Z-z_i)^2 }= c * (t_i-T)
Where X,Y,Z and T are unknown.
This describes the solution to a localization problem, where we know (a) the location of n >= 4 observers in a 3-dimensional space, (b) the time at which each observer "saw" some signal, and (c) the velocity of the signal. The goal is to determine the coordinates of the signal source X,Y,Z (and, as a side effect, the time of emission, T).
Notice that I've tried (many) other approaches to solving this problem, and all leads point toward a combination of Newton-Raphson with regression.
I want to find the minimum of a function in python y = f(x)
Problem : the solver tries to compute the gradient with super close x values (delta x around 1e-8), and my function f is not sensitive to such a small step (ie we can see y vary when delta x around 1e-1).
Hence gradient is 0 to the solver, and can not find the proper solution.
I've tried following solvers from scipy, I can't find the option I'm looking for..
scipy.optimize.minimize
scipy.optimize.fmin
In Matlab fmincon , there is an option that does the job 'DiffMinChange' : Minimum change in variables for finite-difference gradients (a positive scalar).
You may want to try and use L-BFGS-B from scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
And provide the “epsilon” parameter to be around 0.1/0.05 and see if it makes it better. I am of course assuming that you will let the solver compute the gradient for you by numerical differentiation (I.e., you pass fprime=None and approx_grad=True) to the routine.
I personally despise the “minimize” interface to various solvers so I prefer to deal with the actual solvers themselves.
I want to solve the following optimization problem with Python:
I have a black box function f with multiple variables as input.
The execution of the black box function is quite time consuming, therefore I would like to avoid a brute force approach.
I would like to find the optimum input parameters for that black box function f.
In the following, for simplicity I just write the dependency for one dimension x.
An optimum parameter x is defined as:
the cost function cost(x) is maximized with the sum of
f(x) value
a maximum standard deviation of f(x)
.
cost(x) = A * f(x) + B * max(standardDeviation(f(x)))
The parameters A and B are fix.
E.g., for the picture below, the value of x at the position 'U' would be preferred over the value of x at the positon of 'V'.
My question is:
Is there any easily adaptable framework or process that I could utilize (similar to e. g. simulated annealing or bayesian optimisation)?
As mentioned, I would like to avoid a brute force approach.
I’m still not 100% sure of your approach, but does this formula ring true to you:
A * max(f(x)) + B * max(standardDeviation(f(x)))
?
If it does, then I guess you may want to consider that maximizing f(x) may (or may not) be compatible with maximizing the standard deviation of f(x), which means you may be facing a multi-objective optimization problem.
Again, you haven’t specified what f(x) returns - is it a vector? I hope it is, otherwise I’m unclear on what you can calculate the standard deviation on.
The picture you posted is not so obvious to me. F(x) is the entire black curve, it has a maximum at the point v, but what can you say about the standard deviation? To calculate the standard deviation of you have to take into account the entire f(x) curve (including the point u), not just the neighborhood of u and v. If you only want to get the standard deviation in an interval around a maximum for f(x), then I think you’re out of luck when it comes to frameworks. The best thing that comes to my mind is to use a local (or maybe global, better) optimization algorithm to hunt for the maximum of f(x) - simulated annealing, differential evolution, tunnelling, and so on - and then, when you have found a maximum for f(x), sample a few points on the left and right of your optimum and calculate the standard deviation of these evaluations. Then you’ll have to decide if the combination of the maximum of f(x) and this standard deviation is good enough or not compared to any previous “optimal” point found.
This is all speculation, as I’m unsure that your problem is really an optimization one or simply a “peak finding” exercise, for which there are many different - and more powerful and adequate- methods.
Andrea.
I am trying to fit a simulation to empirical data, given the number of parameters of the model brute force is impossible. What are the ressources available to fit a simulation?
The simulations is a python fonction (not to be be mistaken with math function) which outputs a list. I want this list to be as close as possible to an other list (empirical data).
I don't think scipy.optimize works well because it is not a mathematical function but a simulation (impossible to give a function of it). Brute force will require about 5000 simulations which is impractical.
def sim(conta = 0.2, recov = 0.01, D = 600, sig = 3, risk_aversion = 0.05, over_conf = 0.05, power_narr = 5, length = 125, n_k = 0.997, shocks = [8]+[0]*5+[8]+[0]*5+[15]+[0]*5+[40]+[0]*5+[40]+[0]*5, no_len = 25, u = [0.35,0.35,0.15,0.15], w = [1,1,0.1,0.1], ø = 0.9 ):
"""those are the parameters of the simulation, some are floats, others lists"""
"""
simulation going on here
"""
return my_list
i want to make this list fit another list by varying the parameters
I expect the output of this fit to be the list of the best parameters of the simulation.
Of course you can use scipy optimize, and actually many other much more robust libraries than the ones mentioned in other answers, such as Mystic (https://github.com/uqfoundation/mystic) or lmfit (https://github.com/lmfit/lmfit-py), just to mention a couple.
Whether your objective function is a mathematical function, a result from a simulation, an output of an external program or the outcome of the French-fries making machine is irrelevant. The only questions you should ask yourself are:
Is expensive is it to evaluate the objective function? If it takes half a second, then even brute force is ok. I have run scipy and Mystic against external programs (reservoir simulators) and each function evaluation can easily require hours.
How likely is it that your objective function has many, many local optima?
The answers to these two questions may steer you towards a specific set of solvers: I.e., local optimization (faster but you run the risk of getting a local minimum) or global optimization (takes more function evaluations to explore the search space but may give you a better fit).
That said, your objective function can easily be rewritten to make it a target for an optimization algorithm:
def sim(x, my_target_array):
# calculation stuff here
return ((numpy.array(my_list) - my_target_array)**2).sum()
Andrea.
You can use stochastic methods:
Stochastic gradient descent (an example)
MCMC Markov Chain Monte Carlo with PyMC3 (an example)
In any case you need a parametric model. It will be helpful shurely when you write it into your question.
I am trying to implement least squares:
I have: $y=\theta\omega$
The least square solution is \omega=(\theta^{T}\theta)^{-1}\theta^{T}y
I tryied:
import numpy as np
def least_squares1(y, tx):
"""calculate the least squares solution."""
w = np.dot(np.linalg.inv(np.dot(tx.T,tx)), np.dot(tx.T,y))
return w
The problem is that this method becomes quickly unstable
(for small problems its okay)
I realized that, when I compared the result to this least square calculation:
import numpy as np
def least_squares2(y, tx):
"""calculate the least squares solution."""
a = tx.T.dot(tx)
b = tx.T.dot(y)
return np.linalg.solve(a, b)
Compare both methods:
I tried to fit data with a polynomial of degree 12 [1, x,x^2,x^3,x^4...,x^12]
First method:
Second method:
Do you know why the first method diverges for large polynomials ?
P.S. I only added "import numpy as np" for your convinience, if you want to test the functions.
There are three points here:
One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.
The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq
The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials
I am going to improve on what was said before. I answered this yesterday.
The problem with higher order polynomials is something called Runge's phenomena. The reason why the person resorted orthogonal polynomials which are known as Hermite polynomials is that they attempt to get rid of the Gibbs phenomenon which is an adverse oscillatory effect when Fourier series methods are applied to non-periodic signals.
You can sometimes improve under the conditioning be resorting to regularizing methods if the matrix is low rank as I did in the other post. Other parts may be due to smoothness properties of the vector.