I am trying to fit a simulation to empirical data, given the number of parameters of the model brute force is impossible. What are the ressources available to fit a simulation?
The simulations is a python fonction (not to be be mistaken with math function) which outputs a list. I want this list to be as close as possible to an other list (empirical data).
I don't think scipy.optimize works well because it is not a mathematical function but a simulation (impossible to give a function of it). Brute force will require about 5000 simulations which is impractical.
def sim(conta = 0.2, recov = 0.01, D = 600, sig = 3, risk_aversion = 0.05, over_conf = 0.05, power_narr = 5, length = 125, n_k = 0.997, shocks = [8]+[0]*5+[8]+[0]*5+[15]+[0]*5+[40]+[0]*5+[40]+[0]*5, no_len = 25, u = [0.35,0.35,0.15,0.15], w = [1,1,0.1,0.1], ΓΈ = 0.9 ):
"""those are the parameters of the simulation, some are floats, others lists"""
"""
simulation going on here
"""
return my_list
i want to make this list fit another list by varying the parameters
I expect the output of this fit to be the list of the best parameters of the simulation.
Of course you can use scipy optimize, and actually many other much more robust libraries than the ones mentioned in other answers, such as Mystic (https://github.com/uqfoundation/mystic) or lmfit (https://github.com/lmfit/lmfit-py), just to mention a couple.
Whether your objective function is a mathematical function, a result from a simulation, an output of an external program or the outcome of the French-fries making machine is irrelevant. The only questions you should ask yourself are:
Is expensive is it to evaluate the objective function? If it takes half a second, then even brute force is ok. I have run scipy and Mystic against external programs (reservoir simulators) and each function evaluation can easily require hours.
How likely is it that your objective function has many, many local optima?
The answers to these two questions may steer you towards a specific set of solvers: I.e., local optimization (faster but you run the risk of getting a local minimum) or global optimization (takes more function evaluations to explore the search space but may give you a better fit).
That said, your objective function can easily be rewritten to make it a target for an optimization algorithm:
def sim(x, my_target_array):
# calculation stuff here
return ((numpy.array(my_list) - my_target_array)**2).sum()
Andrea.
You can use stochastic methods:
Stochastic gradient descent (an example)
MCMC Markov Chain Monte Carlo with PyMC3 (an example)
In any case you need a parametric model. It will be helpful shurely when you write it into your question.
Related
I'm relatively new to using SciPy; I'm currently using it to minimize a cost function for a multi-layer-perceptron model. I can't use scikit-learn because I need to have the ability to set the coefficients (they are read-only in the MLPClassifer) and add random permutations and noise to any and all parameters. I haven't finished the implementation quite yet, but I am confused about the parameters required for the minimize function.
For example, I have a function that I have written to calculate the "cost" (energy to minimize) of the function, and it calculates the gradient at the same time. That's nothing special as it's common practice. However, when calling scipy.optimize.minimize, it asks for two different functions: one that returns the scalar that is to be minimized (i.e., the cost in my case) and one that calculates the gradient of the current state. Example:
j,grad = myCostFunction(X,y)
Unless I am mistaken, it seems that it would need to call my function twice, with each call needing to be specified to return either the cost or the gradient, like so:
opt = scipy.optimize.minimize(fun=myJFunction, jac=myGradFunction, args = args,...)
Isn't this a waste of computation time? My data set will be > 1 million samples and 10ish features, so reducing redundant computation would be preferred since I will be training and retraining this thing tens of thousands of times for my project.
Another point of confusion is with the args input. Are the arguments passed like this:
# This is what I expect happens
myJFunction(x0,*args)
myGradFunction(x0,*args)
or like this:
# This is what I wish it did
myJFunction(x0,arg0,arg1,arg2)
myGradFunction(x0,arg3,arg4,arg5)
Thanks in advance!
After doing some experimentation and searching, I found the answers to my own questions.
While I can't say for sure about the scipy.optimize.minimize function, using other optimization functions (for example, scipy.optimize.fmin_tnc) explicitly states that the callable function func can either (1) return both the energy and the gradient, (2) return the energy and specify the gradient function for that parameter fprime (slower), or (3) return only the energy and have the function estimate the gradient through perturbation (much slower).
See the docs here: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_tnc.html
I was very happy to see that I could use only one function to return both parameters. I assume it is the same case for the minimize function, but I have not tested it to be sure (See Edit 1)
As for my second question, if you specify two different functions, the *args parameters are passed to both functions the same; you cannot specify individual parameters for both.
EDIT 1: Reading through the minimize documentation more, if the parameter jac is set to True, then the optimizer assumes that the func returns energy and gradient. Reading the docs thoroughly is helpful, it seems.
I'm very new in quantitative and scientifical programming and I've run into the minimizer function of scipy scipy.optimize.fmin. Can someone explain me the basic intuition of this function for a non-engineering student?
Letz say I want to minimize following function:
def f(x): x**2
1) What does the minimizer actually minimize? The dependent or independent variable?
2) What's the difference between scipy.optimize.fmin and scipy.optimize.minimize?
Given a function which contains some unknown parameters (so actually it is a family of functions) and data, a minimizer tries to find parameters which minimize the distance of the function values to data. This is done colloquially speaking by iteratively adjusting the parameters until further change does not seem to improve the result.
This is the equivalent to the ball running down a hill, mentioned by #pylang in the comment. The "hill" is the distance to the data, given all possible parameter values. The rolling ball is the minimizer which "moves" over that landscape, trying out parameters until it is in a position where every move would lead to increasing distance to the data or at least no notable decrease.
Note however, that by this method you are searching for a local minimum of the function values to the data, given a set of parameters to the function. For a simple function like you posted, the local minimum is the only one and therefore the global one, but for complex functions involving many parameters this problem quickly can get quite tricky.
People then often use multiple runs of the minimizer to see if it stops at the same positions. If that is not the case, people say the mimimizer fails to converge, which means the function is too complex that one minimum is easily found. There are a lot of algorithms to counter that, simulated annealing or Monte Carlo methods come to my mind.
To your function: The function f which is mentioned in the example in the help of the fmin function is the distance function. It tells you how far a certain set of parameters puts you with respect to your target. Now you must define what distance means for you. Commonly, the sum of squared residuals (also called euclidean norm) is used:
sum((function values - data points)^2)
Say you have a function
def f(x, a, b): return a*x**2 + b
You want to find values for a and b such that your function comes as closely as possible to the data points given below with their respective x and y values:
datax = [ 0, 1, 2, 3, 4]
datay = [ 2, 3, 5, 9, 15]
Then if you use the euclidean norm, your distance function is (this is the function f in the fmin help)
def dist(params):
a, b = params
return sum((f(x,a,b) - y)**2 for x,y in zip(datax, datay))
You should be able (sorry, I have no scipy on my current machine, will test it tonight) to minimize to get fitting values of a and b using
import scipy.optimize
res = scipy.optimize.fmin(dist, x0 = (0,0))
Note that you need starting values x0 for your parameters a and b. These are the values which you choose randomly if you run the minimizer multiple times to see whether it converges.
Imagine a ball rolling down a hill. This is an optimization problem. The idea of the "minimizer" is to find the value that reduces the gradient/slope or derivative. In other words, you're finding where the ball eventually rests. Optimization problems can get more interesting and complicated, particularly if the ball rolls into a saddlepoint or local minimum (not the global minimum) or along a level ridge or plane, for which a minimum is nearly impossible to find. For example, consider the famous Rosenbrock function (see image), a 2D surface where finding the valley is simple, but locating the minimum is difficult.
The fmin and minimize functions are nearly equivalent for the simplex algorithm. However, fmin is less robust for complex functions. minimize is generalized for other algorithms, e.g. "Nelder-Mead" (simplex), "Powell", "CG", etc. These algorithms are just different approaches for "jostling" or helping the ball down the hill faster towards the minimum. Moreover, supplying a Jacobian and Hessian matrix as parameters to the minimum function improves computation efficiency. See the Scipy docs for more on how these functions are used.
I'm using scipy.optimize.minimize to find the minimum of a 4D function that is rather sensitive to the initial guess used. If I vary it a little bit, the solution will change considerably.
There are many questions similar to this one already in SO (e.g.: 1, 2, 3), but no real answer.
In an old question of mine, one of the developers of the zunzun.com site (apparently no longer online) explained how they managed this:
Zunzun.com uses the Differential Evolution genetic algorithm (DE) to find initial parameter estimates which are then passed to the Levenberg-Marquardt solver in scipy. DE is not actually used as a global optimizer per se, but rather as an "initial parameter guesser".
The closest I've found to this algorithm is this answer where a for block is used to call the minimizing function many times with random initial guesses. This generates multiple minimized solutions, and finally the best (smallest value) one is picked.
Is there something like what the zunzun dev described already implemented in Python?
There is no general answer for such question, as a problem of minimizing arbitrary function is impossible to solve. You can do better or worse on particular classes of functions, thus it is rather a domain for mathematician, to analyze how your function probably looks like.
Obviously you can also work with dozens of so called "meta optimizers", which are just bunch of heuristics, which might (or not) work for you particular application. Those include random sampling starting point in a loop, using genetic algorithms, or - which is as far as I know most mathematically justified approach - using Bayesian optimization. In general the idea is to model your function in the same time when you try to minimize it, this way you can make informed guess where to start next time (which is level of abstraction higher than random guessing or using genetic algorithms/differential evolution). Thus, I would order these methods in following way
grid search / random sampling - uses no information from previous runs, thus - worst results
genetic approach, evolutionary, basin-hooping, annealing - use information from previous runs as a (x, f(x)) pairs, for limited period of time (generations) - thus average results
Bayesian optimization (and similar methods) - use information from all previous experiences through modeling of the underlying function and performing sampling selection based on expected improvement - best results (at the cost of most complex methods)
I'm looking to minimize a function with potentially random outputs. Traditionally, I would use something from the scipy.optimize library, but I'm not sure if it'll still work if the outputs are not deterministic.
Here's a minimal example of the problem I'm working with:
def myfunction(self, a):
noise = random.gauss(0, 1)
return abs(a + noise)
Any thoughts on how to algorithmicly minimizes its expected (or average) value?
A numerical approximation would be fine, as long as it can get "relatively" close to the actual value.
We already reduced noise by averaging over many possible runs, but the function is a bit computationally expensive and we don't want to do more averaging if we can help it.
It turns out that for our application using scipy.optimize anneal algorithm provided a good enough estimate of the local maximum.
For more complex problems, pjs points out that Waeber, Frazier and Henderson (2011) link provides a better solution.
I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.