Optimization in scipy with shared computation in constraints - python

I am trying to do non-linear constrained optimization of an objective function in scipy.
My problem is that I have many constraints that share intermediate results. Something like:
def constraint1_i(x):
T = f_i(x)
return g(T)
def constraint2_j(x):
T = f_j(x)
return h(T)
where i and j run through 1 to n.
In other words, in each constraint, I run f on x to get an intermediate value needed to compute the constraint. The same thing gets run twice unnecessarily (when i==j, for all i,j)!
Is there a way to somehow share computations across constraints to avoid the double calculation?
Note: this question is somewhat similar, but is more specific (and also has no answer).

Related

How to obtain multiple solutions of a binary LP problem using Google's OR-Tools in Python?

I am new to integer optimization. I am trying to solve the following large (although not that large) binary linear optimization problem:
max_{x} x_1+x_2+...+x_n
subject to: A*x <= b ; x_i is binary for all i=1,...,n
As you can see,
. the control variable is a vector x of lengh, say, n=150; x_i is binary for all i=1,...,n
. I want to maximize the sum of the x_i's
. in the constraint, A is an nxn matrix and b is an nx1 vector. So I have n=150 linear inequality constraints.
I want to obtain a certain number of solutions, NS. Say, NS=100. (I know there is more than one solution, and there are potentially millions of them.)
I am using Google's OR-Tools for Python. I was able to write the problem and to obtain one solution. I have tried many different ways to obtain more solutions after that, but I just couldn't. For example:
I tried using the SCIP solver, and then I used the value of the objective function at the optimum, call it V, to add another constraint, x_1+x_2+...+x_n >= V, on top of the original "Ax<=b," and then used the CP-SAT solver to find NS feasible vectors (I followed the instructions in this guide). There is no optimization in this second step, just a quest for feasibility. This didn't work: the solver produced N replicas of the same vector. Still, when asked for the number of solutions found, it misleadingly replies that solution_printer.solution_count() is equal to NS. Here's a snippet of the code that I used:
# Define the constraints (A and b are lists)
for j in range(n):
constraint_expr = [int(A[j][l])*x[l] for l in range(n)]
model.Add(sum(constraint_expr) <= int(b[j][0]))
V = 112
constraint_obj_val = [-x[l] for l in range(n)]
model.Add(sum(constraint_obj_val) <= -V)
# Call the solver:
solver = cp_model.CpSolver()
solution_printer = VarArraySolutionPrinterWithLimit(x, NS)
solver.parameters.enumerate_all_solutions = True
status = solver.Solve(model, solution_printer)
I tried using the SCIP solver and then using solver.NextSolution(), but every time I was using this command, the algorithm would produce a vector that was less and less optimal every time: the first one corresponded to a value of, say, V=112 (the optimal one!); the second vector corresponded to a value of 111; the third one, to 108; fourth to sixth, to 103; etc.
My question is, unfortunately, a bit vague, but here it goes: what's the best way to obtain more than one solution to my optimization problem?
Please let me know if I'm not being clear enough or if you need more/other chunks of the code, etc. This is my first time posting a question here :)
Thanks in advance.
Is your matrix A integral ? if not, you are not solving the same problem with scip and CP-SAT.
Furthermore, why use scip? You should solve both part with the same solver.
Furthermore, I believe the default solution pool implementation in scip will return all solutions found, in reverse order, thus in decreasing quality order.
In Gurobi, you can do something like this to get more than one optimal solution :
solver->SetSolverSpecificParametersAsString("PoolSearchMode=2"); // or-tools [Gurobi]
From Gurobi Reference [Section 20.1]:
By default, the Gurobi MIP solver will try to find one proven optimal solution to your model.
You can use the PoolSearchMode parameter to control the approach used to find solutions.
In its default setting (0), the MIP search simply aims to find one
optimal solution. Setting the parameter to 1 causes the MIP search to
expend additional effort to find more solutions, but in a
non-systematic way. You will get more solutions, but not necessarily
the best solutions. Setting the parameter to 2 causes the MIP to do a
systematic search for the n best solutions. For both non-default
settings, the PoolSolutions parameter sets the target for the number
of solutions to find.
Another way to find multiple optimal solutions could be to first solve the original problem to optimality and then add the objective function as a constraint with lower and upper bound as the optimal objective value.

Retrieve symbolic function from cost or constraint in pydrake

I apparently have a complicated enough constraint function that it takes over 1 minute just to call prog.AddConstraint(), presumably because it's spending a long time constructing the symbolic function (although if you have other insights why it takes so long I would appreciate it). What I would like to do is pull out the symbolic constraint function and cache it so that once it's created, I don't need to wait 1 minute and it can just load it from disk.
My issue is I don't know how to access that function. I can clearly see the expression when I print it, for example in this simplified example:
from pydrake.all import MathematicalProgram
prog = MathematicalProgram()
x = prog.NewContinuousVariables(2, "x")
c = prog.AddConstraint(x[0] * x[1] == 1)
print(c)
I get the output:
ExpressionConstraint
0 <= (-1 + (x(0) * x(1))) <= 0
I realize I could parse that string and pull out the part that represents the function, but it seems like there should be a better way to do it? I'm thinking I could pull out the expression function and upper/lower bounds and use that subsequently in my AddConstraint calls.
In my real use-case, I need to apply the same constraint to multiple timesteps in a trajectory, so I think it would be helpful to create the symbolic function once when I call AddConstraint for the first timestep and then all subsequent timesteps shouldn't have to re-create the symbolic function, I can just use the cached version and apply it to the relevant variables. The expression involves a Cholesky decomposition of a covariance matrix so there are a lot of constraints being applied.
Any help is greatly appreciated.
In my real use-case, I need to apply the same constraint to multiple timesteps in a trajectory, so I think it would be helpful to create the symbolic function once when I call AddConstraint for the first timestep and then all subsequent timesteps shouldn't have to re-create the symbolic function, I can just use the cached version and apply it to the relevant variables. The expression involves a Cholesky decomposition of a covariance matrix so there are a lot of constraints being applied.
I would strongly encourage to write this constraint using a function evaluation, instead of a symbolic expression. One example is that if you want to impose the constraint lb <= my_evaluator(x) <= ub, then you can call it this way
def my_evaluator(x):
# Do Cholesky decomposition and other things to evaluate it. Return the evaluation result.
return result
prog.AddConstraint(my_evaluator, lb, ub, x)
Adding constraint using symbolic expression is convenient when your constraint is linear in the decision variables, otherwise it is better to avoid using symbolic expression to add constraint. (Evaluating a symbolic expression, especially one involving Cholesky decomposition, is really time consuming).
For more details on adding generic nonlinear constraint, you could refer to our tutorial

Numerical issue with SCIP: same problem, different optimal values

I'm having the following numerical issue while using SCIP solver as a callable library via PySCIPOpt:
two equivalent and almost identical models for the same problem yield different optimal
values and optimal solutions, with optimal values having a relative difference of order 1e-6
An independent piece of software verified that the solutions yielded by both models are feasible for the original problem and that their true values agree with the optimal values reported by SCIP for each model. Until the appearance of this instance a bunch of instances of the same problem had been solved, with both models always agreeing on their optimal solutions and values.
Is it possible to modify the numerical precision of SCIP for comparing the values of primal solutions among them and against dual bounds? Which parameters should be modified to overcome this numerical difficulty?
What I tried
I've tried the following things and the problem persisted:
Turning presolving off with model.setPresolve(SCIP_PARAMSETTING.OFF).
Setting model.setParam('numerics/feastol', epsilon) with different values of epsilon.
Since feasible sets and objective functions agree (see description below), I've checked that the actual coefficients of the objective functions agree by calling model.getObjective() and comparing coefficients for equality for each variable appearing in the objective function.
The only thing that seemed to help was to add some noise (multiplying by numbers of the form 1+eps with small eps) to the coefficients in the objective function of the model yielding the worst solution. This makes SCIP to yield the same (and the better) solution for both models if eps is within certain range.
Just in case, this is what I get with scip -v in the terminal:
SCIP version 6.0.2 [precision: 8 byte] [memory: block] [mode:
optimized] [LP solver: SoPlex 4.0.2] [GitHash: e639a0059d]
Description of the models
Model (I) has approximately 30K binary variables, say X[i] for i in some index set I. It's feasible and not a hard MIP.
Model (II) is has the same variables of model (I) plus ~100 continuous variables, say Y[j] for j in some index set J. Also model (II) has some constraints like this X[i_1] + X[i_2] + ... + X[i_n] <= Y[j].
Both objective functions agree and depend only on X[i] variables, the sense is minimization. Note that variables Y[j] are essentially free in model (II), since they are continuous and they do not appear in the objective function. Obviously, there is no point in including the Y[j] variables, but the optimal values shouldn't be different.
Model (II) is the one yielding the better (i.e. smaller) value.
sorry for the late answer.
So, in general, it can definitely happen that any MIP solver reports different optimal solutions for formulations that are slightly perturbed (even if they are mathematically the same), due to the use of floating-point arithmetic.
Your problem does seem a bit weird though. You say the Y-variables are free, i.e. they do not have any variable bounds attached to them? If this is the case, I would be very surprised if they don't get presolved away.
If you are still interested in this problem, could you provide your instance files to me and I will look at them?

Fit a simulation python

I am trying to fit a simulation to empirical data, given the number of parameters of the model brute force is impossible. What are the ressources available to fit a simulation?
The simulations is a python fonction (not to be be mistaken with math function) which outputs a list. I want this list to be as close as possible to an other list (empirical data).
I don't think scipy.optimize works well because it is not a mathematical function but a simulation (impossible to give a function of it). Brute force will require about 5000 simulations which is impractical.
def sim(conta = 0.2, recov = 0.01, D = 600, sig = 3, risk_aversion = 0.05, over_conf = 0.05, power_narr = 5, length = 125, n_k = 0.997, shocks = [8]+[0]*5+[8]+[0]*5+[15]+[0]*5+[40]+[0]*5+[40]+[0]*5, no_len = 25, u = [0.35,0.35,0.15,0.15], w = [1,1,0.1,0.1], ΓΈ = 0.9 ):
"""those are the parameters of the simulation, some are floats, others lists"""
"""
simulation going on here
"""
return my_list
i want to make this list fit another list by varying the parameters
I expect the output of this fit to be the list of the best parameters of the simulation.
Of course you can use scipy optimize, and actually many other much more robust libraries than the ones mentioned in other answers, such as Mystic (https://github.com/uqfoundation/mystic) or lmfit (https://github.com/lmfit/lmfit-py), just to mention a couple.
Whether your objective function is a mathematical function, a result from a simulation, an output of an external program or the outcome of the French-fries making machine is irrelevant. The only questions you should ask yourself are:
Is expensive is it to evaluate the objective function? If it takes half a second, then even brute force is ok. I have run scipy and Mystic against external programs (reservoir simulators) and each function evaluation can easily require hours.
How likely is it that your objective function has many, many local optima?
The answers to these two questions may steer you towards a specific set of solvers: I.e., local optimization (faster but you run the risk of getting a local minimum) or global optimization (takes more function evaluations to explore the search space but may give you a better fit).
That said, your objective function can easily be rewritten to make it a target for an optimization algorithm:
def sim(x, my_target_array):
# calculation stuff here
return ((numpy.array(my_list) - my_target_array)**2).sum()
Andrea.
You can use stochastic methods:
Stochastic gradient descent (an example)
MCMC Markov Chain Monte Carlo with PyMC3 (an example)
In any case you need a parametric model. It will be helpful shurely when you write it into your question.

Parallel many dimensional optimization

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.

Categories

Resources