I am solving a minimization linear program using COIN-OR's CLP solver with PULP in Python.
The variables that are included in the problem are a subset of the total number of possible variables and sometimes my pricing heuristic will pick a subset of variables that result in an infeasible solution. After which I use shadow prices to price new variables in.
My question is, if the problem is infeasible, I still get values from calling prob.constraints[c].pi, but those values don't always seem to be "valid" or "good" per se.
Now, a solver like Gurobi won't even let me call the shadow prices after an infeasible solve.
Actually Stu, this might work! - the "dummy var" in my case could be the source/sink node, which I can loosen the flow constraints on, allowing infinite flow in/out but with a large cost. This makes the solution feasible with a very bad high optimal cost; then the pricing of the new variables should work and show me which variables to add to the problem on the next iteration. I'll give it a try and report back. My only concern is that the bigM cost coefficient on the source/sink node may skew the pricing of the variables making all of them look relatively attractive. This would be counter productive bc adding most of the variables back into the problem will defeat the purpose of my column generation in the first place. I'll test it...
Related
I am learning Pyomo Abstract Modeling from a Book.
I have an example that has an objective functionEquation is here to minimize the cost of establishing a warehouse at optimal locations to build warehouses to meet delivery demands.
The authors modeled the objective with this script Script is here.
Here in the script "model.d" is "Param" and "model.x" is "Var"
Why he has used Param for "model.d" and "Var" for "model.x"?
Please take spare precious time to help me to get out of this.
Not only in pyomo, but in general, for Operations Researchs or Optimization. A Parameter is a given value that you know prior solving the problem. On the other hand, a Variable is a value that you will find solving the problem in order to get the best solution.
Suposse that in your problem model.d is the cost of constructing the warehouse model.x. This means that for each potential warehouse x, construct it cost d. This assumme that if your building a warehouse, you know the capital cost of constructing such a warehouse, therefore, it is known before solving the problem, then model.d is a parameter. model.x is variable since you don't know whether if construct it or not. You want want the model to tell you that, therefore, it is a variable
I am using Google's OR Tool SCIP (Solving Constraint Integer Programs) solver to solve a Mixed integer programming problem using Python. The problem is a variant of the standard scheduling problem, where there are constraints limiting that each worker works maximum once per day and that every shift is covered by only one worker. The problem is modeled as follows:
Where n represents the worker, d the day and i the specific shift in a given day.
The problem comes when I change the objective function that I want to minimize from
To:
In the first case an optimal solution is found within 5 seconds. In the second case, after 20 minutes running, the optimal solution was still not reached. Any ideas to why this happens?
How can I change the objective function without impacting performance this much?
Here is a sample of the values taken by the variables tier and acceptance used in the objective function.
You should ask the SCIP team.
Have you tried using the SAT backend with 8 threads ?
The only thing that I can spot from reading your post is that the objective function is no longer pure integer after adding the acceptance. If you know that your objective is always integer that helps during the solve since you can also round up all your dual bounds. This might be critical for your problem class.
Maybe you could also post a SCIP log (preferable with statistics) of the two runs?
I am tried to solve a MILP problem using python pulp and The solution is infeasible. So, I want to find where infeasibility is coming and want to relax it or remove it to find feasible solution. it is difficult to check manually in the LP file bcz large number of constraints are present. So How I can handle this issue?
I went through some articles they mentioned that check manually in the LP file but it is very difficult to do manually for a huge number of variables/constraints.
It is giving just infeasibility
In general, this is not so easy. Some pointers:
If you can construct a feasible but not necessarily optimal solution for your problem, plug this in and you will find the culprits very easily.
Some advanced solvers have tools that can help (IIS, Conflict refiner). They may or may not point to the real problem.
Note that the model can be LP infeasible or just integer infeasible.
In some cases it is possible just to relax a suspect block of constraints and see what happens.
A more structural approach I often use is to formulate an elastic model: allow constraints to be violated but at a cost. This often makes some economic sense: hire temp workers, rent extra capacity, buy from 3rd parties etc.
I use some rule of thumbs to check infeasibility.
Always start with a small data set that you can inspect more manually.
After relax all integer variables. If this relaxation is infeasible, your problem is linear infeasible. You might have constraints saying stuff like x > 3 and x < 2;
If the linear relaxation is feasible, then deactivate each constraint once. Frequently you find some obvious constraints being infeasible, such as sum(i,x_i) = 1. But if you deactivate one by one, you may find that another more complex constraint set is causing infeasibility, and there you might investigate better.
I'm using scipy.optimize.minimize to find the minimum of a 4D function that is rather sensitive to the initial guess used. If I vary it a little bit, the solution will change considerably.
There are many questions similar to this one already in SO (e.g.: 1, 2, 3), but no real answer.
In an old question of mine, one of the developers of the zunzun.com site (apparently no longer online) explained how they managed this:
Zunzun.com uses the Differential Evolution genetic algorithm (DE) to find initial parameter estimates which are then passed to the Levenberg-Marquardt solver in scipy. DE is not actually used as a global optimizer per se, but rather as an "initial parameter guesser".
The closest I've found to this algorithm is this answer where a for block is used to call the minimizing function many times with random initial guesses. This generates multiple minimized solutions, and finally the best (smallest value) one is picked.
Is there something like what the zunzun dev described already implemented in Python?
There is no general answer for such question, as a problem of minimizing arbitrary function is impossible to solve. You can do better or worse on particular classes of functions, thus it is rather a domain for mathematician, to analyze how your function probably looks like.
Obviously you can also work with dozens of so called "meta optimizers", which are just bunch of heuristics, which might (or not) work for you particular application. Those include random sampling starting point in a loop, using genetic algorithms, or - which is as far as I know most mathematically justified approach - using Bayesian optimization. In general the idea is to model your function in the same time when you try to minimize it, this way you can make informed guess where to start next time (which is level of abstraction higher than random guessing or using genetic algorithms/differential evolution). Thus, I would order these methods in following way
grid search / random sampling - uses no information from previous runs, thus - worst results
genetic approach, evolutionary, basin-hooping, annealing - use information from previous runs as a (x, f(x)) pairs, for limited period of time (generations) - thus average results
Bayesian optimization (and similar methods) - use information from all previous experiences through modeling of the underlying function and performing sampling selection based on expected improvement - best results (at the cost of most complex methods)
I'm a data analysis student and I'm starting to explore Genetic Algorithms at the moment. I'm trying to solve a problem with GA but I'm not sure about the formulation of the problem.
Basically I have a state of a variable being 0 or 1 (0 it's in the normal range of values, 1 is in a critical state). When the state is 1 I can apply 3 solutions (let's consider Solution A, B and C) and for each solution I know the time that the solution was applied and the time where the state of the variable goes to 0.
So I have for the problem a set of data that have a critical event at 1, the solution applied and the time interval (in minutes) from the critical event to the application of the solution, and the time interval (in minutes) from the application of the solution until the event goes to 0.
I want with a genetic algorithm to know which is the best solution for a critical event and the fastest one. And if it is possible to rank the solutions acquired so if in the future on solution can't be applied I can always apply the second best for example.
I'm thinking of developing the solution in Python since I'm new to GA.
Edit: Specifying the problem (responding to AMack)
Yes is more a less that but with some nuances. For example the function A can be more suitable to make the variable go to F but because exist other problems with the variable are applied more than one solution. So on the data that i receive for an event of V, sometimes can be applied 3 ou 4 functions but only 1 or 2 of them are specialized for the problem that i want to analyze. My objetive is to make a decision support on the solution to use when determined problem appear. But the optimal solution can be more that one because for some event function A acts very fast but in other case of the same event function A don't produce a fast response and function C is better in that case. So in the end i pretend a solution where is indicated what are the best solutions to the problem but not only the fastest because the fastest in the majority of the cases sometimes is not the fastest in the same issue but with a different background.
I'm unsure of what your question is, but here are the elements you need for any GA:
A population of initial "genomes"
A ranking function
Some form of mutation, crossing over within the genome
and reproduction.
If a critical event is always the same, your GA should work very well. That being said, if you have a different critical event but the same genome you will run into trouble. GA's evolve functions towards the best possible solution for A Set of conditions. If you constantly run the GA so that it may adapt to each unique situation you will find a greater degree of adaptability, but have a speed issue.
You have a distinct advantage using python because string manipulation (what you'll probably use for the genome) is easy, however...
python is slow.
If the genome is short, the initial population is small, and there are very few generations this shouldn't be a problem. You lose possibly better solutions that way but it will be significantly faster.
have fun...
You should take a look at the GARAGe Michigan State. They are a GA research group with a fair number of resources in terms of theory, papers, and software that should provide inspiration.
To start, let's make sure I understand your problem.
You have a set of sample data, each element containing a time series of a binary variable (we'll call it V). When V is set to True, a function (A, B, or C) is applied which returns V to it's False state. You would like to apply a genetic algorithm to determine which function (or solution) will return V to False in the least amount of time.
If this is the case, I would stay away from GAs. GAs are typically used for some kind of function optimization / tuning. In general, the underlying assumption is that what you permute is under your control during the algorithm's application (i.e., you are modifying parameters used by the algorithm that are independent of the input data). In your case, my impression is that you just want to find out which of your (I assume) static functions perform best in a wide variety of cases. If you don't feel your current dataset provides a decent approximation of your true input distribution, you can always sample from it and permute the values to see what happens; however, this would not be a GA.
Having said all of this, I could be wrong. If anyone has used GAs in verification like this, please let me know. I'd certainly be interested in learning about it.