Minimize the number of outputs - python

For a linear optimization problem, I would like to include a penalty. The penalty of every option (penalties[(i)]) should be 1 if the the sum is larger than 0 and 0 if the penalty is zero. Is there a way to do this?
The penalty is defined as:
penalties = {}
for i in A:
penalties[(i)]=(lpSum(choices[i][k] for k in B))/len(C)
prob += Objective Function + sum(penalties)
For example:
penalties[(0)]=0
penalties[(1)]=2
penalties[(3)]=6
penalties[(4)]=0
The sum of the penalties should then be:
sum(penalties)=0+1+1+0= 2

Yes. What you need to do is to create binary variables: use_ith_row. The interpretation of this variable will be ==1 if any of the choices[i][k] are >= 0 for row i (and 0 otherwise).
The penalty term in your objective function simply needs to be sum(use_ith_row[i] for i in A).
The last thing you need is the set of constraints which enforce the rule described above:
for i in A:
lpSum(choices[i][k] for k in B) <= use_ith_row[i]*M
Finnaly, you need to choose M large enough so that the constraint above has no limiting effect when use_ith_row is 1 (you can normally work out this bound quite easily). Choosing an M which is way too large will also work, but will tend to make your problem solve slower.
p.s. I don't know what C is or why you divide by its length - but typically if this penalty is secondary to you other/primary objective you would weight it so that improvement in your primary objective is always given greater weight.

Related

PuLP Conditional Sum based on key of the loop

I am trying to use this conditional sum in Pulp's objective function. For the second lpSum, I am trying to calculate the costs of when we don't have enough chassis' to cover the demand and will need pool chassis' with a higher costs. Of course, I only want to calculate this when we don't have enough dedicated chassis'(dedicated_chassis_needed) to cover the demand(chassis_needed) for each day.
The problem is a cost minimizing one. The last "if" part doesn't seem to be working and the lpSum seems to be summing up every date's pool cost and ignoring the if condition, and it just sets the decision variable of dedicated_chassis_needed to 0(lower constraint) and the objective value is a negative number which should not be allowed.
prob += lpSum(dedicated_chassis_needed * dedicated_rate for date in chassis_needed.keys()) + \
lpSum(((chassis_needed[(date)] - dedicated_chassis_needed) * pool_rate_day) \
for date in chassis_needed.keys() if ((chassis_needed[(date)] - dedicated_chassis_needed) >= 0))
In general, in LP, you cannot use a conditional statement that is dependent on the value of a variable in any of the constraints or objective function because the value of the variable is unknown when the model is built before solving, so you will have to reformulate.
You don't have much information there about what the variables and constants are, so it isn't possible to give good suggestions. However, a well-designed objective function should be able to handle extra cost for excess demand without a condition as the model will select the cheaper items first.
For example, if:
demand[day 5] = 20
and
cheap_units[day 5] = 15 # $100 (availability)
and
reserve units = 100 # $150 (availability from some pool of reserves)
and you have some constraint to meet demand via both of those sources and an objective function like:
min(cost) s.t. cost[day] = cheap_units[day] * 100 + reserve_units * 150
it should work out fine...

How to constrain optimization based on number of negative values of variable in pyomo

I'm working with a pyomo model (mostly written by someone else, to be updated by me) that optimizes electric vehicle charging (ie, how much power will a vehicle import or export at a given timestep). The optimization variable (u) is power, and the objective is to minimize total charging cost given the charging cost at each timestep.
I'm trying to write a new optimization function to limit the number of times that the model will allow each vehicle to export power (ie, to set u < 0). I've written a constraint called max_call_rule that counts the number of times u < 0, and constrains it to be less than a given value (max_calls) for each vehicle. (max_calls is a dictionary with a label for each vehicle paired with an integer value for the number of calls allowed.)
The code is very long, but I've put the core pieces below:
model.u = Var(model.t, model.v, domain=Integers, doc='Power used')
model.max_calls = Param(model.v, initialize = max_calls)
def max_call_rule(model, v):
return len([x for x in [model.u[t, v] for t in model.t] if x < 0]) <= model.max_calls[v]
model.max_call_rule = Constraint(model.v, rule=max_call_rule, doc='Max call rule')
This approach doesn't work--I get the following error when I try to run the code.
ERROR: Rule failed when generating expression for constraint max_call_rule
with index 16: ValueError: Cannot create an InequalityExpression with more
than 3 terms.
ERROR: Constructing component 'max_call_rule' from data=None failed:
ValueError: Cannot create an InequalityExpression with more than 3 terms.
I'm new to working with pyomo and suspect that this error means that I'm trying to do something that fundamentally won't work with an optimization program. So--is there a better way for me to constrain the number of times that my variable u can be less than 0?
If what you're trying to do is minimize the number of times vehicles are exporting power, you can introduce a binary variable that allows/disallows vehicles discharging. You want this variable to be indexed over time and vehicles.
Note that if the rest of your model is LP (linear, without any integer variables), this will turn it into a MIP/MILP. There's a significant difference in terms of computational effort required to solve, and the types of solvers you can use. The larger the problems, the bigger the difference this will make. I'm not sure why u is currently set as Integers, that seems quite strange given it represents power.
model.allowed_to_discharge = Var(model.t, model.v, within=Boolean)
def enforce_vehicle_discharging_logic_rule(model, t, v):
"""
When `allowed_to_discharge[t,v]` is 1,
this constraint doesn't have any effect.
When `allowed_to_discharge[t,v]` is 1, u[t,v] >= 0.
Note that 1e9 is just a "big M", i.e. any big number
that you're sure exceeds the maximum value of `model.u`.
"""
return model.u[t,v] >= 0 - model.allowed_to_discharge[t,v] * 1e9
model.enforce_vehicle_discharging_logic = Constraint(
model.t, model.v, rule=enforce_vehicle_discharging_logic_rule
)
Now that you have the binary variable, you can count the events, and specifically you can assign a cost to such events and add it to your objective function (just in case, you can only have one objective function, so you're just adding a "component" to it, not adding a second objective function).
def objective_rule(model):
return (
... # the same objective function as before
+ sum(model.u[t, v] for t in model.t for v in model.v) * model.cost_of_discharge_event
)
model.objective = Objective(rule=objective_rule)
If what you instead of what you add to your objective function is a cost associated to the total energy discharged by the vehicles (instead of the number of events), you want to introduce two separate variables for charging and discharging - both non-negative, and then define the "net discharge" (which right now you call u) as an Expression which is the difference between discharge and charge.
You can then add a cost component to your objective function that is the sum of all the discharge power, and multiply it by the cost associated with it.

Sparsity reduction

I have to factorize a big sparse matrix ( 6.5mln rows representing users* 6.5mln columns representing items) to find users and items latent vectors. I chose the als algorithm in spark framework(pyspark).
To boost the quality I have to reduce the sparsity of my matrix till 98%. (current value is 99.99% because I have inly 356mln of filled entries).
I can do it by dropping rows or columns, but I must find the optimal solution maximizing number of rows(users).
The main problem is that I must find some subsets of users and items sets, and dropping some row can drop some columns and vice versa, the second problem is that function that evaluates sparsity is not linear.
Which way I can solve this problem? which libraries in python can help me with it?
Thank you.
This is a combinatorial problem. There is no easy way to drop an optimal set of columns to achieve max number of users while reducing sparsity. A formal approach would be to formulate it as a mixed-integer program. Consider the following 0-1 matrix, derived from your original matrix C.
A(i,j) = 1 if C(i,j) is nonzero,
A(i,j) = 0 if C(i,j) is zero
Parameters:
M : a sufficiently big numeric value, e.g. number of columns of A (NCOLS)
N : total number of nonzeros in C (aka in A)
Decision vars are
x(j) : 0 or 1 implying whether column j is dropped or not
nr(i): number of nonzeros covered in row i
y(i) : 0 or 1 implying whether row i is dropped or not
Constraints:
A(i,:) x = nr(i) for i = 1..NROWS
nr(i) <= y(i) * M for i = 1..NROWS
#sum(nr(i)) + e = 0.98 * N # explicit slack 'e' to be penalized in the objective
y(i) and x(j) are 0-1 variables (binary variables) for i,j
Objective:
maximize #sum(y(i)) - N.e
Such a model would be extremely difficult to solve as an integer model. However, barrier methods should be able to solve the linear programming relaxations (LP) Possible solvers are Coin/CLP (open-source), Lindo (commercial) etc... It may then be possible to use the LP solution to compute approximate integer solutions by simple rounding.
In the end, you will definitely require an iterative approach which will require solving MF problem several times each time factoring a different submatrix of C, computed with above approach, until you are satisfied with the solution.

PuLP - How to specify the solver's accuracy

I will try to keep my question short and simple. If you need any further information, please let me know.
I have an MIP, implemented in Python with the package PuLP. (Roughly 100 variables and constraints) The mathematical formulation of the problem is from a research paper. This paper also includes a numerical study. However, my results differ from the results of the authors.
My problem variable is called prob
prob = LpProblem("Replenishment_Policy", LpMinimize)
I solve the problem with prob.solve()
LpStatus returns Optimal
When I add some of the optimal (paper) results as contraints, I get a slightly better objective value. Same goes for constraining the objecive function to a slightly lower value. The LpStatus remains Optimal.
original objective value: total = 1704.20
decision variable: stock[1] = 370
adding constraints: prob += stock[1] == 379
new objective value: 1704.09
adding constraints: prob += prob.objective <= 1704
new objective value: 1702.81
My assumption is that PuLP's solver approximates the solution. The calculation is very fast, but apparently not very accurate. Is there a way I can improve the accuracy of the solver PuLP is using? I am looking for something like: prob.solve(accuracy=100%). I had a look at the documentation but couldn't figure out what to do. Are there any thoughts what the problem could be?
Any help is appreciated. Thanks.
The answer to my question was given by ayhan: To specify the accuracy of the solver, you can use the fracGap argument of the selected solver.
prob.solve(solvers.PULP_CBC_CMD(fracGap=0.01))
However, the question I asked, was not aligned with the problem I had. The deviation of the results was indeed not a matter of accuracy of the solver (as sascha pointed out in the comments).
The cause to my problem:
The algorithm I implemented was the optimization of the order policy parameters for a (Rn, Sn) policy under non-stationary, stochastic demand. The above mentioned paper is:
Tarim, S. A., & Kingsman, B. G. (2006). Modelling and computing (R n, S n) policies for inventory systems with non-stationary stochastic demand. European Journal of Operational Research, 174(1), 581-599.
The algorithm has two binary variables delta[t] and P[t][j]. The following two constraints only allow values of 0 and 1 for P[t][j], as long as delta[t] is defined as a binary.
for t in range(1, T+1):
prob += sum([P[t][j] for j in range(1, t+1)]) == 1
for j in range(1, t+1):
prob += P[t][j] >= delta[t-j+1] - sum([delta[k] for k in range(t-j+2, t+1)])
Since P[t][j] can only take values of 0 or 1, hence being a binary variable, I declared it as follows:
for t in range(1, T+1):
for j in range(1, T+1):
P[t][j] = LpVariable(name="P_"+str(t)+"_"+str(j), lowBound=0, upBound=1, cat="Integer")
The objective value for the minimization returns: 1704.20
After researching for a solution for quite a while, I noticed a part of the paper that says:
... it follows that P_tj must still take a binary value even if it is
declared as a continuous variable. Therefore, the total number of
binary variables reduces to the total number of periods, N.
Therefore I changed the cat argument of the P[t][j] variable to cat="Continuous". Whithout changing anything else, I got the lower objective value of 1702.81. The status of the result shows in both cases: Optimal
I am still not sure how all these aspects are interrelated, but I guess for me this tweek worked. For everyone else who is directed to this question, will probably find the necessary help with the answer given at the top of this post.

Optimization on a set of data using python

Optimization on a set of data using python.
Following data sets available
x, y, f(x), f(y).
Function to be optimized (maximize):
f(x,y) = f(x)*y - f(y)*x
based on following contraints:
V >= sqrt(f(x)^2+f(y)^2)
I >= sqrt(x^2+y2)
where V and I are constants.
Can anyone please let me know what optimization module do I need to use? From what I understand I need to perform a discrete optimization as I have set f values for x, y, f(x) and f(y).
Using complex optimizers (http://docs.scipy.org/doc/scipy/reference/optimize.html) for such a problem is rather a bad idea.
It looks like a problem which can be quite easily solved in under O(n^2) where n=max(|x|,|y|), simply:
sort x,y,f(x),f(y) creating sorted(x), sorted(y), sorted(f(x)), sorted(f(y))
for each x find the positions in sorted(y) for which I^2 >= x^2+y^2 holds and similarly for f(x) and sorted(f(y)) and V^2 >= f(x)^2 + f(y)^2 (two binary searches, as I^2 >= x^2+y^2 <=> |y| <= sqrt(I^2-x^2) so you can find the "barrier"in constant time and then use bin searches to find actual data points which are the closest ones "on the right side of inequality")
Iterate through sorted(x) and for each x:
Iterate simultanously through elements of y and f(y) and discard (in this loop) points which are not in borth intervals found in step 2. (linear complexity)
Record argument pairs x_max,y_max for which f(x_max,y_max) is maximized
Return x_max,y_max
Total complexity is under quadratic, as step 1 takes O(nlgn), each iteration of loop in step 2 is O(lgn) so the whole step 2 takes O(nlgn), loop in step 3 is O(n) and loop in first substep of step 3 is O(n) (but in real life it should be almost constant due to the constraints), which makes the whole algorithm O(n^2) (and in most cases it will behave as O(nlgn)). It also does not depend on the definition of f(x,y) (it uses it as a black box) so you can optimize an arbitrary function is such a way.

Categories

Resources