How to find the maximum of a prob in PuLP

How to find the maximum of a prob in PuLP - python

I am trying to solve a linear problem in PuLP that minimizes a cost function. The cost function is itself a function of the maximum value of the cost function, e.g., I have a daily cost, and I am trying to minimize the monthly cost, which is the sum of the daily cost plus the maximum daily cost in the month. I don't think I'm capturing the maximum value of the function in the final solution, and I'm not sure how to go about troubleshooting this issue. The basic outline of the code is below:
# Initialize the problem to be solved
prob = LpProblem("monthly_cost", LpMinimize)
# The number of time steps
# price is a pre-existing array of variable prices
tmax = len(price)
# Time range
time = list(range(tmax))
# Price reduction at every time step
d = LpVariable.dict("d", (time), 0, 5)
# Price increase at every time step
c = LpVariable.dict("c", (time), 0, 5)
# Define revenues = price increase - price reduction + initial price
revenue = ([(c[t] - d[t] + price[t]) for t in time])
# Find maximum revenue
max_revenue = max(revenue)
# Initialize the problem
prob += sum([revenue[t]*0.0245 for t in time]) + max_revenue
# Solve the problem
prob.solve()
The variable max_revenue always equals c_0 - d_0 + price[0] even though price[0] is not the maximum of price and c_0 and d_0 both equal 0. Does anyone know how to ensure the dynamic maximum is being inserted into the problem? Thanks!

I don't think you can do the following in PuLP or any other standard LP solvers:
max_revenue = max(revenue)
This is because determining the maximum will require the solver to evaluate revenue equations; so in this case, I don't think you can extract a standard LP model. Such models are in fact non-smooth.
In such situations, you can easily reformulate the problem as follows:
max_revenue >= revenue = ([(c[t] - d[t] + price[t]) for t in time])
This works, as for any value of revenue: max_revenue >= revenue. This in turn helps in extracting a standard LP model from the equations. Hence, the original problem formulation gets extended with additional inequality constraints (the equality constraints and the objective functions should be the same as before). So it could look something like this (word of caution: I have not tested this):
# Define variable
max_revenue = LpVariable("Max Revenue", 0)
# Define other variables, revenues, etc.
# Add the inequality constraints
for item in revenue:
prob += max_revenue >= item
I would also suggest that you have a look at scipy.optimize.linprog. PuLP writes the model in an intermediary file, and then calls installed solver to solve the model. On the other hand, in scipy.optimize.linprog it's all done in python and should be faster. However, if your problem can not be solved using simplex algorithm, or you require other professional solvers (e.g. CPlex, Gurobi, etc.) then PuLP is a good choice.
Also, see the discussion on Data Fitting (page 19) in Introduction to Linear Optimisation by Bertsimas.
Hope this helps. Cheers.

Related

Conditional average of variables in Pulp optimization problem

Suppose I have 2 pulp variables: x1 and x2. These two variables represent water temperatures inside two different water pipes. These two pipes, at a certain point, merges into one single pipe and the two water flows mixes together. The water temperature after the mixing is equal to the average of the two temperatures because the flow rate is the same.
If the flow of one water pipe is zero, there is no mixing and the output temperature is equal to the temperature of the non-zero flow water temperature.
This final water temperature is then used into the objective function of the pulp problem to calculate some cost.
This means that I have to calculate the average of these two variables but each variable has to be considered in the calculation of the average only if it is greater than 0.
Here is an example you can reproduce to calculate the average without the condition of >0.
from pulp import *
# Define the variables
x1 = LpVariable("x1", 0, None)
x2 = LpVariable("x2", 0, None)
avg = LpVariable("avg",0,None)
# Define the problem
prob = LpProblem("average_problem", LpMinimize)
# Define the objective function
prob += 0, "objective function"
# Calculate avg value
prob += avg==(x1+x2)/2, "average_constraint"
# Set x1 and x2 value just as example
prob += x1==100
prob += x2==50
cost_of_engine = (105-avg)*3/0.2
total_production_cost = lpSum(cost_of_engine+10)
prob.setObjective(total_production_cost)
# Solve the problem
prob.solve()
This example works if x1 and x2 are both higher than zero.
However, if for instance x1=0 and x2=100, then avg=50.
What I need, instead, is to discard the x1 variable from the calculation of the average so that avg=100.
This is clearly a non-linear problem because the denominator of the calculation of the average is dynamic and depends on the value of the variable x1 and x2.
Do you have any idea how to solve this problem? Maybe using the Big M technique?

There are several approaches that might be reasonable, depending on the characteristics of your problem that are not described.... As noted, if you are trying to minimize an average in the objective and both the numerator and the denominator are variables, the resultant expression is non-linear and you'll need to consider a substitute objective or move outside of pulp and look at non-linear formulations and non-linear solvers.
Idea #1: Use a penalty for the number of items used.
You can introduce (and properly constrain with a big-M constraint) a new variable y_i ∈ {0, 1} that is 1 if x_i is used and some logical weight w and use an objective like:
obj = ∑ x_i + w * ∑ y_i ; minimized
which might work OK if the x_i are in a range such that a logical w can be generated.
Or...
Idea #2: Use a mini-max or maxi-min constraint
If you are seeking an aggregate total of the x_i used, while minimizing the number that are used and there is some "trade space" in the model, you can set the objective to "maximize the minimum used x_i value", which might work, again, depending on the other characteristics of your model. This should have similar effect by encouraging the model to pick larger x_i to make the target value. In that case, you can... In pseudocode:
Introduce y and z ...
y_i ∈ {0, 1}
x_i <= y_i * M
z ∈ Reals
z <= x_i + (1-y_i)*M # constrain z to the lowest x_i used...
obj = max(z)

Maximise a groups probability of reaching a score within PuLP

Using python, I have a linear programming solution in Pulp which selects 6 players within a budget constraint whilst maximising a specified parameter.
However, I want to be able to maximise a probability parameter of each team of 6 players.
Namely, I want to be able to input a mean and standard deviation for each player, and then maximise the percentage chance of each team reaching a predetermined score. This requires summing the means and standard deviations of the 6 players in each team, then calculating the percentage chance of them surpassing this score (I have been using numpy.norm to do this).
The problem I am having is that I am not sure if it is possible to maximise this parameter within a linear programming module like pulp. I can not get it to maximise the probability after summing each teams mean and standard deviation.
I have tried estimating this value by multiplying each individuals mean and standard deviation by 6, thus creating a dummy team, and calculating the probability of reaching the predetermined score, then scaling back down and maximising the sum of these values in each team. This gets close but is not as accurate as I want. This is the code I have so far:
lineup dataframe:
index
mu
std
Salary
Rory McIlroy
73.450198
10.455766
11100.
Scottie Scheffler
72.652175
9.477475
11000.
Jon Rahm
73.033862
10.293721
10800.
Justin Thomas
73.886648
10.426305
10500.
Collin Morikawa
68.409628
10.588617
10300.
target_score = 600
limit = 50000
lineup_im = lineup2.set_index('index')
w = lineup_im.Salary
v = lineup_im.mu
z = lineup_im['std']
items = list(sorted(lineup_im.index))
# Create model
m = LpProblem("Knapsack", LpMaximize)
# Variables
x = LpVariable.dicts('p', items, lowBound=0, upBound=1, cat=LpInteger)
# Objective
m += sum((((1-(norm(loc=v[i]*6, scale=z[i]*6).cdf(target_score)))))*x[i] for i in items)/6
# Constraint
m += sum(w[i]*x[i] for i in items) <= limit
m += sum(x[i] for i in items) == 6
# Optimize
m.solve()
Is there a way to do this within Pulp or another LP module in python?

Welcome to the site and nice post w/ data!
You have a chicken vs. egg problem here... Let me explain...
The parameter that you want to get to is the CDF of the team score, which, if you assume it is normally distributed is the sum of the means of the player's scores with a variance that is a sum of the player's variances... That's how it works for Norm distribution, right?
So, all of those things are known values (parameters) in your problem, based on the player data. You just haven't calculated the team's CDF for all of the possible teams. The problem is you cannot do that as some kind of callback after using the optimizer to pick team membership, it must be done in advance. pulp solver does not have the ability/linkages to make calls to numpy to get the CDF "on the fly". So you have a couple options...
You could reformulate your problem in terms of the teams and then expand your data set and just have a binary variable for which team is selected, but that seems kind of like a waste, because you are essentially having the solver just picking the single best team, with only one constraint (total salary), which makes me think you should just brute force this (see below.)
You could just brute force this. If you are considering 100 players and you are choosing 6, that is combin(100, 6) ~ 1 billion. So I would use put the data into dictionaries for fast lookup, use itertools to run through the combinations, first screen for total salary cap, and if that passes, compute the team CDF/p-value for the target score, and keep track of the max value

Define decision variable x_i to indicate whether player i is selected for the team. From the basics of independent random variables, if we define mu_i to be the mean for each player i and sd_i to be their standard deviation, then:
mu_team = \sum_i mu_i*x_i
var_team = \sum_i sd_i^2*x_i
sd_team = sqrt(var_team)
You seek to maximize the probability that a normal random variable with mean mu_team and standard deviation sd_team exceeds some target score S. Conveniently, this is equivalent to minimizing the Z-score of the value S for that random variable:
z_team = (S-mu_team) / sd_team
It's now clear that you could reformulate your optimization model as minimizing z_team subject to your budget and team size constraints. However, z_team is non-linear --- it's a linear function of the decision variables divided by the square root of another linear function of the decision variables. In general mixed integer optimization problems with non-linear objective functions are not so trivial to solve; you won't be able to do it "out of the box" with pulp.
Not all is lost, though! Notice that we're basically balancing quantity S-mu_team with quantity sd_team. If we can construct teams with mu_team > S, then we'd ideally like teams with large mu_team and small sd_team, which enables as negative a z_team value as possible. If we could build a tradeoff curve between achievable mu_team and sd_team values, we could quickly identify the best achievable z_team value. Similarly, if all teams have mu_team < S, then we'd ideally like teams with large mu_team and large sd_team to get a z_team value as close as possible to 0; again, a tradeoff curve would be helpful.
This leads us to a simple solution:
Maximize mu_team subject to budget and team size constraints. Call the best achievable mu_team value M. In the special case of M=S, the best achievable z_team value is 0, and you are done. Otherwise, continue.
Build an efficient frontier trading off mu_team and sd_team:
If M > S, then maximize mu_team - alpha*var_team for various constants alpha >= 0
If M < S, then maximize mu_team + alpha*var_team for various constants alpha >= 0
Compute z_team for each solution in your efficient frontier, and select the one with the smallest z_team value.
Note that each of the optimization problems in steps 1 and 2 now have a linear objective value (both mu_team and var_team are linear in the decision variables), so they will be easily solvable with pulp.

PV Overproduction within a linear cost factor optimization

So I am currently trying to optimize the costs for energy in a household. The optimization is based on a cost factor function which I am trying to minimize.
model = ConcreteModel()
model.t = RangeSet(0, 8759)
def costs(model, t):
return sum(model.cost_factor[t] * model.elec_grid[t] for t in model.t)
model.costs = Objective(rule = costs, sense = minimize)
Due to pv overproduction being a thing I try to negate by using these functions:
model.elec_consumption = Param(model.t, initialize = df['Consumption'])
model.pv = Param(model.t, initialize = df['PV'])
model.excess_pv = Var(model.t, within = NonNegativeReals, initialize = 0)
model.demand = Var(model.t, initialize = 0, within = NonNegativeReals)
def pv_overproduction(model, t):
return model.excess_pv[t] >= model.pv[t] - model.demand[t]
model.pv_overproduction = Constraint(model.t, rule = pv_overproduction)
def lastdeckung(model, t):
return (model.pv[t] - model.excess_pv[t]) + model.elec_grid[t] == model.demand[t]
model.lastdeckung = Constraint(model.t, rule = lastdeckung)
The problem is when the cost factor is negative the optimizer puts model.excess_pv very high so he can crank up the model.elec_grid variable in an effort to minimize the cost factor.
That is obviously not the intention but so far I wasnt able to find a better way to calculate the excess pv. An easy fix would technically be to just have a cost factor which is constantly positive but sadly thats not an option.
I'd appreciate if someone had an idea how to fix this.
The basics are that I want to maximize the usage of the pv electricity in order to reduce costs. At some points there is to mooch pv in the system so in order for that optimization to still work I need to get rid of the excess.
return model.demand[t] == model.elec_consumption[t]
model.demand_rule = Constraint(model.t, rule = demand_rule)
This is the demand. Technically there are more functions but for the the problem solving that is irrelevant. The main problem is that this function doesnt work due to the cost factor being negative sometimes
model.excess_pv[t] >= model.pv[t] - model.demand[t]
Excess_pv aswell as model.demand are variables wheres model.pv is a parameter.
So as far as I got in my problemsearching I need to change my overproduction function in a way that it uses the value from pv - excess_pv if the value is > 0 and should the value be < = 0 its supposed to be zero.

I think the easiest way to do this is to probably just penalize excess production to a greater extent than the maximally negative cost factor.
Why can't you...
excess_pentalty = max(-min(cost) + epsilon, 0) # use maximin to prevent odd behavior if there is no negative cost, which might lead to a negative penalty...
# make obj from components, so we can inspect true cost (w/o penalty) later...
cost = sum(model.cost_factor[t] * model.elec_grid[t] for t in model.t)
overproduction_pentaly = sum(excess_penalty * model.excess_pv[t] for t in model.t)
model.obj = Objective(expr= cost + overproduction_penalty, sense = minimize)
and later if you want the cost independently, you can just check the value of cost, which is a legal pyomo expression.
value(cost)
I think you could also add the expression as a model component, if that is important...
model.cost = ...
model.overproduction_penalty = ...

So the idea of a piecewise function is definitely an option for the problem mentioned in this post. It is quite a fancy and complicated solution though. The idea of penalties is much easier and it also showed a few more flaws in my code. Due to negative cost factor the optimizer tries to maximize grid input which is not wrong but when some variables are not capped the optimizer uses electricity with no efficiency whatsoever. So easiest way as mentionted earlier is to just penalize the grid import from the beginning so there are no negative cost factor during the optimization.

Pyomo - Objective function as average value of param

I am trying to use Pyomo for an LP problem and I would like the objective function to be the mean value of a particular parameter in my dataframe (which I'll call obj_param).
I had previously set this up like so:
model = ConcreteModel()
model.decision_var = Var(list(idx for idx in self.df.index), domain=NonNegativeReals)
model.obj = Objective(
expr= -1 * # because I want to maximize not minimize
sum(model.decision_var[idx] * df.loc[idx,'obj_param'] for idx in df.index)
)
The decision_var here is a column of counts (like "acres of this crop" in the classic farmer problem) and the obj_param is the value of this "crop", so my objective (as written) multiplies the acres of the crop by it's value to maximize the total value.
This makes sense in the farmer problem, but what I'm actually trying to do in my case is to maximize the mean value of each acre. (Forgive the farmer metaphor, it becomes a bit strained here.)
To do this, I change my objective as follows:
model.obj = Objective(
expr= -1 * # because I want to maximize not minimize
sum(model.decision_var[idx] * df.loc[idx,'obj_param'] for idx in df.index) /
sum(model.decision_var[idx] for idx in df.index)
)
Conceptually this looks right to me, but now when I run it I get RuntimeError: Cannot write legal LP file. Objective 'obj' has nonlinear terms that are not quadratic.
I can vaguely understand what this error is saying, but I don't totally see how this equation is non-linear. Either way, more generally I'm asking: is it possible in pyomo to define the objective as an average in the way that I'm trying to do?
Thanks for any help!

How to make linear programming optimization faster

I am trying to allocate customers Ci to financial advisers Pj. Each customer has a policy value xi. I'm assuming that the number of customers (n) allocated to each adviser is the same, and that the same customer cannot be assigned to multiple advisers. Therefore each partner will have an allocation of policy values like so:
P1=[x1,x2,x3] , P2=[x4,x5,x6], P3=[x7,x8,x9]
I am trying to find the optimal allocation to minimise dispersion in fund value between the advisers. I am defining dispersion as the difference between the adviser with the highest fund value (z_max) and the lowest fund value (z_min).
The formulation for this problem is therefore:
where yij=1 if we allocate customer Ci to adviser Pj, 0 otherwise
The first constraint says that zmax has to be greater than or equal to each policy value; since the objective function encourages smaller values of zmax, this means that zmax will equal the largest policy value. Similarly, the second constraint sets zmin equal to the smallest policy value. The third constraint says that each customer must be assigned to exactly one adviser. The fourth says that each adviser must have n customers assigned to him/her.
I have a working solution using the optimization package: PuLP that finds the optimal allocation.
import random
import pulp
import time
# DATA
n = 5 # number of customers for each financial adviser
c = 25 # number of customers
p = 5 # number of financial adviser
policy_values = random.sample(range(1, 1000000), c) # random generated policy values
# INDEXES
set_I = range(c)
set_J = range(p)
set_N = range(n)
x = {i: policy_values[i] for i in set_I} #customer policy values
y = {(i,j): random.randint(0, 1) for i in set_I for j in set_J} # allocation dummies
# DECISION VARIABLES
model = pulp.LpProblem("Allocation Model", pulp.LpMinimize)
y_sum = {}
y_vars = pulp.LpVariable.dicts('y_vars',((i,j) for i in set_I for j in set_J), lowBound=0, upBound = 1, cat=pulp.LpInteger)
z_max = pulp.LpVariable("Max Policy Value")
z_min = pulp.LpVariable("Min Policy Value")
for j in set_J:
y_sum[j] = pulp.lpSum([y_vars[i,j] * x[i] for i in set_I])
# OBJECTIVE FUNCTION
model += z_max - z_min
# CONSTRAINTS
for j in set_J:
model += pulp.lpSum([y_vars[i,j] for i in set_I]) == n
model += y_sum[j] <= z_max
model += y_sum[j] >= z_min
for i in set_I:
model += pulp.lpSum([y_vars[i,j] for j in set_J]) == 1
# SOLVE MODEL
start = time.clock()
model.solve()
print('Optimised model status: '+str(pulp.LpStatus[model.status]))
print('Time elapsed: '+str(time.clock() - start))
Note that I have implemented constraints 1 and 2 slightly differently by including an additional variable y_sum to prevent duplicating an expression with a large number of nonzero elements
The problem
The issue is that for larger values of n,p and c the model takes far too long to optimise. Is it possible to make any changes to how I've implemented the objective function/constraints to make the solution faster?

Try a using a commercial solver like Gurobi with pulp. You should get a substantial decrease in solve time.
Also check your computers memory, if any solver runs out of memory and starts paging to disk the solve time will be very long.

You should monitor the time needed for each part of the program (model declaration and solving)
If the solving is too long, you can use a different solver as suggested above (here is some clue how to do it : https://coin-or.github.io/pulp/guides/how_to_configure_solvers.html).
If the model declaration is too long, you may have to optimise your code (try to use the pulp enabled fuctions as pulp.lpSum rather than python sum for example). You can also fidn some tricks here https://groups.google.com/g/pulp-or-discuss/c/p1N2fkVtYyM and here https://github.com/IBMDecisionOptimization/docplex-examples/blob/master/examples/mp/jupyter/efficient.ipynb

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.