optimization comparison between cvxpy and gurobi - python

There are number of jobs to be assigned to number of resources each with a score (performance indicator) and cost. The resource assignment problem (RAP) objective is to maximize assignment scores considering the budget. Constraints: Each resource can handle at most one job and each job if it is filled should be done by one resource. Also, there is a limited budget to spend.
I have tackled the problem in two ways: CVXPY using gurobi solver and gurobi packages. My challenge is I can't program it in a memory-efficient way with cvxpy. There are hundreds of constraint list comprehensions! How can I can improve efficiency of my code in cvxpy? For example, is there a better way to define dictionary variables in cvxpy similar to gurobi?
ms is dictionary of format {('firstName lastName', 'job'), score_value}
cst is dictionary of format {('firstName lastName', 'job'), cost_value}
job is set of jobs
res is set of resources {'firstName lastName'}
G (or g in gurobi implementation) is a dictionary with jobs as keys and values of 0 or 1 whether that job is filled due to budget limit (0 if filled and 1 if not)
thanks
github link including codes and memory profiling comparison
gurobi implementation:
m = gp.Model("RAP")
assign = m.addVars(ms.keys(), vtype=GRB.BINARY, name="assign")
g = m.addVars(job, name="gap")
m.addConstrs((assign.sum("*", j) + g[j] == 1 for j in job), name="demand")
m.addConstrs((assign.sum(r, "*") <= 1 for r in res), name="supply")
m.addConstr(assign.prod(cst) <= budget, name="Budget")
job_gap_penalty = 101 # penatly of not filling a job
m.setObjective(assign.prod(ms) -job_gap_penalty*g.sum(), GRB.MAXIMIZE)
m.optimize()
cvxpy implenentation:
X = {}
for a in ms.keys():
X[a] = cp.Variable(boolean=True, name="assign")
G = {}
for g in job:
G[g] = cp.Variable(boolean=True, name="gap")
constraints = []
for j in job:
X_r = 0
for r in res:
X_r += X[r, j]
constraints += [
X_r + G[j] == 1
]
for r in res:
X_j = 0
for j in job:
X_j += X[r, j]
constraints += [
X_j <= 1
]
constraints += [
np.array(list(cst.values())) # np.array(list(X.values())) <= budget,
]
obj = cp.Maximize(np.array(list(ms.values())) # np.array(list(X.values()))
- job_gap_penalty * cp.sum(list(G.values())))
prob = cp.Problem(obj, constraints)
prob.solve(solver=cp.GUROBI, verbose=False)
Here is the memory profiling comparison:
memeory profiling for cvxpy
memory profiling for gurobi

Previously, I tried to solve thru defining dictionary variables similar to gurobi but at is not available in cvxpy, the code was not efficient when scaling up. But now I solved it thru matrix variables and then converting to dictionary variables which super fast!
assign_scores = np.array(list(ms.values())).reshape(len(res), len(job))
assign_cost = np.array(list(cst.values())).reshape(len(res), len(job))
# make a bool matrix variable with the shape of number of resources and jobs
x = cp.Variable(shape=(len(res), len(job)), boolean=True, name="assign")
# make a bool vector variable with the shape of number of jobs
g = cp.Variable(shape=(len(job), ), boolean=True, name="gap")
constraints = []
# each job can be assigned to at most one resource or remains unfilled due to budget cap
constraints += [cp.sum(x[:, j]) + g[j] == 1 for j in range(len(job))]
# each resource can be assigned to at most one job
constraints += [cp.sum(x[r, :]) <= 1 for r in range(len(res))]
# budget cap
constraints += [cp.sum(cp.multiply(assign_cost, x)) <= budget]
# pentalty if a job is not filled
job_gap_penalty=101
# objective is to maiximize performance score
obj = cp.Maximize(cp.sum(cp.multiply(assign_scores, x) - job_gap_penalty * cp.sum(g)))
prob = cp.Problem(obj, constraints)
prob.solve(solver=cp.GUROBI, verbose=True)

Related

PuLP takes too much time to run

I try to use PuLP to solve route optimization problem but it took around 1 hour to finish. I also monitor resources and it seems to use only 1 processor. Is it possible to do a multi-thread or multi-processor? or is there anyway to improve an efficiency?
Here is some source code.
Variables & Objective function
# DECISION VARIABLE X
x_vars = LpVariable.dicts("route",[(i,j,k) for i in job_id for j in job_id for k in truck_id],lowBound=0,upBound=1,cat=LpBinary)
# DECISION VARIABLE Y
y_vars = LpVariable.dicts("work",[(j,k) for j in job_id for k in truck_id],lowBound=0,upBound=1,cat=LpBinary)
# OBJECTIVE FUNCTION
opt_model += lpSum(x_vars[(i,j,k)]*travel_cost[i+'-'+j+'-'+k] for i in job_id for j in job_id for k in truck_id)
Constrains
#CONSTRAINTS x[i,j,k] = 0 for all i!=k & j!=k
for k in truck_id:
opt_model += lpSum(x_vars[(i,j,k)] for j in job_id for i in yard_id if i!=truck_yard[k]) == 0
#CONSTRAINTS
#2
t2 = time.time()
print(t2 - t1)
for j in job_id:
for k in truck_id:
opt_model += lpSum(x_vars[(i,j,k)] for i in job_id) == y_vars[(j,k)]
Solver
Solver_name = 'PULP_CBC_CMD'
solver = pl.getSolver(Solver_name)
results = opt_model.solve(solver)
At least in PuLP of version 2.6.0, I can execute CBC solver with multi threads by simply adding threads parameter to getSolver.
Solver_name = 'PULP_CBC_CMD'
solver = pl.getSolver(Solver_name, threads=4)
results = opt_model.solve(solver)
https://coin-or.github.io/pulp/technical/solvers.html?highlight=getsolver#pulp.apis.PULP_CBC_CMD

Cannot reduce time taken to add constraints to pulp

I have the following code(python 3) for adding constraints to pulp(v 2.3). It needs to add up to 400000 constraints(100^2 S, 4 A).
def linearProgram(self, error = 1e-12):
lp_problem = p.LpProblem('best-Vpi', p.LpMinimize)
#create problem variables
V = p.LpVariable.dicts("V",range(self.S), cat = "Continuous")
#objective function
for i in range(self.S):
self.v.append(V[i])
lp_problem += p.lpSum(self.v)
#constraints
for s in range(self.S):
for a in range(self.A):
pv = p.LpAffineExpression([(V[x],self.T[s][a][x]) for x in range(self.S)])
constraint = p.lpSum([self.PR[s][a], self.gamma*pv ])
lp_problem += V[s] >= constraint
status = lp_problem.solve(p.PULP_CBC_CMD(msg = 0)) #solve
I can't seem to be able to optimise it further..
I even tried multiprocessing, but it gave a lot of errors-
def __addconstraints(self, S0, S1, lp_problem):
for s in range(S0, S1):
for a in range(self.A):
pv= p.lpDot(self.T[s][a],self.v)
lp_problem += self.v[s] >= p.lpSum([self.PR[s][a], self.gamma*pv])
..................
#in linearProgram
if self.S%4:
s0, s1 = 0, self.S//3
else:
s0, s1 = 0, self.S//4
incr = s1
processes = []
for x in range(4):
proc = multiprocessing.Process(target=self.__addconstraints, args=(s0, s1, lp_problem))
processes.append(proc)
proc.start()
s0 = s1
s1 = min(s1+incr, self.S)
for proc in processes:
proc.join()
hard code for episodic? no need (due to initialization of mdp)
if self.mdptype=="episodic":
for state in self.end:
lp_problem += V[state] == 0
I am new to both pulp and multiprocessing, so I don't really have an idea what I'm doing :p
Any kind of help is appreciated.
In your code, you first build a p.LpAffineExpression, then you apply a p.lpSum and finally you do a third operation on the result V[s] >= constraint. The two last operations may increase the time because the expression is being copied each time.
From my experience, the fastest times I've gotten are doing the following:
# _vars_tup is a list of (key, value) pairs where each key is a variable and each value is a coefficient.
# it's like initializing a dictionary.
# CONSTANT is a python number (not a pulp variable)
model += p.LpAffineExpression(_vars_tup, constant=CONSTANT) >= 0
The idea is to reduce the number of times you do operations with p.LpAffineExpression objects, because a copy is done at each operation. So, build the list of variables and coefficients (_vars_tup) for ALL the variables present in the constraint and then at the last step create the p.LpAffineExpression and compare it with a constant.
An equivalent way would be (although I haven't tried it):
const = p.LpConstraint(e=p.LpAffineExpression(_vars_tup, constant=_constant), sense = p.LpConstraintGE, rhs = -CONSTANT)
model.addConstraint(other)

Solver CBC_MIXED_INTEGER_PROGRAMMING is not reaching the optimal result

Problem
I'm implementing a generalized assignment problem using LINGO (in which I have experience to model mathematical problems) and Or-tools, but results were different.
Brief explanation of my assignment problem
I have a set of houses (called 'object' in the model) that need to be build. Each house needs a set of resources. To supply these resources, there are 3 suppliers. The resource cost varies by supplier.
The model should assign those suppliers to the houses in order to minimize the total cost of assignments.
Model
Parameters
resource_cost_per_supplier[i,j]: cost of resource i of supplier j.
resource_cost_factor_per_object[i,j]: matrix that signals the resources demanded by the objects (cost factor > 0). In addition, it contains the cost factor of resource i demanded by object j. This factor is calculated based on the duration of use of the resource during the construction of the object and also in others contractual factors.
supplier_budget_limit[j]: supplier budget limit of supplier j. Each supplier has a budget limit that should not be exceded (it's in the contract).
supplier_budget_tolerance_margin_limit[j]: supplier budget tolerance margin limit of supplier j. To the model works, I had to create this tolerance margin, that is applied in the supplier budget limit to create an acceptable range of supplier cost.
object_demand_attended_per_supplier[i,j]: binary matrix that signals if the supplier i has all the resources required by object j.
Variables
x[i,j]: binary variable that indicate if the supplier i will be (1) or not (0) assigned to the object j.
supplier_cost[j]: variable that represents the cost of supplier j in the market share. Its value is given by:
total_cost: variable that represents the total cost of market share. Its value is given by:
Objective function
min Z = total_cost
Constraints
1 - Ensure that each object j will have only one supplier i.
2 - For each supplier i, the sum of the cost of all your assignments must be greater than or equal to your budget limit minus the tolerance margin.
3 - For each supplier j, the sum of the cost of all your assignments must be less than or equal to your budget limit plus the tolerance margin.
4 - Ensure that a supplier i will not assigned to an object j if the supplier i cannot provide all the resources of object j.
5 - Ensure that variable x is binary for every supplier i and object j.
Code
Or-tools (Python)
from __future__ import print_function
from ortools.linear_solver import pywraplp
import pandas as pd
import numpy
###### [START] parameters ######
num_objects = 252 #Number of objects
num_resources = 35 #Number of resources (not every object will use all resources. It depends of the type of the object and other things)
num_suppliers = 3 #Number of suppliers
resource_cost_per_supplier = pd.read_csv('https://raw.githubusercontent.com/hrassis/divisao-mercado/master/input_prototype/resource_cost_per_supplier.csv', index_col = 0).to_numpy()
resource_cost_factor_per_object = pd.read_csv('https://raw.githubusercontent.com/hrassis/divisao-mercado/master/input_prototype/resource_cost_factor_per_object.csv', index_col = 0).to_numpy()
object_demand_attended_per_supplier = pd.read_csv('https://raw.githubusercontent.com/hrassis/divisao-mercado/master/input_prototype/object_demand_attended_per_supplier.csv', index_col = 0).to_numpy()
supplier_budget_limit = pd.read_csv('https://raw.githubusercontent.com/hrassis/divisao-mercado/master/input_prototype/supplier_budget_limit.csv', index_col = 0)['budget_limit'].values
supplier_budget_tolerance_margin_limit = pd.read_csv('https://raw.githubusercontent.com/hrassis/divisao-mercado/master/input_prototype/supplier_budget_tolerance_margin_limit.csv', index_col = 0)['tolerance_margin'].values
###### [END] parameters ######
###### [START] variables ######
#Assignment variable
x = {}
supplier_cost = []
#Total cost of market share
total_cost = 0
###### [END] variables ######
def main():
#Declare the solver
solver = pywraplp.Solver('GeneralizedAssignmentProblem', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
#Assignment variable
#x = {}
#Ensure that the assignment variable is binary
for i in range(num_suppliers):
for j in range(num_objects):
x[i, j] = solver.BoolVar('x[%i,%i]' % (i,j))
#Assigning an expression to each supplier_cost element
for j in range(num_suppliers):
supplier_cost.append(solver.Sum(solver.Sum(resource_cost_per_supplier[i,j] * resource_cost_factor_per_object[i,k] * x[j,k] for k in range(num_objects)) for i in range(num_resources)))
#Total cost of market share
total_cost = solver.Sum(supplier_cost[j] for j in range(num_suppliers))
#Objective function
solver.Minimize(total_cost)
###### [START] constraints ######
# 1 - Ensure that each object will have only one supplier
for j in range(num_objects):
solver.Add(solver.Sum([x[i,j] for i in range(num_suppliers)]) == 1)
# 2 - For each supplier j, the sum of the cost of all your allocations must be greater than or equal to your budget limit minus the tolerance margin
for j in range(num_suppliers):
solver.Add(supplier_cost[j] >= total_cost * (supplier_budget_limit[j] - supplier_budget_tolerance_margin_limit[j]))
# 3 - For each supplier j, the sum of the cost of all your allocations must be less than or equal to your budget limit plus the tolerance margin
for j in range(num_suppliers):
solver.Add(supplier_cost[j] <= total_cost * (supplier_budget_limit[j] + supplier_budget_tolerance_margin_limit[j]))
# 4 - Ensure that a supplier i will not assigned to an object j if the supplier i can not supply all resources demanded by object j
for i in range(num_suppliers):
for j in range(num_objects):
solver.Add(x[i,j] - object_demand_attended_per_supplier[i,j] <= 0)
###### [END] constraints ######
solution = solver.Solve()
#Print the result
if solution == pywraplp.Solver.OPTIMAL:
print('------- Solution -------')
print('Total cost =', round(total_cost.solution_value(), 2))
for i in range(num_suppliers):
print('-----')
print('Supplier', i)
print('-> cost:', round(supplier_cost[i].solution_value(), 2))
print('-> cost percentage:', format(supplier_cost[i].solution_value()/total_cost.solution_value(),'.2%'))
print('-> supplier budget limit:', format(supplier_budget_limit[i], '.0%'))
print('-> supplier budget tolerance margin limit:', format(supplier_budget_tolerance_margin_limit[i], '.0%'))
print('-> acceptable range: {0} <= cost percentage <= {1}'.format(format(supplier_budget_limit[i] - supplier_budget_tolerance_margin_limit[i], '.0%'), format(supplier_budget_limit[i] + supplier_budget_tolerance_margin_limit[i], '.0%')))
# print('-> objects: {0}'.format(i))
else:
print('The problem does not have an optimal solution.')
#Generate a result to consult
assignment_result = pd.DataFrame(columns=['object','supplier','cost','assigned'])
for i in range(num_suppliers):
for j in range(num_objects):
assignment_result = assignment_result.append({'object': j, 'supplier': i, 'cost': get_object_cost(j, i), 'assigned': x[i, j].solution_value()}, ignore_index=True)
assignment_result.to_excel('assignment_result.xlsx')
def get_object_cost(object_index, supplier_index):
object_cost = 0.0
for i in range(num_resources):
object_cost = object_cost + resource_cost_factor_per_object[i,object_index] * resource_cost_per_supplier[i,supplier_index]
return object_cost
#Run main
main()
LINGO
model:
title: LINGO;
data:
!Number of objects;
num_objects = #OLE('LINGO_input.xlsx',num_objects);
!Number of resources (not every object will use all resources. It depends of the type of the object and other things);
num_resources = #OLE('LINGO_input.xlsx',num_resources);
!Number of suppliers;
num_suppliers = #OLE('LINGO_input.xlsx',num_suppliers);
enddata
sets:
suppliers/1..num_suppliers/:supplier_budget_limit,supplier_tolerance_margin_limit,supplier_cost;
resources/1..num_resources/:;
objects/1..num_objects/:;
resources_suppliers(resources,suppliers):resource_cost_per_supplier;
resources_objects(resources,objects):resource_cost_factor_per_object;
suppliers_objects(suppliers,objects):x,object_demand_attended_supplier;
endsets
data:
resource_cost_per_supplier = #OLE('LINGO_input.xlsx',resource_cost_per_supplier[cost]);
resource_cost_factor_per_object = #OLE('LINGO_input.xlsx',resource_cost_factor_per_object[cost_factor]);
supplier_budget_limit = #OLE('LINGO_input.xlsx',supplier_budget_limit[budget_limit_percentage]);
supplier_tolerance_margin_limit = #OLE('LINGO_input.xlsx',supplier_budget_tolerance_margin_limit[budget_tolerance_percentage]);
object_demand_attended_supplier = #OLE('LINGO_input.xlsx',object_demand_attended_per_supplier[supply_all_resources]);
enddata
!The array 'supplier_cost' was created to store the total cost of each supplier;
#FOR(suppliers(j):supplier_cost(j)= #SUM(resources(i):#SUM(objects(k):resource_cost_per_supplier(i,j)*resource_cost_factor_per_object(i,k)*x(j,k))));
!Total cost of market share;
total_cost = #SUM(suppliers(i):supplier_cost(i));
!Objective function;
min = total_cost;
!Ensure that each object will have only one supplier;
#FOR(objects(j):#SUM(suppliers(i):x(i,j))=1);
!For each supplier j, the sum of the cost of all your assignments must be greater than or equal to your budget limit minus the tolerance margin;
#FOR(suppliers(j):supplier_cost(j) >= total_cost*(supplier_budget_limit(j)-supplier_tolerance_margin_limit(j)));
!For each supplier j, the sum of the cost of all your assignments must be less than or equal to your budget limit plus the tolerance margin;
#FOR(suppliers(j):supplier_cost(j) <= total_cost*(supplier_budget_limit(j)+supplier_tolerance_margin_limit(j)));
!Ensure that a supplier j will not assigned to an object k if the supplier j can not supply all resources demanded by object k;
#FOR(suppliers(j):#FOR(objects(k):x(j,k)-object_demand_attended_supplier(j,k)<=0));
!Ensure that the assignment variable is binary;
#FOR(suppliers(i):#FOR(objects(j):#BIN(x(i,j))));
data:
#OLE('LINGO_input.xlsx',output[assigned])=x;
#OLE('LINGO_input.xlsx',objective_function_value)=total_cost;
#OLE('LINGO_input.xlsx',supplier_cost)=supplier_cost;
enddata
Results
The picture below shows the comparative result between Or-Tools and LINGO. I emphasize that the data used by the two implementations were exactly the same and I checked all the data several times.
Note that there is a difference of 1.876,20 between the two implementations. LINGO, that uses a Branch and Bound algorithm, found a better solution than Or-Tools. The difference is caused by the assignments inconsistencies shown below.
Regarding the processing time of the algorithms, LINGO took around 14 min and Or-Tools less than 1 min.
All the data used in the two implementations are in this repository: https://github.com/hrassis/divisao-mercado. Data used by LINGO is in folder input_lingo and used by Or-Tools is in the folder input_prototype. In addition I uploaded the validation report.
After "cheating" a bit:
solver.Add(x[1, 177] == 1)
solver.Add(x[0, 186] == 1)
solver.Add(x[0, 205] == 1)
solver.Add(x[2, 206] == 1)
solver.Add(x[2, 217] == 1)
solver.Add(x[2, 66] == 1)
solver.Add(x[2, 115] == 1)
solver.Add(x[1, 237] == 1)
The solver returns a better objective, so I believe there is a bug either on the CBC binary or the OR-Tools interface to it (sounds like the former).
Can you try using the CP-SAT solver?
There have been quite a few problems with CBC
https://github.com/google/or-tools/issues/1450
https://github.com/google/or-tools/issues/1525

Select the same item several times in the knapsack problem [pulp]

I'm doing a coursera' discrete optimization course
which, in the course a tool called Minizinc is used to solve the problems.
I want to translate class examples to python, starting for this one:
I'm using this example code reproduce the results:
v = {'hammer':6, 'wrench':10, 'screwdriver':8, 'towel':40}
w = {'hammer':13, 'wrench':21, 'screwdriver':17, 'towel':100}
q = {'hammer':1000, 'wrench':400, 'screwdriver':500, 'towel':150}
limit = 1000
items = list(sorted(v.keys()))
# Create model
m = LpProblem("Knapsack", LpMaximize)
# Variables
x = LpVariable.dicts('x', items, lowBound=0, upBound=1, cat=LpInteger)
# Objective
m += sum(v[i]*x[i] for i in items)
# Constraint
m += sum(w[i]*x[i] for i in items) <= limit
# Optimize
m.solve()
# Print the status of the solved LP
print("Status = %s" % LpStatus[m.status])
# Print the value of the variables at the optimum
for i in items:
print("%s = %f" % (x[i].name, x[i].varValue))
# Print the value of the objective
print("Objective = %f" % value(m.objective))
But this is giving a wrong answer since is only taken one of a kind.
How can I add the amount available for each item (dict q) into the constraints?
You need to make two very small changes to your code. Firstly you need to remove the upper bound you have set on your x variables. At the moments you have binary variables x[i] which can be only one or zero.
Secondly you need to add in the constraints which effectively set a custom upper bound for each of the items. Working code and resulting solution below - as you can see multiple wrenches (the highest v/w ratio) are chosen, with a single hammer to fill up the small amount of space left.
from pulp import *
v = {'hammer':6, 'wrench':10, 'screwdriver':8, 'towel':40}
w = {'hammer':13, 'wrench':21, 'screwdriver':17, 'towel':100}
q = {'hammer':1000, 'wrench':400, 'screwdriver':500, 'towel':150}
limit = 1000
items = list(sorted(v.keys()))
# Create model
m = LpProblem("Knapsack", LpMaximize)
# Variables
x = LpVariable.dicts('x', items, lowBound=0, cat=LpInteger)
# Objective
m += sum(v[i]*x[i] for i in items)
# Constraint
m += sum(w[i]*x[i] for i in items) <= limit
# Quantity of each constraint:
for i in items:
m += x[i] <= q[i]
# Optimize
m.solve()
# Print the status of the solved LP
print("Status = %s" % LpStatus[m.status])
# Print the value of the variables at the optimum
for i in items:
print("%s = %f" % (x[i].name, x[i].varValue))
# Print the value of the objective
print("Objective = %f" % value(m.objective))
print("Total weight = %f" % sum([x[i].varValue*w[i] for i in items]))
Which returns:
Status = Optimal
x_hammer = 1.000000
x_screwdriver = 0.000000
x_towel = 0.000000
x_wrench = 47.000000
Objective = 476.000000
Total weight = 1000.000000

Can I parallelize this nested for loop? (Python 3.5)

I recently found out that the bottleneck of my code is the following block. N is of order 10,000, and L (10,000)^2. RQ_func is just a function that takes indices (tuples) and returns float V and dictionary sp_dist of {index : probability} format.
Is there a way I can parallelize this code? I have access to cluster computing from which I can use up to 20 cores at a time and would like to use the option.
R = np.empty((L,))
Q = scipy.sparse.lil_matrix((L, N))
traverser = 0 # Populate R and Q by traversing the array
for s_index in state_indices:
for a_index in action_indices:
V, sp_dist = RQ_func(s_index, a_index)
R[traverser] = V
for sp_index, prob in sp_dist.items():
Q[traverser, sp_index] = prob
traverser += 1

Categories

Resources