I have a list of asset returns.
Re=[0.5346,0.5064,1.0838,0.7665,0.9463,0.7047,0.6735,0.5294,0.7697,0.7299,0.99,1.0856,0.9052,0.3827,0.3804,1.0271,0.9431,0.538,0.9313,0.9423]
I want to maximize the following objective:
$Re(w)=\sum_{i=1}^{n}w_{i}Re_{i}$
and constraints are:
(i) Full utilization of capital:
$\sum_{k=1}^{n}w_{k}v_{k}=1$
where v_{k} is a binary variable, which is 1 if any of asset k is held, and 0 otherwise.
(ii) Cardinality constraint:
$\sum_{k=1}^{n}v_{k}=q$
where q ∈ [q1, q2] is the desired number of assets in the portfolio.
(iii) No short selling constraint:
$w_{k}\ge 0$ for k=1,2,...,n.
(iv) Lower and upper bounds defining the proportion of the
capital that can be invested in a single asset. For avoiding very
small investments in several assets and at the same time to
maintain sufficient diversification of the funds, the bounds of
investment in individual assets are specified as:
$l_{k}v_{k} \le w_{k} \le u_{k}v_{k}$ for k=1,2,...,n.
In this example, it is assumed that the desired number of assets in the portfolio specified by investors is between 7 and 10, i.e.
7 ≤ q ≤ 10. The lower and upper bounds to be invested in
each asset k are set as l_{k} = 0.01 and u_{k} = 0.3, respectively.
I tried to solve this problem as follows:
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
from pyomo.environ import Var, NonNegativeReals
# Defining the model
model=pyo.ConcreteModel()
# set
model.i=pyo.Set(initialize=['a1','a2','a3','a4','a5','a6','a7','a8','a9','a10','a11','a12','a13','a14','a15','a16','a17','a18','a19','a20'])
#parameters
model.Re=pyo.Param(model.i, initialize={'a1':0.5346,'a2':0.5064,'a3':1.0838,'a4':0.7665,'a5':0.9463,'a6':0.7047,'a7':0.6735,'a8':0.5294,'a9':0.7697,'a10':0.7299,'a11':0.99,'a12':1.0856,'a13':0.9052,'a14':0.3827,'a15':0.3804,'a16':1.0271,'a17':0.9431,'a18':0.538,'a19':0.9313,'a20':0.9423})
re=model.Re
# Decision variable
model.w=pyo.Var(model.i, within=NonNegativeReals)
w=model.w
model.v=pyo.Var(model.i, domain=pyo.Binary)
v=model.v
model.q=pyo.Var(model.i, domain=pyo.Integers, bounds=(7,10))
q=model.q
# Objective Function
def Objective_rule(model,i):
return sum(re[i]*w[i] for i in model.i)
model.Obj=pyo.Objective(rule=Objective_rule, sense=pyo.maximize)
# Constraints
def Constraint1(model,i):
return sum(w[i]*v[i] for i in model.i)==1
model.Const1=pyo.Constraint(model.i, rule=Constraint1 )
def Constraint2(model,i):
return sum(v[i] for i in model.i)==7
model.Const2=pyo.Constraint(model.i, rule=Constraint2 )
def Constraint3(model,i):
return (.01<=w[i]<=.3 for i in model.i)
model.Const3=pyo.Constraint(model.i, rule=Constraint3 )
#results
Solver=SolverFactory('cplex_direct')
results=Solver.solve(model)
print(results)
But this code doesn't work! I tried a lot to solve this problem, but unfortunately I could not.Can anyone help me?
Related
Suppose I have 2 pulp variables: x1 and x2. These two variables represent water temperatures inside two different water pipes. These two pipes, at a certain point, merges into one single pipe and the two water flows mixes together. The water temperature after the mixing is equal to the average of the two temperatures because the flow rate is the same.
If the flow of one water pipe is zero, there is no mixing and the output temperature is equal to the temperature of the non-zero flow water temperature.
This final water temperature is then used into the objective function of the pulp problem to calculate some cost.
This means that I have to calculate the average of these two variables but each variable has to be considered in the calculation of the average only if it is greater than 0.
Here is an example you can reproduce to calculate the average without the condition of >0.
from pulp import *
# Define the variables
x1 = LpVariable("x1", 0, None)
x2 = LpVariable("x2", 0, None)
avg = LpVariable("avg",0,None)
# Define the problem
prob = LpProblem("average_problem", LpMinimize)
# Define the objective function
prob += 0, "objective function"
# Calculate avg value
prob += avg==(x1+x2)/2, "average_constraint"
# Set x1 and x2 value just as example
prob += x1==100
prob += x2==50
cost_of_engine = (105-avg)*3/0.2
total_production_cost = lpSum(cost_of_engine+10)
prob.setObjective(total_production_cost)
# Solve the problem
prob.solve()
This example works if x1 and x2 are both higher than zero.
However, if for instance x1=0 and x2=100, then avg=50.
What I need, instead, is to discard the x1 variable from the calculation of the average so that avg=100.
This is clearly a non-linear problem because the denominator of the calculation of the average is dynamic and depends on the value of the variable x1 and x2.
Do you have any idea how to solve this problem? Maybe using the Big M technique?
There are several approaches that might be reasonable, depending on the characteristics of your problem that are not described.... As noted, if you are trying to minimize an average in the objective and both the numerator and the denominator are variables, the resultant expression is non-linear and you'll need to consider a substitute objective or move outside of pulp and look at non-linear formulations and non-linear solvers.
Idea #1: Use a penalty for the number of items used.
You can introduce (and properly constrain with a big-M constraint) a new variable y_i ∈ {0, 1} that is 1 if x_i is used and some logical weight w and use an objective like:
obj = ∑ x_i + w * ∑ y_i ; minimized
which might work OK if the x_i are in a range such that a logical w can be generated.
Or...
Idea #2: Use a mini-max or maxi-min constraint
If you are seeking an aggregate total of the x_i used, while minimizing the number that are used and there is some "trade space" in the model, you can set the objective to "maximize the minimum used x_i value", which might work, again, depending on the other characteristics of your model. This should have similar effect by encouraging the model to pick larger x_i to make the target value. In that case, you can... In pseudocode:
Introduce y and z ...
y_i ∈ {0, 1}
x_i <= y_i * M
z ∈ Reals
z <= x_i + (1-y_i)*M # constrain z to the lowest x_i used...
obj = max(z)
I'm looking to write a set of code that allows me to set risk budget constraints to individual positions in a portfolio, i.e. each position to contribute a set amount of risk to the portfolio, and I'm looking to do it specifically in CVXPY as I have noticed sometimes SCIPY breaks the constraints.
I have the below code, I was wondering if you would be able to provide me with some direction as I have encountered cvxpy "The objective is not DCP" error and I'm not sure how to correct it.
Please also let me know if I should rephrase my question to make it clearer.
import cvxpy as cp
import numpy as np
# covmat is a (31,31) numpy array, covariance matrix calculated from monthly returns
risk_budget = np.repeat(1/31, 31).reshape(-1,1)
def risk_budget_objective(risk_budget, covmat):
n = covmat.shape[0]
# set equal weight
equal_wts = np.repeat(1 / n, n)
# weights vertical
wts = cp.Variable((n,1))
constraints = [cp.sum(wts) == 1.0] # weight constraints
port_variance = cp.square(cp.quad_form(wts, covmat)) # portfolio variance, not volatility
mrc = covmat # wts * 12 # vector # marginal risk contribution annualised
risk_contrib = cp.multiply(mrc, wts) / port_variance # calculate risk contribution
mean_square_diff = cp.sum(cp.square(risk_contrib - risk_budget)) # squared difference and summed
prob = cp.Problem(cp.Minimize(mean_square_diff), constraints) # minimise squared difference
prob.solve(solver=cp.SCS)
if problem.status not in ["infeasible", "unbounded"]:
solution = wts.value
return solution
else:
print('Problem not feasible... resorting to equal weight...')
return equal_wts
Looking into the objects, it seems that the problem lies in the following code:
risk_contrib = cp.multiply(mrc, wts) / port_variance
As the curvature is "UNKOWN" rather than convex.
I am trying to allocate customers Ci to financial advisers Pj. Each customer has a policy value xi. I'm assuming that the number of customers (n) allocated to each adviser is the same, and that the same customer cannot be assigned to multiple advisers. Therefore each partner will have an allocation of policy values like so:
P1=[x1,x2,x3] , P2=[x4,x5,x6], P3=[x7,x8,x9]
I am trying to find the optimal allocation to minimise dispersion in fund value between the advisers. I am defining dispersion as the difference between the adviser with the highest fund value (z_max) and the lowest fund value (z_min).
The formulation for this problem is therefore:
where yij=1 if we allocate customer Ci to adviser Pj, 0 otherwise
The first constraint says that zmax has to be greater than or equal to each policy value; since the objective function encourages smaller values of zmax, this means that zmax will equal the largest policy value. Similarly, the second constraint sets zmin equal to the smallest policy value. The third constraint says that each customer must be assigned to exactly one adviser. The fourth says that each adviser must have n customers assigned to him/her.
I have a working solution using the optimization package: PuLP that finds the optimal allocation.
import random
import pulp
import time
# DATA
n = 5 # number of customers for each financial adviser
c = 25 # number of customers
p = 5 # number of financial adviser
policy_values = random.sample(range(1, 1000000), c) # random generated policy values
# INDEXES
set_I = range(c)
set_J = range(p)
set_N = range(n)
x = {i: policy_values[i] for i in set_I} #customer policy values
y = {(i,j): random.randint(0, 1) for i in set_I for j in set_J} # allocation dummies
# DECISION VARIABLES
model = pulp.LpProblem("Allocation Model", pulp.LpMinimize)
y_sum = {}
y_vars = pulp.LpVariable.dicts('y_vars',((i,j) for i in set_I for j in set_J), lowBound=0, upBound = 1, cat=pulp.LpInteger)
z_max = pulp.LpVariable("Max Policy Value")
z_min = pulp.LpVariable("Min Policy Value")
for j in set_J:
y_sum[j] = pulp.lpSum([y_vars[i,j] * x[i] for i in set_I])
# OBJECTIVE FUNCTION
model += z_max - z_min
# CONSTRAINTS
for j in set_J:
model += pulp.lpSum([y_vars[i,j] for i in set_I]) == n
model += y_sum[j] <= z_max
model += y_sum[j] >= z_min
for i in set_I:
model += pulp.lpSum([y_vars[i,j] for j in set_J]) == 1
# SOLVE MODEL
start = time.clock()
model.solve()
print('Optimised model status: '+str(pulp.LpStatus[model.status]))
print('Time elapsed: '+str(time.clock() - start))
Note that I have implemented constraints 1 and 2 slightly differently by including an additional variable y_sum to prevent duplicating an expression with a large number of nonzero elements
The problem
The issue is that for larger values of n,p and c the model takes far too long to optimise. Is it possible to make any changes to how I've implemented the objective function/constraints to make the solution faster?
Try a using a commercial solver like Gurobi with pulp. You should get a substantial decrease in solve time.
Also check your computers memory, if any solver runs out of memory and starts paging to disk the solve time will be very long.
You should monitor the time needed for each part of the program (model declaration and solving)
If the solving is too long, you can use a different solver as suggested above (here is some clue how to do it : https://coin-or.github.io/pulp/guides/how_to_configure_solvers.html).
If the model declaration is too long, you may have to optimise your code (try to use the pulp enabled fuctions as pulp.lpSum rather than python sum for example). You can also fidn some tricks here https://groups.google.com/g/pulp-or-discuss/c/p1N2fkVtYyM and here https://github.com/IBMDecisionOptimization/docplex-examples/blob/master/examples/mp/jupyter/efficient.ipynb
I am trying to solve a linear problem in PuLP that minimizes a cost function. The cost function is itself a function of the maximum value of the cost function, e.g., I have a daily cost, and I am trying to minimize the monthly cost, which is the sum of the daily cost plus the maximum daily cost in the month. I don't think I'm capturing the maximum value of the function in the final solution, and I'm not sure how to go about troubleshooting this issue. The basic outline of the code is below:
# Initialize the problem to be solved
prob = LpProblem("monthly_cost", LpMinimize)
# The number of time steps
# price is a pre-existing array of variable prices
tmax = len(price)
# Time range
time = list(range(tmax))
# Price reduction at every time step
d = LpVariable.dict("d", (time), 0, 5)
# Price increase at every time step
c = LpVariable.dict("c", (time), 0, 5)
# Define revenues = price increase - price reduction + initial price
revenue = ([(c[t] - d[t] + price[t]) for t in time])
# Find maximum revenue
max_revenue = max(revenue)
# Initialize the problem
prob += sum([revenue[t]*0.0245 for t in time]) + max_revenue
# Solve the problem
prob.solve()
The variable max_revenue always equals c_0 - d_0 + price[0] even though price[0] is not the maximum of price and c_0 and d_0 both equal 0. Does anyone know how to ensure the dynamic maximum is being inserted into the problem? Thanks!
I don't think you can do the following in PuLP or any other standard LP solvers:
max_revenue = max(revenue)
This is because determining the maximum will require the solver to evaluate revenue equations; so in this case, I don't think you can extract a standard LP model. Such models are in fact non-smooth.
In such situations, you can easily reformulate the problem as follows:
max_revenue >= revenue = ([(c[t] - d[t] + price[t]) for t in time])
This works, as for any value of revenue: max_revenue >= revenue. This in turn helps in extracting a standard LP model from the equations. Hence, the original problem formulation gets extended with additional inequality constraints (the equality constraints and the objective functions should be the same as before). So it could look something like this (word of caution: I have not tested this):
# Define variable
max_revenue = LpVariable("Max Revenue", 0)
# Define other variables, revenues, etc.
# Add the inequality constraints
for item in revenue:
prob += max_revenue >= item
I would also suggest that you have a look at scipy.optimize.linprog. PuLP writes the model in an intermediary file, and then calls installed solver to solve the model. On the other hand, in scipy.optimize.linprog it's all done in python and should be faster. However, if your problem can not be solved using simplex algorithm, or you require other professional solvers (e.g. CPlex, Gurobi, etc.) then PuLP is a good choice.
Also, see the discussion on Data Fitting (page 19) in Introduction to Linear Optimisation by Bertsimas.
Hope this helps. Cheers.
I am looking for a Python function (or to write my own if there is not one) to get the t-statistic in order to use in a confidence interval calculation.
I have found tables that give answers for various probabilities / degrees of freedom like this one, but I would like to be able to calculate this for any given probability. For anyone not already familiar with this degrees of freedom is the number of data points (n) in your sample -1 and the numbers for the column headings at the top are probabilities (p) e.g. a 2 tailed significance level of 0.05 is used if you are looking up the t-score to use in the calculation for 95% confidence that if you repeated n tests the result would fall within the mean +/- the confidence interval.
I have looked into using various functions within scipy.stats, but none that I can see seem to allow for the simple inputs I described above.
Excel has a simple implementation of this e.g. to get the t-score for a sample of 1000, where I need to be 95% confident I would use: =TINV(0.05,999) and get the score ~1.96
Here is the code that I have used to implement confidence intervals so far, as you can see I am using a very crude way of getting the t-score at present (just allowing a few values for perc_conf and warning that it is not accurate for samples < 1000):
# -*- coding: utf-8 -*-
from __future__ import division
import math
def mean(lst):
# μ = 1/N Σ(xi)
return sum(lst) / float(len(lst))
def variance(lst):
"""
Uses standard variance formula (sum of each (data point - mean) squared)
all divided by number of data points
"""
# σ² = 1/N Σ((xi-μ)²)
mu = mean(lst)
return 1.0/len(lst) * sum([(i-mu)**2 for i in lst])
def conf_int(lst, perc_conf=95):
"""
Confidence interval - given a list of values compute the square root of
the variance of the list (v) divided by the number of entries (n)
multiplied by a constant factor of (c). This means that I can
be confident of a result +/- this amount from the mean.
The constant factor can be looked up from a table, for 95% confidence
on a reasonable size sample (>=500) 1.96 is used.
"""
if perc_conf == 95:
c = 1.96
elif perc_conf == 90:
c = 1.64
elif perc_conf == 99:
c = 2.58
else:
c = 1.96
print 'Only 90, 95 or 99 % are allowed for, using default 95%'
n, v = len(lst), variance(lst)
if n < 1000:
print 'WARNING: constant factor may not be accurate for n < ~1000'
return math.sqrt(v/n) * c
Here is an example call for the above code:
# Example: 1000 coin tosses on a fair coin. What is the range that I can be 95%
# confident the result will f all within.
# list of 1000 perfectly distributed...
perc_conf_req = 95
n, p = 1000, 0.5 # sample_size, probability of heads for each coin
l = [0 for i in range(int(n*(1-p)))] + [1 for j in range(int(n*p))]
exp_heads = mean(l) * len(l)
c_int = conf_int(l, perc_conf_req)
print 'I can be '+str(perc_conf_req)+'% confident that the result of '+str(n)+ \
' coin flips will be within +/- '+str(round(c_int*100,2))+'% of '+\
str(int(exp_heads))
x = round(n*c_int,0)
print 'i.e. between '+str(int(exp_heads-x))+' and '+str(int(exp_heads+x))+\
' heads (assuming a probability of '+str(p)+' for each flip).'
The output for this is:
I can be 95% confident that the result of 1000 coin flips will be
within +/- 3.1% of 500 i.e. between 469 and 531 heads (assuming a
probability of 0.5 for each flip).
I also looked into calculating the t-distribution for a range and then returning the t-score that got the probability closest to that required, but I had issues implementing the formula. Let me know if this is relevant and you want to see the code, but I have assumed not as there is probably an easier way.
Have you tried scipy?
You will need to installl the scipy library...more about installing it here: http://www.scipy.org/install.html
Once installed, you can replicate the Excel functionality like such:
from scipy import stats
#Studnt, n=999, p<0.05, 2-tail
#equivalent to Excel TINV(0.05,999)
print stats.t.ppf(1-0.025, 999)
#Studnt, n=999, p<0.05%, Single tail
#equivalent to Excel TINV(2*0.05,999)
print stats.t.ppf(1-0.05, 999)
You can also read about installing the library here: how to install scipy for python?
Try the following code:
from scipy import stats
#Studnt, n=22, 2-tail
#stats.t.ppf(1-0.025, df)
# df=n-1=22-1=21
print (stats.t.ppf(1-0.025, 21))
scipy.stats.t has another method isf that directly returns the quantile that corresponds to the upper tail probability alpha. This is an implementation of the inverse survival function and returns the exact same value as t.ppf(1-alpha, dof).
from scipy import stats
alpha, dof = 0.05, 999
stats.t.isf(alpha, dof)
# 1.6463803454275356
For two-tailed, halve alpha:
stats.t.isf(alpha/2, dof)
# 1.962341461133449
You can try this code:
# for small samples (<50) we use t-statistics
# n = 9, degree of freedom = 9-1 = 8
# for 99% confidence interval, alpha = 1% = 0.01 and alpha/2 = 0.005
from scipy import stats
ci = 99
n = 9
t = stats.t.ppf(1- ((100-ci)/2/100), n-1) # 99% CI, t8,0.005
print(t) # 3.36