Cvxpy portfolio optimization with constraint on the maximum number of assets - python

I'm using cvxpy library to perform Portfolio Optimization.
However, instead of using the Markowitz covariance model, I would like to introduce new variables where yi variable is a binary variable that assumes value 1 if the asset i is included in the portfolio and 0 otherwise; m is the maximum number of assets I want to include in the portfolio; r is the return I want to get.
The Markowitz model, with constraint on the return is the following:
import numpy as np
import pandas as pd
from cvxpy import *
# assets names
tickers = ["AAA", "BBB", "CCC", "DDD", "EEE", "FFF"]
# return matrix
ret = pd.DataFrame(np.random.rand(1,6), columns = tickers)
# Variance_Coviariance matrix
covm = pd.DataFrame(np.random.rand(6,6), columns = tickers, index = tickers)
# problem setting
x = Variable(len(tickers)) # xi variables
er = np.asarray(ret.T) * x # expected return
min_ret = 0.2 # minimum return
risk = quad_form(x, np.asmatrix(covm)) # risk
prob = Problem(Minimize(risk), # problem setting function
[sum(x) == 1, er >= min_ret, x >= 0])
The solution of this problem gives out a percentage to invest in each asset. But what if I want to invest on a limited number of asset m?
In order to do that I need to implement yi variables and make sure that their sum is equal to m
Hence, it should be something like this:
x = Variable(n)
er = np.asarray(ret.T) * x
risk = quad_form(x, np.asmatrix(covm))
y = Variable(n, boolean=True) #adding boolean variables
prob = Problem(Minimize(risk), [sum(x) == 1, er >= min_ret, x >= 0, sum(y) == k, sum(x) <= sum(y)])
Unfortunately, this last chunk of code doesn't produce any result. Do you know why? Is there another method to solve this problem?

In short, you have to link the variables x and y.
In case of long only constraints:
eps = 1e-5
[-1 + eps <= x - y, x - y <= 0]
This will set y to 1 if x > 0 and y to 0 if x == 0.
To make it work properly and not to be bothered by assets being just marginally above 0, you should also introduce a buy-in threshold.
[x - y >= buy_in_threshold - 1]
Note, that this problem is a mixed integer problem.
The ECOS BB solver can deal with that, if the problem remains small. Otherwise, you will need a commercial grade optimizer.


Problem while looping variables for Scipy Optimization (SLSQP, COBYLA)

I have a constrained optimization problem where I am trying to minimize an objective function of 100+ variables which is of the form
Min F(x) = f(x1) + f(x2) + ... + f(xn)
Subject to functional constraint
(g(x1) + g(x2) + ... + g(xn))/(f(x1) + f(x2) + ... + f(xn)) - constant >= 0
I also have individual bounds for each variable x1, x2, x3...xn
a <= x1 <= b
c <= x2 <= d
For this, I wrote a python script, using the scipy.optimize.minimize implementation with constraints and bounds, but I am unable to fulfill my bounds and constraints in the solutions. These are all cases where optimization could converge to a solution (message: success)
Here is a sample of my code:
df is my pandas dataset
B(x) is LogNorm transform based on x and other constants
Values U, c, lb, ub are pre-calculated constant dictionaries for each index in df
import scipy
df = pd.DataFrame(..)
k = set(df.index.values) ## list of indexes to iterate on
val = 0.25 ## Arbitrary
def obj(x):
fn = 0
for n,i in enumerate(k):
x0 = x[n]
fn1 = (U[i]) * B(x0) * (x0)
fn += fn1
return fn
def cons(x):
cn = 1
c1 = 0
c2 = 0
for n,i in enumerate(k):
x0 = x[n]
c1 += (U[i]) * (B(x0) * (x0 - c[i])
c2 += (U[i]) * (B(x0) * (x0)
cn = c1/(c2)
return cn - val
const = [{'type':'ineq', 'fun':cons}]
bnds = tuple((lb[i], ub[i]) for i in k) ## Lower, Upper for each element ((lb1, ub1), (lb2, ub2)...)
x_init = [lb[i] for i in k] ## for eg. starting from lower bound
## Solution
sol = scipy.optimize.minimize(obj, x_init, method = 'COBYLA', bounds = bnds, constraints = const)
I have more pointed questions if that helps:
Is there a way to construct the same equation concisely/ without the use of loops (given the number of variables could depend on input data and I have no control over it)?
Is there any noticeable issue in my application of bounds? I can't seem to get the final values of all variables follow individual bounds.
Similarly, is there a visible flaw in the construction on constraint equation? My results often DO NOT follow the constraints is repeated runs with different inputs.
Any help with either of the questions can help me progress further at work.
I have also looked into a Lagrangian solution of the same but so far I am unable to solve it for undefined number of (n) variables.

GLPK (python swiglpk) "Problem has no primal feasible solution" but ok with CVXPY

I'm trying to solve a simple optimization problem:
max x+y
s.t. -x <= -1
x,y in {0,1}^2
using following code
import swiglpk
import numpy as np
def solve_boolean_lp_swig(obj: np.ndarray, aub: np.ndarray, bub: np.ndarray, minimize: bool) -> tuple:
Solves following optimization problem
s.t <= bub
x \in {0, 1}
obj : m vector
aub : nxm matrix
bub : n vector
# init problem
ia = swiglpk.intArray(1+aub.size); ja = swiglpk.intArray(1+aub.size)
ar = swiglpk.doubleArray(1+aub.size)
lp = swiglpk.glp_create_prob()
# set obj to minimize if minimize==True else maximize
swiglpk.glp_set_obj_dir(lp, swiglpk.GLP_MIN if minimize else swiglpk.GLP_MAX)
# number of rows and columns as n, m
swiglpk.glp_add_rows(lp, int(aub.shape[0]))
swiglpk.glp_add_cols(lp, int(aub.shape[1]))
# setting row constraints (-inf < x <= bub[i])
for i, v in enumerate(bub):
swiglpk.glp_set_row_bnds(lp, i+1, swiglpk.GLP_UP, 0.0, float(v))
# setting column constraints (x in {0, 1})
for i in range(aub.shape[1]):
# not sure if this is needed but perhaps for presolving
swiglpk.glp_set_col_bnds(lp, i+1, swiglpk.GLP_FR, 0.0, 0.0)
# setting x in {0,1}
swiglpk.glp_set_col_kind(lp, i+1, swiglpk.GLP_BV)
# setting aub
for r, (i,j) in enumerate(np.argwhere(aub != 0)):
ia[r+1] = int(i)+1; ja[r+1] = int(j)+1; ar[r+1] = float(aub[i,j])
# solver settings
iocp = swiglpk.glp_iocp()
iocp.msg_lev = swiglpk.GLP_MSG_ALL
iocp.presolve = swiglpk.GLP_ON
iocp.binarize = swiglpk.GLP_ON
# setting objective
for i,v in enumerate(obj):
swiglpk.glp_set_obj_coef(lp, i+1, float(v))
swiglpk.glp_load_matrix(lp, r, ia, ja, ar)
info = swiglpk.glp_intopt(lp, iocp)
# use later
#status = swiglpk.glp_mip_status(lp)
x = np.array([swiglpk.glp_mip_col_val(lp, int(i+1)) for i in range(obj.shape[0])])
# for now, keep it simple. info == 0 means optimal
# solution (there are others telling feasible solution)
return (info == 0), x
and the following instance (as given on top)
obj = np.array([ 1, 1]),
aub = np.array([[-1, 0]]),
bub = np.array([-1]),
minimize = False
In my mind x=[1,0] should be a valid solution since dot([-1, 0], x) <= -1 (and [1,0] are boolean) holds but solver says PROBLEM HAS NO PRIMAL FEASIBLE SOLUTION. However, if i run the same problem instance using the lib CVXOPT instead, with cvxopt.glpk.ilp, the solver finds an optimal solution. I've seen the c-code underneath cvxopt and has done the same so I suspect something small that I cannot see..
Add to the model:
Then you'll see immediately what the problem is:
\* Problem: Unknown *\
obj: + z_1 + z_2
Subject To
r_1: 0 z_1 <= -1
0 <= z_1 <= 1
0 <= z_2 <= 1
I noticed that r=0, so the ne argument for the load call is already wrong. If you set r=1 things look better.
The constraints
x <= -1
x,y in {0,1}^2
are obviously infeasible. I suspect your code does not reflect the model.

How to vectorize hinge loss gradient computation

I'm computing thousands of gradients and would like to vectorize the computations in Python. The context is SVM and the loss function is Hinge Loss. Y is Mx1, X is MxN and w is Nx1.
L(w) = lam/2 * ||w||^2 + 1/m Sum i=1:m ( max(0, 1-y[i]X[i]w) )
The gradient of this is
grad = lam*w + 1/m Sum i=1:m {-y[i]X[i].T if y[i]*X[i]*w < 1, else 0}
Instead of looping through each element of the sum and evaluating the max function, is it possible to vectorize this? I want to use something like np.where like the following
grad = np.where(y* < 1,, 0)
This does not work because where the condition is true, -X.T*y is the wrong dimension.
edit: list comprehension version, would like to know if there's a cleaner or more optimal way
def grad(X,y,w,lam):
# cache y[i]*X[i].dot(w), each row of Xw is multiplied by a single element of y
yXw = y*
# cache y[i]*X[i], note each row of X is multiplied by a single element of y
yX = X*y[:,np.newaxis]
# return the average of this max function
return lam*w + np.mean( [-yX[i] if yXw[i] < 1 else 0 for i in range(len(y))] )
you have two vectors A and B, and you want to return array C, such that C[i] = A[i] if B[i] < 1 and 0 else, consequently all you need to do is
C := A * sign(max(0, 1-B)) # suprisingly similar to the original hinge loss, right?:)
if B < 1 then 1-B > 0, thus max(0, 1-B) > 0 and sign(max(0, 1-B)) == 1
if B >= 1 then 1-B <= 0, thus max(0, 1-B) = 0 and sign(max(0, 1-B)) == 0
so in your code it will be something like
A = (y*
B = (X*y[:,np.newaxis]).ravel()
C = A * np.sign(np.maximum(0, 1-B))

Rows and columns restrictions on python

I have a list of lists m which I need to modify
I need that the sum of each row to be greater than A and the sum of each column to be lesser than B
I have something like this
x = 5 #or other number, not relevant
rows = len(m)
cols = len(m[0])
for r in range(rows):
while sum(m[r]) < A:
c = randint(0, cols-1)
m[r][c] += x
for c in range(cols):
cant = sum([m[r][c] for r in range(rows)])
while cant > B:
r = randint(0, rows-1)
if m[r][c] >= x: #I don't want negatives
m[r][c] -= x
My problem is: I need to satisfy both conditions and, this way, after the second for I won't be sure if the first condition is still met.
Any suggestions on how to satisfy both conditions and, of course, with the best execution? I could definitely consider the use of numpy
Edit (an example)
m = [[0,0,0],
A = 20
B = 25
# one desired output (since it chooses random positions)
m = [[10,0,15],
I may need to add
This is for the generation of the random initial population of a genetic algorithm, the restrictions are to make them a possible solution, and I would need to run this like 80 times to get different possible solutions
Something like this should to the trick:
import numpy
from scipy.optimize import linprog
A = 10
B = 20
m = 2
n = m * m
# the coefficients of a linear function to minimize.
# setting this to all ones minimizes the sum of all variable
# values in the matrix, which solves the problem, but see below.
c = numpy.ones(n)
# the constraint matrix.
# This is matrix-multiplied with the current solution candidate
# to form the left hand side of a set of normalized
# linear inequality constraint equations, i.e.
# x_0 * A_ub[0][0] + x_1 * A_ub[0][1] <= b_0
# x_1 * A_ub[1][0] + x_1 * A_ub[1][1] <= b_1
# ...
A_ub = numpy.zeros((2 * m, n))
# row sums. Since the <= inequality is a fixed component,
# we just multiply everthing by (-1), i.e. we demand that
# the negative sums are smaller than the negative limit -A.
# Assign row ranges all at once, because numpy can do this.
for r in xrange(0, m):
A_ub[r][r * m:(r + 1) * m] = -1
# We want that the sum of the x in each (flattened)
# column is smaller than B
# The manual stepping for the column sums in row-major encoding
# is a little bit annoying here.
for r in xrange(0, m):
for j in xrange(0, m):
A_ub[r + m][r + m * j] = 1
# the actual upper limits for the normalized inequalities.
b_ub = [-A] * m + [B] * m
# hand the linear program to scipy
solution = linprog(c, A_ub=A_ub, b_ub=b_ub)
# bring the solution into the desired matrix form
print numpy.reshape(solution.x, (m, m))
I use <=, not < as indicated in your question, because that's what numpy supports.
This minimizes the total sum of all values in the target vector.
For your use case, you probably want to minimize the distance
to the original sample, which the linear program cannot handle, since neither the squared error nor the absolute difference can be expressed using a linear combination (which is what c stands for). For that, you will probably need to go to full minimize().
Still, this should get you rough idea.
A NumPy solution:
import numpy as np
val = B / len(m) # column sums <= B
assert val * len(m[0]) >= A # row sums >= A
# create array shaped like m, filled with val
arr = np.empty_like(m)
arr[:] = val
I chose to ignore the original content of m - it's all zero in your example anyway.
from random import *
m = [[0,0,0],
A = 20
B = 25
x = 1 #or other number, not relevant
rows = len(m)
cols = len(m[0])
def runner(list1, a1, b1, x1):
list1_backup = list(list1)
rows = len(list1)
cols = len(list1[0])
for r in range(rows):
while sum(list1[r]) <= a1:
c = randint(0, cols-1)
list1[r][c] += x1
for c in range(cols):
cant = sum([list1[r][c] for r in range(rows)])
while cant >= b1:
r = randint(0, rows-1)
if list1[r][c] >= x1: #I don't want negatives
list1[r][c] -= x1
good_a_int = 0
for r in range(rows):
test1 = sum(list1[r]) > a1
good_a_int += 0 if test1 else 1
if good_a_int == 0:
return list1
return runner(list1=list1_backup, a1=a1, b1=b1, x1=x1)
m2 = runner(m, A, B, x)
for row in m:
print ','.join(map(lambda x: "{:>3}".format(x), row))

Python Pulp using with Matrices

I am still very new to Python, after years and years of Matlab. I am trying to use Pulp to set up an integer linear program.
Given an array of numbers:
I want to maximize:
sum( x_i P_i )
subject to the constraints
A x <= b
A_eq x = b_eq
and with bounds (vector based bounds)
LB <= x <= UB
In pulp however, I don't see how to do vector declarations properly. I was using:
RANGE = range(numpy.size(P))
x = pulp.LpVariable.dicts("x", LB_ind, UB_ind, "Integer")
where I can only enter individual bounds (so only 1 number).
prob = pulp.LpProblem("Test", pulp.LpMaximize)
prob += pulp.lpSum([Prices[i]*Dispatch[i] for i in RANGE])
and for the constraints, do I really have to do this line per line? It seems that I am missing something. I would appreciate some help. The documentation discusses a short example. The number of variables in my case is a few thousand.
You can set the lowBound and upBound on variables after the initialization.
You can create an array of variables with
LB[i] <= x[i] <= UB[i]
with the following code.
x = pulp.LpVariable.dicts("x", RANGE, cat="Integer")
for i in x.viewkeys():
x[i].lowBound = LB_ind[i]
x[i].upBound = UB_ind[i]
The second parameter to LpVariable.dict is the index set of the decision variables, not their lower bounds.
For the first question, you can do it like this in some other problem.
students = range(96)
group = range(24)
var = lp.LpVariable.dicts("if_i_in_group_j", ((i, j) for i in students for j in group),cat='binary')

