Related
I want to use GEKKO to solve the following optimization problem:
Minimize x'Qx + 1e-10 * sum_{i=1}^n x_i^0.1
subject to 1' x = 1 and x >= 0
However, the following code returns sol = [0., 0., 0. ,0. ,1.] and Objective: 1.99419 as a solution. Which is far from optimal, I'll explain why below.
import numpy as np
from gekko import GEKKO
n = 5
m = GEKKO(remote=False)
m.options.SOLVER = 1
m.options.IMODE = 3
x = [m.Var(lb=0, ub=1) for _ in range(n)]
m.Equation(m.sum(x) == 1)
np.random.seed(0)
Q = np.random.uniform(-1, 1, size=(n, n))
Q = np.dot(Q.T, Q)
## Add h_i^p
c, p = 1e-10, 0.1
for i in range(n):
m.Obj(c * x[i] ** p)
for j in range(n):
m.Obj(x[i] * Q[i, j] * x[j])
m.solve(disp=True)
sol = np.array(x).flatten()
This is clearly wrong since if we only optimize the quadratic part (x'Qx) using below code, and put the solution to the initial objective, we get a much smaller objective value (Objective: 0.02489503). The 1e-10 * sum_{i=1}^n x_i^p is esentially ignored since it is very small.
m1 = GEKKO(remote=False)
m1.options.SOLVER = 1
m1.options.OTOL = 1e-10
x1 = [m1.Var(lb=0, ub=1) for _ in range(n)]
m1.Equation(m1.sum(x1) == 1)
m1.qobj(b=np.zeros(n), A=2 * Q, x=x1, otype='min')
m1.solve(disp=True)
sol = np.array(x1).flatten()
Is there any way to resolve this? Thank you!
Gekko solves nonlinear programming optimization problems with gradient-based methods: interior point and active set SQP. It looks like there is a problem with the objective function. Use matrix operations in Numpy to simplify the objective definition.
## Create Objective
c, p = 1e-10, 0.1
obj = np.dot(np.dot(x,Q),x) + c*m.sum([xi**p for xi in x])
m.Minimize(obj)
Here is the modified script that solves with Gekko. Increase MAX_ITER if the default limit of 250 is reached.
import numpy as np
from gekko import GEKKO
n = 5
m = GEKKO(remote=False)
m.options.SOLVER = 3
m.options.IMODE = 3
x = m.Array(m.Var,n,value=0.1, lb=1e-6, ub=1)
m.Equation(m.sum(x) == 1)
np.random.seed(0)
Q = np.random.uniform(-1, 1, size=(n, n))
Q = np.dot(Q.T, Q)
print(Q)
## Create Objective
c, p = 1e-10, 0.1
obj = np.dot(np.dot(x,Q),x) + c*m.sum([xi**p for xi in x])
m.Minimize(obj)
# adjust solver tolerance
m.options.RTOL=1e-10
m.options.OTOL=1e-10
m.options.MAX_ITER = 1000
m.solve(disp=True)
sol = np.array(x).flatten()
print('x: ', sol)
print('obj: ', m.options.OBJFCNVAL)
This gives an optimal solution that is also global because it is a Quadratic Programming (QP) problem (convex optimization). Using a nonlinear programming (SQP) solver for QP problems gives a solution with the IPOPT solver:
x: [[0.36315827507] [0.081993130341] [1e-06] [0.086231281612] [0.46861632269]]
obj: 0.024895918696
As far as I could see, gekko looks like it's built for machine learning, which focuses on local optimization opposed to global optimization, and typically most libraries will not be able to guarantee you optimal solutions.
If you really want optimal solutions, than for this case I would suggest looking into interval arithmetic. There are packages such as mpmath which can offer this, though I have yet to see optimizers using it in my brief time searching.
The TL;DR on how interval arithmetic works is you feed in a range of inputs and get back a range of outputs. For example, you can test if 1 is in the range of possible outputs for x1 + x2 + x3 + x4, and you can see the minimum/maximum potential values for your objective function. In this way, you can progressively split your intervals in half, keeping only intervals for which your constraints are potentially satisfied and for which your objective function's maximum potential is at least the largest minimum potential. This allows you to achieve guaranteed convergence to global optimums at the cost of a lot more computation.
I am trying to work on an optimization problem using python and I started working on GEKKO since it solves nonlinear programs. I have written a simple model in LINGO to check the answer which didn't have the same value as the answer I got from GEKKO (same model).
Python Code:-
from gekko import GEKKO
# Initialize Model
smplmdl = GEKKO()
# Create Variables
x = smplmdl.Array(smplmdl.Var, 3, lb = 0)
a = smplmdl.Array(smplmdl.Var, 3, lb = 0)
Constant_Val = [10, 15, 20]
for i in range(3):
smplmdl.Equation(x[i]*(sum(a[j] for j in range(3))) == Constant_Val[i])
# Objective Function
smplmdl.Obj(sum(x[i] for i in range(3)))
smplmdl.options.IMODE = 3
smplmdl.solve()
smplmdl.options.OBJFCNVAL
print('x:', x)
print('a:', a)
print(smplmdl.options.OBJFCNVAL)
LINGO Code:-
Min = x1 + x2 + x3;
x1*(a1 + a2 + a3) = 10;
x2*(a1 + a2 + a3) = 15;
x3*(a1 + a2 + a3) = 20;
It is possible to simply the model.
from gekko import GEKKO
# Initialize Model
smplmdl = GEKKO(remote=False)
# Create Variables
x = smplmdl.Array(smplmdl.Var, 3, lb = 0)
a = smplmdl.Array(smplmdl.Var, 3, lb = 0)
Constant_Val = [10, 15, 20]
for i in range(3):
smplmdl.Equation(x[i]*sum(a) == Constant_Val[i])
# Objective Function
smplmdl.Minimize(sum(x))
# Solve and print solution
smplmdl.solve()
print('x:', x)
print('a:', a)
print(smplmdl.options.OBJFCNVAL)
This gives the solution:
x: [[5.7995291433e-05] [8.699293715e-05] [0.00011599058287]]
a: [[57475.93038] [57475.930412] [57475.930377]]
Objective: 0.00026097881145
The objective is to minimize the summation of x and the objective function obtained by IPOPT is 2.6e-4. If LINGO gave a different solution, there are likely solver tolerances that can be adjusted to achieve better agreement. Try adjusting m.options.RTOL and m.options.OTOL for the residual and objective function tolerances. For non-convex problems, a multi-start method or a global solver may be better suited to this problem.
If the purpose is to compare LINGO and GEKKO, perhaps try a simple convex optimization problem such as:
from gekko import GEKKO
import numpy as np
m = GEKKO()
x = m.Array(m.Var,4,value=1,lb=1,ub=5)
x1,x2,x3,x4 = x
# change initial values
x2.value = 5; x3.value = 5
m.Equation(x1*x2*x3*x4>=25)
m.Equation(x1**2+x2**2+x3**2+x4**2==40)
m.Minimize(x1*x4*(x1+x2+x3)+x3)
m.solve()
print(x,m.options.OBJFCNVAL)
I am doing a LassoCV with 1000 coefs. Statsmodels did not seem to able to handle this many coefs. So I am using scikit learn. Statsmodel allowed for .fit_constrained("coef1 + coef2...=1"). This constrained the sum of the coefs to = 1. I need to do this in Scikit. I am also keeping the intercept at zero.
from sklearn.linear_model import LassoCV
LassoCVmodel = LassoCV(fit_intercept=False)
LassoCVmodel.fit(x,y)
Any help would be appreciated.
As mentioned in the comments: the docs and the sources do not indicate that this is supported within sklearn!
I just tried the alternative of using off shelf convex-optimization solvers. It's just a simple prototype-like approach and it might not be a good fit for your (incompletely defined) task (sample-size?).
Some comments:
implementation/model-formulation is easy
the problem is harder to solve than i thought
solver ECOS having general trouble
solver SCS reaches good accuracy (worse compared to sklearn)
but: tuning iterations to improve accuracy breaks the solver
problem will be infeasible for SCS!
SCS + bigM-based formulation (constraint is posted as penalization-term within objective) looks usable; but might need tuning
only open-source solvers were tested and commercial ones might be much better
Further things to try:
Tackling huge problems (where performance gets more important compared to robustness and accuracy), a (Accelerated) Projected Stochastic Gradient approach looks promising
Code
""" data """
from time import perf_counter as pc
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
A = diabetes.data
y = diabetes.target
alpha=0.1
print('Problem-size: ', A.shape)
def obj(x): # following sklearn's definition from user-guide!
return (1. / (2*A.shape[0])) * np.square(np.linalg.norm(A.dot(x) - y, 2)) + alpha * np.linalg.norm(x, 1)
""" sklearn """
print('\nsklearn classic l1')
from sklearn import linear_model
clf = linear_model.Lasso(alpha=alpha, fit_intercept=False)
t0 = pc()
clf.fit(A, y)
print('used (secs): ', pc() - t0)
print(obj(clf.coef_))
print('sum x: ', np.sum(clf.coef_))
""" cvxpy """
print('\ncvxpy + scs classic l1')
from cvxpy import *
x = Variable(A.shape[1])
objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y) + alpha * norm(x, 1))
problem = Problem(objective, [])
t0 = pc()
problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)
print('used (secs): ', pc() - t0)
print(obj(x.value.flat))
print('sum x: ', np.sum(x.value.flat))
""" cvxpy -> sum x == 1 """
print('\ncvxpy + scs sum == 1 / 1st approach')
objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y))
constraints = [sum(x) == 1]
problem = Problem(objective, constraints)
t0 = pc()
problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)
print('used (secs): ', pc() - t0)
print(obj(x.value.flat))
print('sum x: ', np.sum(x.value.flat))
""" cvxpy approach 2 -> sum x == 1 """
print('\ncvxpy + scs sum == 1 / 2nd approach')
M = 1e6
objective = Minimize((1. / (2*A.shape[0])) * sum_squares(A*x - y) + M*(sum(x) - 1))
constraints = [sum(x) == 1]
problem = Problem(objective, constraints)
t0 = pc()
problem.solve(solver=SCS, use_indirect=False, max_iters=10000, verbose=False)
print('used (secs): ', pc() - t0)
print(obj(x.value.flat))
print('sum x: ', np.sum(x.value.flat))
Output
Problem-size: (442, 10)
sklearn classic l1
used (secs): 0.001451024380348898
13201.3508496
sum x: 891.78869298
cvxpy + scs classic l1
used (secs): 0.011165673357417458
13203.6549995
sum x: 872.520510561
cvxpy + scs sum == 1 / 1st approach
used (secs): 0.15350853891775978
13400.1272148
sum x: -8.43795102327
cvxpy + scs sum == 1 / 2nd approach
used (secs): 0.012579569383536493
13397.2932976
sum x: 1.01207061047
Edit
Just for fun i implemented a slow non-optimized prototype solver using the approach of accelerated projected gradient (remarks in code!).
This one should scale much better for huge problems (as it's a first-order method), despite slow behaviour here (because not optimized). There should be a lot of potential!
Warning: might be seen as advanced numerical-optimization to some people :-)
Edit 2: I forgot to add the nonnegative-constraint on the projection (sum(x) == 1 makes not much sense if x can be nonnegative!). This makes the solving much harder (numerical-trouble) and it's obvious, that one of those fast special-purpose projections should be used (and i'm too lazy right now; i think n*log n algs are available). Again: this APG-solver is a prototype not ready for real tasks.
Code
""" accelerated pg -> sum x == 1 """
def solve_pg(A, b, momentum=0.9, maxiter=1000):
""" remarks:
algorithm: accelerated projected gradient
projection: proj on probability-simplex
-> naive and slow using cvxpy + ecos
line-search: armijo-rule along projection-arc (Bertsekas book)
-> suffers from slow projection
stopping-criterion: naive
gradient-calculation: precomputes AtA
-> not needed and not recommended for huge sparse data!
"""
M, N = A.shape
x = np.zeros(N)
AtA = A.T.dot(A)
Atb = A.T.dot(b)
stop_count = 0
# projection helper
x_ = Variable(N)
v_ = Parameter(N)
objective_ = Minimize(0.5 * square(norm(x_ - v_, 2)))
constraints_ = [sum(x_) == 1]
problem_ = Problem(objective_, constraints_)
def gradient(x):
return AtA.dot(x) - Atb
def obj(x):
return 0.5 * np.linalg.norm(A.dot(x) - b)**2
it = 0
while True:
grad = gradient(x)
# line search
alpha = 1
beta = 0.5
sigma=1e-2
old_obj = obj(x)
while True:
new_x = x - alpha * grad
new_obj = obj(new_x)
if old_obj - new_obj >= sigma * grad.dot(x - new_x):
break
else:
alpha *= beta
x_old = x[:]
x = x - alpha*grad
# projection
v_.value = x
problem_.solve()
x = np.array(x_.value.flat)
y = x + momentum * (x - x_old)
if np.abs(old_obj - obj(x)) < 1e-2:
stop_count += 1
else:
stop_count = 0
if stop_count == 3:
print('early-stopping # it: ', it)
return x
it += 1
if it == maxiter:
return x
print('\n acc pg')
t0 = pc()
x = solve_pg(A, y)
print('used (secs): ', pc() - t0)
print(obj(x))
print('sum x: ', np.sum(x))
Output
acc pg
early-stopping # it: 367
used (secs): 0.7714511330487027
13396.8642379
sum x: 1.00000000002
I am surprised nobody has stated this before in the comments, but I think there is a conceptual misunderstanding in your question statement.
Let us start with the definition of the Lasso Estimator, for example as given in Statistical Learning with Sparsity The Lasso and Generalizations by Hastie, Tibshirani and Wainwright:
Given a collection of N predictor-response pairs {(xi,yi)}, the
lasso finds the fit coefficients (β0,βi) to the least-square
optimization problem with the additional constraint that the L1-norm
of the vector of coefficients βi is less than or equal to t.
Where the L1-norm of the coefficient vector is the sum of the magnitudes of all coefficients. In the case where your coefficients are all positive, this is precisely tackling your question.
Now, what is the relationship between this t and the alpha parameter used in scikit-learn? Well, it turns out that by Lagrangian duality, there is a one-to-one correspondence between every value of t and a value for alpha.
This means that when you use LassoCV, since you are using a range of values for alpha, you are using by definition a range of allowable values for the sum of all your coefficients!
To sum up, the condition of the sum of all your coefficients being equal to one is equivalent to using Lasso for a particular value of alpha.
I have a generic question on how to solve optimization problems of the Min-Max type, using the PICOS package in Python. I found little information in this context while searching the PICOS documentation and on the web as well.
I can imagine a simple example of the below form.
Given a matrix M, find x* = argmin_x [ max_y x^T M y ], where x > 0, y > 0, sum(x) = 1 and sum(y) = 1.
I have tried a few methods, starting with the most straightforward idea of having minimax, minmax keywords in the objective function of PICOS Problem class. It turns out that none of these keywords are valid, see the package documentation for objective functions. Furthermore, having nested objective functions also turns out to be invalid.
In the last of my naive attempts, I have two functions, Max() and Min() which are both solving a linear optimization problem. The outer function, Min(), should minimize the inner function Max(). So, I have used Max() in the objective function of the outer optimization problem.
import numpy as np
import picos as pic
import cvxopt as cvx
def MinMax(mat):
## Perform a simple min-max SDP formulated as:
## Given a matrix M, find x* = argmin_x [ max_y x^T M y ], where x > 0, y > 0, sum(x) = sum(y) = 1.
prob = pic.Problem()
## Constant parameters
M = pic.new_param('M', cvx.matrix(mat))
v1 = pic.new_param('v1', cvx.matrix(np.ones((mat.shape[0], 1))))
## Variables
x = prob.add_variable('x', (mat.shape[0], 1), 'nonnegative')
## Setting the objective function
prob.set_objective('min', Max(x, M))
## Constraints
prob.add_constraint(x > 0)
prob.add_constraint((v1 | x) == 1)
## Print the problem
print("The optimization problem is formulated as follows.")
print prob
## Solve the problem
prob.solve(verbose = 0)
objVal = prob.obj_value()
solution = np.array(x.value)
return (objVal, solution)
def Max(xVar, M):
## Given a vector l, find y* such that l y* = max_y l y, where y > 0, sum(y) = 1.
prob = pic.Problem()
# Variables
y = prob.add_variable('y', (M.size[1], 1), 'nonnegative')
v2 = pic.new_param('v1', cvx.matrix(np.ones((M.size[1], 1))))
# Setting the objective function
prob.set_objective('max', ((xVar.H * M) * y))
# Constraints
prob.add_constraint(y > 0)
prob.add_constraint((v2 | y) == 1)
# Solve the problem
prob.solve(verbose = 0)
sol = prob.obj_value()
return sol
def print2Darray(arr):
# print a 2D array in a readable (matrix like) format on the standard output
for ridx in range(arr.shape[0]):
for cidx in range(arr.shape[1]):
print("%.2e \t" % arr[ridx,cidx]),
print("")
print("========")
return None
if __name__ == '__main__':
## Testing the Simple min-max SDP
mat = np.random.rand(4,4)
print("## Given a matrix M, find x* = argmin_x [ max_y x^T M y ], where x > 0, y > 0, sum(x) = sum(y) = 1.")
print("M = ")
print2Darray(mat)
(optval, solution) = MinMax(mat)
print("Optimal value of the function is %.2e and it is attained by x = %s and that of y = %.2e." % (optval, np.array_str(solution)))
When I run the above code, it gives me the following error message.
10:stackoverflow pavithran$ python minmaxSDP.py
## Given a matrix M, find x* = argmin_x [ max_y x^T M y ], where x > 0, y > 0, sum(x) = sum(y) = 1.
M =
1.46e-01 9.23e-01 6.50e-01 7.30e-01
6.13e-01 6.80e-01 8.35e-01 4.32e-02
5.19e-01 5.99e-01 1.45e-01 6.91e-01
6.68e-01 8.46e-01 3.67e-01 3.43e-01
========
Traceback (most recent call last):
File "minmaxSDP.py", line 80, in <module>
(optval, solution) = MinMax(mat)
File "minmaxSDP.py", line 19, in MinMax
prob.set_objective('min', Max(x, M))
File "minmaxSDP.py", line 54, in Max
prob.solve(verbose = 0)
File "/Library/Python/2.7/site-packages/picos/problem.py", line 4135, in solve
self.solver_selection()
File "/Library/Python/2.7/site-packages/picos/problem.py", line 6102, in solver_selection
raise NotAppropriateSolverError('no solver available for problem of type {0}'.format(tp))
picos.tools.NotAppropriateSolverError: no solver available for problem of type MIQP
10:stackoverflow pavithran$
At this point, I am stuck and unable to fix this problem.
Is it just that PICOS does not natively support min-max problem or is my way of encoding the problem, incorrect?
Please note: The reason I am insisting on using PICOS is that ideally, I would like to know the answer to my question in the context of solving a min-max semidefinite program (SDP). But I think the addition of semidefinite constraints is not hard, once I can figure out how to do a simple min-max problem using PICOS.
The first answer is that min-max problems are not natively supported in PICOS. However, whenever the inner maximization problem is a convex optimization problem, you can reformulate it as a minimization problem (by taking the Lagrangian dual), and so you get a min-min problem.
Your particular problem is a standard zero-sum game, and can be reformulated as: (assuming M is of dimension n x m):
min_x max_{i=1...m} [M^T x]_i = min_x,t t s.t. [M^T x]_i <= t (for i=1...m)
In Picos:
import picos as pic
import cvxopt as cvx
n=3
m=4
M = cvx.normal(n,m) #generate a random matrix
P = pic.Problem()
x = P.add_variable('x',n,lower=0)
t = P.add_variable('t',1)
P.add_constraint(M.T*x <= t)
P.add_constraint( (1|x) == 1)
P.minimize(t)
print 'the solution is x='
print x
If you also need the optimal y, then you can show that it corresponds to the optimal value of the constraint M'x <= t:
print 'the solution of the inner max-problem is y='
print P.constraints[0].dual
Best,
Guillaume.
I am very new to scipy and doing data analysis in python. I am trying to solve the following regularized optimization problem and unfortunately I haven't been able to make too much sense from the scipy documentation. I am looking to solve the following constrained optimization problem using scipy.optimize
Here is the function I am looking to minimize:
here A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
In the function A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second term is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
I would like to know how to minimize this function subject to the constraints that:
wj >= 0
wj,j = 0
I would like to use coordinate descent (or any other method that scipy.optimize provides) to solve the above problem. I would like so direction on how to achieve this as I have no idea how to take the frobenius norm or how to tune the parameters beta and lambda or whether the scipy.optimize will tune and return the parameters for me. Any help regarding these questions would be much appreciated.
Thanks in advance!
How large is m and n?
Here is a basic example for how to use fmin:
from scipy import optimize
import numpy as np
m = 5
n = 3
a = np.random.rand(m, n)
idx = np.arange(n)
def func(w, beta, lam):
w = w.reshape(n, n)
w2 = np.abs(w)
w2[idx, idx] = 0
return 0.5*((a - np.dot(a, w2))**2).sum() + lam*w2.sum() + 0.5*beta*(w2**2).sum()
w = optimize.fmin(func, np.random.rand(n*n), args=(0.1, 0.2))
w = w.reshape(n, n)
w[idx, idx] = 0
w = np.abs(w)
print w
If you want to use coordinate descent, you can implement it by theano.
http://deeplearning.net/software/theano/
Your problem seems tailor-made for cvxopt - http://cvxopt.org/
and in particular
http://cvxopt.org/userguide/solvers.html#problems-with-nonlinear-objectives
using fmin would likely be slower, since it does not take advantage of gradient / Hessian information.
The code in HYRY's answer also has the drawback that as far as fmin is concerned the diagonal W is a variable and fmin would try to move the W-diagonal values around until it realizes that they don't do anything (since the objective function resets them to zero). Here is the implementation in cvxopt of HYRY's code that explicitly enforces the zero-constraints and uses gradient info, WARNING: I couldn't derive the Hessian for your objective... and you might double-check the gradient as well:
'''CVXOPT version:'''
from numpy import *
from cvxopt import matrix, mul
''' warning: CVXOPT uses column-major order (Fortran) '''
m = 5
n = 3
n_active = (n)*(n-1)
A = matrix(random.rand(m*n),(m,n))
ids = arange(n)
beta = 0.1;
lam = 0.2;
W = matrix(zeros(n*n), (n,n));
def cvx_objective_func(w=None, z=None):
if w is None:
num_nonlinear_constraints = 0;
w_0 = matrix(1, (n_active,1), 'd');
return num_nonlinear_constraints, w_0
#main call:
'calculate objective:'
'form W matrix, warning _w is column-major order (Fortran)'
'''column-major order!'''
_w = matrix(w, (n, n-1))
for k in xrange(n):
W[k, 0:k] = _w[k, 0:k]
W[k, k+1:n] = _w[k, k:n-1]
squared_error = A - A*W
objective_value = .5 * sum( mul(squared_error,squared_error)) +\
.5* beta*sum(mul(W,W)) +\
lam * sum(abs(W));
'not sure if i calculated this right...'
_Df = -A.T*(squared_error) + beta*W + lam;
'''column-major order!'''
Df = matrix(0., (1, n*(n-1)))
for jdx in arange(n):
for idx in list(arange(0,jdx)) + list(arange(jdx+1,n)):
idx = int(idx);
jdx = int(jdx)
Df[0, jdx*(n-1) + idx] = _Df[idx, jdx]
if z is None:
return objective_value, Df
'''Also form hessian of objective+non-linear constraints
(but there are no nonlinear constraints) :
This is the trickiest part...
WARNING: H is for sure coded wrong'''
H = matrix(1., (n_active, n_active))
return objective_value, Df, H
m, w_0 = cvx_objective_func()
print cvx_objective_func(w_0)
G = -matrix(diag(ones(n_active),), (n_active,n_active))
h = matrix(0., (n_active,1), 'd')
from cvxopt import solvers
print solvers.cp(cvx_objective_func, G=G, h=h)
having said that, the tricks to eliminate the equality/inequality constraints in HYRY's code are quite cute