I have used Python to perform optimization in the past; however, I am now trying to use a matrix as the input for the objective function as well as set bounds on the individual element values and the sum of the value of each row in the matrix, and I am encountering problems.
Specifically, I would like to pass the objective function ObjFunc three parameters - w, p, ret - and then minimize the value of this function (technically I am trying to maximize the function by minimizing the value of -1*ObjFunc) by adjusting the value of w subject to the bound that all elements of w should fall within the range [0, 1] and the constraint that sum of each row in w should sum to 1.
I have included a simplified piece of example code below to demonstrate the issue I'm encountering. As you can see, I am using the minimize function from scipy.opimize. The problems begin in the first line of objective function x = np.dot(p, w) in which the optimization procedure attempts to flatten the matrix into a one-dimensional vector - a problem that does not occur when the function is called without performing optimization. The bounds = b and constraints = c are both producing errors as well.
I know that I am making an elementary mistake in how I am approaching this optimization and would appreciate any insight that can be offered.
import numpy as np
from scipy.optimize import minimize
def objFunc(w, p, ret):
x = np.dot(p, w)
y = np.multiply(x, ret)
z = np.sum(y, axis=1)
r = z.mean()
s = z.std()
ratio = r/s
return -1 * ratio
# CREATE MATRICES
# returns, ret, of each of the three assets in the 5 periods
ret = np.matrix([[0.10, 0.05, -0.03], [0.05, 0.05, 0.50], [0.01, 0.05, -0.10], [0.01, 0.05, 0.40], [1.00, 0.05, -0.20]])
# probability, p, of being in each stae {X, Y, Z} in each of the 5 periods
p = np.matrix([[0,0.5,0.5], [0,0.6,0.4], [0.2,0.4,0.4], [0.3,0.3,0.4], [1,0,0]])
# initial equal weights, w
w = np.matrix([[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]])
# OPTIMIZATION
b = [(0, 1)]
c = ({'type': 'eq', 'fun': lambda w_: np.sum(w, 1) - 1})
result = minimize(objFunc, w, (p, ret), method = 'SLSQP', bounds = b, constraints = c)
Digging into the code a bit. minimize calls optimize._minimize._minimize_slsqp. One of the first things it does is:
x = asfarray(x0).flatten()
So you need to design your objFunc to work with the flattened version of w. It may be enough to reshape it at the start of that function.
I read the code from a IPython session, but you can also find it in your scipy directory:
/usr/local/lib/python3.5/dist-packages/scipy/optimize/_minimize.py
Related
I'm trying the solve a minimization problem using the minimize function of Scipy. The objective function is simply the ratio of two multivariate normal distributions with different mean and variance. I'm hoping to find the maximum of the function g_func, which is equivalent to find the minimum of the function g_optimization. Also, I added a constraint of x[0] = 0. Here, x is a vector with 8 elements. The objective function g_optimization is as following:
import numpy as np
from scipy.optimize import minimize
# Set up mean and variance for two MVN distributions
n_trait = 8
sigma = np.full((n_trait, n_trait),0.0005)
np.fill_diagonal(sigma,0.005)
omega = np.full((n_trait, n_trait),0.0000236)
np.fill_diagonal(omega,0.0486)
sigma_pos = np.linalg.inv(np.linalg.inv(sigma)+np.linalg.inv(omega))
mu_pos = np.array([-0.01288244,0.08732091,0.01049617,0.0860966,0.10055626,0.07952922,0.04363669,-0.0061975])
mu_pri = 0
sigma_pri = omega
#objective function
def g_func(beta,mu_sim_pos):
g1 = ((np.linalg.det(sigma_pri))**(1/2))/((np.linalg.det(sigma_pos))**(1/2))
g2 = (-1/2)*np.linalg.multi_dot([np.transpose(beta-mu_sim_pos),np.linalg.inv(sigma_pos),beta-mu_sim_pos])
g3 = (1/2)*np.linalg.multi_dot([np.transpose(beta-mu_pri),np.linalg.inv(sigma_pri),beta-mu_pri])
g = g1*np.exp(g2+g3)
return g
def g_optimization(beta,mu_sim_pos):
return -1*g_func(beta,mu_sim_pos)
#optimization
start_point = np.full(8,0)
cons = ({'type': 'eq',
'fun' : lambda x: np.array([x[0]])})
anws = minimize (g_optimization, [start_point], args=(mu_pos),
constraints=cons, options={'maxiter': 50}, tol=0.001)
anws
The optimization stops after two iterations, and the minimum value that the function gives is 0, at the point np.array([0,10.32837891,-1.62396508,10.13790152,12.38752653,9.11615259,3.53201544,-4.22115517]). This cannot be true because even we plug in the starting point np.zeros(8) to the g_optimization function, the result given is -657.0041125829354, which is smaller than 0. So the solution provided is definitely not minimal.
g_optimization(np.zeros(8),mu_pos) #gives solution of -657.0041125829354
I'm not sure where did I go wrong.
I would try a different solver. For example L-BFGS-B works well.
You can look at all options here.
anws = minimize (g_optimization, [start_point], args=(mu_pos), method='L-BFGS-B',
constraints=cons, options={'maxiter': 50}, tol=0.001)
print(anws)
# success: True
# message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
# fun: -21688.00879938617
# x: array([-0.0101048, 0.09937778, 0.01543875, 0.0980401, 0.11383878, 0.09086455, 0.05164822, -0.00280081])
EDIT:
L-BFGS-B can not handle general constraints h(x)=0, only bounding boxes on the variables:
Bounds on variables for L-BFGS-B, TNC, SLSQP, Powell, and trust-constr methods. There are two ways to specify the bounds:
Instance of Bounds class.
Sequence of (min, max) pairs for each element in x. None is used to specify no bound.
In your case you have to define 8 pairs of lower and upper limits.
For x[0] you have to make a tight bound as the method can not handle x_low == x_high.
bounds = [(None, None)] * 8
bounds[0] = (0, 0.00001)
anws = minimize (g_optimization, [start_point], args=(mu_pos), method='L-BFGS-B', bounds=bounds,
options={'maxiter': 50}, tol=0.001)
# fun: -21467.48153792194
# x: array([0., 0.10039832, 0.01641271, 0.0990599, 0.11486735, 0.09188037, 0.05264228, -0.00183697])
Another alternative is to exclude the value x[0] from your optimisation problem:
def g_optimization(beta,mu_sim_pos):
beta2 = np.empty(8)
beta2[0] = 0
beta2[1:] = beta
return -1*g_func(beta2, mu_sim_pos)
start_point = np.zeros(7) # exclude x[0]
anws = minimize(g_optimization, [start_point], args=(mu_pos), method='L-BFGS-B',
options={'maxiter': 50}, tol=0.001)
# fun: -21467.47686079844
# x: array([0.10041797, 0.01648995, 0.09908046, 0.11487707, 0.09190585, 0.05269467, -0.00174722])
# ^ missing x[0]
I have a least squares problem to solve without any known estimates of a parameter. I impose the constraint that my desired solution be smooth (the model parameters vary slowly), so I minimize the difference between adjacent parameters (a traditional remedy used for this geological problem).
The constraints are implemented by arranging the constraining equations as rows in the original data equation d = Gm. The auxiliary parameter w is chosen by trial and error (w is called Lagrange multiplier by some textbooks).
I have the following:
G = np.array([[1,0,1,0,0,6],
[1,0,0,1,0,6.708],
[1,0,0,0,1,8.485],
[0,1,1,0,0,7.616],
[0,1,0,1,0,7],
[0,1,0,0,1,7.616]])
d = np.array([[2.323],
[2.543],
[2.857],
[2.64],
[2.529],
[2.553]])
Now adding a constraint of an arbitrary w-weighted smoothness (w = 0.01):
w = 0.01
G = np.array([[1,0,1,0,0,6],
[1,0,0,1,0,6.708],
[1,0,0,0,1,8.485],
[0,1,1,0,0,7.616],
[0,1,0,1,0,7],
[0,1,0,0,1,7.616],
[w,-w,0,0,0,0],
[0,w,-w,0,0,0],
[0,0,w,-w,0,0],
[0,0,0,w,-w,0],
[0,0,0,0,w,-w]])
d = np.array([[2.323],
[2.543],
[2.857],
[2.64],
[2.529],
[2.553],
[0],
[0],
[0],
[0],
[0]])
However, choosing a proper value for w seems to be a key step to constraint a good solution for the model parameters.
So my question is: with Python, is there a way I can loop over many calculated solutions with different values for w and choose the value that was used to achieve the solution with the best quality?
In the presented solution I'll refer to G_0 as G without the additional constraint and similarly d_0 is d without the additional zeros. I'm also assuming you're reading G_0 and d_0 from somewhere and I'm referring them as known.
import numpy as np
def create_W(n_rows, w):
W = -np.diagflat(np.ones(n_rows), 1)
np.fill_diagonal(W, 1)
return W
def solution_quality_metric(m):
# this need to be implemented to determine what you mean by "best"
n_rows = 5
d_w = np.zeros(n_rows)
# choose range for w values for example w_min = 0, w_max = 1, dw = 0.01
best_m = -np.inf
best_w = w_min
for w in np.arange(w_min, w_max, dw):
W = create_W(n_rows, w)
G = np.concatenate([G_0, W], axis=0)
d = np.concatenate([d_0, d_w])
m = np.lstsq(G, d)
if solution_quality_metric(m) > best_m:
best_m = solution_quality_metric(m)
best_w = w
This code will obviously not work as is since you didn't specify what you mean by "solution with the best quality". For this you'll need to implement the solution_quality_metric function
I am a newbie at Python and I was writing a code to compute, then fit, magnetization data.
Firstly, I am writing the function for the energy to be minimized with respect to the parameter "theta".
def E_uniaxial(H, phi, theta, Keff, Ms):
e = Keff*(np.cos(theta))**2 - ((4*np.pi)**2*mu0)*Ms*H*np.cos(theta - phi)
return e
Then, as the magnetization depends strongly on the previous equilibriuum position of the system, I write a function for the "next equilibriuum position", the parameter H is the one supposed to change between the previous and the new equilibriuum position.
def next_theta(Ms, phi, Keff, H, lasttheta, fctE):
E = lambda x : fctE(H, phi, x, Keff, Ms)[0]
result = scipy.optimize.minimize(E, lasttheta)
return result.x
After this, I write a function that computes a whole hysteresis cycle. Given a starting point that is known, the function increases H and computes all the equilibriuum positions that depends on the previous one (then H is decreased and the same process is performed).
def cycle_theta(Ms, desfield, Keff, Hmax, theta_init_1, theta_init_2, fctE):
#aller
H1 = np.linspace(-Hmax, Hmax, 2000)
sol1 = np.zeros(np.shape(H1))
sol1[0] = theta_init_1
for i in range(len(H1)-1):
sol1[i+1] = next_theta(Ms, desfield, Keff, H1[i+1], sol1[i], fctE)
#retour
H2 = np.linspace(Hmax, -Hmax, 2000)
sol2 = np.zeros(np.shape(H2))
sol2[0] = theta_init_2
for i in range(len(H2) -1):
sol2[i+1] = next_theta(Ms, desfield, Keff, H2[i+1], sol2[i], fctE)
return H1, sol1, np.flip(sol2)
Then, I have to fit data in order to find the Ms and Keff parameters. I defined this function :
def test_fit(H, Ms, Keff):
a = cycle_theta(Ms, 1., Keff, 20, np.pi, 0., E_uniaxial)[1]
idx = 0
if isinstance(H, float):
idx = find_nearest(a, H)
print('float')
return np.sin(a[idx])
if isinstance(H, np.ndarray):
c = np.zeros(np.shape(H))
for i in range(len(H)):
idx = find_nearest(a, H[i])
c[i] = a[idx]
print('array')
return np.sin(c)
The condition on the type seemed to be required for the function to work with curve_fit.
I finally call popt = curve_fit(test_fit, b, sig) where "b" and "sig" are my experimental data.
But I got this error several times coming from the scipy.optimize.minimize, not the curve_fit:
ValueError: setting an array element with a sequence.
I read that this message can come from the fact my energy function E_unixial returns an array and not a scalar, but actually it's a quite regular function : if you input a scalar, you get a scalar and if you input an array, you get an array.
So I really don't understand, am I not supposed to use scipy.optimize.minimize and scipy.minimize.curve_fit one into the other ?
Thank you a lot for your help !!
I have set of equation in form: Y=aA+bB
where Y-is know vector of floats (only this one is known!); a, b are unkown scalar (float) and A, B are unknown vectors of floats. Each equation have it own Y, a, b, whereas all equation share the same unknow vectors A and B.
I have set of such equation so my problem is to minimize function:
(Y-aA-bB)+(Y'-a'A-b'B)+....
I have also many inequality constrains of type: Ai>Aj (Ai i-th element of vector A), Bi>= Bk, Bi>0, a>a', ...
Is there any software or library (ideally for python) which can handle this problem?
General remarks
This is a linear problem (at least in the linear least-squares sense, continue reading)!
It's also incompletely specified as it's not clear if there should be always a feasible solution in your case or if you want to minimize some given loss in general. Your text sounds like the latter, but in this case one has to chose the loss (which makes a difference in regards to possible algorithms). Let's take the euclidean-norm (probably the best pick here)!
Ignoring constraints for a moment, we can view this problem as basic least-squares solution to a linear matrix equation problem (euclidean-norm vs. squared euclidean-norm does not make a difference!).
min || b - Ax ||^2
Here:
M = number of Y's
N = size of Y
b = (Y0,
Y1,
...) -> shape: M*N (flattened: Y_x = (y_x_0, y_x_1).T)
A = ((a0, 0, 0, ..., b0, 0, 0, ...),
(0, a0, 0, ..., 0, b0, 0, ...),
(0, 0, a0, ..., 0, 0, b0, ...),
...
(a1, 0, 0, ..., b1, 0, 0, ...)) -> shape: (M*N, N*2)
x = (A0, A1, A2, ... B0, B1, B2, ...) -> shape: N*2 (one for A, one for B)
What you should do
If unconstrained:
Convert to standard-form and use numpy's lstsq
If constrained:
Either use customized optimization algorithms, or:
Linear-programming (if minimizing absolute-differences / l1-norm)
I'm too lazy to formulate it for scipy's linprog
Not that hard, but l1-norm is non-trivial using scipy's API
Much easier to formulate with cvxpy (obj=cvxpy.norm(X, 1))
Quadratic-programming / Second-order-cone-programming (if minimizing euclidean norm / l2-norm)
Again, too lazy to formuate it; no special solver available at scipy yet
Could be easily formulated with cvxpy (obj=cvxpy.norm(X, 2))
Emergency: use general-purpose constrained nonlinear-optimization algorithms like SLSQP -> see code
Some hacky code (not the best approach!)
This code:
Is just a demo!
Uses general nonlinear optimization algorithms from scipy
Therefore:
easier to formulate
Less fast & robust than LP, QP, SOCP
But will achieve approximately the same result as convergence on convex optimization problems is guaranteed
Uses automatic-differentiation whenever needed
(author too lazy to add gradients)
this can really hurt if performance is important
Is really ugly in terms of np.repeat vs. broadcasting!
Code:
import numpy as np
from scipy.optimize import minimize
np.random.seed(1)
""" Fake-problem (usually the job of the question-author!) """
def get_partial(N=10):
Y = np.random.uniform(size=N)
a, b = np.random.uniform(size=2)
return Y, a, b
""" Optimization """
def optimize(list_partials, N, M):
""" General approach:
This is a linear system of equations (with constraints)
Basic (unconstrained) form: min || b - Ax ||^2
"""
Y_all = np.vstack(map(lambda x: x[0], list_partials)).ravel() # flat 1d
a_all = np.hstack(map(lambda x: np.repeat(x[1], N), list_partials)) # repeat to be of same shape
b_all = np.hstack(map(lambda x: np.repeat(x[2], N), list_partials)) # """
def func(x):
A = x[:N]
B = x[N:]
return np.linalg.norm(Y_all - a_all * np.repeat(A, M) - b_all * np.repeat(B, M))
""" Example constraints: A >= B element-wise """
cons = ({'type': 'ineq',
'fun' : lambda x: x[:N] - x[N:]})
res = minimize(func, np.zeros(N*2), constraints=cons, method='SLSQP', options={'disp': True})
print(res)
print(Y_all - a_all * np.repeat(res.x[:N], M) - b_all * np.repeat(res.x[N:], M))
""" Test """
M = 4
N = 3
list_partials = [get_partial(N) for i in range(M)]
optimize(list_partials, N, M)
Output:
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.9019356096498999
Iterations: 12
Function evaluations: 96
Gradient evaluations: 12
fun: 0.9019356096498999
jac: array([ 1.03786588e-04, 4.84041870e-04, 2.08129734e-01,
1.57609582e-04, 2.87599862e-04, -2.07959406e-01])
message: 'Optimization terminated successfully.'
nfev: 96
nit: 12
njev: 12
status: 0
success: True
x: array([ 1.82177105, 0.62803449, 0.63815278, -1.16960281, 0.03147683,
0.63815278])
[ 3.78873785e-02 3.41189867e-01 -3.79020251e-01 -2.79338679e-04
-7.98836875e-02 7.94168282e-02 -1.33155595e-01 1.32869391e-01
-3.73398306e-01 4.54460178e-01 2.01297470e-01 3.42682496e-01]
I did not check the result! If there is an error it's an implementation-error, not a conceptional one (my opinion)!
I agree with sascha that this is a linear problem. As I do not like constrains very much, I prefer, actually, to make it a non-linear without constrains. I do so by setting the vector A=(a1**2, a1**2+a2**2, a1**2+a2**2+a3**2, ...) like this it is ensured that it is all positive and A_i > A_j for i>j. That makes errors a bit problematic, as you now have to consider error propagation to get A1, A2, etc. including correlation, but I will have an important point on that at the end. The "simple" solution would look as follows:
import numpy as np
from scipy.optimize import leastsq
from random import random
np.set_printoptions(linewidth=190)
def generate_random_vector(n, sortIt=True):
out=np.fromiter( (random() for x in range(n) ),np.float)
if sortIt:
out.sort()
return out
def residuals(parameters,dataVec,dataLength,vecDims):
aParams=parameters[:dataLength]
bParams=parameters[dataLength:2*dataLength]
AParams=parameters[-2*vecDims:-vecDims]
BParams=parameters[-vecDims:]
YList=dataVec
AVec=[a**2 for a in AParams]##assures A_i > 0
BVec=[b**2 for b in BParams]
AAVec=np.cumsum(AVec)##assures A_i>A_j for i>j
BBVec=np.cumsum(BVec)
dist=[ np.array(Y)-a*np.array(AAVec)-b*np.array(BBVec) for Y,a,b in zip(YList,aParams,bParams) ]
dist=np.ravel(dist)
return dist
if __name__=="__main__":
aList=generate_random_vector(20, sortIt=False)
bList=generate_random_vector(20, sortIt=False)
AVec=generate_random_vector(5)
BVec=generate_random_vector(5)
YList=[a*AVec+b*BVec for a,b in zip(aList,bList)]
aGuess=20*[.2]
bGuess=20*[.3]
AGuess=5*[.4]
BGuess=5*[.5]
bestFitValues, covMX, infoDict, messages ,ier = leastsq(residuals, aGuess+bGuess+AGuess+BGuess ,args=(YList,20,5) ,full_output=True)
print "a"
print aList
besta = bestFitValues[:20]
print besta
print "b"
print bList
bestb = bestFitValues[20:40]
print bestb
print "A"
print AVec
bestA = bestFitValues[-2*5:-5]
realBestA = np.cumsum([x**2 for x in bestA])
print realBestA
print "B"
print BVec
bestB = bestFitValues[-5:]
realBestB = np.cumsum([x**2 for x in bestB])
print realBestB
print covMX
The problem on errors and correlation is that the solution to the problem is not unique. If Y = a A + b B is a solution and we, e.g., rotate such that A = c E + s F and B = -s E + c F then also Y = (ac-bs) E + (as+bc) F =e E + f F is a solution. The parameter space is, hence, completely flat at "the solution" resulting in huge errors and apocalyptic correlations.
I am trying to better understand how various parts of the mosek optimizer work and cannot quite understand the logic of the following constraints etc.
It I have the following code:
n = 3
x0 = [-20.0, -50.0, -10.0]
t = [0.01, 0.01, 0.01]
TC = flattenBook(n, x0, t)
def flattenBook(n, x0, t):
M = Model("Simple Portfolio")
M.setLogHandler(sys.stdout)
## can be long and short
x = M.variable("x", n, Domain.unbounded())
## helper variable for buy/sell positions
z = M.variable("z", n, Domain.unbounded())
## find long positions
l = M.variable("l", n, Domain.greaterThan(0.0))
M.constraint('long1', Expr.sub(l,x0), Domain.greaterThan(0.0))
M.constraint('buy', Expr.sub(z,Expr.sub(x,x0)), Domain.greaterThan(0.0))
M.constraint('sell', Expr.sub(z,Expr.sub(x0,x)), Domain.greaterThan(0.0))
M.constraint("longeqshort", Expr.sum(x), Domain.equalsTo(0.0))
M.objective('obj', ObjectiveSense.Minimize, Expr.dot(z, t))
M.solve()
if True:
print "x:"
print x.level()
The results are as follows:
[60.0, -50.0, -10.0]
Which are correct but can someone confirm the logic for the l variable. My understanding is that the long1 constraint forces l to be only the positive values from the x0 array, is that correct? And if so why?
Based on the examples on the Mosek website I have taken this logic from the buy/sell constraints for the costs of trading.