Optimization using scipy - python

I am trying to build an efficient frontier as in the Markowitz problem.
I have written the code below, but I get the error "ValueError: Objective function must return a scalar". I have tested 'fun' with some values, for example, I input to the console:
W = np.ones([n])/n # start optimization with equal weights
cov_matrix = returns.cov()
fun = 0.5*np.dot(np.dot(W, cov_matrix), W) # variance of the portfolio
fun
The output is 0.00015337622774133828, which is a scalar.
I don't know what might be wrong. Any help is appreciated.
Code:
from scipy.optimize import minimize
import pandas as pd
import numpy as np
from openpyxl import load_workbook
wb = load_workbook('path/Assets_3.xlsx') # in this workbook there is data for returns.
# The next lines clean unnecessary first column and first row.
ws = wb.active
df = pd.DataFrame(ws.values)
df1 = df.drop(0,axis=1)
df1 = df1.drop(0)
df1 = df1.astype(float)
rf = 0.05
r_bar = 0.05
returns = df1.copy()
def efficient_frontier(rf, r_bar, returns):
n = len(returns.transpose())
W = np.ones([n])/n # start optimization with equal weights
exp_ret = returns.mean()
cov_matrix = returns.cov()
fun = 0.5*np.dot(np.dot(W, cov_matrix), W) # variance of the portfolio
cons = ({'type': 'eq', 'fun': lambda W: sum(W) - 1. },
{'type': 'ineq', 'fun': lambda W: np.dot(exp_ret,W) - r_bar })
bnds = [(0.,1.) for i in range(n)] # weights between 0..1.
res = minimize(fun, W, (returns, cov_matrix, rf),
method='SLSQP', bounds = bnds, constraints = cons)
return res
x= efficient_frontier(rf,r_bar,returns)
x
Some Data
1 2 3
1 0.060206 0.005781 0.001117
2 0.006463 -0.007390 0.001133
3 -0.003211 -0.015730 0.001167
4 0.044227 -0.006250 0.001225
5 -0.040571 -0.006910 0.001292
6 -0.007900 -0.006160 0.001208
7 0.068702 0.013836 0.001300
8 0.039286 0.009854 0.001350
9 0.012457 -0.007950 0.001358
10 -0.013758 0.001021 0.001283
11 -0.002616 -0.013600 0.001300
12 0.059004 -0.006090 0.001442
13 0.015566 0.002818 0.001308
14 -0.036454 0.001395 0.001283
15 0.058899 0.011072 0.001325
16 -0.043086 0.017070 0.001308
17 0.023156 -0.003350 0.001392
18 0.063705 0.000301 0.001417
19 0.017628 -0.001960 0.001508
20 -0.014567 -0.006990 0.001525
21 -0.007191 -0.013000 0.001425
22 -0.000815 0.014773 0.001450
23 0.046493 -0.001540 0.001542
24 0.051832 -0.008580 0.001742
25 -0.007151 0.001177 0.001633
26 -0.018196 -0.008680 0.001642
27 -0.013513 -0.008810 0.001675
28 -0.026493 -0.010510 0.001825
29 -0.003249 -0.014750 0.001800
30 0.001222 0.022258 0.001758

This code is a mess and while i can show you something which runs, that does not mean anything.
You will see convergence to your starting-point, whatever that means in your task! It's a strong indicator that something is still very wrong (might be the underlying theory)!
Some additional remarks:
scipy's optimizers are build to work with numpy-arrays, not pandas Dataframes or Series objects!
the only things in your original question which hinted pandas-usage was a var-name df and returns.cov() which does not exist for numpy-arrays!
rf is never used anywhere!
there are multiple things in optimize's args, which are not used!
it does not feel like a problem one should use scipy's optimizers for! (but it's possible; here we are paying for numerical-differentiation for example)
cvxpy would probably a much much better approach (more clean, faster, more accurate) if interpret the problem correctly (did not analyze much)
but the same rules apply: some python-knowledge is needed!
Code:
from scipy.optimize import minimize
import numpy as np
import pandas as pd
rf = 0.05
r_bar = 0.05
returns = pd.DataFrame(np.random.randn(30, 3), columns=list('ABC')) # PANDAS DF
cov_matrix = returns.cov().as_matrix() # use PANDAS one last time
# but result = np.array!
returns = returns.as_matrix() # From now on: np-only!
def fun(x, returns, cov_matrix, rf):
return 0.5*np.dot(np.dot(x, cov_matrix), x)
def efficient_frontier(rf, r_bar, returns):
n = len(returns.transpose())
W = np.ones([n])/n # start optimization with equal weights
exp_ret = returns.mean()
cons = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1. }, # let's use numpy here
{'type': 'ineq', 'fun': lambda x: np.dot(exp_ret, x) - r_bar })
bnds = [(0.,1.) for i in range(n)] # weights between 0..1.
res = minimize(fun, W, (returns, cov_matrix, rf),
method='SLSQP', bounds = bnds, constraints = cons)
return res
x= efficient_frontier(rf,r_bar,returns)
print(x)
Output:
A B C
A 0.813375 -0.001370 0.173901
B -0.001370 1.482756 0.380514
C 0.173901 0.380514 1.285936
fun: 0.2604530793556774
jac: array([ 0.32863522, 0.62063321, 0.61345008])
message: 'Optimization terminated successfully.'
nfev: 35
nit: 7
njev: 3
status: 0
success: True
x: array([ 0.33333333, 0.33333333, 0.33333333])

Related

Minimizing function using scipy with constraints

I have the following constraints for the problem,
I want to minimize the sum of squared differences of w_i, uw_i divided by SUM(uw) following these restrictions:
1. w_i is at maximum, ul
2. w_i is, at least, 0.05
3. The sum of all w for a sector code can not be bigger than 0.50
So basically I want to generate all w_i for each row, however, I dont know how to implement the third restriction with scipy.
With scipy.optimize.lsq_linear I can force the first two conditions with bound = (0.05, ul), but I don't know how to force the third one.
import pandas as pd
import scipy
import numpy as np
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/data(1).csv")
df = df.drop("Unnamed: 0", axis = 1)
df
I think you are trying to do something like this:
import pandas as pd
import scipy
import numpy as np
'''
Minimize the sum of squared differences of w_i, uw_i divided by SUM(uw) following these restrictions:
Constraints:
1. w_i is at maximum, ul [NOTE: I think this should say 'minimum']
2. w_i is, at least, 0.05 [NOTE: I think this should say 'at most']
3. The sum of all w for a sector code can not be bigger than 0.50
'''
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/data(1).csv")
df = df.drop("Unnamed: 0", axis = 1)
print(df)
gb = df.groupby('Sector Code')['ul']
codeCounts = gb.count().to_list()
cumCounts = [0] + [sum(codeCounts[:i + 1]) for i in range(len(codeCounts))]
newIdx = []
for code, dfGp in gb:
newIdx += list(dfGp.index)
df = df.reindex(newIdx)
# For each unique Sector Code, create constraint that 0.50 minus the sum of w for that code must be non-negative:
def foo(i, c):
# return a closure that acts as a constraint for the i'th interval defined by c[i-1]:c[i]
def bar(x):
return 0.50 - sum(x[c[i-1]:c[i]])
return bar
cons = [{'type': 'ineq', 'fun': foo(i, cumCounts)} for i in range(1, len(cumCounts))]
# Value of bounds argument to enforce ul <= w_i <= 0.05
bnds = tuple((ul_i, 0.05) for code, ul_group in gb for ul_i in ul_group)
# Initial guess
n = len(df.index)
w_i = np.ones(n) * (1 / n)
# The objective function to be minimized
uw_sum = df.uw.sum()
def fun(w):
return (pd.Series(w) - df.uw).pow(2).sum() / uw_sum
# Optimize using scipy minimize() function
from scipy.optimize import minimize
res = minimize(fun, w_i, method='SLSQP', bounds=bnds, constraints=cons)
print(res)
df['w'] = res.x
df = df.reindex(range(len(df.index)))
print(df)
Explanation:
Use groupby() to get the row count for each unique Sector Code value and also to construct an index ordered by Sector Code, which we use to re-order the original input df
create a list of constraint dictionaries to be passed to the optimizer, one for each Sector Code, which will use python closures to constrain the sum of the corresponding solution elements to be <= 0.50
create a sequence of bounds to constrain solution elements w_i to be between ul and 0.05
create the objective function to return the sum of squared differences of w_i, uw_i divided by sum(uw)
call minimize() from scipy.optimize with the above constraints, bounds, objective function and an initial guess
add a column to the dataframe with the result and call reindex() to restore the original row order.
Output:
uw ul Sector Code
0 0.006822 0.050000 40
1 0.017949 0.050000 40
2 0.001906 0.031289 40
3 0.000904 0.040318 20
4 0.001147 0.046904 15
... ... ... ...
1226 0.003653 0.033553 10
1227 0.002556 0.031094 10
1228 0.002816 0.041031 10
1229 0.010216 0.050000 40
1230 0.001559 0.033480 55
[1231 rows x 3 columns]
fun: 0.4487707682194904
jac: array([0.02089997, 0.00466947, 0.01358654, ..., 0.02070332, nan,
0.02188896])
message: 'Positive directional derivative for linesearch'
nfev: 919
nit: 5
njev: 1
status: 8
success: False
x: array([0.03730054, 0.0247585 , 0.02171931, ..., 0.03300862, 0.05 ,
0.03348039])
uw ul Sector Code w
0 0.006822 0.050000 40 0.050000
1 0.017949 0.050000 40 0.050000
2 0.001906 0.031289 40 0.031289
3 0.000904 0.040318 20 0.040318
4 0.001147 0.046904 15 0.046904
... ... ... ... ...
1226 0.003653 0.033553 10 0.033553
1227 0.002556 0.031094 10 0.031094
1228 0.002816 0.041031 10 0.041031
1229 0.010216 0.050000 40 0.050000
1230 0.001559 0.033480 55 0.033480
[1231 rows x 4 columns]
Note that success is False, so perhaps some work remains. Hopefully the dataframe related manipulations are helpful in addressing your question.
You already got a working answer from #constantstranger. IMO, there's just one problem: it's quite slow. More precisely, it took more than a minute to solve the problem on my machine.
Therefore, some notes on what could be done in order to speed up the solver in the following:
Since Python has a noticeable overhead when calling functions, it's a good idea to implement all functions as fast as possible. For instance, evaluating one vectorial constraint function is faster than evaluating multiple scalar constraint functions.
At the moment, all derivatives (the objective gradient and the constraint Jacobian) are approximated by finite differences. This is a real bottleneck because each evaluation of the approximated derivative goes in hand with multiple objective/constraint function evaluations. Instead, it's highly recommended to provide the exact derivatives or use algorithmic differentiation.
Last but not least, scipy.optimize.minimize is only suited for small to mid-sized problems at least. If you are willing to use another package, you could use IPOPT, the state-of-the-art NLP solver. The cyipopt package provides a scipy-like interface, so it isn't hard switching from scipy.optimize.minimize.
Besides from that, your problem is a (convex) quadratic optimization problem and can be formulated as follows:
min f(w) s.t. A*w <= 0.5, u_l <= w <= 0.05
with
f(w) = (1/sum(u_w)) * ||w - u_w||^2_2 = (1/sum(u_w)) * (w'Iw - 2u_w'*w + u_w'u_w)
where A[i,j] = 1 if w[j] belongs to sector i and 0 otherwise.
Then, solving the problem with IPOPT (note that we pass the exact derivatives) looks like this:
import numpy as np
import pandas as pd
from cyipopt import minimize_ipopt
# dataframe
df = pd.read_csv("https://raw.githubusercontent.com/norhther/datasets/main/data(1).csv")
df = df.drop("Unnamed: 0", axis = 1)
# sectors
sectors = df["Sector Code"].unique()
# building the matrix A
A = np.zeros((sectors.size, len(df)))
for i, sec in enumerate(sectors):
indices = df[df["Sector Code"] == sec].index.values
A[i, indices] = 1
uw = df['uw'].values
uw_sum = uw.sum()
# objective
def obj(w):
return np.sum((w - uw)**2) / uw_sum
# objective gradient
def grad(w):
return (2*w - 2*uw) / uw_sum
# Linear Constraint A # w <= 0.5 <=> 0.5 - A # w >= 0
cons = [{'type': 'ineq', 'fun': lambda w: 0.5 - A # w, 'jac': lambda w: -A}]
# variable bounds
bounds = [(u_i, 0.05) for u_i in df.ul.values]
# feasible initial guess
w0 = np.ones(len(df)) / len(df)
# solve the problem
res = minimize_ipopt(obj, x0=w0, jac=grad, bounds=bounds, constraints=cons)
print(res)
On my machine, this terminates in less than 2 seconds and yields
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
fun: 0.4306218505716169
info: {'x': array([0.05 , 0.05 , 0.03128946, ..., 0.04103131, 0.05 ,
0.03348038]), 'g': array([-3.51687688, -9.45217602, -7.88799127, -1.78825803, -1.86650095,
-5.09092925, -2.11181422, -1.35485327, -1.15847276, 0.35 ]), 'obj_val': 0.4306218505716169, 'mult_g': array([-1.000000e+03, -1.000000e+03, -1.000000e+03, -1.000000e+03,
-1.000000e+03, -1.000000e+03, -1.000000e+03, -1.000000e+03,
-1.000000e+03, -2.857166e-09]), 'mult_x_L': array([1000.02960821, 1000.02197802, 1000.00000005, ..., 1000.00000011,
1000.02728049, 1000.00000006]), 'mult_x_U': array([0.00000000e+00, 0.00000000e+00, 5.34457820e-08, ...,
1.11498931e-07, 0.00000000e+00, 6.05340266e-08]), 'status': 2, 'status_msg': b'Algorithm converged to a point of local infeasibility. Problem may be infeasible.'}
message: b'Algorithm converged to a point of local infeasibility. Problem may be infeasible.'
nfev: 13
nit: 9
njev: 7
status: 2
success: False
x: array([0.05 , 0.05 , 0.03128946, ..., 0.04103131, 0.05 ,
0.03348038])
[Finished in 1.9s]

How to constrain the weight of characteristic variables in regression

Now I faced a problem that for a data sample(lets‘s say 10 continuous variables and one dependent variable),  I need fit a model for the prediction. I would like constrain the weights of all the variables between a particular number, like abs(0.2). Which means the variables should no more than 0.2 or less than -0.2. However, I tried lasso and ridge regression in sklearn.linear_model(Also tried ElasticNet) to control the weights of variables, it's not quite good because there always be one or two extreme large weights or sometimes when I gave a large alpha the r square shows the model was really bad. I tried to write my own methods, but I could only constrain the sum of weights nor the every weight of variables. SVR would provide a pretty close answer, however I still wanna ask if there are some good choices for muti-regression with self define constrains?
import numpy as np
from scipy.optimize import shgo
def my_general_linear_model_func(A1,b1):
num_x = np.shape(A1)[1]
def my_func(x):
ls = 0.5*(b1-np.dot(A1,x))**2
result = np.sum(ls)
return result
def g1(x):
return np.sum(x) #sum of X >= 0
def g2(x):
return 1-np.sum(x) #sum of X <= 1
cons = ({'type': 'ineq', 'fun': g1}
,{'type': 'ineq', 'fun': g2})
x0 = np.zeros(num_x)
bnds = [(0,1)]
for i in range(num_x-1):
bnds.append((0,1))
res1 = shgo(my_func,
bounds = bnds,
constraints=cons)
return res1
A1 = np.array([[0.12,5.96,3.14],[0.68,7.89,4.56]])
b1 = np.array([3,5])
my_general_linear_model_func(A1,b1)
The result:
fun: 0.07651391974288956
funl: array([0.07651392, 0.11079534, 0.2564125 ])
message: 'Optimization terminated successfully.'
nfev: 53
nit: 2
nlfev: 49
nlhev: 0
nljev: 12
success: True
x: array([1.12339358e-16, 5.62146099e-02, 9.43785390e-01])
xl: array([[1.12339358e-16, 5.62146099e-02, 9.43785390e-01],
[3.90241087e-01, 5.00000000e-01, 1.09758913e-01],
[5.00000000e-01, 5.00000000e-01, 0.00000000e+00]])

dimensions are wrong when function called through lambda

I have issues with the following code snippet, where I optimize a function (minimizing the volatility).
from scipy import optimize as sco
import numpy as np
def risk_measure(covMatrix, weights):
risk = np.dot(weights, np.dot(covMatrix, weights))
return risk
prescribed_esg = 6 # esg score between 0 and 10 used as threshold in the esg_constraint
# Covariance and return matrix
V = np.matrix([[84.76695659, 20.8854772, 20.62182415, 74.73652696, 14.35995947],
[20.8854772, 35.22429277, 12.95439707, 32.22912903, 12.96449085],
[20.62182415, 12.95439707, 44.02079739, 38.73627316, 9.46608475],
[74.73652696, 32.22912903, 38.73627316, 178.86640813, 33.40281336],
[14.35995947, 12.96449085, 9.46608475, 33.40281336, 32.38514103]])
R = np.matrix([[-0.32264539, -0.08469428, 1.27628749, -0.23207085, 0.21012106]]).T
# Mean ESG score of each company
esgarr = np.matrix([[8.24336898, 4.6373262, 8.30657754, 4.65406417, 3.43620321]]).T
# Bounds and constraints
N = len(R) # number of instruments
bounds = ((-10,10),)*N # allow shorting, bounds of stocks
constraints = {'type': 'eq', 'fun': lambda weights: weights.sum() - 1}
esg_constraint = {'type': 'eq', 'fun': lambda weights: np.dot(weights, esgarr) - prescribed_esg}
esgmvp = sco.minimize(lambda x: risk_measure(V, x), # function to be minimized
N * [1 / N], # initial guess
bounds=bounds, # boundary conditions
constraints =[constraints, esg_constraint], # equality constraints)
)
esgmvp_weights = list(esgmvp['x'])
esgmvp_risk = esgmvp['fun']
esgmvp_esg = np.dot(esgmvp_weights, esgarr)
With the error message
<ipython-input-252-0d6bf5d30ccf> in risk_measure(covMatrix, weights)
3
4 def risk_measure(covMatrix, weights):
----> 5 risk = np.dot(weights, np.dot(covMatrix, weights))
6 return risk
7
<__array_function__ internals> in dot(*args, **kwargs)
ValueError: shapes (5,) and (1,5) not aligned: 5 (dim 0) != 1 (dim 0)
I am able to get results If I create a standalone matrix of weights such as
weights = np.matrix([[1, 1, 1, 1, 1]])
risk = np.dot(weights, np.dot(V, weights.T))
but this does not work when tranposing in my original function.
The following solved it
V = np.squeeze(np.asarray(V))
esg_constraint = {'type': 'eq', 'fun': lambda weights: np.dot(weights, esgarr).sum() - prescribed_esg}
I also edited the function
def risk_measure(covMatrix, weights):
risk = np.dot(weights.T, np.dot(covMatrix, weights))
return risk

Maximising a function

experts. I'm trying to maximize a function my_obj with the Nelder-Mead algorithm to fit my data. For this i have taken help from the scipy's optimize.fmin . I think i am very close to the solutions but missing something and getting an error like:
As explained in the scipy.optimize.minimize documentation, you should be using a 1-D array (or a 1-D list because it is compatible) as input for your objective function instead of multiple parameters:
#!/usr/bin/env python
import numpy as np
from scipy.optimize import minimize
d1 = np.array([ 5.0, 10.0, 15.0, 20.0, 25.0])
h = np.array([10000720600.0, 10011506200.0, 10057741200.0, 10178305100.0,10415318500.0])
b = 2.0
cx = 2.0
#objective function
def obj_function(x): # EDIT: Input is a list
m,n,r= x
pw = 1/cx
c = b*cx
x1 = 1+(d1/n)**c
x2 = 1+(d1/m)**c
x3 = (x1/x2)**pw
dcal = (r)*x3
dobs = (h)
deld=((np.log10(dcal)-np.log10(dobs)))**2
return np.sum(deld)
print(obj_function([5.0,10.0,15.0])) # EDIT: Input is a list
x0 = [5.0,10.0,15.0]
print(obj_function(x0))
res = minimize(obj_function, x0, method='nelder-mead')
print(res)
Output:
% python3 script.py
432.6485766651165
432.6485766651165
final_simplex: (array([[7.76285924e+00, 3.02470699e-04, 1.93396980e+01],
[7.76286507e+00, 3.02555020e-04, 1.93397231e+01],
[7.76285178e+00, 3.01100639e-04, 1.93397381e+01],
[7.76286445e+00, 3.01025402e-04, 1.93397169e+01]]), array([0.12196442, 0.12196914, 0.12197448, 0.12198028]))
fun: 0.12196441986340725
message: 'Optimization terminated successfully.'
nfev: 130
nit: 67
status: 0
success: True
x: array([7.76285924e+00, 3.02470699e-04, 1.93396980e+01])

Having trouble with scipy Minimize function, it is giving me odd results

Created an objective function
Added constraints
The problem is no matter what initial guess I use, the minimize functions just keeps on using that number. for example: If I use 15 for the initial guess, the solver will not try any other number and say the answer is 15. I'm sure the ere is an issue with the code but I am not sure where.
CODE BELOW:
from scipy.optimize import minimize
import numpy as np
from pandas import *
#----------------------------------------------------
#-------- Create Function ------------
#----------------------------------------------------
def MovingAverage(Input,N,test=0):
# Create data frame
df = DataFrame(Input, columns=['Revenue'])
# Add columns
df['CummSum'] = df['Revenue'].cumsum()
df['Mavg'] = rolling_mean(df['Revenue'], N)
df['Error'] = df['Revenue'] - df['Mavg']
df['MFE'] = (df['Error']).mean()
df['MAD'] = np.fabs(df['Error']).mean()
df['MSE'] = np.sqrt(np.square(df['Error']).mean())
df['TS'] = np.sum(df['Error'])/df['MAD']
print N, df.MAD[0]
if test == 0:
return df.MAD[0]
else: return df
#----------------------------------------------------
#-------- Input ------------
#----------------------------------------------------
data = [1,2,3,4,5,5,5,5,5,5,5,5,5,5,5]
#----------------------------------------------------
#-------- SOLVER ------------
#----------------------------------------------------
## Objective Function
fun = lambda x: MovingAverage(data, x[0])
## Contraints
cons = ({'type': 'ineq', 'fun': lambda x: x[0] - 2}, # N>=2
{'type': 'ineq', 'fun': lambda x: len(data) - x[0]}) # N<=len(data)
## Bounds (note sure what this is yet)
bnds = (None,None)
## Solver
res = minimize(fun, 15, method='SLSQP', bounds=bnds, constraints=cons)
##print res
##print res.status
##print res.success
##print res.njev
##print res.nfev
##print res.fun
##for i in res.x:
## print i
##print res.message
##for i in res.jac:
## print i
##print res.nit
# print final results
result = MovingAverage(data,res.x,1)
print result
List of possible values:
2 = 0.142857142857,
3 = 0.25641025641,
4 = 0.333333333333,
5 = 0.363636363636,
6 = 0.333333333333,
7 = 0.31746031746,
8 = 0.3125,
9 = 0.31746031746,
10 = 0.333333333333,
11 = 0.363636363636,
12 = 0.416666666667,
13 = 0.487179487179,
14 = 0.571428571429,
15 = 0.666666666667
Your function is piecewise constant between integer input values, as seen in the plot below (plotted in steps of 0.1 on the x axis):
So the derivative is zero at almost all points, and that's why a gradient based minimization method will return any given initial point as a local minimum.
To rescue the situation, you could think about using interpolation in the objective function to get intermediate function values for non-integer input values. If you combine this with a gradient-based minimization, it might find some point around 8 as a local minimum when starting at 15.

Categories

Resources