Scipy: Minimize violates given bounds - python

I am trying to use scipy.optimize.minimize with simple a <= x <= b bounds. However, it often happens that my target function is evaluated just outside the bounds. To my understanding, this happens when minimize tries to determine the gradient of the target function at the boundary.
Minimal example:
import math
import numpy as np
from scipy.optimize import Bounds, minimize
constraint = Bounds([-1, -1], [1, 1], True)
def fun(x):
print(x)
return -math.exp(-np.dot(x,x))
result = minimize(fun, [-1, -1], bounds=constraint)
The output shows that the minimizer jumps to the point [1,1] and then tries to evaluate at [1.00000001, 1]:
[-1. -1.]
[-0.99999999 -1. ]
[-1. -0.99999999]
[-0.72932943 -0.72932943]
[-0.72932942 -0.72932943]
[-0.72932943 -0.72932942]
[-0.22590689 -0.22590689]
[-0.22590688 -0.22590689]
[-0.22590689 -0.22590688]
[1. 1.]
[1.00000001 1. ]
[1. 1.00000001]
[-0.03437328 -0.03437328]
...
Of course, there is no problem in this example, as fun can be evaluated also there. But that might not always be the case...
In my actual problem, the minimum can not be on the boundary and I have the easy workaround of adding an epsilon to the bounds.
But one would expect that there should be an easy solution to this issue which also works if the minimum can be at a boundary?
PS: It would be strange if I were the first to have this problem -- sorry if this question has been asked before somewhere, but I didn't find it anywhere.

As discussed here (thanks #"Welcome to Stack Overflow" for the comment directing me there), the problem is indeed that the gradient routine doesn't respect the bounds.
I wrote a new one that does the job:
import math
import numpy as np
from scipy.optimize import minimize
def gradient_respecting_bounds(bounds, fun, eps=1e-8):
"""bounds: list of tuples (lower, upper)"""
def gradient(x):
fx = fun(x)
grad = np.zeros(len(x))
for k in range(len(x)):
d = np.zeros(len(x))
d[k] = eps if x[k] + eps <= bounds[k][1] else -eps
grad[k] = (fun(x + d) - fx) / d[k]
return grad
return gradient
bounds = ((-1, 1), (-1, 1))
def fun(x):
print(x)
return -math.exp(-np.dot(x,x))
result = minimize(fun, [-1, -1], bounds=bounds,
jac=gradient_respecting_bounds(bounds, fun))
Note that this can be a bit less efficient, because fun(x) now gets evaluated twice at each point.
This seems to be unavoidable, relevant snippet from _minimize_lbfgsb in lbfgsb.py:
if jac is None:
def func_and_grad(x):
f = fun(x, *args)
g = _approx_fprime_helper(x, fun, epsilon, args=args, f0=f)
return f, g
else:
def func_and_grad(x):
f = fun(x, *args)
g = jac(x, *args)
return f, g
As you can see, the value of f can only be reused by the internal _approx_fprime_helper function.

Related

Improve performance of autograd jacobian

I'm wondering how the following code could be faster. At the moment, it seems unreasonably slow, and I suspect I may be using the autograd API wrong. The output I expect is each element of timeline evaluated at the jacobian of f, which I do get, but it takes a long time:
import numpy as np
from autograd import jacobian
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]))
I would expect the following:
jacobian(f) returns an function that represents the gradient vector w.r.t. the parameters.
jacobian(f)(np.array([1.0, 1.0])) is the Jacobian evaluated at the point (1, 1). To me, this should be like a vectorized numpy function, so it should execute very fast, even for 40k length arrays. However, this is not what is happening.
Even something like the following has the same poor performance:
import numpy as np
from autograd import jacobian
def f(params, t):
mu_, log_sigma_ = params
Z = t * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = jacobian(f)(np.array([1.0, 1.0]), timeline)
From https://github.com/HIPS/autograd/issues/439 I gathered that there is an undocumented function autograd.make_jvp which calculates the jacobian with a fast forward mode.
The link states:
Given a function f, vectors x and v in the domain of f, make_jvp(f)(x)(v) computes both f(x) and the Jacobian of f evaluated at x, right multiplied by the vector v.
To get the full Jacobian of f you just need to write a loop to evaluate make_jvp(f)(x)(v) for each v in the standard basis of f's domain. Our reverse mode Jacobian operator works in the same way.
From your example:
import autograd.numpy as np
from autograd import make_jvp
def f(params):
mu_, log_sigma_ = params
Z = timeline * mu_ / log_sigma_
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates (f(0), first column of jacobian)
# [0, 1] evaluates (f(0), second column of jacobian)
for basis in (np.array([1, 0]), np.array([0, 1])):
val_of_f, col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[ 1. 1.00247506 1.00495012 ... 99.99504988 99.99752494
100. ]
[ -1. -1.00247506 -1.00495012 ... -99.99504988 -99.99752494
-100. ]
This runs in ~ 0.005 seconds on google collab.
Edit:
Functions like cdf aren't defined for the regular jvp yet but you can use another undocumented function make_jvp_reversemode where it is defined. Usage is similar except that the output is only the column and not the value of the function:
import autograd.numpy as np
from autograd.scipy.stats.norm import cdf
from autograd.differential_operators import make_jvp_reversemode
def f(params):
mu_, log_sigma_ = params
Z = timeline * cdf(mu_ / log_sigma_)
return Z
timeline = np.linspace(1, 100, 40000)
gradient_at_mle = make_jvp_reversemode(f)(np.array([1.0, 1.0]))
# loop through each basis
# [1, 0] evaluates first column of jacobian
# [0, 1] evaluates second column of jacobian
for basis in (np.array([1, 0]), np.array([0, 1])):
col_of_jacobian = gradient_at_mle(basis)
print(col_of_jacobian)
Output:
[0.05399097 0.0541246 0.05425823 ... 5.39882939 5.39896302 5.39909665]
[-0.05399097 -0.0541246 -0.05425823 ... -5.39882939 -5.39896302 -5.39909665]
Note that make_jvp_reversemode will be slightly faster than make_jvp by a constant factor due to it's use of caching.

Scipy: How can I use Bounds with trust-constr?

for my constrained Problem I want to use the Scipy-Trusted-Constr algorithm as I have a multivariable, constraint problem. I dont want /can't calculate the Jacobi/Hessian analytically, and compute it.
However, when setting the bounds, the computation of the Jacobian crashes:
File "C:\Python27\lib\site-packages\scipy\optimize\_trustregion_constr\tr_interior_point.py", line 56, in __init__
self.jac0 = self._compute_jacobian(jac_eq0, jac_ineq0, s0)
File "C:\Python27\lib\site-packages\scipy\optimize\_trustregion_constr\tr_interior_point.py", line 164, in _compute_jacobian
[J_ineq, S]]))
File "C:\Python27\lib\site-packages\numpy\matrixlib\defmatrix.py", line 1237, in bmat
arr_rows.append(concatenate(row, axis=-1))
ValueError: all the input array dimensions except for the concatenation axis must match exactly
The error occurs both when using old style bounds and the newest Bounds object. I could reproduce the error with this code:
import numpy as np
import scipy.optimize as scopt
def RosenbrockN(x):
result = 0
for i in range(len(x)-1):
result += 100*(x[i+1]-x[i]**2)**2+(1-x[i])**2
return result
x0 = [0.0, 0.0, 0.0]
#bounds = scopt.Bounds([-2.0,-0.5,-2.0],[2.0,0.8,0.7])
bounds = [(-2.0,2.0),(-0.5,0.8),(-2.0,0.7)]
Res = scopt.minimize(RosenbrockN, x0, \
method = 'trust-constr', bounds = bounds, \
jac = '2-point', hess = scopt.SR1())
I take it that I just misunderstand how bounds are set, but cant find my mistake. Advice is appreciated.
EDIT: I also tried the code example from the documentation which gave the same result. Other methods as SLSQP work well with bounds.
SciPy Version 1.1.0, Python Version 2.7.4, OS Win 7 Ent.
I tried several times with the method "trust-constr" and the boundary constraints fails to be incorporated. I solved this issue by using linear constraints for the boundary conditions. Following the example
from scipy.optimize import minimize, LinearConstraint, Bounds
def RosenbrockN(x):
result = 0
for i in range(len(x)-1):
result += 100*(x[i+1]-x[i]**2)**2+(1-x[i])**2
return result
x0 = [0.0, 0.0, 0.0]
# This will not work:
#bounds = Bounds([-2.0,-0.5,-2.0],[2.0,0.8,0.7])
# This works
lb = [-2.0,-0.5,-2.0]
ub = [2.0,0.8,0.7]
A = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
lcons = LinearConstraint(A, lb=lb, ub=ub, keep_feasible=True)
Res = minimize(RosenbrockN, x0, method = 'trust-constr', constraints=lcons)
I removed your jac and hess arguments and got it to work; perhaps the problem lies there?
import numpy as np
import scipy.optimize as scopt
def RosenbrockN(x):
result = 0
for i in range(len(x)-1):
result += 100*(x[i+1]-x[i]**2)**2+(1-x[i])**2
return result
x0 = [0.0, 0.0, 0.0]
#bounds = scopt.Bounds([-2.0,-0.5,-2.0],[2.0,0.8,0.7])
bounds = [(-2.0,2.0),(-0.5,0.8),(-2.0,0.7)]
Res = scopt.minimize(RosenbrockN, x0, \
method = 'SLSQP', bounds = bounds)
print(Res)
Result is
fun: 0.051111012543332675
jac: array([-0.00297706, -0.50601892, -0.00621008])
message: 'Optimization terminated successfully.'
nfev: 95
nit: 18
njev: 18
status: 0
success: True
x: array([0.89475126, 0.8 , 0.63996894])

nonlinear optimization with vectors, scalars and inequality constraints

I have set of equation in form: Y=aA+bB
where Y-is know vector of floats (only this one is known!); a, b are unkown scalar (float) and A, B are unknown vectors of floats. Each equation have it own Y, a, b, whereas all equation share the same unknow vectors A and B.
I have set of such equation so my problem is to minimize function:
(Y-aA-bB)+(Y'-a'A-b'B)+....
I have also many inequality constrains of type: Ai>Aj (Ai i-th element of vector A), Bi>= Bk, Bi>0, a>a', ...
Is there any software or library (ideally for python) which can handle this problem?
General remarks
This is a linear problem (at least in the linear least-squares sense, continue reading)!
It's also incompletely specified as it's not clear if there should be always a feasible solution in your case or if you want to minimize some given loss in general. Your text sounds like the latter, but in this case one has to chose the loss (which makes a difference in regards to possible algorithms). Let's take the euclidean-norm (probably the best pick here)!
Ignoring constraints for a moment, we can view this problem as basic least-squares solution to a linear matrix equation problem (euclidean-norm vs. squared euclidean-norm does not make a difference!).
min || b - Ax ||^2
Here:
M = number of Y's
N = size of Y
b = (Y0,
Y1,
...) -> shape: M*N (flattened: Y_x = (y_x_0, y_x_1).T)
A = ((a0, 0, 0, ..., b0, 0, 0, ...),
(0, a0, 0, ..., 0, b0, 0, ...),
(0, 0, a0, ..., 0, 0, b0, ...),
...
(a1, 0, 0, ..., b1, 0, 0, ...)) -> shape: (M*N, N*2)
x = (A0, A1, A2, ... B0, B1, B2, ...) -> shape: N*2 (one for A, one for B)
What you should do
If unconstrained:
Convert to standard-form and use numpy's lstsq
If constrained:
Either use customized optimization algorithms, or:
Linear-programming (if minimizing absolute-differences / l1-norm)
I'm too lazy to formulate it for scipy's linprog
Not that hard, but l1-norm is non-trivial using scipy's API
Much easier to formulate with cvxpy (obj=cvxpy.norm(X, 1))
Quadratic-programming / Second-order-cone-programming (if minimizing euclidean norm / l2-norm)
Again, too lazy to formuate it; no special solver available at scipy yet
Could be easily formulated with cvxpy (obj=cvxpy.norm(X, 2))
Emergency: use general-purpose constrained nonlinear-optimization algorithms like SLSQP -> see code
Some hacky code (not the best approach!)
This code:
Is just a demo!
Uses general nonlinear optimization algorithms from scipy
Therefore:
easier to formulate
Less fast & robust than LP, QP, SOCP
But will achieve approximately the same result as convergence on convex optimization problems is guaranteed
Uses automatic-differentiation whenever needed
(author too lazy to add gradients)
this can really hurt if performance is important
Is really ugly in terms of np.repeat vs. broadcasting!
Code:
import numpy as np
from scipy.optimize import minimize
np.random.seed(1)
""" Fake-problem (usually the job of the question-author!) """
def get_partial(N=10):
Y = np.random.uniform(size=N)
a, b = np.random.uniform(size=2)
return Y, a, b
""" Optimization """
def optimize(list_partials, N, M):
""" General approach:
This is a linear system of equations (with constraints)
Basic (unconstrained) form: min || b - Ax ||^2
"""
Y_all = np.vstack(map(lambda x: x[0], list_partials)).ravel() # flat 1d
a_all = np.hstack(map(lambda x: np.repeat(x[1], N), list_partials)) # repeat to be of same shape
b_all = np.hstack(map(lambda x: np.repeat(x[2], N), list_partials)) # """
def func(x):
A = x[:N]
B = x[N:]
return np.linalg.norm(Y_all - a_all * np.repeat(A, M) - b_all * np.repeat(B, M))
""" Example constraints: A >= B element-wise """
cons = ({'type': 'ineq',
'fun' : lambda x: x[:N] - x[N:]})
res = minimize(func, np.zeros(N*2), constraints=cons, method='SLSQP', options={'disp': True})
print(res)
print(Y_all - a_all * np.repeat(res.x[:N], M) - b_all * np.repeat(res.x[N:], M))
""" Test """
M = 4
N = 3
list_partials = [get_partial(N) for i in range(M)]
optimize(list_partials, N, M)
Output:
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.9019356096498999
Iterations: 12
Function evaluations: 96
Gradient evaluations: 12
fun: 0.9019356096498999
jac: array([ 1.03786588e-04, 4.84041870e-04, 2.08129734e-01,
1.57609582e-04, 2.87599862e-04, -2.07959406e-01])
message: 'Optimization terminated successfully.'
nfev: 96
nit: 12
njev: 12
status: 0
success: True
x: array([ 1.82177105, 0.62803449, 0.63815278, -1.16960281, 0.03147683,
0.63815278])
[ 3.78873785e-02 3.41189867e-01 -3.79020251e-01 -2.79338679e-04
-7.98836875e-02 7.94168282e-02 -1.33155595e-01 1.32869391e-01
-3.73398306e-01 4.54460178e-01 2.01297470e-01 3.42682496e-01]
I did not check the result! If there is an error it's an implementation-error, not a conceptional one (my opinion)!
I agree with sascha that this is a linear problem. As I do not like constrains very much, I prefer, actually, to make it a non-linear without constrains. I do so by setting the vector A=(a1**2, a1**2+a2**2, a1**2+a2**2+a3**2, ...) like this it is ensured that it is all positive and A_i > A_j for i>j. That makes errors a bit problematic, as you now have to consider error propagation to get A1, A2, etc. including correlation, but I will have an important point on that at the end. The "simple" solution would look as follows:
import numpy as np
from scipy.optimize import leastsq
from random import random
np.set_printoptions(linewidth=190)
def generate_random_vector(n, sortIt=True):
out=np.fromiter( (random() for x in range(n) ),np.float)
if sortIt:
out.sort()
return out
def residuals(parameters,dataVec,dataLength,vecDims):
aParams=parameters[:dataLength]
bParams=parameters[dataLength:2*dataLength]
AParams=parameters[-2*vecDims:-vecDims]
BParams=parameters[-vecDims:]
YList=dataVec
AVec=[a**2 for a in AParams]##assures A_i > 0
BVec=[b**2 for b in BParams]
AAVec=np.cumsum(AVec)##assures A_i>A_j for i>j
BBVec=np.cumsum(BVec)
dist=[ np.array(Y)-a*np.array(AAVec)-b*np.array(BBVec) for Y,a,b in zip(YList,aParams,bParams) ]
dist=np.ravel(dist)
return dist
if __name__=="__main__":
aList=generate_random_vector(20, sortIt=False)
bList=generate_random_vector(20, sortIt=False)
AVec=generate_random_vector(5)
BVec=generate_random_vector(5)
YList=[a*AVec+b*BVec for a,b in zip(aList,bList)]
aGuess=20*[.2]
bGuess=20*[.3]
AGuess=5*[.4]
BGuess=5*[.5]
bestFitValues, covMX, infoDict, messages ,ier = leastsq(residuals, aGuess+bGuess+AGuess+BGuess ,args=(YList,20,5) ,full_output=True)
print "a"
print aList
besta = bestFitValues[:20]
print besta
print "b"
print bList
bestb = bestFitValues[20:40]
print bestb
print "A"
print AVec
bestA = bestFitValues[-2*5:-5]
realBestA = np.cumsum([x**2 for x in bestA])
print realBestA
print "B"
print BVec
bestB = bestFitValues[-5:]
realBestB = np.cumsum([x**2 for x in bestB])
print realBestB
print covMX
The problem on errors and correlation is that the solution to the problem is not unique. If Y = a A + b B is a solution and we, e.g., rotate such that A = c E + s F and B = -s E + c F then also Y = (ac-bs) E + (as+bc) F =e E + f F is a solution. The parameter space is, hence, completely flat at "the solution" resulting in huge errors and apocalyptic correlations.

Problems with scipy.optimize using matrix as input, bounds, constraints

I have used Python to perform optimization in the past; however, I am now trying to use a matrix as the input for the objective function as well as set bounds on the individual element values and the sum of the value of each row in the matrix, and I am encountering problems.
Specifically, I would like to pass the objective function ObjFunc three parameters - w, p, ret - and then minimize the value of this function (technically I am trying to maximize the function by minimizing the value of -1*ObjFunc) by adjusting the value of w subject to the bound that all elements of w should fall within the range [0, 1] and the constraint that sum of each row in w should sum to 1.
I have included a simplified piece of example code below to demonstrate the issue I'm encountering. As you can see, I am using the minimize function from scipy.opimize. The problems begin in the first line of objective function x = np.dot(p, w) in which the optimization procedure attempts to flatten the matrix into a one-dimensional vector - a problem that does not occur when the function is called without performing optimization. The bounds = b and constraints = c are both producing errors as well.
I know that I am making an elementary mistake in how I am approaching this optimization and would appreciate any insight that can be offered.
import numpy as np
from scipy.optimize import minimize
def objFunc(w, p, ret):
x = np.dot(p, w)
y = np.multiply(x, ret)
z = np.sum(y, axis=1)
r = z.mean()
s = z.std()
ratio = r/s
return -1 * ratio
# CREATE MATRICES
# returns, ret, of each of the three assets in the 5 periods
ret = np.matrix([[0.10, 0.05, -0.03], [0.05, 0.05, 0.50], [0.01, 0.05, -0.10], [0.01, 0.05, 0.40], [1.00, 0.05, -0.20]])
# probability, p, of being in each stae {X, Y, Z} in each of the 5 periods
p = np.matrix([[0,0.5,0.5], [0,0.6,0.4], [0.2,0.4,0.4], [0.3,0.3,0.4], [1,0,0]])
# initial equal weights, w
w = np.matrix([[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]])
# OPTIMIZATION
b = [(0, 1)]
c = ({'type': 'eq', 'fun': lambda w_: np.sum(w, 1) - 1})
result = minimize(objFunc, w, (p, ret), method = 'SLSQP', bounds = b, constraints = c)
Digging into the code a bit. minimize calls optimize._minimize._minimize_slsqp. One of the first things it does is:
x = asfarray(x0).flatten()
So you need to design your objFunc to work with the flattened version of w. It may be enough to reshape it at the start of that function.
I read the code from a IPython session, but you can also find it in your scipy directory:
/usr/local/lib/python3.5/dist-packages/scipy/optimize/_minimize.py

Problems and questions regarding scipy.optimize.minimize in Python

I have some questions and problems regarding scipy's optimize.minimize routine. I would like to minimize the function:
f(eta) = sum_i |eta*x_i - y_i|
with regard to eta. Since I am not familiar with the minimize routine and the corresponding methods, I tried some out. However, using method BFGS raises the following error:
File "/usr/local/lib/python3.4/dist-packages/scipy/optimize/_minimize.py", line 441, in minimize return _minimize_bfgs(fun, x0, args, jac, callback, **options)
File "/usr/local/lib/python3.4/dist-packages/scipy/optimize/optimize.py", line 904, in _minimize_bfgs
A1 = I - sk[:, numpy.newaxis] * yk[numpy.newaxis, :] * rhok
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index
which I was not able to solve. Please find code, which causes the error below. I am using Python3 with scipy 0.17.0 and numpy 1.8.2 on Ubuntu 14.04.3 LTS.
Furthermore, the method conjugate gradient seems to perform worse than other methods.
Last but not least, I favour estimating the minimum by finding the zero of the first derivative via scipy.optimize.brentq. Is this fine or do you recommend another approach? I prefer robustness over speed.
Here is some code illustrating the problems and questions:
from scipy import optimize
import numpy as np
def function(x, bs, cs):
sum = 0.
for b, c in zip(bs, cs):
sum += np.abs(x*b - c)
return sum
def derivativeFunction(x, bs, cs):
sum = 0.
for b, c in zip(bs, cs):
if x*b > c:
sum += b
else:
sum -= b
return sum
np.random.seed(1000)
bs = np.random.rand(10)
cs = np.random.rand(10)
eta0 = 0.5
res = optimize.minimize(fun=function, x0=eta0, args=(bs, cs), method='Nelder-Mead', tol=1e-6)
print('Nelder-Mead:\t', res.x[0], function(res.x[0], bs, cs))
res = optimize.minimize(fun=function, x0=eta0, args=(bs, cs,), method='CG', jac=derivativeFunction, tol=1e-6)
print('CG:\t', res.x[0], function(res.x[0], bs, cs))
x = optimize.brentq(f=derivativeFunction, a=0, b=2., args=(bs, cs), xtol=1e-6, maxiter=100)
print('Brentq:\t', x, function(x, bs, cs))
#Throwing the error
res = optimize.minimize(fun=function, x0=eta0, args=(bs, cs), method='BFGS', jac=derivativeFunction, tol=1e-6)
print('BFGS:\t', res.x[0], function(res.x[0], bs, cs))
Its output is:
Nelder-Mead: 0.493537902832 3.71986334101
CG: 0.460178525461 3.72659733011
Brentq: 0.49353725172947666 3.71986347245
where the first value is the position of the minimum and the second value the minimum itself. The output misses the error message from above.
Thank you for your help!

Categories

Resources