No iterations with scipy.optmize when trying to minimize a function - python

I'm using scipy to maximize a likelihood function by using 'minimize' from scipy.optimize to minimize the negative of the function value. I'm using the BFGS method and have written functions for the likelihood and its first derivative.
I have been able to minimize the function by estimating the gradient numerically (not providing an argument for the jacobian). However when I try to pass my gradient function as an argument, no iterations are performed to improve my initial guess of the function input values.
EDIT: Using check_grad from scipy I have figured out that my gradient function is flawed. This causes the line search step of the first iteration to fail so no iterations are carried out.
Here are the function and gradient:
def f(X):
X = X.reshape((N,Q))
cov = kern2.compute_noisy(X,X)
inv_cov = np.linalg.inv(cov)
YYt = np.dot(Y, Y.T)
log_l = (-0.5*D*N*np.log(2*math.pi))-(0.5*D*np.log(np.linalg.det(cov))) - (0.5*np.matrix.trace(np.dot(inv_cov,YYt)))
return -log_l
def grad(X):
X = X.reshape(N,-1)
cov = kern2.compute_noisy(X,X)
inv_cov = np.linalg.inv(cov)
YYt = np.dot(Y, Y.T)
dlogl_dK = np.dot(np.dot(inv_cov,YYt),inv_cov) - D*inv_cov
dK_dX = np.empty((X.shape[0], X.shape[0], X.shape[1]))
Q = int(X.shape[1])
for j in range(0,X.shape[0]):
for i in range(0,X.shape[0]):
for k in range(0,X.shape[1]):
dK_dX[i,j,k] = (X[i][k] - X[j][k]) * kern.K(X[i,:][None],X[j,:][None])
dK_dX = np.sum(dK_dX, axis=1)
dlogl_dX = np.dot(dlogl_dK, dK_dX)
return -dlogl_dX.flatten(1)
Checking the initial function value:
print f(X)
>>6597.80198798
Estimating the gradient numerically seems to be ok (the function is not minimized but at least something happens). X is my initial guess at the input:
from scipy.optimize import minimize
test = minimize(f, X, method='BFGS', options={'disp': True})
>>Warning: Desired error not necessarily achieved due to precision loss.
>> Current function value: 6215.446492
>> Iterations: 289
>> Function evaluations: 67671
>> Gradient evaluations: 335
This is what happens when I try to include the gradient function. No iterations are performed and the function value doesn't change:
test2 = minimize(f, X, method='BFGS', jac=grad, options={'disp': True})
>>Warning: Desired error not necessarily achieved due to precision loss.
>> Current function value: 6597.801988
>> Iterations: 0
>> Function evaluations: 43
>> Gradient evaluations: 32
I have looked at the documentation and can't work out why no iterations are being performed. I think I am using minimize correctly and I don't think my initial guess is at a minimum already as I have the same problem with different sets of values. Help would be much appreciated!

Related

Nonlinear constraints with scipy

The problem at hand is optimization of multivariate function with nonlinear constraints.
There is a differential equation (in its oversimplified form)
dy/dx = y(x)*t(x) + g(x)
I need to minimize the solution of the DE y(x), but by varying the t(x).
Since it is physics under the hood, there are constraints on t(x). I successfully implemented all of them except one:
0 < t(x) < 1 for any x in range [a,b]
For certainty, the t(x) is a general polynomial:
t(x) = a0 + a1*x + a2*x**2 + a3*x**3 + a4*x**4 + a5*x**5
The x is fixed numpy.ndarray of floats and the optimization goes for coefficients a. I use scipy.optimize with trust-constr.
What I have tried so far:
Root finding at each step and determining the minimal/maximal value of the function using optimize.root and checking for sign changes. Return 0.5 if constraints are satisfied and numpy.inf or -1 or whatever not in [0;1] range if constraints are not satisfied. The optimizer stops soon and the function is not minimized properly.
Since x is fixed-length and known, I tried to define a constraint for each point, so I got N constraints where N = len(x). This works (at least look like) but takes forever for not-so large N. Also, since x is discrete and non-uniform, I can't be sure that there are no violated constraints for any x in [a,b].
EDIT #1: the minimal reproducible example
import scipy.optimize as optimize
from scipy.optimize import Bounds
import numpy as np
# some function y(x)
x = np.linspace(-np.pi,np.pi,100)
y = np.sin(x)
# polynomial t(z)
def t(a,z):
v = 0.0;
for ii in range(len(a)):
v += a[ii]*z**ii
return v
# let's minimize the sum
def targetFn(a):
return np.sum(y*t(a,x))
# polynomial order
polyord = 3
# simple bounds to have reliable results,
# otherwise the solution will grow toward +-infinity
bnd = 10.0
bounds = Bounds([-bnd for i in range(polyord+1)],
[bnd for i in range(polyord+1)])
res = optimize.minimize(targetFn, [1.0 for i in range(polyord+1)],
bounds = bounds)
if np.max(t(res.x,x))>200:
print('max constraint violated!')
if np.min(t(res.x,x))<-100:
print('min constraint violated!')
In the reproducible example given above, let the constraints to be that the value of the polynomial t(a,x) is in range [-100;200] for the given x.
So the question is: how does one properly define a constraint to tell the optimizer that the function's values must be constrained for the given range of arguments?

Can Tensorflow work out gradients for integral approximations?

I am trying to use Hamiltonian Monte Carlo (HMC, from Tensorflow Probability) but my target distribution contains an intractable 1-D integral which I approximate with the trapezoidal rule. My understanding of HMC is that it calculates gradients of the target distribution to build a more efficient transition kernel. My question is can Tensorflow work out gradients in terms of the parameters of function, and are they meaningful?
For example this is a log-probability of the target distribution where 'A' is a model parameter:
# integrate e^At * f[t] with respect to t between 0 and t, for all t
t = tf.linspace(0., 10., 100)
f = tf.ones(100)
delta = t[1]-t[0]
sum_term = tfm.multiply(tfm.exp(A*t), f)
integrals = 0.5*delta*tfm.cumsum(sum_term[:-1] + sum_term[1:], axis=0)
pred = integrals
sq_diff = tfm.square(observed_data - pred)
sq_diff = tf.reduce_sum(sq_diff, axis=0)
log_lik = -0.5*tfm.log(2*PI*variance) - 0.5*sq_diff/variance
return log_lik
Are the gradients of this function in terms of A meaningful?
Yes, you can use tensorflow GradientTape to work out the gradients. I assume you have a mathematical function outputting log_lik with many inputs, one of it is A
GradientTape to get the gradient of A
The get the gradients of log_lik with respect to A, you can use the tf.GradientTape in tensorflow
For example:
with tf.GradientTape(persistent=True) as g:
g.watch(A)
t = tf.linspace(0., 10., 100)
f = tf.ones(100)
delta = t[1]-t[0]
sum_term = tfm.multiply(tfm.exp(A*t), f)
integrals = 0.5*delta*tfm.cumsum(sum_term[:-1] + sum_term[1:], axis=0)
pred = integrals
sq_diff = tfm.square(observed_data - pred)
sq_diff = tf.reduce_sum(sq_diff, axis=0)
log_lik = -0.5*tfm.log(2*PI*variance) - 0.5*sq_diff/variance
z = log_lik
## then, you can get the gradients of log_lik with respect to A like this
dz_dA = g.gradient(z, A)
dz_dA contains all partially derivatives of variables in A
I just show you the idea by the code above. In order to make it works you need to do the calculation by Tensor operation. So change to modify your function to use tensor type for the calculation
Another example but in tensor operation
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
with tf.GradientTape() as gg:
gg.watch(x)
y = x * x
dy_dx = gg.gradient(y, x) # Will compute to 6.0
d2y_dx2 = g.gradient(dy_dx, x) # Will compute to 2.0
Here you can see more example from the document to understand more https://www.tensorflow.org/api_docs/python/tf/GradientTape
Further discussion on "meaningfulness"
Let me translate the python code to mathematics first (I use https://www.codecogs.com/latex/eqneditor.php, hope it can display properly):
# integrate e^At * f[t] with respect to t between 0 and t, for all t
From above, it means you have a function. I call it g(t, A)
Then you are doing a definite integral. I call it G(t,A)
From your code, t is not variable any more, it is set to 10. So, we reduce to a function that has only one variable h(A)
Up to here, function h has a definite integral inside. But since you are approximating it, we should not think it as a real integral (dt -> 0), it is just another chain of simple maths. No mystery here.
Then, the last output log_lik, which is simply some simple mathematical operations with one new input variable observed_data, I call it y.
Then a function z that compute log_lik is:
z is no different than other normal chain of maths operations in tensorflow. Therefore, dz_dA is meaningful in the sense that the gradient of z w.r.t A gives you the gradient to update A that you can minimize z

Finding gradient of an unknown function at a given point in Python

I am asked to write an implementation of the gradient descent in python with the signature gradient(f, P0, gamma, epsilon) where f is an unknown and possibly multivariate function, P0 is the starting point for the gradient descent, gamma is the constant step and epsilon the stopping criteria.
What I find tricky is how to evaluate the gradient of f at the point P0 without knowing anything on f. I know there is numpy.gradient but I don't know how to use it in the case where I don't know the dimensions of f. Also, numpy.gradient works with samples of the function, so how to choose the right samples to compute the gradient at a point without any information on the function and the point?
I'm assuming here, So how can i choose a generic set of samples each time I need to compute the gradient at a given point? means, that the dimension of the function is fixed and can be deduced from your start point.
Consider this a demo, using scipy's approx_fprime, which is an easier to use wrapper-method for numerical-differentiation and also used in scipy's optimizers when a jacobian is needed, but not given.
Of course you can't ignore the parameter epsilon, which can make a difference depending on the data.
(This code is also ignoring optimize's args-parameter which is usually a good idea; i'm using the fact that A and b are inside the scope here; surely not best-practice)
import numpy as np
from scipy.optimize import approx_fprime, minimize
np.random.seed(1)
# Synthetic data
A = np.random.random(size=(1000, 20))
noiseless_x = np.random.random(size=20)
b = A.dot(noiseless_x) + np.random.random(size=1000) * 0.01
# Loss function
def fun(x):
return np.linalg.norm(A.dot(x) - b, 2)
# Optimize without any explicit jacobian
x0 = np.zeros(len(noiseless_x))
res = minimize(fun, x0)
print(res.message)
print(res.fun)
# Get numerical-gradient function
eps = np.sqrt(np.finfo(float).eps)
my_gradient = lambda x: approx_fprime(x, fun, eps)
# Optimize with our gradient
res = res = minimize(fun, x0, jac=my_gradient)
print(res.message)
print(res.fun)
# Eval gradient at some point
print(my_gradient(np.ones(len(noiseless_x))))
Output:
Optimization terminated successfully.
0.09272331925776327
Optimization terminated successfully.
0.09272331925776327
[15.77418041 16.43476772 15.40369129 15.79804516 15.61699104 15.52977276
15.60408688 16.29286766 16.13469887 16.29916573 15.57258797 15.75262356
16.3483305 15.40844536 16.8921814 15.18487358 15.95994091 15.45903492
16.2035532 16.68831635]
Using:
# Get numerical-gradient function with a way too big eps-value
eps = 1e-3
my_gradient = lambda x: approx_fprime(x, fun, eps)
shows that eps is a critical parameter resulting in:
Desired error not necessarily achieved due to precision loss.
0.09323354898565098

How does scipy.optimize.fmin (Simplex) deal with parameters associated with different magnitudes?

I want to fit a 4-parameters (a,g,N and k) model to data by minimizing a chi-square loss function with a python implementation of the Simplex algorithm (scipy.optimize.fmin).
Preliminary simulations suggest the following range for each parameter: a = [5, 50], g = [0.05, 1.5], N = [5, 200],and k = [0, 0.05].
Looks like the scipy.optimize.fmin function treats the parameters as if they were all in the same range (presumably [0, 1]). Should I rescale them? Below is my code:
#determine starting point (x0) for each parameter
a = np.random.uniform(5,50)
g = np.random.uniform(0.05, 1.5)
N = np.random.uniform(5, 200)
k = np.random.uniform(0, 0.05)
x0 = np.array ([a, g, N, k]) #initial guess for SIMPLEX
xopt = fmin (chis, x0, maxiter=1000)#call Simplex
Imagine that you want to minimize the following bi-variate function
def to_min1((x,y)):
return abs(1e-15 - x) + abs(1e15 - y)
Even if this example is not realistic, it highlights the main point. For sure, fmin may not move in x (if x0=0), because it is already very close to zero.
So as to get objectives which have equal weights within the optimization program, one makes them in terms of variations rather than in terms of differentials (with arguments to numerators to avoid ZeroDivisionError):
def to_min2((x,y)):
return abs(-1+x/1e-15) + abs(-1+y/1e15)
Note that this is an ftol concern, since, by doing so, one wants its iterative recomputation to be equally weighted over all arguments.
What follows does not exactly answer to your question, but to the one:
Does scipy.optimize.fmin (Simplex) deal with parameters associated with different magnitudes?
Apparently no, since
>>> fmin(to_min1, (0,0))
Optimization terminated successfully.
Current function value: 1000000000000000.000000
Iterations: 3
Function evaluations: 11
array([ 0., 0.])
while
>>> fmin(to_min2, (0,0))
Optimization terminated successfully.
Current function value: 1.000000
Iterations: 118
Function evaluations: 213
array([ 1.00000000e-15, 8.98437500e-05])
For sure the Optimization did not terminate successfully., and it could be done by increasing fmin's maxiter argument, etc... but the two cases are clearly not managed the same way.

python curve_fit doesn't work with stiff model

I am trying to find the x0parameter which fits as much as possible the blue model on the green curve (x0control the width of the crenel; see below).
Here is my attempt:
from pylab import *
from scipy.optimize import curve_fit
x=linspace(0,2*pi,1000)
def crenel(x):return sign(sin(x))
def inverter(x,x0): return (crenel(x-x0)+crenel(x+x0))/2
p,e = curve_fit(inverter,x,sin(x),1)
plot(x,inverter(x,*p),x,sin(x))
ylim(-1.5,1.5)
By hand, the optimal value is x0 = arcsin(1/2) # 0.523598, but curve_fit doesn't estimate any value ( "OptimizeWarning: Covariance of the parameters could not be estimated") . I suspect the stiffness of the model. The docs inform :
The algorithm uses the Levenberg-Marquardt algorithm through leastsq. Additional keyword arguments are passed directly to that algorithm.
So my question is : Is there keyword arguments that can help curve_fit to estimate the parameter in this case ? or another approach ?
Thanks for any advice.
The problem is that the objective function that curve_fit tries to minimize is not continuous. x0 controls the location of the discontinuities in the inverter function. When a discontinuity crosses one of the grid points in x, there is a jump in the objective function. Between these points, the objective function is constant. curve_fit (actually, leastsq, the function used by curve_fit) is not designed to handle such a function.
The following function sse is (in effect) the function that curve_fit tries to minimize, with x being the same x defined in your example, and y = sin(x):
def sse(x0, x, y):
f = inverter(x, x0)
diff = y - f
s = (diff**2).sum()
return s
If you plot this function on a fine grid with code such as
xx = np.linspace(0, 1, 10000)
yy = [sse(x0, x, y) for x0 in xx]
plot(xx, yy)
and zoom in, you'll see
To use scipy to find your optimal value, you can use fmin with a smooth objective function. For example, here's the continuous objective function, using only the interval [0, pi/2] (quad is scipy.integrate.quad):
def func(x0):
s0, e0 = quad(lambda x: np.sin(x)**2, 0, x0)
s1, e0 = quad(lambda x: (1 - np.sin(x))**2, x0, 0.5*np.pi)
return s0 + s1
scipy.optimize.fmin can be used to find the minimum of that function, as in this snippet from an ipython session:
In [202]: fmin(func, 0.3, xtol=1e-8)
Optimization terminated successfully.
Current function value: 0.100545
Iterations: 28
Function evaluations: 56
Out[202]: array([ 0.52359878])
In [203]: np.arcsin(0.5)
Out[203]: 0.52359877559829882

Categories

Resources