I am attempting to perform a multivariate optimization using scipy.optimize.minimize with a constraint, but the constraint is not on each individual variable; rather, it is on the summation of variables.
Here is the quadratic objective:
where A is a symmetric m by m matrix (m is the dimensionality of the points x and y).
The derivative of this function is very nice; A vanishes completely, making the gradient a constant that I can precompute. This is the gradient:
Here's the Python code I'm using to perform the optimization:
retval = scipy.optimize.minimize(f, A.flatten(),
args = (S, dAi.flatten(), A.shape[0]),
jac = True, method = 'SLSQP')
where A is the matrix (flattened), S is the set containing pairs of points x and y, and dAi is the precomputed gradient matrix (also flattened). The objective function f looks like this:
def f(A, S, dfA, k):
A = A.reshape((k, k))
return [np.sum([np.dot(x - y, A).dot(x - y) for x, y in S]), dfA]
However, this implementation spins off into infinity and never completes. I haven't been able to specify the summation constraint anywhere because the optimization method expects either bounds or inequality constraints on each variable, rather than on an aggregation.
Is there a way to do this that I'm missing? This question seemed close but never got a solution. This question involves multivariate optimization but was just an issue of an incorrect derivation, and this question seems analogous to my problem but involves Pandas, which I'm not using.
Related
I am trying to minimize a function of a vector of length 20, but I want to constrain the solution to be monotonic, i.e.
x[1] <= x[2]... <= x[20]
I have tried to implement this in the following way using "constraints" for this routine:
cons = tuple([{'type':'ineq', 'fun': lambda x: x[i]- x[i-1]} for i in range(1, len(node_vals))])
res = sp.optimize.minimize(localisation, b, args=(d), constraints = cons) #optimize
However, the results I get are not monotonic, even when the initial guess b is, it seems that the optimizer is completely ignoring the constraints. What could be going wrong? I have also tried changing the constraint to x[i]**3 - x[i+1]**3 to make it "smoother", but it didn't help at all. My objective function, localisation is the integral of solution to an eigenvalue problem whose parameters are defined beforehand:
def localisation(node_vals, domain): #calculate localisation for solutions with piecewise linear grading
f = piecewise(node_vals, domain) #create piecewise linear function using given values at nodes
#plt.plot(domain, f(domain))
M = diff_matrix(f(domain)) #differentiation matrix created from piecewise linear function
m = np.concatenate(([0], get_solutions(M)[1][:, 0], [0]))
integral = num_int(domain, m)
return integral
You didn’t post a minimum reproducible example that we can run. However, did you try to specify which optimization algorithm to use in SciPy? Something like this:
res = sp.optimize.minimize(localisation, b, args=(d), constraints = cons, method=‘SLSQP’)
I'm having a very similar problem but with additional upper and lower bounds on the monotonicity property. I'm tackling the problem like this (maybe it helps you):
Using the Trust-Region Constrained Algorithm given by scipy. This provides us a way of dealing with linear constraints in a matrix-manner:
lb <= A.dot(x) <= ub
where lb & and ub are the lower (upper) bounds of this constraint problem and A is the matrix, representing the linear constraint problem.
every row of matrix A is a linear term which defines a constraint
If, for example, x[0] <= x[1], then this can be transformed into x[0] - x[1] <= 0 which in terms of the linear constraint matrix A looks like this [1, -1,...], provided that the upper bound vector has a 0 value on this level of course (vice versa is also possible but either way, having at least one of both, lower or upper bound, makes this easy)
Setting up enough of these inequalities and at the same time merging a couple of those into a single inequality may create a sufficient matrix to solve this.
Hope this helps a bit, It did the job for my problem.
I know the library curve_fit of scipy and its power to fitting curves. I have read many examples here and in the documentation, but I cannot solve my problem.
For example, I have 10 files (chemical structers but it does not matter) and ten experimental energy values. I have a function inside a class that calculates for each structure the theoretical energy for some parameters and it returns a numpy array with the theoretical energy values.
I want to find the best parameters to have the theoretical values nearest to the experimental ones. I will furnish here the minimum exemple of my code
This is the class function that reads the experimental energy files, extracts the correct substring and returns the values as a numpy array. The self.path is just the directory and self.nPoints = 10. It is not so important, but I furnish for the sake of completeness
def experimentalValues(self):
os.chdir(self.path)
energy = np.zeros(self.nPoints)
for i in range(1, self.nPoints):
f = open("p_" + str(i + 1) + ".xyz", "r")
energy[i] = float(f.readlines()[1].split()[1])
f.close()
os.chdir('..')
return energy
I calculate the theoretical value with this class function that takes two numpy arrays as arguments, lets say
sigma = np.full(nSubstrate, 2.)
epsilon = np.full(nSubstrate, 0.15)
where nSubstrate = 9
Here there is the class function. It reads files and does two nested loops to calculate for each file the theoretical value and return it to a numpy array.
def theoreticalEnergy(self, epsilon, sigma):
os.chdir(self.path)
cE = np.zeros(self.nPoints)
for n in range(0, self.nPoints):
filenameXYZ = "p_" + str(n + 1) + "_extended.xyz"
allCoordinates = np.loadtxt(filenameXYZ, skiprows = 0, usecols = (1, 2, 3))
substrate = allCoordinates[0:self.nSubstrate]
surface = allCoordinates[self.nSubstrate:]
for i in range(0, substrate.shape[0]):
positionAtomI = np.array(substrate[i][:])
for j in range(0, surface.shape[0]):
positionAtomJ = np.array(surface[j][:])
distanceIJ = self.distance(positionAtomI, positionAtomJ)
cE[n] += self.LennardJones(distanceIJ, epsilon[i], sigma[i])
os.chdir('..')
return cE
Again, for the sake of completeness the Lennard Jones class function is defined as
def LennardJones(self, distance, epsilon, sigma):
repulsive = (sigma/distance) ** 12.
attractive = (sigma/distance) ** 6.
potential = 4. * epsilon* (repulsive - attractive)
return potential
where in this case all the arguments are scalar as the return value.
To conclude the problem presentation I have 3 ingredients:
a numpy array with the experimental data
two numpy arrays with a guess for the parameters sigma and epsilon
a function that takes the last parameters and returns a numpy vector with the values to be fitted.
How can I solve this problem like the approach described in the documentation https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html?
Curve fitting
The curve_fit fits a function f(w, x[i]) to points y[i] by finding w that minimizes sum((f(w, x[i] - y[i])**2 for i in range(n)). As you will read in the first line after the function definition
[It uses] non-linear least squares to fit a function, f, to data.
It refers to least_squares where it states
Given the residuals f(x) (an m-D real function of n real variables) and the loss function rho(s) (a scalar function), least_squares finds a local minimum of the cost function F(x):
Curve fitting is a kind of convex-cost multi-objective optimization. Since the each individual cost is convex, you can add all of them and that will still be a convex function. Notice that the decision variables (the parameters to be optimized) are the same in every point.
Your problem
In my understanding for each energy level you have a different set of parameters, if you write it as a curve fitting problem, the objective function could be expressed as sum((f(w[i], x[i]) - y[i])**2 ...), where y[i]is determined by the energy level. Since each of the terms in the sum is independent on the other terms, this is equivalent to finding each group of parametersw[i]separately minimizing(f(w[i], x[i]) - y[i])**2`.
Convexity
Convexity is a very convenient property for optimization because it ensures that you will have only one minimum in the parameter space. I am not doing a detailed analysis but have reasonable doubts about the convexity of your energy function.
The Lennard Jones function has the difference of a repulsive and an attractive force both with negative even exponent on the distance this alone is very unlikely to be convex.
The sum of multiple local functions centered at different positions has no defined convexity.
Molecular energy, or crystal energy, or protein folding are well known to be non-convex.
A few days ago (on a bike ride) I was thinking about this, how the molecules will be configured in a global minimum energy, and I was wondering if it finds that configuration so rapidly because of quantum tunneling effects.
Non-convex optimization
The non-convex (global) optimization is different from (non-linear) least-squares, in the sense that when a local minimum is found the process don't return immediately, it start making new attempts in different regions of the search spaces. If the function is smooth you can still take advantage of a gradient based local optimization method, but the complexity is still NP.
A classic global optimization method is the Simulated annenaling, if you have a chemical background I think you will have some insights reading about it. Once upon a time, simulated annealing was provided in scipy.optimize.
You will find a few global optimization methods in scipy.optimize. I would encourage you to try Basin hopping, since it was successfully applied to similar problems, as you can read in the references.
I hope this drop you on the right way to your solution. But, be aware that you will probably need to spend, learning how to use the function and will need to make some decisions. You will need to find a balance of accuracy, simplicity, efficiency.
If you want better solution take the time to derive the gradient of the cost function (you can return two values f, and df, where df is the gradient of f with respect to the decision variables).
I'm currently working through some exercises on multivariable function calculus and thought I would have a go at making my own function to determine gradient and hessian at a defined point for any function. I'm currently having issues when attempting to substitute the resulting matrices with coordinate values for an arbitrary function. I've already managed to solve specific examples, but my attempt to make a function to solve a user defined function isn't working correctly.
def multivariable_function(function, variables, substitute=(0,0)):
"""Determines Gradient and Hessian vectors for multivariable function.
Args:
function: Enter the multivariable function
variables: Enter list of variable names
substitute: Default = (0,0)
Returns:
gradient/hessian matrices for given coordinate
To do:
Include sympy symbol() generation within function
"""
#derive_by_array returns a gradient matrix for multivariable function
Gradient = simplify(derive_by_array(function, variables))
#derive_by_array returns a Hessian matrix for multivariable function
Hessian = simplify(derive_by_array(derive_by_array(function, variables), variables))
#Line currently isn't doing anything
Gradient.subs(zip(variables, substitute))
return Gradient, Hessian
This is the basic function so far in operation.
multivariable_function((x**2)*(y**3) + exp(2*x + x*y - 1) - (x**3 + 3*y**2)**2, (x,y))`
which yields the following result, I am however aiming to substitute the desired values into the gradient and hessian matrices to achieve the following desired result. I managed to achieved the desired result using the following.
from sympy import *
x, y, z, K, T, r, σ, h, a, f, μ, c, t, m, x1, x2, x3 = symbols('x, y, z, K, T, r, σ, h, a, f, μ, c, t, m, x1, x2, x3') # Variables used must be defined in sympy.
init_printing(use_unicode=False) #Print the answers in unicode characters
function = (x**2)*(y**3) + exp(2*x + x*y - 1) - (x**3 + 3*y**2)**2
Gradient_1 = simplify(derive_by_array(function, (x, y)))
Hessian_1 = simplify(derive_by_array(derive_by_array(function, (x, y)), (x, y)))
Gradient_1.subs(x, 0).subs(y,0), Hessian_1.subs(x,0).subs(y,0)
After viewing the issue raised here, it seems zipping the two lists should enable the subs() function to work, but it currently isn't for me. I attempted to loop through 'variables and 'substitute' to sequentially apply .subs(), however I'm finding the function only works if the method is chained for all replacement variables, as in the example above.
Does anyone know how I can apply the .subs() n times for a given coordinate to yield the relevant gradient/hessian matrices?
The variable Gradient is of type
sympy.tensor.array.dense_ndim_array.ImmutableDenseNDimArray
Like almost all SymPy objects, with exception of mutable matrices, it is immutable. The method subs does not modify it in place; it returns a new object, which needs to be assigned.
Gradient = Gradient.subs(zip(variables, substitute))
Hessian = Hessian.subs(zip(variables, substitute))
Then the function works as expected, returning
([2*exp(-1), 0], [[4*exp(-1), exp(-1)], [exp(-1), 0]])
But I suggest not passing generators to subs; there are outstanding issues involving that. Convert to a list or a dict first, to be safe. (There is also a difference there: should substitutions be consecutive or simultaneous, although this does not matter when substituting numbers for symbols.)
subs_dict = dict(zip(variables, substitute))
Gradient = Gradient.subs(subs_dict)
Hessian = Hessian.subs(subs_dict)
I'm starting in python and I'm trying to solve a problem that Fmincon solves in Matlab.
Basically, my problem has 12 variables, and a list of 2000 values is created (linearly) and my objective is to maximize the largest value of this list.
In addition, the problem has a linear constraint.
I have tried, without success, using scipy but in all the free-gradient solvers or approximate gradient that I tried, it is not possible to insert linear constraints.
I've also tried using cvxopt, but I did not find any free-gradient solver or approximate gradient.
In addition, I would not like to use tools like Genetic Algorithm, PSO, Harmony Search, etc.
I would like to use a free-gradient solver or approximate gradient that is possible to insert linear constraints, like Fmincon in Matlab.
This is my objective function:
import numpy as np
def max_receita(X, f, CONSTANTE_CVAR):
# f is a matrix with 2000 rows and 12 columns
# CONSTANTE_CVAR is a matrix with 2000 rows and 12 columns
NSERIES = len(f)
REC = np.zeros(NSERIES)
X = np.transpose(X)
CONST_ANUAL = np.sum(CONSTANTE_CVAR,1)
for i in range(NSERIES):
REC[i] = np.dot(f[i],X) + CONST_ANUAL[i]
return -max(REC)
There are 12 variables represented by the vector X with 12 elements and this vector must have sum equal 1. That's my constraint.
Beside this, each variable has a different bound. So the solver must allow to insert bounds.
The vector X (all the 12 variables), with the inputs f and CONST_CVAR, creates the vector REC (1x2000) and my objective is to maximize the biggest value of the vector REC.
In this way, I need a solver that allows:
Free-gradient or approximate gradient
Linear constraints
Non linear function
Bounds
Python
Could anyone suggest any solver?
I am asked to write an implementation of the gradient descent in python with the signature gradient(f, P0, gamma, epsilon) where f is an unknown and possibly multivariate function, P0 is the starting point for the gradient descent, gamma is the constant step and epsilon the stopping criteria.
What I find tricky is how to evaluate the gradient of f at the point P0 without knowing anything on f. I know there is numpy.gradient but I don't know how to use it in the case where I don't know the dimensions of f. Also, numpy.gradient works with samples of the function, so how to choose the right samples to compute the gradient at a point without any information on the function and the point?
I'm assuming here, So how can i choose a generic set of samples each time I need to compute the gradient at a given point? means, that the dimension of the function is fixed and can be deduced from your start point.
Consider this a demo, using scipy's approx_fprime, which is an easier to use wrapper-method for numerical-differentiation and also used in scipy's optimizers when a jacobian is needed, but not given.
Of course you can't ignore the parameter epsilon, which can make a difference depending on the data.
(This code is also ignoring optimize's args-parameter which is usually a good idea; i'm using the fact that A and b are inside the scope here; surely not best-practice)
import numpy as np
from scipy.optimize import approx_fprime, minimize
np.random.seed(1)
# Synthetic data
A = np.random.random(size=(1000, 20))
noiseless_x = np.random.random(size=20)
b = A.dot(noiseless_x) + np.random.random(size=1000) * 0.01
# Loss function
def fun(x):
return np.linalg.norm(A.dot(x) - b, 2)
# Optimize without any explicit jacobian
x0 = np.zeros(len(noiseless_x))
res = minimize(fun, x0)
print(res.message)
print(res.fun)
# Get numerical-gradient function
eps = np.sqrt(np.finfo(float).eps)
my_gradient = lambda x: approx_fprime(x, fun, eps)
# Optimize with our gradient
res = res = minimize(fun, x0, jac=my_gradient)
print(res.message)
print(res.fun)
# Eval gradient at some point
print(my_gradient(np.ones(len(noiseless_x))))
Output:
Optimization terminated successfully.
0.09272331925776327
Optimization terminated successfully.
0.09272331925776327
[15.77418041 16.43476772 15.40369129 15.79804516 15.61699104 15.52977276
15.60408688 16.29286766 16.13469887 16.29916573 15.57258797 15.75262356
16.3483305 15.40844536 16.8921814 15.18487358 15.95994091 15.45903492
16.2035532 16.68831635]
Using:
# Get numerical-gradient function with a way too big eps-value
eps = 1e-3
my_gradient = lambda x: approx_fprime(x, fun, eps)
shows that eps is a critical parameter resulting in:
Desired error not necessarily achieved due to precision loss.
0.09323354898565098