Minimizing scipy.stats.multivariate_normal.logpdf with respect to covariance

Minimizing scipy.stats.multivariate_normal.logpdf with respect to covariance - python

I have a python script where I compute the value of a normal log-likelihood function for a sample of bivariate data using scipy's multivariate_normal.log_pdf. I am assuming the values of the sample means and variances, leaving only the sample correlation between the variables as the unknown,
from scipy.stats import multivariate_normal
from scipy.optimize import minimize
VAR_X = 0.4
VAR_Y = 0.32
MEAN_X = 1
MEAN_Y = 1.2
def log_likelihood_function(x, data):
log_likelihood = 0
sigma = [ [VAR_X, x[0]], [x[0], VAR_Y] ]
mu = [ MEAN_X, MEAN_Y ]
for point in data:
log_likelihood += multivariate_normal.logpdf(x=point, mean=mu, cov=sigma)
return log_likelihood
if __name__ == "__main__":
some_data = [ [1.1, 2.0], [1.2, 1.9], [0.8, 0.2], [0.7, 1.3] ]
guess = [ 0 ]
# maximize log-likelihood by minimizing the negative
likelihood = lambda x: (-1)*log_likelihood_function(x, some_data)
result = minimize(fun = likelihood, x0 = guess, options = {'disp': True}, method="SLSQP")
print(result)
No matter what I set as my guess, this script reliably throws a ValueError,
ValueError: the input matrix must be positive semidefinite
Now, the problem, by my estimation, seems to be scipy.optimize.minimize is guessing values that create a covariance matrix that is not positive definite. So I need a way to make sure the minimization algorithm throws away values that are outside the domain of the problem. I thought to add a constraint to the minimize call,
## make the determinant always positive
def positive_definite_constraint(x):
return VAR_X*VAR_Y - x*x
Which is basically the Slyvester Criteron for the covariance matrix and would ensure the matrix is positive definite (since we know the variance is always positiv, that condition doesn't need checked) But it seems like scipy.optimize.minimize evaluates the objective function before it determines if the constraints are satisfied (which seems like a design flaw; wouldn't it be faster to search for a solution in a restricted domain, instead of searching all possible solutions and then determining if the constraints are satisfied? I might be mistaken about the order of evaluation, though.)
I am not sure how to proceed. I realize I am stretching the purpose of scipy.optimize here a bit by parameterizing the covariance matrix and then minimizing with respect to that parameterization, and I know there are better ways to calculate the correlation for a normal sample, but I am interested in this problem because of its generalization to distributions that are not normal.
Any suggestions? Is there a better way to solve this problem?

You are on the right track. Note that your definiteness constraint reduces to a simple bound on the optimization variable, i.e. -∞ <= x[0] <= VAR_X*VAR_Y. Variable bounds are better handled internally than the more general constraints, so I'd recommend something like this:
bounds = [(None, VAR_X*VAR_Y)]
res = minimize(fun = likelihood, x0 = guess, bounds=bounds, options = {'disp': True}, method="SLSQP")
This gives me:
fun: 6.610504611834715
jac: array([-0.0063166])
message: 'Optimization terminated successfully'
nfev: 9
nit: 4
njev: 4
status: 0
success: True
x: array([0.12090069])

Related

scipy curve_fit incorrect for large X values

To determine trends over time, I use scipy curve_fit with X values from time.time(), for example 1663847528.7147126 (1.6 billion).
Doing a linear interpolation sometimes creates erroneous results, and providing approximate initial p0 values doesn't help. I found the magnitude of X to be a crucial element for this error and I wonder why?
Here is a simple snippet that shows working and non-working X offset:
import scipy.optimize
def fit_func(x, a, b):
return a + b * x
y = list(range(5))
x = [1e8 + a for a in range(5)]
print(scipy.optimize.curve_fit(fit_func, x, y, p0=[-x[0], 0]))
# Result is correct:
# (array([-1.e+08, 1.e+00]), array([[ 0., -0.],
# [-0., 0.]]))
x = [1e9 + a for a in range(5)]
print(scipy.optimize.curve_fit(fit_func, x, y, p0=[-x[0], 0.0]))
# Result is not correct:
# OptimizeWarning: Covariance of the parameters could not be estimated
# warnings.warn('Covariance of the parameters could not be estimated',
# (array([-4.53788811e+08, 4.53788812e-01]), array([[inf, inf],
# [inf, inf]]))
Almost perfect p0 for b removes the warning but still curve_fit doesn't work
print(scipy.optimize.curve_fit(fit_func, x, y, p0=[-x[0], 0.99]))
# Result is not correct:
# (array([-7.60846335e+10, 7.60846334e+01]), array([[-1.97051972e+19, 1.97051970e+10],
# [ 1.97051970e+10, -1.97051968e+01]]))
# ...but perfect p0 works
print(scipy.optimize.curve_fit(fit_func, x, y, p0=[-x[0], 1.0]))
#(array([-1.e+09, 1.e+00]), array([[inf, inf],
# [inf, inf]]))
As a side question, perhaps there's a more efficient method for a linear fit? Sometimes I want to find the second-order polynomial fit, though.
Tested with Python 3.9.6 and SciPy 1.7.1 under Windows 10.

Root cause
You are facing two problems:
Fitting procedure are scale sensitive. It means chosen units on a specific variable (eg. µA instead of kA) can artificially prevent an algorithm to converge properly (eg. One variable is several order of magnitude bigger than another and dominate the regression);
Float Arithmetic Error. When switching from 1e8 to 1e9 you just hit the magnitude when such a kind of error become predominant.
The second one is very important to realize. Let's say you are limited to 8 significant digits representation, then 1 000 000 000 and 1 000 000 001 are the same numbers as they are both limited to this writing 1.0000000e9 and we cannot accurately represents 1.0000000_e9 which requires one more digit (_). This is why your second example fails.
Additionally you are using an Non Linear Least Square algorithm to solve a Linear Least Square problem, and this is also somehow related to your problem.
You have three solutions:
Normalize;
Normalize and change the methodology/algorithm;
Increase the machine precision.
I'll choose the first one as it is more generic, the second one has been proposed by #blunova and totally makes sense, the latter is probably an inherent limitation.
Normalization
To mitigate both problems, a common solution is normalization. In your case a simple standardization is enough:
import numpy as np
import scipy.optimize
y = np.arange(5)
x = 1e9 + y
def fit_func(x, a, b):
return a + b * x
xm = np.mean(x) # 1000000002.0
xs = np.std(x) # 1.4142135623730951
result = scipy.optimize.curve_fit(fit_func, (x - xm)/xs, y)
# (array([2. , 1.41421356]),
# array([[0., 0.],
# [0., 0.]]))
# Back transformation:
a = result[0][1]/xs # 1.0
b = result[0][0] - xm*result[0][1]/xs # -1000000000.0
Or the same result using sklearn interface:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.linear_model import LinearRegression
pipe = Pipeline([
("scaler", StandardScaler()),
("regressor", LinearRegression())
])
pipe.fit(x.reshape(-1, 1), y)
pipe.named_steps["scaler"].mean_ # array([1.e+09])
pipe.named_steps["scaler"].scale_ # array([1.41421356])
pipe.named_steps["regressor"].coef_ # array([1.41421356])
pipe.named_steps["regressor"].intercept_ # 2.0
Back transformation
Indeed when normalizing the fit result is then expressed in term of normalized variable. To get the required fit parameters, you just need to do a bit of math to convert back the regressed parameters into the original variable scales.
Simply write down and solve the transformation:
y = x'*a' + b'
x' = (x - m)/s
y = x*a + b
Which gives you the following solution:
a = a'/s
b = b' - m/s*a'
Precision addendum
Numpy default float precision is float64 as you expected and has about 15 significant digits:
x.dtype # dtype('float64')
np.finfo(np.float64).precision # 15
But scipy.curve_fit relies on scipy.least_square which makes use of a squared metric to drive the optimization.
Without digging into the details I suspect this is where the problem happens, when dealing with values that are all close to 1e9 you reach the threshold where Float Arithmetic Error becomes predominant.
So this threshold of 1e9 you have hit is not related to the distinction between numbers on your variable x (float64 has sufficient precision to make it almost exactly different) but on the usage that is made of it when solving:
minimize F(x) = 0.5 * sum(rho(f_i(x)**2), i = 0, ..., m - 1)
subject to lb <= x <= ub`
You can also check that in its signature, tolerances are about 8 decades wide:
scipy.optimize.least_squares(fun, x0, jac='2-point', bounds=(- inf, inf),
method='trf', ftol=1e-08, xtol=1e-08, gtol=1e-08, x_scale=1.0,
loss='linear', f_scale=1.0, diff_step=None, tr_solver=None,
tr_options={}, jac_sparsity=None, max_nfev=None, verbose=0,
args=(), kwargs={})
Which may let you tweak the algorithm to add extra steps before convergence is reached (if so) but that will not replace or beat the usefulness of normalization.
Methods comparison
What is interesting with scipy.stats.linregress method is the scale tolerance which is handled by design. The method uses variable normalization and pure linear algebra and numerical stability trick (see the TINY variable) to solve the LS problem even in problematic conditions.
This of course contrasts with the scipy.optimize.curve_fit method which is a NLLS solver implemented as an optimized gradient descent algorithm (see Levenberg–Marquardt algorithm).
If you stick with linear least square problems (linear in terms of parameters not variables, so second order polynomial is LLS) then LLS is probably a simpler option to chose as it handles normalization for you.

If you just need to compute a linear fit, I believe curve_fit is not necessary and I would just use the linregress function instead from SciPy as well:
>>> from scipy import stats
>>> y = list(range(5))
>>> x = [1e8 + a for a in range(5)]
>>> stats.linregress(x, y)
LinregressResult(slope=1.0, intercept=-100000000.0, rvalue=1.0, pvalue=1.2004217548761408e-30, stderr=0.0, intercept_stderr=0.0)
>>> x2 = [1e9 + a for a in range(5)]
>>> stats.linregress(x2, y)
LinregressResult(slope=1.0, intercept=-1000000000.0, rvalue=1.0, pvalue=1.2004217548761408e-30, stderr=0.0, intercept_stderr=0.0)
In general, if you need a polynomial fit I would use NumPy polyfit.

Is there a way to optimize function thresholds with fixed steps?

How can I optimize a function with fixed steps? I have developed a function with five thresholds as entry that I want to optimize. I actually tried to optimize them with different solvers, but the steps that the solver takes are so tiny that the function never converge in a good solution.
Defined thresholds vary from 0 to 1, and I want them to take steps of 0.01. For example, in case of threshold_0, I want It to vary from initial guess 0.6 to 0.61 or 0.59, etc. depending on error result.
from scipy import optimize
initial_guess = [0.6,0.3,0.6,0.5,0.5]
def get_sobel3d_accuracy_from_thresholds(thresholds,array_dicts,ponderation_dict):
...
return error
result = optimize.minimize(
get_sobel3d_accuracy_from_thresholds, # function to optimize
initial_guess,
args=(array_dicts,ponderation_dict), # extra fixed args
method='nelder-mead',
options={'xatol': 1e-8, 'disp': True})
What I want to get is a solution that minimizes de error returned from the function get_sobel3d_accuracy_from_thresholds as follows:
optimized_thresholds = [0.61, 0.3, 0.81, 0.52, 0.44]
I would also like to fix boundaries for thresholds from 0 to 1, but I think that It can be done only with some solvers, right?
bounds = [(0, 1) for n in range(0,5)]
thank you all.

How does scipy.optimize.fmin (Simplex) deal with parameters associated with different magnitudes?

I want to fit a 4-parameters (a,g,N and k) model to data by minimizing a chi-square loss function with a python implementation of the Simplex algorithm (scipy.optimize.fmin).
Preliminary simulations suggest the following range for each parameter: a = [5, 50], g = [0.05, 1.5], N = [5, 200],and k = [0, 0.05].
Looks like the scipy.optimize.fmin function treats the parameters as if they were all in the same range (presumably [0, 1]). Should I rescale them? Below is my code:
#determine starting point (x0) for each parameter
a = np.random.uniform(5,50)
g = np.random.uniform(0.05, 1.5)
N = np.random.uniform(5, 200)
k = np.random.uniform(0, 0.05)
x0 = np.array ([a, g, N, k]) #initial guess for SIMPLEX
xopt = fmin (chis, x0, maxiter=1000)#call Simplex

Imagine that you want to minimize the following bi-variate function
def to_min1((x,y)):
return abs(1e-15 - x) + abs(1e15 - y)
Even if this example is not realistic, it highlights the main point. For sure, fmin may not move in x (if x0=0), because it is already very close to zero.
So as to get objectives which have equal weights within the optimization program, one makes them in terms of variations rather than in terms of differentials (with arguments to numerators to avoid ZeroDivisionError):
def to_min2((x,y)):
return abs(-1+x/1e-15) + abs(-1+y/1e15)
Note that this is an ftol concern, since, by doing so, one wants its iterative recomputation to be equally weighted over all arguments.
What follows does not exactly answer to your question, but to the one:
Does scipy.optimize.fmin (Simplex) deal with parameters associated with different magnitudes?
Apparently no, since
>>> fmin(to_min1, (0,0))
Optimization terminated successfully.
Current function value: 1000000000000000.000000
Iterations: 3
Function evaluations: 11
array([ 0., 0.])
while
>>> fmin(to_min2, (0,0))
Optimization terminated successfully.
Current function value: 1.000000
Iterations: 118
Function evaluations: 213
array([ 1.00000000e-15, 8.98437500e-05])
For sure the Optimization did not terminate successfully., and it could be done by increasing fmin's maxiter argument, etc... but the two cases are clearly not managed the same way.

numpy.polyfit gives useful fit, but infinite covariance matrix

I am trying to fit a polynomial to a set of data. Sometimes it may happen that the covariance matrix returned by numpy.ployfit only consists of inf, although the fit seems to be useful. There are no numpy.inf or 'numpy.nan' in the data!
Example:
import numpy as np
# sample data, does not contain really x**2-like behaviour,
# but that should be visible in the fit results
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
fit, cov = np.polyfit(x, y, 2, cov=True)
print 'fit: ', fit
print 'cov: ', cov
Result:
fit: [ 1.67867158e-06 5.69199547e-04 8.85146009e-01]
cov: [[ inf inf inf]
[ inf inf inf]
[ inf inf inf]]
np.cov(x,y) gives
[[ 6.25000000e+01 -6.07388099e-02]
[ -6.07388099e-02 5.92268942e-05]]
So np.cov is not the same as the covariance returned from np.polyfit. Has anybody an idea what's going on?
EDIT:
I now got the point that numpy.cov is not what I want. I need the variances of the polynom coefficients, but I dont get them if (len(x) - order - 2.0) == 0. Is there another way to get the variances of the fit polynom coefficients?

As rustil's answer says, this is caused by the bias correction applied to the denominator of the covariance equation, which results in a zero divide for this input. The reasoning behind this correction is similar to that behind Bessel's Correction. This is really a sign that there are too few datapoints to estimate covariance in a well-defined way.
How to skirt this problem? Well, this version of polyfit accepts weights. You could add another datapoint but weight it at epsilon. This is equivalent to reducing the 2.0 in this formula to a 1.0.
x = [-449., -454., -459., -464., -469.]
y = [ 0.9677024, 0.97341953, 0.97724978, 0.98215678, 0.9876293]
x_extra = x + x[-1:]
y_extra = y + y[-1:]
weights = [1.0, 1.0, 1.0, 1.0, 1.0, sys.float_info.epsilon]
fit, cov = np.polyfit(x, y, 2, cov=True)
fit_extra, cov_extra = np.polyfit(x_extra, y_extra, 2, w=weights, cov=True)
print fit == fit_extra
print cov_extra
The output. Note that the fit values are identical:
>>> print fit == fit_extra
[ True True True]
>>> print cov_extra
[[ 8.84481850e-11 8.11954338e-08 1.86299297e-05]
[ 8.11954338e-08 7.45405039e-05 1.71036963e-02]
[ 1.86299297e-05 1.71036963e-02 3.92469307e+00]]
I am very uncertain that this will be especially meaningful, but it's a way to work around the problem. It's a bit of a kludge though. For something more robust, you could modify polyfit to accept its own ddof parameter, perhaps in lieu of the boolean that cov currently accepts. (I just opened an issue to suggest as much.)
A quick final note about the calculation of cov: If you look at the wikipedia page on least squares regression, you'll see that the simplified formula for the covariance of the coefficients is inv(dot(dot(X, W), X)), which has a corresponding line in the numpy code -- at least roughly speaking. In this case, X is the Vandermonde matrix, and the weights have already been multiplied in. The numpy code also does some scaling (which I understand; it's part of a strategy to minimize numerical error) and multiplies the result by the norm of the residuals (which I don't understand; I can only guess that it's part of another version of the covariance formula).

the difference should be in the degree of freedom. In the polyfit method it already takes into account that your degree is 2, thus causing:
RuntimeWarning: divide by zero encountered in true_divide
fac = resids / (len(x) - order - 2.0)
you can pass your np.cov a ddof= keyword (ddof = delta degrees of freedom) and you'll run into the same problem

Does scipy's minimize function with method "COBYLA" accept bounds?

I'm using the algorithm 'COBYLA' in scipy's optimize.minimize function (v.0.11 build for cygwin). I observed that the parameter bounds seems not to be used in this case. For instance, the simple example:
from scipy.optimize import minimize
def f(x):
return -sum(x)
minimize(f, x0=1, method='COBYLA', bounds=(-2,2))
returns:
status: 2.0
nfev: 1000
maxcv: 0.0
success: False
fun: -1000.0
x: array(1000.0)
message: 'Maximum number of function evaluations has been exceeded.'
instead of the expected 2 for x.
Did anyone perceived the same problem? Is there a known bug or documentation error? In the scipy 0.11 documentation, this option is not excluded for the COBYLA algorithm. In fact the function fmin_cobyla doesn't have the bounds parameter.
Thanks for any hint.

You can formulate the bounds in the form of constraints
import scipy
#function to minimize
def f(x):
return -sum(x)
#initial values
initial_point=[1.,1.,1.]
#lower and upper bound for variables
bounds=[ [-2,2],[-1,1],[-3,3] ]
#construct the bounds in the form of constraints
cons = []
for factor in range(len(bounds)):
lower, upper = bounds[factor]
l = {'type': 'ineq',
'fun': lambda x, lb=lower, i=factor: x[i] - lb}
u = {'type': 'ineq',
'fun': lambda x, ub=upper, i=factor: ub - x[i]}
cons.append(l)
cons.append(u)
#similarly aditional constrains can be added
#run optimization
res = scipy.optimize.minimize(f,initial_point,constraints=cons,method='COBYLA')
#print result
print res
Note that the minimize function will give the design variables to the function. In this case 3 input variables are given with 3 upper and lower bounds. the result yields:
fun: -6.0
maxcv: -0.0
message: 'Optimization terminated successfully.'
nfev: 21
status: 1
success: True
x: array([ 2., 1., 3.])

The original COBYLA(2) FORTRAN algorithm does not support variable bounds explicitly, you have to formulate the bounds in the context of the general constraints.
Looking at the current source code for the SciPy minimize interface here, it is apparent that no measures has yet been taken in SciPy to handle this limitation.
Thus, in order to apply bounds for the cobyla algorithm in the SciPy minimize function, you will need to formulate the variable bounds as inequality constraints and contain them in the associated constraints parameter.
(source code excerpt)
// bounds set to anything else than None yields warning
if meth is 'cobyla' and bounds is not None:
warn('Method %s cannot handle bounds.' % method,
RuntimeWarning)
...
// No bounds argument in the internal call to the COBYLA function
elif meth == 'cobyla':
return _minimize_cobyla(fun, x0, args, constraints, **options)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Minimizing scipy.stats.multivariate_normal.logpdf with respect to covariance - python

Related

scipy curve_fit incorrect for large X values

Is there a way to optimize function thresholds with fixed steps?

How does scipy.optimize.fmin (Simplex) deal with parameters associated with different magnitudes?

numpy.polyfit gives useful fit, but infinite covariance matrix

Does scipy's minimize function with method "COBYLA" accept bounds?

Categories

Resources