Scipy.optimize.root enables you to minimize a vector function while Scipy.optimize.root_scalar enables you to minimize a scalar function. What I need to solve is somewhat in between. I have a bunch of complex functions f_i depending on index i and x_i, I want to solve f_1(x_1)=0,f_2(x_2)=0,...,f_n(x_n)=0. But instead of solving them using for loop, I want to solve in a vectorized style. The reason for that is because using for loop to query the value of f_1,...,f_n is expensive. But querying them in batch (f_1,...f_n) is relatively cheaper.
Let f=(f_1,...,f_n) and x=(x_1,...x_n). We want to solve f(x)=(f_1(x_1),f_2(x_2),...f_n(x_n))=0. By directly calling scipy.optimize.root is not ideal since the solver has no idea that each dimension is independent.
A toy example:
from scipy import optimize
import numpy as np
coef = np.arange(10)
def f(x):
return x ** 2 + 2 * coef * x + coef ** 2
optimize.root(f, np.zeros(10))
How can we let the solver know each dimension is independent to speed it up?
The above is just a toy example to illustrate my problem. In the real case, the function f is like a black box and there is no analytical derivative for each component f_1, f_2, ...f_n. So I couldn't just input a diagonal Jacobian to the solver. I tried to look at if we can let the solver know that Jacobian matrix should be diagonal but I have no luck towards this path. Any suggestions?
Related
I want to solve the following (convex) minimization problem:
min ||x||_1 under the constraints sgn(A[x,R]=y) and ||x||_2 = 1
where A is a mx(N+1) matrix, x in R^N a vector, and \[x,R\] a vector that is created by appending a given number R. The objective is to find the optimal value for x.
A is a Fourier matrix and there are fast matrix-vector, inversion, etc. algorithms available. Since this matrix is really big, I need to use an optimization algorithm that utilizes this.
Currently, I use the following implementation in cvxpy, which is way too slow:
import cvxpy as cvx
# rewrite the problem in the form x = x^- + x^+
n = A.shape[1]-1
vx = cvx.Variable(2*n)
objective = cvx.Minimize(cvx.pnorm(vx, 1)) # min ||x||_1
constraints = [vx >= 0, cvx.multiply(A[:,:n] # vx[:n] - A[:,:n] # vx[n:] + A[:,n]*R, y) >= 0,
cvx.norm(vx, 2) <= R] # sgn(A[x,1]) = y, ||x||_2 <= R
x, solve_time = solve(vx, objective, constraints)
solution = x[:n] - x[n:]
Is there a way to use fast matrix computations in cvxpy? Or is there a better library? I found a few implementations that can do this for one special algorithm but not in the general case, so I was not able to implement my problem.
No. The solver will not call your matrix multiplication code. They do their own linear algebra, which is very different in many ways. In a sense your matrix multiplication is just notation for the problem statement.
Regarding performance, it depends heavily on where the bottleneck is. Is it in generating the model (in cvxpy itself) or in the solver? What solver are you using? Consider using a different solver. Obviously, we don't have enough information (and no reproducible example) to answer this question.
What method should i use?
a,b are vectors, or arrays n dimensionals, and X is (nxn) dimensional. Im using numpy for this.
I have a
X^T X a=X^T b
matrix vector equation. X,X^T,b is known and the question is a.
I have tried X^T X as X^T#X=z and doing z^-1, then
z^-1#X^T =g and doing np.linalg.solve(g,b). Is there some basic linear algebra i'm doing wrong here?
Is there a specific python code for these types of equations?
"Is there a specific python code for these types of equations?"
Yes. The problem that you are solving is ordinary least squares (see also linear least squares).
NumPy has the function numpy.linalg.lstsq for solving such problems. In your case, to compute a given X and b, you would use
a, residuals, rank, singvals = np.linalg.lstsq(X, b)
residuals, rank and singvals are additional information returned by lstsq, as explained in the docstring.
I have an objective function from a paper that I would like to minimize with gradient descent. I have not yet had to do this "from scratch" and would like some advice as to how to code it up manually. The objective function is:
T(L) = tr(X.T L^s X) - beta * ||L||.
where L is an N x N matrix positive semidefinite matrix to be estimated, X is an N x M matrix, beta is a regularization constant, X.T = X transpose, and ||.|| is the frobenius norm.
Also, L^s is the matrix exponential where L^s = F Λ^s F.T, where F is a matrix of the eigenvectors of L and Λ is the diagonal matrix of eigenvalues of L.
The derivative of the objective function is:
dT/dL = sum_{from r = 0 to r = s - 1} L^r (XX.T) L^(s-r-1) - 2 * beta * L
I have done very rudimentary gradient descent problems (such as matrix factorization) where optimization is done over every element of the matrix, or using packages/libraries. This kind of problem is more complex I am used to, and I was hoping that some of you that are much more experienced with this sort of thing could help me out.
Any general advice is much appreciated as well as specific recommendations of how to code this up in python or R.
Here is the link for the paper with this function:
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128136#sec016
Thank you very much for your help!
Paul
In general, it would probably be advisable to use a machine learning library such as tensorflow or pytorch. If you go down this route you have several advantages 1) efficient C++ implementation of the Tensor operations 2) automatic differentiation 3) easy access to more sophisticated optimizers (e.g. ADAM).
`
If you prefer to do the gradient computation yourself you could do that by setting the gradient L.grad manually before the optimization step
A simple implementation would look like this:
import torch
n=10
m=20
s = 3
b=1e-3
n_it=40
# L=torch.nn.Parameter(torch.rand(n,n))
F=torch.nn.Parameter(torch.rand(n,n))
D=torch.nn.Parameter(torch.rand(n))
X=torch.rand((n,m))
opt=torch.optim.SGD([F,D],lr=1e-4)
for i in range(n_it):
loss = (X.T.matmul(F.matmul((D**s).unsqueeze(1)*F.T)).matmul(X)).trace() - b * F.matmul((D**s).unsqueeze(1)*F.T).norm(2)
print(loss)
opt.zero_grad()
loss.backward()
opt.step()
I am trying to use the "brute" method to minimize a function of 20 variables. It is failing with a mysterious error. Here is the complete code:
import random
import numpy as np
import lmfit
def progress_update(params, iter, resid, *args, **kws):
pass
#print(resid)
def score(params, data = None):
parvals = params.valuesdict()
M = data
X_params = []
Y_params = []
for i in range(M.shape[0]):
X_params.append(parvals['x'+str(i)])
for j in range(M.shape[1]):
Y_params.append(parvals['y'+str(i)])
return diff(M, X_params, Y_params)
def diff(M, X_params, Y_params):
total = 0
for i in range(M.shape[0]):
for j in range(M.shape[1]):
total += abs(M[i,j] - (X_params[i] - Y_params[j])**2)
return total
dim = 10
random.seed(0)
M = np.empty((dim, dim))
for i in range(M.shape[0]):
for j in range(M.shape[1]):
M[i,j] = i*random.random()+j**2
params = lmfit.Parameters()
for i in range(M.shape[0]):
params.add('x'+str(i), value=random.random()*10, min=0, max=10)
for j in range(M.shape[1]):
params.add('y'+str(j), value=random.random()*10, min=0, max=10)
result = lmfit.minimize(score, params, method='brute', kws={'data': M}, iter_cb=progress_update)
However, this fails with:
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
What is causing this problem?
"What is causing this problem"
Math
You can't brute force a high dimensional problem because brute force methods require exponential work (time, and memory if implemented naively).
More directly, lmfit uses numpy (*) under the hood, which has a maximum size of how much data it can allocate. Your initial data structure isn't too big (10x10), it's the combinatorical table required for a brute force that's causing problems.
If you're willing to hack the implementation, you could switch to a sparse memory structure. But this doesn't solve the math problem.
On High Dimensional Optimization
Try a different minimzer, but be warned: it's very difficult to minimze globally in high dimensional space. "Local minima" methods like fixed point / gradient descent might be more productive.
I hate to be pessimistic, but high level optimization is very hard when probed generally, and I'm afraid is beyond the scope of an SO question. Here is a survey.
Practical Alternatives
Gradient descent is supported a little in sklearn but more for machine learning than general optimization; scipy actually has pretty good optimization coverage, and great documentation. I'd start there. It's possible to do gradient descent there too, but not necessary.
From scipy's docs on unconstrained minimization, you have many options:
Method Nelder-Mead uses the Simplex algorithm [], []. This algorithm
is robust in many applications. However, if numerical computation of
derivative can be trusted, other algorithms using the first and/or
second derivatives information might be preferred for their better
performance in general.
Method Powell is a modification of Powell’s method [], [] which is a
conjugate direction method. It performs sequential one-dimensional
minimizations along each vector of the directions set (direc field in
options and info), which is updated at each iteration of the main
minimization loop. The function need not be differentiable, and no
derivatives are taken.
and many more derivative-based methods are available. (In general, you do better when you have derivative information available.)
Footnotes/Looking at the Source Code
(*) the actual error is thrown here, based on your numpy implementation. Quoted:
`if (npy_mul_with_overflow_intp(&nbytes, nbytes, dim)) {
PyErr_SetString(PyExc_ValueError,
"array is too big; `arr.size * arr.dtype.itemsize` "
"is larger than the maximum possible size.");
Py_DECREF(descr);
return NULL;`
I have been doing some Monte Carlo physics simulations with Python and I am in unable to determine the standard error for the coefficients of a non-linear least square fit.
Initially, I was using SciPy's scipy.stats.linregress for my model since I thought it would be a linear model but noticed it is actually some sort of power function. I then used NumPy's polyfit with the degrees of freedom being 2 but I can't find anyway to determine the standard error of the coefficients.
I know gnuplot can determine the errors for me but I need to do fits for over 30 different cases. I was wondering if anyone knows of anyway for Python to read the standard error from gnuplot or is there some other library I can use?
Finally found the answer to this long asked question! I'm hoping this can at least save someone a few hours of hopeless research for this topic. Scipy has a special function called curve_fit under its optimize section. It uses the least square method to determine the coefficients and best of all, it gives you the covariance matrix. The covariance matrix contains the variance of each coefficient. More exactly, the diagonal of the matrix is the variance and by square rooting the values, the standard error of each coefficient can be determined! Scipy doesn't have much documentation for this so here's a sample code for a better understanding:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plot
def func(x,a,b,c):
return a*x**2 + b*x + c #Refer [1]
x = np.linspace(0,4,50)
y = func(x,2.6,2,3) + 4*np.random.normal(size=len(x)) #Refer [2]
coeff, var_matrix = curve_fit(func,x,y)
variance = np.diagonal(var_matrix) #Refer [3]
SE = np.sqrt(variance) #Refer [4]
#======Making a dictionary to print results========
results = {'a':[coeff[0],SE[0]],'b':[coeff[1],SE[1]],'c':[coeff[2],SE[2]]}
print "Coeff\tValue\t\tError"
for v,c in results.iteritems():
print v,"\t",c[0],"\t",c[1]
#========End Results Printing=================
y2 = func(x,coeff[0],coeff[1],coeff[2]) #Saves the y values for the fitted model
plot.plot(x,y)
plot.plot(x,y2)
plot.show()
What this function returns is critical because it defines what will used to fit for the model
Using the function to create some arbitrary data + some noise
Saves the covariance matrix's diagonal to a 1D matrix which is just a normal array
Square rooting the variance to get the standard error (SE)
it looks like gnuplot uses levenberg-marquardt and there's a python implementation available - you can get the error estimates from the mpfit.covar attribute (incidentally, you should worry about what the error estimates "mean" - are other parameters allowed to adjust to compensate, for example?)