I am trying to use scipy.optimize.newton_krylov() to solve a least-squares optimization problem, i.e. finding x such that (Ax - b)**2 = 0. My understanding is that A has to be mxn with m>n, b has to be mx1, and x will be nx1. When I try to run the optimization, I get an error:
ValueError: expected square matrix, but got shape=(40, 6)
Presumably this error concerns the computation of the Jacobian and not my input matrix A? But if so, how can I change the values I am providing to the functions to resolve this problem? Any advice would be appreciated.
The following code reproduces the error:
import numpy as np
from scipy.optimize import newton_krylov
A = np.random.uniform(0, 1, (40,6))
b = np.arange(40)
x0 = np.ones(6)
def F(x):
return (A.dot(x) - b)**2
x = newton_krylov(F, np.ones(6))
As the docstring of newton_krylov explains, it finds a root of a function F(x). The function F must accept a one-dimensional array, and return a one-dimensional array of the same size as the input. If, for example, x has length 3, F(x) must return an array with length 3. In that case, newton_krylov attempts to solve F(x) = [0, 0, 0].
The error that you got is the result of newton_krylov attempting to use the numerically computed Jacobian matrix of F with a function that expects the matrix to be square. Your function F has a Jacobian matrix with shape (40, 6), because the input has length 6 and the output has length 40.
By itself, newton_krylov is not the right function to use for solving a least-squares problem. A least-squares problem is a minimization problem, not a root-finding problem. (A solver such as newton_krylov might be used to implement a minimization algorithm, but I assume you are interested in using an existing solution rather than writing your own.)
You say you want to solve a least-squares problem, but then you say "i.e. finding x such that (Ax - b)**2 = 0." I assume that was just a bit a sloppiness in your description, because that is not the least-squares problem. The least-squares problem is to find x such that sum((Ax - b)**2) is minimized. (In general, there won't be an x that makes the sum of squares equal to zero.)
So, assuming you really want to find x such that sum((Ax - b)**2) is minimized, you can use scipy.linalg.lstsq.
For example:
In [54]: from scipy.linalg import lstsq
In [55]: A = np.random.uniform(0, 1, (40,6))
In [56]: b = np.arange(40)
In [57]: x, res, rank, s = lstsq(A, b)
In [58]: x
Out[58]:
array([ 5.07513787, 1.83858547, 18.07818853, 9.28805475,
6.13019155, -0.7045539 ])
Krylov method requires the first argument (function F(x) in your case) to be a square matrix.
This seems like a homework question but the answer will be adjusting matrix A to make it square. Examples: https://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/optimize.html#kk
Related
Scipy.optimize.root enables you to minimize a vector function while Scipy.optimize.root_scalar enables you to minimize a scalar function. What I need to solve is somewhat in between. I have a bunch of complex functions f_i depending on index i and x_i, I want to solve f_1(x_1)=0,f_2(x_2)=0,...,f_n(x_n)=0. But instead of solving them using for loop, I want to solve in a vectorized style. The reason for that is because using for loop to query the value of f_1,...,f_n is expensive. But querying them in batch (f_1,...f_n) is relatively cheaper.
Let f=(f_1,...,f_n) and x=(x_1,...x_n). We want to solve f(x)=(f_1(x_1),f_2(x_2),...f_n(x_n))=0. By directly calling scipy.optimize.root is not ideal since the solver has no idea that each dimension is independent.
A toy example:
from scipy import optimize
import numpy as np
coef = np.arange(10)
def f(x):
return x ** 2 + 2 * coef * x + coef ** 2
optimize.root(f, np.zeros(10))
How can we let the solver know each dimension is independent to speed it up?
The above is just a toy example to illustrate my problem. In the real case, the function f is like a black box and there is no analytical derivative for each component f_1, f_2, ...f_n. So I couldn't just input a diagonal Jacobian to the solver. I tried to look at if we can let the solver know that Jacobian matrix should be diagonal but I have no luck towards this path. Any suggestions?
I'm hoping to just get some assistance conceptually about how to solve a linear system of equations with penalty functions. Example code is at the bottom.
Let's say I'm trying trying to do a fit of this equation:
Ie=IaX+IbY+IcZ
where Ie, Ia, Ib, and Ic are constants, and X,Y,Z are variables
I could easily solve this system of equations using scipy.least_squares, but I want to constrain the system with 2 contraints.
1. X+Y+Z=1
2. X,Y,Z > 0 and X,Y,Z < 1
To do this, I then modified the above function.
Ie-Ic=X(Ia-Ic)+Y(Ib-Ic) where X+Y+Z=1 I solved for Z
therefore
Ax=B where A=[Ia-Ic,Ib-Ic] and B=[Ie-Ic] given bounds (0,1)
This solves the 2nd criteria of X,Y,Z > 0 and X,Y,Z < 1, but it does not solve the 1st.
To resolve the first issue, an additional constraint needs to be made, where X+Y<1, and this I don't quite know how to do.
So I presume least_squares has some built in penalty function for it's bounds. I.E.
chi^2=||A-Bx||^2+P
where P is the conditions, if X,Y,Z>0 then P = 10000
thus, giving high chi squared values and thus setting the constraints
I don't know how I can add a further condition. So if X+Y<1 then P=10000 or something similar along those lines.
In short, least_squares enables you to set bounds on the individual values, but I'd like to set some further constraints, and I don't quite know how to do this with least_squares. I've seen additional inequality constraint options in scipy.minimize, but I don't quite know how to apply that to a linear system of equations with the format Ax=B.
So as an example code, lets say I've already done the calculations and obtained my A matrix of constants and my B vector. I use least squares, get my values for X and Y, and can calculate Z since X+Y+Z=1. The issue here is in my minimization, I did not set a constraint that X+Y<1, so in some cases you can actually get values where X+Y>1. So I'd like to find a method where I can set that additional constraint, in addition to the bounds constraint for the individual variables:
Ax=np.array([[1,2],[2,4],[3,4]])
B=np.array([0,1,2])
solution=lsq_linear(Ax,B,lsq_solver='lsmr',bounds=(0,1))
X=solution.x[0]
Y=solution.x[1]
Z=1-sum(solution.x)
If minimize is the solution here, can you please show me how to set it up given the above matrix of A and array of B?
Any advice, tips, or help to point me in the right direction is greatly appreciated!
Edit:
So I found something similar on here: Minimizing Least Squares with Algebraic Constraints and Bounds
So I thought I'd apply it to my case, but I don't think I've been able to apply it properly.
Ax=np.array([[1,2],[2,4],[3,4]])
B=np.array([0,1,2])
def fun(x,a1,a2,y):
fun_output=x[0]*a1+x[1]*a2
return np.sum((fun_output-y)**2)
cons = [{"type": "eq", "fun": lambda x: x[0] + x[1] - 1}]
bnds = [(0, 1), (0, 1)]
xinit = np.array([1, 1])
solution=minimize(fun,args=(Ax[:,0],Ax[:,1], B), x0=xinit, bounds=bnds, constraints=cons)
solution_2=lsq_linear(Ax,B,bounds=(0,1))
print(solution.x)
print(solution_2.x)
Issue is, the output of this differs from lsq_linear, and I almost always get a very close to zero value for Z regardless of what the input arrays are. I don't know if I'm setting this up/understanding this correctly.
Your initial guess xinit is not feasible and doesn't satisfy your constraint.
IMO, solving the initial problem directly as a constrained nonlinear optimization problem (NLP) instead of rewriting it is the easier approach. Assuming you have all the data points Ia, Ib, Ic and Ie (you didn't provide all of them), you can use the following code snippet which is in the same vein as the linked answer of mine in your question.
from scipy.optimize import minimize
import numpy as np
def fun_to_fit(coeffs, *args):
x, y, z = coeffs
Ia, Ib, Ic = args
return Ia*x + Ib*y + Ic*z
def objective(coeffs, *args):
Ia, Ib, Ic, Ie = args
residual = Ie - fun_to_fit(coeffs, Ia, Ib, Ic)
return np.sum(residual**2)
# Constraint: x + y + z == 1
con = [{'type': 'eq', 'fun': lambda coeffs: np.sum(coeffs) - 1}]
# bounds
bounds = [(0, 1), (0, 1), (0, 1)]
# initial guess (fulfils the constraint and lies within the bounds)
x0 = np.array([0.25, 0.5, 0.25])
# your given data points
#Ia = np.array(...)
#Ib = np.array(...)
#Ic = np.array(...)
#Ie = np.array(...)
# solve the NLP
res = minimize(lambda coeffs: objective(coeffs, Ia, Ib, Ic, Ie), x0=x0, bounds=bounds, constraint=con)
What method should i use?
a,b are vectors, or arrays n dimensionals, and X is (nxn) dimensional. Im using numpy for this.
I have a
X^T X a=X^T b
matrix vector equation. X,X^T,b is known and the question is a.
I have tried X^T X as X^T#X=z and doing z^-1, then
z^-1#X^T =g and doing np.linalg.solve(g,b). Is there some basic linear algebra i'm doing wrong here?
Is there a specific python code for these types of equations?
"Is there a specific python code for these types of equations?"
Yes. The problem that you are solving is ordinary least squares (see also linear least squares).
NumPy has the function numpy.linalg.lstsq for solving such problems. In your case, to compute a given X and b, you would use
a, residuals, rank, singvals = np.linalg.lstsq(X, b)
residuals, rank and singvals are additional information returned by lstsq, as explained in the docstring.
The scipy.linalg.eigh function can take two matrices as arguments: first the matrix a, of which we will find eigenvalues and eigenvectors, but also the matrix b, which is optional and chosen as the identity matrix in case it is left blank.
In what scenario would someone like to use this b matrix?
Some more context: I am trying to use xdawn covariances from the pyRiemann package. This uses the scipy.linalg.eigh function with a covariance matrix a and a baseline covariance matrix b. You can find the implementation here. This yields an error, as the b matrix in my case is not positive definitive and thus not useable in the scipy.linalg.eigh function. Removing this matrix and just using the identity matrix however solves this problem and yields relatively nice results... The problem is that I do not really understand what I changed, and maybe I am doing something I should not be doing.
This is the code from the pyRiemann package I am using (modified to avoid using functions defined in other parts of the package):
# X are samples (EEG data), y are labels
# shape of X is (1000, 64, 2459)
# shape of y is (1000,)
from scipy.linalg import eigh
Ne, Ns, Nt = X.shape
tmp = X.transpose((1, 2, 0))
b = np.matrix(sklearn.covariance.empirical_covariance(tmp.reshape(Ne, Ns * Nt).T))
for c in self.classes_:
# Prototyped response for each class
P = np.mean(X[y == c, :, :], axis=0)
# Covariance matrix of the prototyper response & signal
a = np.matrix(sklearn.covariance.empirical_covariance(P.T))
# Spatial filters
evals, evecs = eigh(a, b)
# and I am now using the following, disregarding the b matrix:
# evals, evecs = eigh(a)
If A and B were both symmetric matrices that doesn't necessarily have to imply that inv(A)*B must be a symmetric matrix. And so, if i had to solve a generalised eigenvalue problem of Ax=lambda Bx then i would use eig(A,B) rather than eig(inv(A)*B), so that the symmetry isn't lost.
One practical application is in finding the natural frequencies of a dynamic mechanical system from differential equations of the form M (d²x/dt²) = Kx where M is a positive definite matrix known as the mass matrix and K is the stiffness matrix, and x is displacement vector and d²x/dt² is acceleration vector which is the second derivative of the displacement vector. To find the natural frequencies, x can be substituted with x0 sin(ωt) where ω is the natural frequency. The equation reduces to Kx = ω²Mx. Now, one can use eig(inv(K)*M) but that might break the symmetry of the resultant matrix, and so I would use eig(K,M) instead.
A - lambda B x it means that x is not in the same basis as the covariance matrix.
If the matrix is not definite positive it means that there are vectors that can be flipped by your B.
I hope it was helpful.
I'm trying to solve an overdetermined system with QR decomposition and linalg.solve but the error I get is
LinAlgError: Last 2 dimensions of the array must be square.
This happens when the R array is not square, right? The code looks like this
import numpy as np
import math as ma
A = np.random.rand(2,3)
b = np.random.rand(2,1)
Q, R = np.linalg.qr(A)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
Is there a way to write this in a more efficient way for arbitrary A dimensions? If not, how do I make this code snippet work?
The reason is indeed that the matrix R is not square, probably because the system is overdetermined. You can try np.linalg.lstsq instead, finding the solution which minimizes the squared error (which should yield the exact solution if it exists).
import numpy as np
A = np.random.rand(2, 3)
b = np.random.rand(2, 1)
x_qr = np.linalg.lstsq(A, b)[0]
You need to call QR with the flag mode='reduced'. The default Q R matrices are returned as M x M and M x N, so if M is greater than N then your matrix R will be nonsquare. If you choose reduced (economic) mode your matrices will be M x N and N x N, in which case the solve routine will work fine.
However, you also have equations/unknowns backwards for an overdetermined system. Your code snippet should be
import numpy as np
A = np.random.rand(3,2)
b = np.random.rand(3,1)
Q, R = np.linalg.qr(A, mode='reduced')
#print(Q.shape, R.shape)
Qb = np.matmul(Q.T,b)
x_qr = np.linalg.solve(R,Qb)
As noted by other contributors, you could also call lstsq directly, but sometimes it is more convenient to have Q and R directly (e.g. if you are also planning on computing projection matrix).
As shown in the documentation of numpy.linalg.solve:
Computes the “exact” solution, x, of the well-determined, i.e., full rank, linear matrix equation ax = b.
Your system of equations is underdetermined not overdetermined. Notice that you have 3 variables in it and 2 equations, thus fewer equations than unknowns.
Also notice how it also mentions that in numpy.linalg.solve(a,b), a must be an MxM matrix. The reason behind this is that solving the system of equations Ax=b involves computing the inverse of A, and only square matrices are invertible.
In these cases a common approach is to take the Moore-Penrose pseudoinverse, which will compute a best fit (least squares) solution of the system. So instead of trying to solve for the exact solution use numpy.linalg.lstsq:
x_qr = np.linalg.lstsq(R,Qb)