Finding the closest solution to a system of linear equations

Finding the closest solution to a system of linear equations - python

I have a matrix A of size MxN where M>N and a vector b size M. I want to solve Ax=b as close as possible to b given I know that the system of equations is not solvable. In other words, I want to find x that will give me a vector that is closest to b. Looking online, it seems like I could reduce A down to its basis (linear independent vectors) and then find the projection of b onto that basis. However, I am not sure how to do this in python. I understand it would have something to do with qr decomposition, but I am not sure what the next step would be. And how it would be possible to recover x.

You can compute a least squares solution via np.linalg.lstsq:
x = np.linalg.lstsq(A, b)

Related

Solve linear equations on a Gram matrix with numpy

Consider a case where, given an MxM matrix A and a vector b, I want to solve something of the form inv(A # A.T) # b (where I know A is invertible).
As far as I know, it is always faster to use solve_* rather than inv. There are also variants for more efficient solving for PSD matrices (which A # A.T must be), using Cholesky factorization.
My question - since I'm constructing the matrix A # A.T just to immediately throw it away - is there a more specialized procedure for solving linear equations with the gram matrix of A without having to construct it?

You can compute the factorization of A and then use that to solve your system.
Assume we want to solve
A A^T x = b
for x.
Compute the factorization of A=LU.
Then solve Ay=b for y.
Then solve A^T x = y for x.
This way you dont have to compute the matrix A^T A.
Note that if one has a factorization of A=LU then one can solve Ax=b as well as A^T x=b efficiently for x.
This is because A^T=U^T L^T which is again a factorization of a lower times an upper triangular matrix.

How to find A in a Matrix multiplication Ax=b, with some Values of A known, and A being left stochastic

I have been trying to find an answer to this problem for a couple of hours now, but i can't find anything so far...
So I have two vectors let's call them b and x, of which i know all values. They add up to be the same amount, so sum(b) = sum(x).
I also have a Matrix, let's call it A, of which i know what values are 0, all the other values are unknown (but are different from 0).
Furthermore, the the elements of each column of A has the sum of 1 (I think that's called it's a left stochastic matrix)
Generally the Equation can be written in the form A*x = b.
Now I'm trying to find the missing values of A.
I have found one answer to the general problem here: https://math.stackexchange.com/questions/1170843/solving-ax-b-when-x-and-b-are-given
Furthermore i looked at the documentation of numpy.linalg
:https://docs.scipy.org/doc/numpy/reference/routines.linalg.html, but i just can't figure out how to do it.
It looks similar to a multi linear regression problem, but also on sklearn, i couldn't find anything: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression

Not a complete answer, but a bit of a more formal statement of the problem.
I think this can be solved as just a system of linear equations. Let
NZ = {(i,j)|a(i,j) is not fixed to zero}
Then write:
sum( j | (i,j) ∈ NZ, a(i,j) * x(j) ) = b(i) ∀i
sum( i | (i,j) ∈ NZ, a(i,j)) = 1 ∀j
This is just a system of linear equations in a(i,j). It may be under- (or over-) determined and it may be sparse. I think it depends a bit on this how to solve it. It may possible to think about these as constraints in a linear (or quadratic) programming problem. That would allow you to add an objective (in case of an underdetermined system or overdetermined -- in that case minimize sum of squared deviations, or 1-norm of deviations). In addition we can add bounds on a(i,j) (e.g. lower bounds of zero and upper bounds of one). So a linear programming approach may be what you are looking for.
This problem looks a bit like matrix balancing. This is used a lot for economic data sets that come from different sources and where we want to reconcile the data to get a consistent data set usable for subsequent modeling.

Computing 3D-homography with 5 3D-points

I've got a set of 3D-points in a projective space and I want to transform them into a metric 3D space so that I could measure distances in meters.
In order to do so, I need a 3D to 3D homography, which is a 4x4 matrix with 15 degrees of freedom (so I need 5 3D-points to get 15 equations).
I have a set of these 5 3D-points from the projective space and their corresponding 5 3D-points aligned in the metric space (which I expect the 5 projective points to be transformed to).
I can't figure out how to estimate the homography matrix. At first I tried:
A=np.vstack([p1101.T, p1111.T, p0101.T, p0001.T, p0011.T])
b=np.array([[1,1,0,1], [1,1,1,1], [0,1,0,1], [0,0,0,1], [0,0,1,1]])
x, _, _, _ = np.linalg.lstsq(A,b)
H = x.T
where p1101 is a [X,Y,Z,1] point which corresponds to [1,1,0,1] in the 3D metric space, etc..
However, this is not correct since I'm in projective space, so I need to create somehow an equation set where I divide the rows of H with its last or something like that.
I thought maybe there is an implemented method that will do it for me, for example in opencv, but didn't find. Any help would be appreciated.

I finally solved this question with a friend, and would like to share the solution.
Since in projective space, one needs to solve an equation set where the homogene coordinate of the outcome is the denominator of each other coordinate. i.e, if you want to find a 4x4 homography matrix H, and you have matching 3D points x and b (b is in the meteric space), you'll need to optimize the search of H parameters such that H applied on x will give a vector v with 4 coordinates, such that all the first three coordinates of v divided by the last coordinate are b. written in numpy:
v = H.dot(x)
v = v[:3]/v[3]
v == b # True
mathematically, the optimization is based on this (this is focused on the first coordinate only, for simplicity, but other coordinates are done the same way):
so in python one needs to arrange the equations for the solver in the explained manner, with 5 matching points. The way that was purposed in the question is good (just didn't solve the right problem), and in these terms it will make Ax=b least squares optimization such that A is 15x15 matrix, and b is a 15 dimensional vector.
Each matching point generates 3 equations, then 5 matching points will generate 15 equations built into the matrix A, thus solving the 15 DOF of the 3D homography H.

Best way to calculate the fundamental matrix of an absorbing Markov Chain?

I have a very large absorbing Markov chain (scales to problem size -- from 10 states to millions) that is very sparse (most states can react to only 4 or 5 other states).
I need to calculate one row of the fundamental matrix of this chain (the average frequency of each state given one starting state).
Normally, I'd do this by calculating (I - Q)^(-1), but I haven't been able to find a good library that implements a sparse matrix inverse algorithm! I've seen a few papers on it, most of them P.h.D. level work.
Most of my Google results point me to posts talking about how one shouldn't use a matrix inverse when solving linear (or non-linear) systems of equations... I don't find that particularly helpful. Is the calculation of the fundamental matrix similar to solving a system of equations, and I simply don't know how to express one in the form of the other?
So, I pose two specific questions:
What's the best way to calculate a row (or all the rows) of the inverse of a sparse matrix?
OR
What's the best way to calculate a row of the fundamental matrix of a large absorbing Markov chain?
A Python solution would be wonderful (as my project is still currently a proof-of-concept), but if I have to get my hands dirty with some good ol' Fortran or C, that's not a problem.
Edit: I just realized that the inverse B of matrix A can be defined as AB=I, where I is the identity matrix. That may allow me to use some standard sparse matrix solvers to calculate the inverse... I've got to run off, so feel free to complete my train of thought, which I'm starting to think might only require a really elementary matrix property...

Assuming that what you're trying to do is work out is the expected number of steps before absorbtion, the equation from "Finite Markov Chains" (Kemeny and Snell), which is reproduced on Wikipedia is:
Or expanding the fundamental matrix
Rearranging:
Which is in the standard format for using functions for solving systems of linear equations
Putting this into practice to demonstrate the difference in performance (even for much smaller systems than those you're describing).
import networkx as nx
import numpy
def example(n):
"""Generate a very simple transition matrix from a directed graph
"""
g = nx.DiGraph()
for i in xrange(n-1):
g.add_edge(i+1, i)
g.add_edge(i, i+1)
g.add_edge(n-1, n)
g.add_edge(n, n)
m = nx.to_numpy_matrix(g)
# normalize rows to ensure m is a valid right stochastic matrix
m = m / numpy.sum(m, axis=1)
return m
Presenting the two alternative approaches for calculating the number of expected steps.
def expected_steps_fundamental(Q):
I = numpy.identity(Q.shape[0])
N = numpy.linalg.inv(I - Q)
o = numpy.ones(Q.shape[0])
numpy.dot(N,o)
def expected_steps_fast(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
numpy.linalg.solve(I-Q, o)
Picking an example that's big enough to demonstrate the types of problems that occur when calculating the fundamental matrix:
P = example(2000)
# drop the absorbing state
Q = P[:-1,:-1]
Produces the following timings:
%timeit expected_steps_fundamental(Q)
1 loops, best of 3: 7.27 s per loop
And:
%timeit expected_steps_fast(Q)
10 loops, best of 3: 83.6 ms per loop
Further experimentation is required to test the performance implications for sparse matrices, but it's clear that calculating the inverse is much much slower than what you might expect.
A similar approach to the one presented here can also be used for the variance of the number of steps

The reason you're getting the advice not to use matrix inverses for solving equations is because of numerical stability. When you're matrix has eigenvalues that are zero or near zero, you have problems either from lack of an inverse (if zero) or numerical stability (if near zero). The way to approach the problem, then, is to use an algorithm that doesn't require that an inverse exist. The solution is to use Gaussian elimination. This doesn't provide a full inverse, but rather gets you to row-echelon form, a generalization of upper-triangular form. If the matrix is invertible, then the last row of the result matrix contains a row of the inverse. So just arrange that the last row you eliminate on is the row you want.
I'll leave it to you to understand why I-Q is always invertible.

pseudo inverse of sparse matrix in python

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem.
I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).
The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.
The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.
I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0
Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?
Thanks
Update 1st June:
thanks again for the help.
I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.
The code is as follow:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short).
Also now I can compare the results with the known solution, but how could one choose k in general?
I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.
Thanks again,
Jennifer

You could study more on the alternatives offered in scipy.sparse.linalg.
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e). At least specify:
'typical' size of A
'typical' percentage of nonzero entries in A
least-squares implies that norm(e) is minimized, but please indicate whether your main interest is on x_hat or on b_hat, where e= b- b_hat and b_hat= dot(A, x_hat)
Update: If you have some idea of the rank of A (and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k is the number of first singular values and vectors to use (i.e. 'effective' rank).
from scipy.sparse import hstack
from scipy.sparse.linalg import svds
def tls(A, b, k= 6):
"""A tls solution of Ax= b, for sparse A."""
u, s, v= svds(hstack([A, b]), k)
return v[-1, :-1]/ -v[-1, -1]

Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.