Large sparse matrix inversion on Python

Large sparse matrix inversion on Python - python

I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
I chose Python (which is not the fastest) and it works pretty well. However, in my code, I have inversions of large sparse symmetric (non-positive definite, so can't use Cholesky) matrix to execute (image below). I currenty use np.linalg.inv() which is using the LU decomposition method.
I pretty sure there would be some optimization to do in terms of rapidity.
I thought about Cuthill-McKee algotihm to rearange the matrix and take its inverse. Do you have any ideas or advice ?
Thank you very much for your answers !

Good news is that if you're using any of the popular python libraries for linear algebra (namely, numpy), the speed of python really doesn't matter for the math – it's all done natively inside the library.
For example, when you write matrix_prod = matrix_a # matrix_b, that's not triggering a bunch of Python loops to multiply the two matrices, but using numpy's internal implementation (which I think uses the FORTRAN LAPACK library).
The scipy.sparse.linalg module has your back covered when it comes to solving sparsely stored matrices specifying sparse systems of equations. (which is what you do with the inverse of a matrix). If you want to use sparse matrices, that's your way to go – notice that there's matrices that are sparse in mathematical terms (i.e., most entries are 0), and matrices which are stored as sparse matrix, which means you avoid storing millions of zeros. Numpy itself doesn't have sparsely stored matrices, but scipy does.
If your matrix is densely stored, but mathematically sparse, i.e. you're using standard numpy ndarrays to store it, then you won't get any more rapid by implementing anything in Python. The theoretical complexity gains will be outweighed by the practical slowness of Python compared to highly optimized inversion.
Inverting a sparse matrix usually loses the sparsity. Also, you never invert a matrix if you can avoid it at all! For a sparse matrix, solving the linear equation system Ax = b, with A your matrix and b a known vector, for x, is so much faster done forward than computing A⁻¹! So,
I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
since LS says you don't need the inverse matrix, simply don't calculate it, ever. The point of LS is finding a solution that's as close as it gets, even if your matrix isn't invertible. Which can very well be the case for sparse matrices!

Related

Python correspondent for MATLAB matrix operation

I have a vector of indices (let's call it peo), a sparse matrix P and a matrix W. In MATLAB I can do an operation of this kind:
P(peo, peo) = W(peo, peo)
Is there a way to do the same in Python maintaining the same computational and time complexity?

Choosing library
There is a very similar way of expressing that in python if you use dense matrices. Using a sparse matrix is a little more complex. In general, if your code is not slowed by dense matrices too much and memory is not a problem I would stick to dense matrices with numpy since it is very convenient. (As they say premature optimization is the root of all evil... or something like that). However if you really need sparse matrices scipy will offer you an option for that.
Dense matrices
If you want to use dense matrices, you can use numpy to define matrices and peo should be defined as a list. Then you should note that the indexing with list (or vectors) doesn't work the same way in matlab as it does in python. Check this for details. (thank you Cris Lunego for pointing that out). To circumvent this and obtain the same behaviour as matlab, we will be using numpy.ix_. This will allows us to reproduce the matlab behavior of indexing with minimal alteration to our code.
Here is an example:
import numpy as np
# Dummy matrices definition
peo = [1, 3, 4]
P = np.zeros((5, 5))
W = np.ones((5, 5))
# Assignment (using np.ix_ to reproduce matlab behavior)
P[np.ix_(peo, peo)] = W[np.ix_(peo, peo)]
print(P)
Sparse matrices
For sparse matrices, scipy has a package called sparse that allows you to use sparse matrices along the way of matlab does. It gives you an actual choice on how the matrix should be represented where matlab doesn't. With great power comes great responsabiliy. Taking the time to read the pros and cons of each representation will help you choose the right one for your application.
In general it's hard to guarantee the exact same complexity because the languages are different and I don't know the intricate details of each. But the concept of sparse matrices is the same in scipy and matlab so you can expect the complexity to be comparable. (You might even be faster in python since you can choose a representation tailored to your needs).
Note that in this case if you want to keep working the same way as you describe in matlab you should choose a dok or lil representation. Those are the only two formats that allow efficient index access and sparsity change.
Here is an example of what you want to archive using dok representation:
from scipy.sparse import dok_matrix
import numpy as np
# Dummy matrices definition
peo = [1, 2, 4]
P = dok_matrix((5, 5))
W = np.ones((5, 5))
# Assignment (using np.ix_ to reproduce matlab behavior)
P[np.ix_(peo, peo)] = W[np.ix_(peo, peo)]
print(P.toarray())
If you are interested in the pros and cons of sparse matrix representation and algebra in Python here is a post that explores this a bit as well as performances. This is to take with a grain of salt since it is a little old, but the ideas behind it are still mostly correct.

Fast matrix inversion without a package

Assume that I have a square matrix M. Assume that I would like to invert the matrix M.
I am trying to use the the fractions mpq class within gmpy2 as members of my matrix M. If you are not familiar with these fractions, they are functionally similar to python's built-in package fractions. The only problem is, there are no packages that will invert my matrix unless I take them out of fraction form. I require the numbers and the answers in fraction form. So I will have to write my own function to invert M.
There are known algorithms that I could program, such as gaussian elimination. However, performance is an issue, so my question is as follows:
Is there a computationally fast algorithm that I could use to calculate the inverse of a matrix M?

Is there anything else you know about these matrices? For example, for symmetric positive definite matrices, Cholesky decomposition allows you to invert faster than the standard Gauss-Jordan method you mentioned.
For general matrix inversions, the Strassen algorithm will give you a faster result than Gauss-Jordan but slower than Cholesky.
It seems like you want exact results, but if you're fine with approximate inversions, then there are algorithms which approximate the inverse much faster than the previously mentioned algorithms.
However, you might want to ask yourself if you need the entire matrix inverse for your specific application. Depending on what you are doing it might be faster to use another matrix property. In my experience computing the matrix inverse is an unnecessary step.
I hope that helps!

Limits on Complex Sparse Linear Algebra in Python

I am prototyping numerical algorithms for linear programming and matrix manipulation with very large (100,000 x 100,000) very sparse (0.01% fill) complex (a+b*i) matrices with symmetric structure and asymmetric values. I have been happily using MATLAB for seven years, but have been receiving suggestions to switch to Python since it is open source.
I understand that there are many different Python numeric packages available, but does Python have any limits for handling these types of matrices and solving linear optimization problems in real time at high speed? Does Python have a sparse complex matrix solver comparable in speed to MATLAB's backslash A\b operator? (I have written Gaussian and LU codes, but A\B is always at least 5 times faster than anything else that I have tried and scales linearly with matrix size.)

Probably your sparse solvers were slower than A\b at least in part due to the interpreter overhead of MATLAB scripts. Internally MATLAB uses UMFPACK's multifrontal solver for LU() function and A\b operator (see UMFPACK manual).
You should try scipy.sparse package with scipy.sparse.linalg for the assortment of solvers available. In particular, spsolve() function has an option to call UMFPACK routine instead of the builtin SuperLU solver.
... solving linear optimization problems in real time at high speed?
Since you have time constraints you might want to consider iterative solvers instead of direct ones.
You can get an idea of the performance of SuperLU implementation in spsolve and iterative solvers available in SciPy from another post on this site.

Scipy: Linear programming with sparse matrices

I want to solve a linear program in python. The number of variables (I will call it N from now on) is very large (~50000) and in order to formulate the problem in the way scipy.optimize.linprog requires it, I have to construct two N x N matrices (A and B below). The LP can be written as
minimize: c.x
subject to:
A.x <= a
B.x = b
x_i >= 0 for all i in {0, ..., n}
whereby . denotes the dot product and a, b, and c are vectors with length N.
My experience is that constructing such large matrices (A and B have both approx. 50000x50000 = 25*10^8 entries) comes with some issues: If the hardware is not very strong, NumPy may refuse to construct such big matrices at all (see for example Very large matrices using Python and NumPy) and even if NumPy creates the matrix without problems, there is a huge performance issue. This is natural regarding the huge amount of data NumPy has to deal with.
However, even though my linear program comes with N variables, the matrices I work with are very sparse. One of them has only entries in the very first row, the other one only in the first M rows, with M < N/2. Of course I would like to exploit this fact.
As far as I have read (e.g. Trying to solve a Scipy optimization problem using sparse matrices and failing), scipy.optimize.linprog does not work with sparse matrices. Therefore, I have the following questions:
Is it actually true that SciPy does not offer any possibility to solve a linear program with sparse matrices? (If not, how can I do it?)
Do you know any alternative library that will solve the problem more effectively than SciPy with non-sparse matrices? (The library suggested in the thread above seems to be not flexible enough for my purposes - as far as I understand its documentation)
Can it be expected that a new implementation of the simplex algorithm (using plain Python, no C) that exploits the sparsity of the matrices will be more efficient than SciPy with non-sparse matrices?

I would say forming a dense matrix (or two) to solve a large sparse LP is probably not the right thing to do. When solving a large sparse LP it is important to use a solver that has facilities to handle such problems and also to generate the model in a way that does not create explicitly any of these zero elements.
Writing a stable, fast, sparse Simplex LP solver in Python as a replacement for the SciPy dense solver is not a trivial exercise. Moreover a solver written in pure Python may not perform as well.
For the size you indicate, although not very, very large (may be large medium sized model would be a good classification) you may want to consider a commercial solver like Cplex, Gurobi or Mosek. These solvers are very fast and very reliable (they solve basically any LP problem you throw at them). They all have Python APIs. The solvers are free or very cheap for academics.
If you want to use an Open Source solver, you may want to look at the COIN CLP solver. It also has a Python interface.
If your model is more complex then you also may want to consider to use a Python modeling tool such as Pulp or Pyomo (Gurobi also has good modeling support in Python).

I can't believe nobody has pointed you in the direction of PuLP! You will be able to create your problem efficiently, like so:
import pulp
prob = pulp.LpProblem("test problem",pulp.LpMaximize)
x = pulp.LpVariable.dicts('x', range(5), lowBound=0.0)
prob += pulp.lpSum([(ix+1)*x[ix] for ix in range(5)]), "objective"
prob += pulp.lpSum(x)<=3, "capacity"
prob.solve()
for k, v in prob.variablesDict().iteritems():
print k, v.value()
PuLP is fantastic, comes with a very decent solver (CBC) and can be hooked up to open source and commercial solvers. I am currently using it in production for a forestry company and exploring Dippy for the hardest (integer) problems we have. Best of luck!

Computing generalized eigen values for sparse matrices in python

I am using scipy.sparse.linalg.eigsh to solve the generalized eigen value problem for a very sparse matrix and running into memory problems. The matrix is a square matrix with 1 million rows/columns, but each row has only about 25 non-zero entries. Is there a way to solve the problem without reading the entire matrix into memory, i.e. working with only blocks of the matrix in memory at a time?
It's ok if the solution involves using a different library in python or in java.

For ARPACK, you only need to code up a routine that computes certain matrix-vector products. This can be implemented in any way you like, for instance reading the matrix from the disk.
from scipy.sparse.linalg import LinearOperator
def my_matvec(x):
y = compute matrix-vector product A x
return y
A = LinearOperator(matvec=my_matvec, shape=(1000000, 1000000))
scipy.sparse.linalg.eigsh(A)
Check the scipy.sparse.linalg.eigsh documentation for what is needed in the generalized eigenvalue problem case.
The Scipy ARPACK interface exposes more or less the complete ARPACK interface, so I doubt you will gain much by switching to FORTRAN or some other way to access Arpack.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.