I am working on an FEM project using Scipy. Now my problem is, that
the assembly of the sparse matrices is too slow. I compute the
contribution of every element in dense small matrices (one for each
element). For the assembly of the global matrices I loop over all
small dense matrices and set the matrice entries the following way:
[i,j] = someList[k][l]
Mglobal[i,j] = Mglobal[i,j] + Mlocal[k,l]
Mglobal is a lil_matrice of appropriate size, someList maps the
indexing variables.
Of course this is rather slow and consumes most of the matrice
assembly time. Is there a better way to assemble a large sparse matrix
from many small dense matrices? I tried scipy.weave but it doesn't
seem to work with sparse matrices
I posted my response to the scipy mailing list; stack overflow is a bit easier
to access so I will post it here as well, albeit a slightly improved version.
The trick is to use the IJV storage format. This is a trio of three arrays
where the first one contains row indicies, the second has column indicies, and
the third has the values of the matrix at that location. This is the best way
to build finite element matricies (or any sparse matrix in my opinion) as access
to this format is really fast (just filling an an array).
In scipy this is called coo_matrix; the class takes the three arrays as an
argument. It is really only useful for converting to another format (CSR os
CSC) for fast linear algebra.
For finite elements, you can estimate the size of the three arrays by something
like
size = number_of_elements * number_of_basis_functions**2
so if you have 2D quadratics you would do number_of_elements * 36, for example.
This approach is convenient because if you have local matricies you definitely
have the global numbers and entry values: exactly what you need for building
the three IJV arrays. Scipy is smart enough to throw out zero entries, so
overestimating is fine.
Related
I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
I chose Python (which is not the fastest) and it works pretty well. However, in my code, I have inversions of large sparse symmetric (non-positive definite, so can't use Cholesky) matrix to execute (image below). I currenty use np.linalg.inv() which is using the LU decomposition method.
I pretty sure there would be some optimization to do in terms of rapidity.
I thought about Cuthill-McKee algotihm to rearange the matrix and take its inverse. Do you have any ideas or advice ?
Thank you very much for your answers !
Good news is that if you're using any of the popular python libraries for linear algebra (namely, numpy), the speed of python really doesn't matter for the math – it's all done natively inside the library.
For example, when you write matrix_prod = matrix_a # matrix_b, that's not triggering a bunch of Python loops to multiply the two matrices, but using numpy's internal implementation (which I think uses the FORTRAN LAPACK library).
The scipy.sparse.linalg module has your back covered when it comes to solving sparsely stored matrices specifying sparse systems of equations. (which is what you do with the inverse of a matrix). If you want to use sparse matrices, that's your way to go – notice that there's matrices that are sparse in mathematical terms (i.e., most entries are 0), and matrices which are stored as sparse matrix, which means you avoid storing millions of zeros. Numpy itself doesn't have sparsely stored matrices, but scipy does.
If your matrix is densely stored, but mathematically sparse, i.e. you're using standard numpy ndarrays to store it, then you won't get any more rapid by implementing anything in Python. The theoretical complexity gains will be outweighed by the practical slowness of Python compared to highly optimized inversion.
Inverting a sparse matrix usually loses the sparsity. Also, you never invert a matrix if you can avoid it at all! For a sparse matrix, solving the linear equation system Ax = b, with A your matrix and b a known vector, for x, is so much faster done forward than computing A⁻¹! So,
I'm currently working with a least-square algorithm on Python, regarding some geodetic calculations.
since LS says you don't need the inverse matrix, simply don't calculate it, ever. The point of LS is finding a solution that's as close as it gets, even if your matrix isn't invertible. Which can very well be the case for sparse matrices!
I'm comparing in python the reading time of a row of a matrix, taken first in dense and then in sparse format.
The "extraction" of a row from a dense matrix costs around 3.6e-05 seconds
For the sparse format I tried both csr_mtrix and lil_matrix, but both for the row-reading spent around 1-e04 seconds
I would expect the sparse format to give the best performance, can anyone help me understand this ?
arr[i,:] for a dense array produces a view, so its execution time is independent of arr.shape. If you don't understand the distinction between view and copy, you need to do more reading about numpy basics.
csr and lil formats allow indexing that looks a lot like ndarray's, but there are key differences. For the most part the concept of a view does not apply. There is one exception. M.getrowview(i) takes advantage of the unique data structure of a lil to produce a view. (Read its doc and code)
Some indexing of a csr format actually uses matrix multiplication, using a specially constructed 'extractor' matrix.
In all cases where sparse indexing produces sparse matrix, actually constructing the new matrix from the data takes time. Sparse does not use compiled code nearly as much as numpy. It's strong point, relative to numpy is matrix multiplication of matrices that are 10% sparse (or smaller).
In the simplest format (to understand), coo, each nonzero element is represented by 3 values - data, row, col. Those are stored in 3 1d arrays. So it has to have a sparsity of less than 30% to even break even with respect to memory use. coo does not implement indexing.
I am currently working on a Project where I have to calculate the extremal Eigenvalues using the Lanczos-Algorithm. I replaced the MVM so that the Matrix-Elements are calculated on-the-fly, because I will have to calculate the Eigenvalues of real huge matrices. This slows down my Code because for-loops are slower in python thand MVM. Is there any way to improve on my code in an easy way? I tried using Cython but I had no real luck here.
for i in range(0,dim):
for j in range(0,dim):
temp=get_Matrix_Element(i,j)
ws[i]=ws[i]+temp*v[j]
This replaces:
ws = M.dot(v)
Update
The ,atrix M is sparse and can be stored for "small" systems in a sparse-matrix-format using scipy.sparse. For large systems with up to ~10^9 dimensions I need to calculate the matrix elements on-the-fly
The easiest and fast to implement solution would be to go half-way: precompute one row at a time.
In your original solution (M.dot(v)) you had to store dim x dim which grows quadratically. If you precompute one row, it scales linearly and should not cause you the troubles (since you are already storing a resultant vector ws of the same size).
The code should be along the lines of:
for i in range(0,dim):
temp=get_Matrix_Row(i)
ws[i]=temp.dot(v)
where temp is now an dim x 1 vector.
Such modification should allow more optimizations to happen during the dot product without major code modifications.
I use a Scipy CSR representation of a 800,000x350,000 Matrix, let's say its M. I want to calculate the dot product M * M[0:x].T. Now depending on the value of x the memory consumption grows. x=1 is not noticeable but if x=2000 the multiplication process takes around 8 gigabyte of RAM.
I wonder what happens when i calculate this product and why it takes so much memory in comparison to storing the sparse Matrix (around 30Mb). Is the matrix expanded for the multiplication?
By investigating the results and memory consumption over time and after each operation I found out that the reason is the result of the sparse matrix multiplication. Indeed there are many zero-values in M. But the result of M*M.T is a matrix containing only 50% zeros. Thus the result consumes a lot memory.
Example: Let's assume each row vector of M has a none-zero field at the same index but other than that is sparse. Then the result of M*M.T would not be sparse at all (no zero-values).
Nonetheless, thanks for helping.
The compile core of csr matrix multiplication is found at
https://github.com/scipy/scipy/blob/0cff7a5fe6226668729fc2551105692ce114c2b3/scipy/sparse/sparsetools/csr.h
starting around line 500, the csr_matmat... function. It includes a reference to a math paper that it is based on.
The Python code called with A*B is __mul__. Look at the version for your csr matrix to make sure that it is calling self._mul_sparse_matrix, which you will see ends up calling self.format + '_matmat_pass1' (and pass2).
Assuming it isn't resorting to the dense versions, you'll have to study the underlying algorithm to understand whether this memory use is realistic or not.
I need to form a 2D matrix with total size 2,886 X 2,003,817. I try to use numpy.zeros to make a 2D zero element matrix and then calculate and assign each element of Matrix (most of them are zero son I need to replace few of them).
but when I try numpy.zero to initialize my matrix I get the following memory error:
C=numpy.zeros((2886,2003817)) "MemoryError"
I also try to form the matrix without initialization. Basically I calculate the element of each row in each iteration of my algorithm and then
C=numpy.concatenate((C,[A]),axis=0)
in which C is my final matrix and A is the calculated row at the current iteration. But I find out this method takes a lots of time, I am guessing it is because of using numpy.concatenate(?)
could you please let me know if there is a way to avoid memory error in initializing my matrix or is there any better method or suggestion to form the matrix in this size?
Thanks,
Amir
If your data has a lot of zeros in it, you should use scipy.sparse matrix.
It is a special data structure designed to save memory for matrices that have a lot of zeros. However, if your matrix is not that sparse, sparse matrices start to take up more memory. There are many kinds of sparse matrices, and each of them is efficient at one thing, while inefficient at another thing, so be careful with what you choose.