I have a sparse matrix M of size N*N with Nd non-zero elements and a sparse vector A of size N*1 with Na non-zero elements. (N is large)
I want to calculate the matrix multiplication B=MA.
I use the sparse matrix representation in scipy.sparse. P=csr_matrix(M). Then I do B=P.dot(A).
I know the complexity of this operation is O(Nd). It seems that A is regarded as a dense vector in the calculation. Because when I change the number Na, the computation time of this multiplication does not change. But the vector A is also sparse. Is there any efficient ways to perform this multiplication with less computation time.
In my simulation, M is fix. The vectors A are differents but they are all sparse.
Thank you very much.
Related
I am trying to multiply two sparse matrices or a sparse matrix with a dense. The size of matrices are roughly 128x256. Their 80% values are 0 and the rest are floating point values with 8 digits of precision roughly.
I am aware of the function scipy.sparse. But I do not want to use this function. Can anyone help with the code in python.
WW_sp=numpy.zeros([len(X),Y.shape[1]])
for i in range(len(X)):
A=numpy.where(X[i]!=0)
for j in range(Y.shape[1]):
B=numpy.where(Y[:,j]!=0)
for k in numpy.intersect1d(A,B):
WW_sp[i][j]+=X[i][k]*Y[k][j]
This code is good for sparse matrix but takes more time for dense matrix. Can there be any optimized code that works well for both sparse and dense matrix
I have a huge sparse matrix A
<5000x5000 sparse matrix of type '<type 'numpy.float64'>'
with 14979 stored elements in Compressed Sparse Column format>
for whom I need to delete linearly dependent rows. I have a prior that j rows will be dependent. I need to
find out which sets of rows are linearly dependent
for each set, keep one arbitrary row and remove the others
I was trying to follow this question, but the corresponding method for sparse matrices, scipy.sparse.linalg.eigs says that
k: The number of eigenvalues and eigenvectors desired. k must be smaller than N. It is not possible to compute all eigenvectors of a
matrix.
How should I proceed?
scipy.sparse.linalg.eigs uses implicitly restarted Arnoldi iteration. The algorithm is meant for finding a few eigenvectors quickly, and can't find all of them.
5000x5000, however, is not that large. Have you considered just using numpy.linalg.eig or scipy.linalg.eig? It will probably take a few minutes, but it isn't completely infeasible. You don't gain anything by using a sparse matrix, but I'm not sure there's an algorithm for efficiently finding all eigenvectors of a sparse matrix.
I have a loop that in each iteration gives me a column c of a sparse matrix N.
To assemble/grow/accumulate N column by column I thought of using
N = scipy.sparse.hstack([N, c])
To do this it would be nice to initialize the matrix with with rows of length 0. However,
N = scipy.sparse.csc_matrix((4,0))
raises a ValueError: invalid shape.
Any suggestions, how to do this right?
You can't. Sparse matrices are restricted compared to NumPy arrays and in particular don't allow 0 for any axis. All sparse matrix constructors check for this, so if and when you do manage to build such a matrix, you're exploiting a SciPy bug and your script is likely to break when you upgrade SciPy.
That being said, I don't see why you'd need an n × 0 sparse matrix since an n × 0 NumPy array is allowed and takes practically no storage space.
Turns out sparse.hstack cannot handle a NumPy array with a zero axis, so disregard my previous comment. However, what I think you should do is collect all the columns in a list, then hstack them in one call. That's better than your loop since append'ing to a list takes amortized constant time, while hstack takes linear time. So your proposed algorithm takes quadratic time while it could be linear.
You must use at least 1 in your shape.
N = scipy.sparse.csc_matrix((4,1))
Which you can stack:
print scipy.sparse.hstack( (N,N) )
#<4x2 sparse matrix of type '<type 'numpy.float64'>'
# with 0 stored elements in COOrdinate format>
I have two sparse matrix A (affinity matrix) and D (Diagonal matrix) with dimension 100000*100000. I have to compute the Laplacian matrix L = D^(-1/2)*A*D^(-1/2). I am using scipy CSR format for sparse matrix.
I didnt find any method to find inverse of sparse matrix. How to find L and inverse of sparse matrix? Also suggest that is it efficient to do so by using python or shall i call matlab function for calculating L?
In general the inverse of a sparse matrix is not sparse which is why you won't find sparse matrix inverters in linear algebra libraries. Since D is diagonal, D^(-1/2) is trivial and the Laplacian matrix calculation is thus trivial to write down. L has the same sparsity pattern as A but each value A_{ij} is multiplied by (D_i*D_j)^{-1/2}.
Regarding the issue of the inverse, the standard approach is always to avoid calculating the inverse itself. Instead of calculating L^-1, repeatedly solve Lx=b for the unknown x. All good matrix solvers will allow you to decompose L which is expensive and then back-substitute (which is cheap) repeatedly for each value of b.
What is the best way to compute the distance/proximity matrix for very large sparse vectors?
For example you are given the following design matrix, where each row is 68771 dimensional sparse vector.
designMatrix
<5830x68771 sparse matrix of type ''
with 1229041 stored elements in Compressed Sparse Row format>
Have you tried the routines in scipy.spatial.distance?
http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
If this forces you to go to a dense representation, then you may be better off rolling your own, depending on the density of nonzero elements. You could squeeze out the zeros while retaining a map between the new and original indices, calculate the pairwise distances on the remaining nonzero elements and then use the indexing to map things back.