Scipy.sparse.csr_matrix slow on matrix with float values - python

I am trying to multiply two sparse matrices or a sparse matrix with a dense. The size of matrices are roughly 128x256. Their 80% values are 0 and the rest are floating point values with 8 digits of precision roughly.
I am aware of the function scipy.sparse. But I do not want to use this function. Can anyone help with the code in python.
WW_sp=numpy.zeros([len(X),Y.shape[1]])
for i in range(len(X)):
A=numpy.where(X[i]!=0)
for j in range(Y.shape[1]):
B=numpy.where(Y[:,j]!=0)
for k in numpy.intersect1d(A,B):
WW_sp[i][j]+=X[i][k]*Y[k][j]
This code is good for sparse matrix but takes more time for dense matrix. Can there be any optimized code that works well for both sparse and dense matrix

Related

Multiplying a sparse matrix with sparse vector (efficient way)

I have a sparse matrix M of size N*N with Nd non-zero elements and a sparse vector A of size N*1 with Na non-zero elements. (N is large)
I want to calculate the matrix multiplication B=MA.
I use the sparse matrix representation in scipy.sparse. P=csr_matrix(M). Then I do B=P.dot(A).
I know the complexity of this operation is O(Nd). It seems that A is regarded as a dense vector in the calculation. Because when I change the number Na, the computation time of this multiplication does not change. But the vector A is also sparse. Is there any efficient ways to perform this multiplication with less computation time.
In my simulation, M is fix. The vectors A are differents but they are all sparse.
Thank you very much.

Linear dependent rows: Huge Sparse Matrix

I have a huge sparse matrix A
<5000x5000 sparse matrix of type '<type 'numpy.float64'>'
with 14979 stored elements in Compressed Sparse Column format>
for whom I need to delete linearly dependent rows. I have a prior that j rows will be dependent. I need to
find out which sets of rows are linearly dependent
for each set, keep one arbitrary row and remove the others
I was trying to follow this question, but the corresponding method for sparse matrices, scipy.sparse.linalg.eigs says that
k: The number of eigenvalues and eigenvectors desired. k must be smaller than N. It is not possible to compute all eigenvectors of a
matrix.
How should I proceed?
scipy.sparse.linalg.eigs uses implicitly restarted Arnoldi iteration. The algorithm is meant for finding a few eigenvectors quickly, and can't find all of them.
5000x5000, however, is not that large. Have you considered just using numpy.linalg.eig or scipy.linalg.eig? It will probably take a few minutes, but it isn't completely infeasible. You don't gain anything by using a sparse matrix, but I'm not sure there's an algorithm for efficiently finding all eigenvectors of a sparse matrix.

How to calculate (1 - SparseMatrix) of a huge sparse matrix?

I researched a lot on this but couldn't find a practical solution to this problem. I am using scipy to create csr sparse matrix and want to substract this matrix from an equivalent matrix of all ones. In scipy and numpy notations, if matrix is not sparse, we can do so by simply writing 1 - MatrixVariable. However, this operation is not implemented if Matrix is sparse. I could just think of the following obvious solution:
Iterate through the entire sparse matrix, set all zero elements to 1 and all non-zero elements to 0.
But this would create a matrix where most elements are 1 and only a few are 0, which is no longer sparse and due its huge size could not be converted to dense.
What could be an alternative and effective way of doing this?
Thanks.
Your new matrix will not be sparse, because it will have 1s everywhere, so you will need a dense array to hold it:
new_mat = np.ones(sps_mat.shape, sps_mat.dtype) - sps_mat.todense()
This requires that your matrix fits in memory. It actually requires that it fits in memory 3 times. If that is an issue, you can get it to be more efficient doing something like:
new_mat = sps_mat.todense()
new_mat *= -1
new_mat += 1
You can access the data from your sparse matrix as a 1D array so that:
ss.data *= -1
ss.data += 1
will work like 1 - ss, for all non-zero elements in your sparse matrix.

How to define a (n, 0) sparse matrix in scipy or how to assemble a sparse matrix column wise?

I have a loop that in each iteration gives me a column c of a sparse matrix N.
To assemble/grow/accumulate N column by column I thought of using
N = scipy.sparse.hstack([N, c])
To do this it would be nice to initialize the matrix with with rows of length 0. However,
N = scipy.sparse.csc_matrix((4,0))
raises a ValueError: invalid shape.
Any suggestions, how to do this right?
You can't. Sparse matrices are restricted compared to NumPy arrays and in particular don't allow 0 for any axis. All sparse matrix constructors check for this, so if and when you do manage to build such a matrix, you're exploiting a SciPy bug and your script is likely to break when you upgrade SciPy.
That being said, I don't see why you'd need an n × 0 sparse matrix since an n × 0 NumPy array is allowed and takes practically no storage space.
Turns out sparse.hstack cannot handle a NumPy array with a zero axis, so disregard my previous comment. However, what I think you should do is collect all the columns in a list, then hstack them in one call. That's better than your loop since append'ing to a list takes amortized constant time, while hstack takes linear time. So your proposed algorithm takes quadratic time while it could be linear.
You must use at least 1 in your shape.
N = scipy.sparse.csc_matrix((4,1))
Which you can stack:
print scipy.sparse.hstack( (N,N) )
#<4x2 sparse matrix of type '<type 'numpy.float64'>'
# with 0 stored elements in COOrdinate format>

Load sparse scipy matrix into existing numpy dense matrix

Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory.
Let's say I also have a collection of scipy sparse matrices with the same dimensions as the numpy matrix. Sometimes I want to convert one of these sparse matrices into a dense matrix to perform some vectorized operations.
Can I load one of these sparse matrices into A rather than re-allocate space each time I want to convert a sparse matrix into a dense matrix? The .toarray() method which is available on scipy sparse matrices does not seem to take an optional dense array argument, but maybe there is some other way to do this.
If the sparse matrix is in the COO format:
def assign_coo_to_dense(sparse, dense):
dense[sparse.row, sparse.col] = sparse.data
If it is in the CSR format:
def assign_csr_to_dense(sparse, dense):
rows = sum((m * [k] for k, m in enumerate(np.diff(sparse.indptr))), [])
dense[rows, sparse.indices] = sparse.data
To be safe, you might want to add the following lines to the beginning of each of the functions above:
assert sparse.shape == dense.shape
dense[:] = 0
It does seem like there should be a better way to do this (and I haven't scoured the documentation), but you could always loop over the elements of the sparse array and assign to the dense array (probably zeroing out the dense array first). If this ends up too slow, that seems like an easy C extension to write....

Categories

Resources