Fast column access over large scipy sparse matrix - python

I am working with scipy's csc sparse matrix and currently a major bottleneck in the code is a line similar to the following
for i in range(multiply_cols.shape[0]):
F = F - factor*values[i]*mat.getcol(multiply_cols[i])
The matrices that I am working with are extremely large, of size typically more than 10**6x10**6 and I don't want to convert them to dense matrix. In fact I have a restriction to always have the matrix in csc format. My attempts show that converting to coo_matrix or lil_matrix also does not pay off.
Here is my rudimentary attempts using csc, csr and coo:
n=1000
sA = csc_matrix(np.random.rand(n,n))
F = np.random.rand(n,1)
multiply_cols = np.unique(np.random.randint(0,int(0.6*n),size=n))
values = np.random.rand(multiply_cols.shape[0])
def foo1(mat,F,values,multiply_cols):
factor = 0.75
for i in range(multiply_cols.shape[0]):
F = F - factor*values[i]*mat.getcol(multiply_cols[i])
def foo2(mat,F,values,multiply_cols):
factor = 0.75
mat = mat.tocsr()
for i in range(multiply_cols.shape[0]):
F = F - factor*values[i]*mat.getcol(multiply_cols[i])
def foo3(mat,F,values,multiply_cols):
factor = 0.75
mat = mat.tocoo()
for i in range(multiply_cols.shape[0]):
F = F - factor*values[i]*mat.getcol(multiply_cols[i])
def foo4(mat,F,values,multiply_cols):
factor = 0.75
mat = mat.tolil()
for i in range(multiply_cols.shape[0]):
F = F - factor*values[i]*mat.getcol(multiply_cols[i])
and timing them I get:
In [41]: %timeit foo1(sA,F,values,multiply_cols)
10 loops, best of 3: 133 ms per loop
In [42]: %timeit foo2(sA,F,values,multiply_cols)
1 loop, best of 3: 999 ms per loop
In [43]: %timeit foo3(sA,F,values,multiply_cols)
1 loop, best of 3: 6.38 s per loop
In [44]: %timeit foo4(sA,F,values,multiply_cols)
1 loop, best of 3: 45.1 s per loop
So certainly coo_matrix and lil_matrix are not a good choice here. Does anyone know a faster way of doing this. Is it a good option to retrieve the underlyng indptr, indices and data have a custom cython solution?

I found in
Sparse matrix slicing using list of int
that column (or row) indexing for sparse matrices is essentially a matrix multiplication task - construct a sparse matrix with the right mix of 1s and 0s, and multiply. Also row (and column) sums are done with multiplication.
This function implements that idea. M is a 1 column sparse matrix, with values in the multiply_cols slots:
def wghtsum(sA, values, multiply_cols):
cols = np.zeros_like(multiply_cols)
M=sparse.csc_matrix((values,(multiply_cols,cols)),shape=(sA.shape[1],1))
return (sA*M).A
testing:
In [794]: F1=wghtsum(sA,values,multiply_cols)
In [800]: F2=(sA[:,multiply_cols]*values)[:,None] # Divaker's
In [802]: np.allclose(F1,F2)
Out[802]: True
It has a modest time savings over #Divakar's solution:
In [803]: timeit F2=(sA[:,multiply_cols]*values)[:,None]
100 loops, best of 3: 18.3 ms per loop
In [804]: timeit F1=wghtsum(sA,values,multiply_cols)
100 loops, best of 3: 6.57 ms per loop
=======
sA as created is dense - it's a sparse rendition of a dense random array. sparse.rand can be used to create a sparse random matrix with a defined level of sparsity.
In testing your foo1 I had a problem with getcol:
In [818]: sA.getcol(multiply_cols[0])
...
TypeError: an integer is required
In [819]: sA.getcol(multiply_cols[0].item())
Out[819]:
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
with 1000 stored elements in Compressed Sparse Column format>
In [822]: sA[:,multiply_cols[0]]
Out[822]:
<1000x1 sparse matrix of type '<class 'numpy.float64'>'
with 1000 stored elements in Compressed Sparse Column format>
I suspect that's caused by a scipy version difference.
In [821]: scipy.__version__
Out[821]: '0.17.0'
This issue did go away in 0.18; but I can't find a relevant issue/pullrequest.

Well you could use a vectorized approach that uses matrix-multiplication of sliced out columns from sparse matrix against values, like so -
F -= (mat[:,multiply_cols]*values*factor)[:,None]
Benchmarking
It seems foo1 is the fastest of the lot listed in the question. So, let's time the proposed approach against that one.
Function definitions -
def foo1(mat,F,values,multiply_cols):
factor = 0.75
outF = F.copy()
for i in range(multiply_cols.shape[0]):
outF -= factor*values[i]*mat.getcol(multiply_cols[i])
return outF
def foo_vectorized(mat,F,values,multiply_cols):
factor = 0.75
return F - (mat[:,multiply_cols]*values*factor)[:,None]
Timings and verification on bigger set with sparseness -
In [242]: # Setup inputs
...: n = 3000
...: mat = csc_matrix(np.random.randint(0,3,(n,n))) #Sparseness with 0s
...: F = np.random.rand(n,1)
...: multiply_cols = np.unique(np.random.randint(0,int(0.6*n),size=n))
...: values = np.random.rand(multiply_cols.shape[0])
...:
In [243]: out1 = foo1(mat,F,values,multiply_cols)
In [244]: out2 = foo_vectorized(mat,F,values,multiply_cols)
In [245]: np.allclose(out1, out2)
Out[245]: True
In [246]: %timeit foo1(mat,F,values,multiply_cols)
1 loops, best of 3: 641 ms per loop
In [247]: %timeit foo_vectorized(mat,F,values,multiply_cols)
10 loops, best of 3: 40.3 ms per loop
In [248]: 641/40.3
Out[248]: 15.905707196029779
There we have a 15x+ speedup!

Related

Inserting null columns into a scipy sparse matrix in a specific order

I have a sparse matrix with M rows and N columns, to which I want to concatenate K additional NULL columns so my objects will have now M rows and (N+K) columns. The tricky part is that I also have a list of indeces of length N, which can range from 0 to N+K, that indicate what is the position that every column should have in the new matrix.
So for example, if N = 2, K = 1 and the list of indices is [2, 0], it means that I want to take the last column from my MxN matrix to be the first one, the introduce a null column and then put my first column as the last one.
I'm trying to use the following code - when I already have x but I can't upload it here.
import numpy as np
from scipy import sparse
M = 5000
N = 10
pad_factor = 1.2
size = int(pad_factor * N)
x = sparse.random(m = M, n = N, density = 0.1, dtype = 'float64')
indeces = np.random.choice(range(size), size=N, replace=False)
null_mat = sparse.lil_matrix((M, size))
null_mat[:, indeces] = x
The problem is that for N = 1,500,000, P = 5,000 and K = 200 this code won't scale and it will give me a memory error. The exact error is:
"return np.zeros(self.shape, dtype = self.dtype, order=order) MemoryError".
I just want to add some null columns so I guess my slicing idea is inefficient, especially as K << N in my real data. In a way we can think about this as a merge sort problem - I have a non-null and a null dataset and I want to concatenate them, but in a specific order. Any ideas on how to make it work?
Thanks!
As I deduced in the comments, the memory error was produced in the
null_mat[:, indeces] = x
line because the lil __setitem__ method, does a x.toarray(), that is, it first converts x to a dense array. Mapping the sparse matrix onto the index lil directly might be more space efficient, but a lot more work to code. And lil is optimized for iterative assignment, not this large scale matrix mapping.
sparse.hstack uses sparse.bmat to join sparse matrices. This converts all inputs to coo, and then combines their attributes into a new set, building the new matrix from those.
direct coo matrix construction
After quite a bit of playing around, I found that the following simple operation works:
In [479]: z1=sparse.coo_matrix((x.data, (x.row, indeces[x.col])),shape=(M,size))
In [480]: z1
Out[480]:
<5000x12 sparse matrix of type '<class 'numpy.float64'>'
with 5000 stored elements in COOrdinate format>
Compare this with the x and null_mat:
In [481]: x
Out[481]:
<5000x10 sparse matrix of type '<class 'numpy.float64'>'
with 5000 stored elements in COOrdinate format>
In [482]: null_mat
Out[482]:
<5000x12 sparse matrix of type '<class 'numpy.float64'>'
with 5000 stored elements in LInked List format>
Testing the equality of sparse matrices can be tricky. coo values in particular can occur in any order, such as in x which was produced by sparse.random.
But the csr format orders the rows, so this comparison of the indptr attribute is a pretty good equality test:
In [483]: np.allclose(null_mat.tocsr().indptr, z1.tocsr().indptr)
Out[483]: True
A time test:
In [477]: timeit z1=sparse.coo_matrix((x.data, (x.row, indeces[x.col])),shape=(M,size))
108 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [478]:
In [478]: timeit null_mat[:, indeces] = x
3.05 ms ± 4.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
matrix multiplication approach
csr format indexing with lists is done with matrix multiplication. It constructs an extractor matrix, and applies that. Matrix multiplication is a csr_matrix strong point.
We can perform the reordering in the same way:
In [489]: I = sparse.csr_matrix((np.ones(10),(np.arange(10),indeces)), shape=(10,12))
In [490]: I
Out[490]:
<10x12 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
In [496]: w1=x*I
Comparing the dense equivalents of these matrices:
In [497]: np.allclose(null_mat.A, z1.A)
Out[497]: True
In [498]: np.allclose(null_mat.A, w1.A)
Out[498]: True
In [499]: %%timeit
...: I = sparse.csr_matrix((np.ones(10),(np.arange(10),indeces)),shape=(10,
...: 12))
...: w1=x*I
1.11 ms ± 5.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
That's better than the lil indexing approach, though still much slower than the direct coo matrix construction. Though to be fair, we should construct a csr matrix from the coo style inputs. That conversion takes some time:
In [502]: timeit z2=sparse.csr_matrix((x.data, (x.row, indeces[x.col])),shape=(M
...: ,size))
639 µs ± 604 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
error traceback
The MemoryError traceback should have revealed that the error occurred in this indexed assignment, and that the relevant method calls are:
Signature: null_mat.__setitem__(index, x)
Source:
def __setitem__(self, index, x):
....
if isspmatrix(x):
x = x.toarray()
...
Signature: x.toarray(order=None, out=None)
Source:
def toarray(self, order=None, out=None):
"""See the docstring for `spmatrix.toarray`."""
B = self._process_toarray_args(order, out)
Signature: x._process_toarray_args(order, out)
Source:
def _process_toarray_args(self, order, out):
...
return np.zeros(self.shape, dtype=self.dtype, order=order)
I found this by doing a code search on the scipy github, for the np.zeros calls.

how to speed up the computation?

i need to calculate a 1 million*1 million computations to fill a sparse matrix.But when i use loops to fill the matrix line by line,i find it will take 6 minutes to do a just 100*100 computations.So the task won't be solved.Is there some ways to speed up the process?
import numpy as np
from scipy.sparse import lil_matrix
import pandas as pd
tp = pd.read_csv('F:\\SogouDownload\\train.csv', iterator=True, chunksize=1000)
data = pd.concat(tp, ignore_index=True)
matrix=lil_matrix((1862220,1862220))
for i in range(1,1862220):
for j in range(1,1862220):
matrix[i-1,j-1]=np.sum(data[data['source_node']==i].destination_node.isin(data[data['source_node']==j].destination_node))
While not the fastest way of constructing a sparse matrix, this isn't horribly slow either, at least not the lil assignment step:
In [204]: N=100
In [205]: M=sparse.lil_matrix((N,N))
In [206]: for i in range(N):
...: for j in range(N):
...: M[i,j]=(i==j)
In [207]: M
Out[207]:
<100x100 sparse matrix of type '<class 'numpy.float64'>'
with 100 stored elements in LInked List format>
It saved just the nonzero values to M. I barely saw the delay during the loop.
So my guess is that most of the time is spent in the panadas indexing expression:
np.sum(data[data['source_node']==i].destination_node.isin(data[data['source_node']==j].destination_node))
Converting data, often textual, into coocurance counts sparse matrices comes up often. They are used in learning code, pattern searches etc. scikit-learn is often used. Also tensorflow.
For N=1000
In [212]: %%timeit
...: M=sparse.lil_matrix((N,N))
...: for i in range(N):
...: for j in range(N):
...: M[i,j]=(i==j)
...:
1 loop, best of 3: 7.31 s per loop
Iteratively assigning these values to a dense array is faster, even if we include the conversion to sparse at the end.
In [213]: %%timeit
...: M=np.zeros((N,N))
...: for i in range(N):
...: for j in range(N):
...: M[i,j]=(i==j)
...:
1 loop, best of 3: 353 ms per loop
In [214]: %%timeit
...: M=np.zeros((N,N))
...: for i in range(N):
...: for j in range(N):
...: M[i,j]=(i==j)
...: M = sparse.lil_matrix(M)
...:
1 loop, best of 3: 353 ms per loop
But for the very large case, creating that intermediate dense array might hit memory problems.
The technique to use here is sparse matrix multiplication. But for that technique you first need a binary matrix mapping source nodes to destination nodes (the node labels will be the indices of the nonzero entries).
from scipy.sparse import csr_matrix
I = data['source_node'] - 1
J = data['destination_node'] - 1
values = np.ones(len(data), int)
shape = (np.max(I) + 1, np.max(J) + 1)
mapping = csr_matrix((values, (I, J)), shape)
The technique itself is simply a matrix multiplication of this matrix with its transpose (see also this question).
cooccurrence = mapping.dot(mapping.T)
The only potential problem is that the resulting matrix may not be sparse and consumes all your RAM.

Efficiently compute columnwise sum of sparse array where every non-zero element is 1

I have a bunch of data in SciPy compressed sparse row (CSR) format. Of course the majority of elements is zero, and I further know that all non-zero elements have a value of 1. I want to compute sums over different subsets of rows of my matrix. At the moment I am doing the following:
import numpy as np
import scipy as sp
import scipy.sparse
# create some data with sparsely distributed ones
data = np.random.choice((0, 1), size=(1000, 2000), p=(0.95, 0.05))
data = sp.sparse.csr_matrix(data, dtype='int8')
# generate column-wise sums over random subsets of rows
nrand = 1000
for k in range(nrand):
inds = np.random.choice(data.shape[0], size=100, replace=False)
# 60% of time is spent here
extracted_rows = data[inds]
# 20% of time is spent here
row_sum = extracted_rows.sum(axis=0)
The last few lines there are the bottleneck in a larger computational pipeline. As I annotated in the code, 60% of time is spent slicing the data from the random indices, and 20% is spent computing the actual sum.
It seems to me I should be able to use my knowledge about the data in the array (i.e., any non-zero value in the sparse matrix will be 1; no other values present) to compute these sums more efficiently. Unfortunately, I cannot figure out how. Dealing with just data.indices perhaps? I have tried other sparsity structures (e.g. CSC matrix), as well as converting to dense array first, but these approaches were all slower than this CSR matrix approach.
It is well known that indexing of sparse matrices is relatively slow. And there have SO questions about getting around that by accessing the data attributes directly.
But first some timings. Using data and ind as you show I get
In [23]: datad=data.A # times at 3.76 ms per loop
In [24]: timeit row_sumd=datad[inds].sum(axis=0)
1000 loops, best of 3: 529 µs per loop
In [25]: timeit row_sum=data[inds].sum(axis=0)
1000 loops, best of 3: 890 µs per loop
In [26]: timeit d=datad[inds]
10000 loops, best of 3: 55.9 µs per loop
In [27]: timeit d=data[inds]
1000 loops, best of 3: 617 µs per loop
The sparse version is slower than the dense one, but not by a lot. The sparse indexing is much slower, but its sum is somewhat faster.
The sparse sum is done with a matrix product
def sparse.spmatrix.sum
....
return np.asmatrix(np.ones((1, m), dtype=res_dtype)) * self
That suggests that faster way - turn inds into an appropriate array of 1s and multiply.
In [49]: %%timeit
....: b=np.zeros((1,data.shape[0]),'int8')
....: b[:,inds]=1
....: rowmul=b*data
....:
1000 loops, best of 3: 587 µs per loop
That makes the sparse operation about as fast as the equivalent dense one. (but converting to dense is much slower)
==================
The last time test is missing the np.asmatrix that is present in the sparse sum. But times are similar, and the results are the same
In [232]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x1=np.asmatrix(b)*data
1000 loops, best of 3: 661 µs per loop
In [233]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x2=b*data
1000 loops, best of 3: 605 µs per loop
One produces a matrix, the other an array. But both are doing a matrix product, 2nd dim of B against 1st of data. Even though b is an array, the task is actually delegated to data and its matrix product - in a not so transparent a way.
In [234]: x1
Out[234]: matrix([[9, 9, 5, ..., 9, 5, 3]], dtype=int8)
In [235]: x2
Out[235]: array([[9, 9, 5, ..., 9, 5, 3]], dtype=int8)
b*data.A is element multiplication and raises an error; np.dot(b,data.A) works but is slower.
Newer numpy/python has a matmul operator. I see the same time pattern:
In [280]: timeit b#dataA # dense product
100 loops, best of 3: 2.64 ms per loop
In [281]: timeit b#data.A # slower due to `.A` conversion
100 loops, best of 3: 6.44 ms per loop
In [282]: timeit b#data # sparse product
1000 loops, best of 3: 571 µs per loop
np.dot may also delegate action to sparse, though you have to be careful. I just hung my machine with np.dot(csr_matrix(b),data.A).
Here's a vectorized approach after converting data to a dense array and also getting all those inds in a vectorized manner using argpartition-based method -
# Number of selections as a parameter
n = 100
# Get inds across all iterations in a vectorized manner as a 2D array.
inds2D = np.random.rand(nrand,data.shape[0]).argpartition(n)[:,:n]
# Index into data with those 2D array indices. Then, convert to dense NumPy array,
# reshape and sum reduce to get the final output
out = np.array(data.todense())[inds2D.ravel()].reshape(nrand,n,-1).sum(1)
Runtime test -
1) Function definitions :
def org_app(nrand,n):
out = np.zeros((nrand,data.shape[1]),dtype=int)
for k in range(nrand):
inds = np.random.choice(data.shape[0], size=n, replace=False)
extracted_rows = data[inds]
out[k] = extracted_rows.sum(axis=0)
return out
def vectorized_app(nrand,n):
inds2D = np.random.rand(nrand,data.shape[0]).argpartition(n)[:,:n]
return np.array(data.todense())[inds2D.ravel()].reshape(nrand,n,-1).sum(1)
Timings :
In [205]: # create some data with sparsely distributed ones
...: data = np.random.choice((0, 1), size=(1000, 2000), p=(0.95, 0.05))
...: data = sp.sparse.csr_matrix(data, dtype='int8')
...:
...: # generate column-wise sums over random subsets of rows
...: nrand = 1000
...: n = 100
...:
In [206]: %timeit org_app(nrand,n)
1 loops, best of 3: 1.38 s per loop
In [207]: %timeit vectorized_app(nrand,n)
1 loops, best of 3: 826 ms per loop

Csr_matrix.dot vs. Numpy.dot

I have a large (n=50000) block diagonal csr_matrix M representing the adjacency matrices of a set of graphs. I have to have multiply M by a dense numpy.array v several times. Hence I use M.dot(v).
Surprisingly, I have discovered that first converting M to numpy.array and then using numpy.dot is much faster.
Any ideas why this it the case?
I don't have enough memory to hold a 50000x50000 dense matrix in memory and multiply it by a 50000 vector. But find here some tests with lower dimensionality.
Setup:
import numpy as np
from scipy.sparse import csr_matrix
def make_csr(n, N):
rows = np.random.choice(N, n)
cols = np.random.choice(N, n)
data = np.ones(n)
return csr_matrix((data, (rows, cols)), shape=(N,N), dtype=np.float32)
The code above generates sparse matrices with n non-zero elements in a NxN matrix.
Matrices:
N = 5000
# Sparse matrices
A = make_csr(10*10, N) # ~100 non-zero
B = make_csr(100*100, N) # ~10000 non-zero
C = make_csr(1000*1000, N) # ~1000000 non-zero
D = make_csr(5000*5000, N) # ~25000000 non-zero
E = csr_matrix(np.random.randn(N,N), dtype=np.float32) # non-sparse
# Numpy dense arrays
An = A.todense()
Bn = B.todense()
Cn = C.todense()
Dn = D.todense()
En = E.todense()
b = np.random.randn(N)
Timings:
>>> %timeit A.dot(b) # 9.63 µs per loop
>>> %timeit An.dot(b) # 41.6 ms per loop
>>> %timeit B.dot(b) # 41.3 µs per loop
>>> %timeit Bn.dot(b) # 41.2 ms per loop
>>> %timeit C.dot(b) # 3.2 ms per loop
>>> %timeit Cn.dot(b) # 41.2 ms per loop
>>> %timeit D.dot(b) # 35.4 ms per loop
>>> %timeit Dn.dot(b) # 43.2 ms per loop
>>> %timeit E.dot(b) # 55.5 ms per loop
>>> %timeit En.dot(b) # 43.4 ms per loop
For highly sparse matrices (A and B) it is more than 1000x times faster.
For not very sparse matrices (C), it still gets 10x speedup.
For almost non-sparse matrix (D will have some 0 due to repetition in the indices, but not many probabilistically speaking), it is still faster, not much, but faster.
For a truly non-sparse matrix (E), the operation is slower, but not much slower.
Conclusion: the speedup you get depends on the sparsity of your matrix, but with N = 5000 sparse matrices are always faster (as long as they have some zero entries).
I can't try it for N = 50000 due to memory issues. You can try the above code and see what is like for you with that N.

Diagonal sparse matrix obtained from a sparse coo_matrix

I built some sparse matrix M in Python using the coo_matrix format. I would like to find an efficient way to compute:
A = M + M.T - D
where D is the restriction of M to its diagonal (M is potentially very large). I can't find a way to efficiently build D while keeping a coo_matrix format. Any ideas?
Could D = scipy.sparse.spdiags(coo_matrix.diagonal(M),0,M.shape[0],M.shape[0]) be a solution?
I have come up with a faster coo diagonal:
msk = M.row==M.col
D1 = sparse.coo_matrix((M.data[msk],(M.row[msk],M.col[msk])),shape=M.shape)
sparse.tril uses this method with mask = A.row + k >= A.col (sparse/extract.py)
Some times for a (100,100) M (and M1 = M.tocsr())
In [303]: timeit msk=M.row==M.col; D1=sparse.coo_matrix((M.data[msk],(M.row[msk],M.col[msk])),shape=M.shape)
10000 loops, best of 3: 115 µs per loop
In [305]: timeit D=sparse.diags(M.diagonal(),0)
1000 loops, best of 3: 358 µs per loop
So the coo way of getting the diagional is fast, at least for this small, and very sparse matrix (only 1 time in the diagonal)
If I start with the csr form, the diags is faster. That's because .diagonal works in the csr format:
In [306]: timeit D=sparse.diags(M1.diagonal(),0)
10000 loops, best of 3: 176 µs per loop
But creating D is a small part of the overall calculation. Again, working with M1 is faster. The sum is done in csr format.
In [307]: timeit M+M.T-D
1000 loops, best of 3: 1.35 ms per loop
In [308]: timeit M1+M1.T-D
1000 loops, best of 3: 1.11 ms per loop
Another way to do the whole thing is to take advantage of that fact that coo allows duplicate i,j values, which will be summed when converted to csr format. So you could stack the row, col, data arrays for M with those for M.T (see M.transpose for how those are constructed), along with masked values for D. (or the masked diagonals could be removed from M or M.T)
For example:
def MplusMT(M):
msk=M.row!=M.col;
data=np.concatenate([M.data, M.data[msk]])
rows=np.concatenate([M.row, M.col[msk]])
cols=np.concatenate([M.col, M.row[msk]])
MM=sparse.coo_matrix((data, (rows, cols)), shape=M.shape)
return MM
# alt version with a more explicit D
# msk=M.row==M.col;
# data=np.concatenate([M.data, M.data,-M.data[msk]])
MplusMT as written is very fast because it is just doing array concatenation, not summation. To do that we have to convert it to a csr matrix.
MplusMT(M).tocsr()
which takes considerably longer. Still this approach is, in my limited testing, more than 2x faster than M+M.T-D. So it's a potential tool for constructing complex sparse matrices.
You probably want
from scipy.sparse import diags
D = diags(M.diagonal(), 0, format='coo')
This will still build an M-size 1d array as an intermediate step, but that will probably not be so bad.

Categories

Resources