coo_matrix without concatenate

coo_matrix without concatenate - python

I have a number of indices and values that make up a scipy.coo_matrix. The indices/values are generated from different subroutines and are concatenated together before handed over to the matrix constructor:
import numpy
from scipy import sparse
n = 100000
I0 = range(n)
J0 = range(n)
V0 = numpy.random.rand(n)
I1 = range(n)
J1 = range(n)
V1 = numpy.random.rand(n)
# [...]
I = numpy.concatenate([I0, I1])
J = numpy.concatenate([J0, J1])
V = numpy.concatenate([V0, V1])
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
Now, the components of (I, J, V) can be quite large such that the concatenate operations become significant. (In the above example it takes over 20% of the runtime on my machine.) I'm reading that it's not possible to concatenate without a copy.
Is there a way for handing over indices and values without copying the input data around first?

If you look at the code for coo_matrix.__init__ you'll see that it's pretty simple. In fact if the (V, (I,J)) inputs are right it will simply assign those 3 arrays to its .data, row, col attributes. You can even check that after creation by comparing those attributes with your variables.
If they aren't 1d arrays of the right dtype, it will massage them - make the arrays, etc. So without getting into details, processing that you do before hand might save time in the coo call.
self.row = np.array(row, copy=copy, dtype=idx_dtype)
self.col = np.array(col, copy=copy, dtype=idx_dtype)
self.data = np.array(obj, copy=copy)
One way or other those attributes will have to each be a single array, not a loose list of arrays or lists of lists.
sparse.bmat makes a coo matrix from other ones. It collected their coo attributes, joins them in the fill an empty array styles, and calls coo_matrix. Look at its code.
Almost all numpy operations that return a new array do so by allocating an empty and filling it. Letting numpy do that in compiled code (with np.concatentate) should be a be a little faster, but details like the size and number of inputs will make a difference.
A non_connonical coo matrix is just the start. Many operations require a conversion to one of the other formats.
Efficiently construct FEM/FVM matrix
This is about sparse matrix constrution where there are many duplicate points that need to be summed - and using using the csr format for calculations.

You can try pre-allocating the arrays. It'll spare you the copy at least. I didn't see any speedup for the example, but you might see a change.
import numpy
from scipy import sparse
n = 100000
I = np.empty(2*n, np.double)
J = np.empty_like(I)
V = np.empty_like(I)
I[:n] = range(n)
J[:n] = range(n)
V[:n] = numpy.random.rand(n)
I[n:] = range(n)
J[n:] = range(n)
V[n:] = numpy.random.rand(n)
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))

Related

scipy sparse `LinearOperator` preserves sparseness under what conditions?

I have a scipy sparse csc_matrix J with J.shape = (n, k).
Suppose d is some k-length array with no zeros.
I want to construct a LinearOperator, call it linop, where
from scipy.sparse.linalg import LinearOperator,
J = # (n,k) csc_matrix
d = # some k-array
D = # assume I make a sparse diagonal matrix here of 1/d
linop = LinearOperator((n,k),
matvec=lambda v: J.dot(v/d),
rmatvec=lambda v: D.dot(J.T.dot(v))
)
My question is, under what conditions does this preserve "sparsity"? Not of the result, but of the intermediate steps. (I am unsure in general what happens "under the hood" when you multiply sparse times dense.)
For example, if (v/d) is dense, is J converted to dense before the multiplication? This would be very bad for my use case. Do I need to explicitly convert the input arguments in the lambda methods to sparse before the multiplication?
Thank you in advance.
Edit: pre-computing "J / d" is not an option as I need J later, and don't have the memory to store J and J / d.

How to use a ndarray of stored ndarrays with memmap as a big ndarray tensor

I recently started to use numpy memmap to link an array in my project since I have a 3 dimensions tensor for a total of 133 billions values for a graph of the dataset I am using as example.
I am trying to calculate the heat kernel signature of a 5748 nodes graph (21st of DD dataset). My code to calculate the projectors (where I use memmap) is:
Path('D:/hks_temp').mkdir(parents=True, exist_ok=True)
for l, ll in enumerate(L):
pl = np.zeros((n, n))
for k in ll:
pl += np.outer(evecs[:, k], evecs[:, k])
fp = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='w+', shape=(n, n))
fp[:] = pl[:]
fp.flush()
inside all the X_hks.npy there is a n by n ndarray (from the example 5748 * 5748).
Then I want all these computed arrays to form the 3 dimension tensor so I "link" (I don't know if it's the right term) them in this way:
P = np.array([None] * len(L)) # len(L) = 4043
for l in range(len(L)):
P[l] = np.memmap('D:/hks_temp/{}_hks.npy'.format(l), dtype='float32', mode='r', shape=(n, n))
P is used later only to do inside a cycle H = np.einsum('ijk,i->jk', P, np.exp(-unique_eval * t)).
However, that raises an error: ValueError: einstein sum subscripts string contains too many subscripts for operand 0. Since the method is correct for smaller graphs that doesn't require memmap, my thought was that P isn't well structured for numpy and I must arrange the data, maybe doing a reshape. So I tried to do a P.reshape(len(L), n, n) but it doesn't work giving ValueError: cannot reshape array of size 4043 into shape (4043,5748,5748). How can I make it work?
I already found this question but it doesn't fit this case. I think I can't store all inside one big object since it did 497GB of memmap files (126MB each). If I can do it, please tell me.
If it is impossible to do it I will reduce the use case, however I am quite interested to make it work for all the possibilities.

efficient setting 1D range values in a DataFrame (or a ndarray) with boolean array

PREREQUISITE
import numpy as np
import pandas as pd
INPUT1:boolean 2d array (a sample array as below)
x = np.array(
[[False,False,False,False,True],
[True,False,False,False,False],
[False,False,True,False,True],
[False,True,True,False,False],
[False,False,False,False,False]])
INPUT2:1D Range values (a sample as below)
y=np.array([1,2,3,4])
EXPECTED OUTPUT:2D ndarray
[[0,0,0,0,1],
[1,0,0,0,2],
[2,0,1,0,1],
[3,1,1,0,2],
[4,2,2,0,3]]
I want to set a range value(vertical vector) for each True in 2d ndarray(INPUT1) efficiently. Is there some useful APIs or solutions for this purpose?

Unfortunately I couldn't come up with an elegant solution, so I came up with multiple inelegant ones. The two main approaches I could think of are
brute-force looping over each True value and assigning slices, and
using a single indexed assignment to replace the necessary values.
It turns out that the time complexity of these approaches is non-trivial, so depending on the size of your array either can be faster.
Using your example input:
import numpy as np
x = np.array(
[[False,False,False,False,True],
[True,False,False,False,False],
[False,False,True,False,True],
[False,True,True,False,False],
[False,False,False,False,False]])
y = np.array([1,2,3,4])
refout = np.array([[0,0,0,0,1],
[1,0,0,0,2],
[2,0,1,0,1],
[3,1,1,0,2],
[4,2,2,0,3]])
# alternative input with arbitrary size:
# N = 100; x = np.random.rand(N,N) < 0.2; y = np.arange(1,N)
def looping_clip(x, y):
"""Loop over Trues, use clipped slices"""
nmax = x.shape[0]
n = y.size
# initialize output
out = np.zeros_like(x, dtype=y.dtype)
# loop over True values
for i,j in zip(*x.nonzero()):
# truncate right-hand side where necessary
out[i:i+n, j] = y[:nmax-i]
return out
def looping_expand(x, y):
"""Loop over Trues, use an expanded buffer"""
n = y.size
nmax,mmax = x.shape
ivals,jvals = x.nonzero()
# initialize buffed-up output
out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)
# loop over True values
for i,j in zip(ivals, jvals):
# slice will always be complete, i.e. of length y.size
out[i:i+n, j] = y
return out[:nmax, :].copy() # rather not return a view to an auxiliary array
def index_2d(x, y):
"""Assign directly with 2d indices, use an expanded buffer"""
n = y.size
nmax,mmax = x.shape
ivals,jvals = x.nonzero()
# initialize buffed-up output
out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)
# now we can safely index for each "(ivals:ivals+n, jvals)" so to speak
upped_ivals = ivals[:,None] + np.arange(n) # shape (ntrues, n)
upped_jvals = jvals.repeat(y.size).reshape(-1, n) # shape (ntrues, n)
out[upped_ivals, upped_jvals] = y # right-hand size of shape (n,) broadcasts
return out[:nmax, :].copy() # rather not return a view to an auxiliary array
def index_1d(x,y):
"""Assign using linear indices, use an expanded buffer"""
n = y.size
nmax,mmax = x.shape
ivals,jvals = x.nonzero()
# initialize buffed-up output
out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)
# grab linear indices corresponding to Trues in a buffed-up array
inds = np.ravel_multi_index((ivals, jvals), out.shape)
# now all we need to do is start stepping along rows for each item and assign y
upped_inds = inds[:,None] + mmax*np.arange(n) # shape (ntrues, n)
out.flat[upped_inds] = y # y of shape (n,) broadcasts to (ntrues, n)
return out[:nmax, :].copy() # rather not return a view to an auxiliary array
# check that the results are correct
print(all([np.array_equal(refout, looping_clip(x,y)),
np.array_equal(refout, looping_expand(x,y)),
np.array_equal(refout, index_2d(x,y)),
np.array_equal(refout, index_1d(x,y))]))
I tried to document each function, but here's a synopsis:
looping_clip loops over every True value in the input and assigns to a corresponding slice in the output. We take care on the right-hand side to shorten the assigned array for when part of the slice would go beyond the edge of the array along the first dimension.
looping_expand loops over every True value in the input and assigns to a corresponding full slice in the output after allocating a padded output array ensuring that every slice will be full. We do more work when allocating a larger output array, but we don't have to shorten the right-hand side on assignment. We could omit the .copy() call in the last step, but I prefer not to return a nontrivially strided array (i.e. a view to an auxiliary array rather than a proper copy) as this might lead to obscure surprises for the user.
index_2d computes the 2d indices of every value to be assigned to, and assumes that duplicate indices will be handled in order. This is not guaranteed! (More on this a bit later.)
index_1d does the same using linearized indices and indexing into the flatiter of the output.
Here are the timings of the above methods using random arrays (see the commented line near the start):
What we can see is that for small and large arrays the looping versions are faster, but for linear sizes between roughly 10 and 150 the indexing versions are better. The reason I didn't go to higher sizes is that the indexing cases start to use a lot of memory, and I didn't want to have to worry about this messing with timings.
Just to make the above worse, note that the indexing versions assume that duplicate indices in a fancy indexing scenario are handled in order, so when True values are handled which are "lower" in the array, previous values will be overwritten as per your requirements. There's only one problem: this is not guaranteed:
For advanced assignments, there is in general no guarantee for the iteration order. This means that if an element is set more than once, it is not possible to predict the final result.
This doesn't sounds very encouraging. While in my experiments it seems that the indices are handled in order (according to C order), this can also be coincidence, or an implementation detail. So if you want to use the indexing versions, make sure that on your specific version and specific dimensions and shapes this still holds true.
We can make the assignment safer by getting rid of duplicate indices ourselves. For this we can make use of this answer by Divakar on a corresponding question:
def index_1d_safe(x,y):
"""Same as index_1d but use Divakar's safe solution for reducing duplicates"""
n = y.size
nmax,mmax = x.shape
ivals,jvals = x.nonzero()
# initialize buffed-up output
out = np.zeros((nmax + max(n + ivals.max() - nmax,0), mmax), dtype=y.dtype)
# grab linear indices corresponding to Trues in a buffed-up array
inds = np.ravel_multi_index((ivals, jvals), out.shape)
# now all we need to do is start stepping along rows for each item and assign y
upped_inds = inds[:,None] + mmax*np.arange(n) # shape (ntrues, n)
# now comes https://stackoverflow.com/a/44672126
# need additional step: flatten upped_inds and corresponding y values for selection
upped_flat_inds = upped_inds.ravel() # shape (ntrues, n) -> (ntrues*n,)
y_vals = np.broadcast_to(y, upped_inds.shape).ravel() # shape (ntrues, n) -> (ntrues*n,)
sidx = upped_flat_inds.argsort(kind='mergesort')
sindex = upped_flat_inds[sidx]
idx = sidx[np.r_[np.flatnonzero(sindex[1:] != sindex[:-1]), upped_flat_inds.size-1]]
out.flat[upped_flat_inds[idx]] = y_vals[idx]
return out[:nmax, :].copy() # rather not return a view to an auxiliary array
This still reproduces your expected output. The problem is that now the function takes much longer to finish:
Bummer. Considering how my indexing versions are only faster for an intermediate array size and how their faster versions are not guaranteed to work, perhaps it's simplest to just use one of the looping versions. This is not to say, of course, that there aren't any optimal vectorized solutions that I missed.

Performant creation of sparse (stiffness) matrix

i'm currently implementing a small finite element sim. using Python/Numpy, and i am looking for an efficient way to create the global stiffness matrix:
1) I think that the creation of a sparse matrix from smaller element stiffness matrices should be done using coo_matrix(). However, can i extend an existing coo_matrix, or should i create it from the final i,j and v lists?
2) Currently, i am creating the i and j lists from the smaller element stiffness matrix using list comprehensions and concatenating them. Is there a better way to create these lists?
3) Creation of the data vector: Same question, are python lists preferred over numpy vectors due to the easy extension possibilities?
4) Of course i am open for any advices :). Thank You!
Here is a small example of my current plan to do the global assembly to make clear what i intend:
import numpy as np
from scipy.sparse import coo_matrix
#2 nodes, 3 dof per node
locations = [0, 6]
nNodes = 2
dof =3
totSize = nNodes * dof
Ke = np.array([[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3]])
I = []
J = []
#generate rowwise i and j lists:
i = [ idx + u for i in range(totSize) for idx in locations for u in range(dof) ]
j = [ idx + u for idx in locations for u in range(dof) for i in range(totSize) ]
I += i
J += J
Data = Ke.flatten()
cMatrix = coo_matrix( (Data, (i,j)), )

In this post, I would try to focus on performance issue specific to the creation of lists i, j and finally matrix cMatrix.
Under those loop/list comprehensions, you are basically performing element-wise additions of locations and range(dof). Porting over to NumPy, we could leverage broadcasting there. Finally, to simulate for range(totSize) again in those comprehensions, we could tile the final addition result with np.tile. We will use it as its flattened version for indexing into columns of the sparse matrix and its transposed flattened version for rows.
Thus, the implementation would look something like this -
idx0 = (np.asarray(locations)[:,None] + np.arange(dof)).ravel()
J = np.tile(idx0[:,None],totSize)
cMatrix = coo_matrix( (Data, (J.ravel('F'),J.ravel())), )

Huge sparse matrix in python

I need to iteratively construct a huge sparse matrix in numpy/scipy. The intitialization is done within a loop:
from scipy.sparse import dok_matrix, csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
matrix = dok_matrix((dim_x, dim_y))
for i in range(dim_x):
# compute stuff in order to get j
matrix[i, j] = 1.
return matrix.tocsr()
Then i need to convert it to a csr_matrix, because of further computations like:
matrix = foo(...)
result = matrix.T.dot(x)
At the beginning this was working fine. But my matrices are getting bigger and bigger and my computer starts to crash. Is there a more elegant way in storing the matrix?
Basically i have the following requirements:
The matrix needs to store float values form 0. to 1.
I need to compute the transpose of the matrix
I need to compute the dot product with a x_dimensional vector
The matrix dimensions can be around 1*10^9 x 1*10^8
My ram-storage is exceeding. I was reading several posts on stack overflow and the rest of the internet ;) I found PyTables, which isn't really made for matrix computations... etc.. Is there a better way?

For your case I would recommend using the data type np.int8 (or np.uint8) which require only one byte per element:
matrix = dok_matrix((dim_x, dim_y), dtype=np.int8)
Directly constructing the csr_matrix will also allow you to go further with the maximum matrix size:
from scipy.sparse import csr_matrix
def foo(*args):
dim_x = 256*256*1024
dim_y = 128*128*512
row = []
col = []
for i in range(dim_x):
# compute stuff in order to get j
row.append(i)
col.append(j)
data = np.ones_like(row, dtype=np.int8)
return csr_matrix((data, (row, col)), shape=(dim_x, dim_y), dtype=np.int8)

You may have hit the limits of what Python can do for you, or you may be able to do a little more. Try setting a datatype of np.float32, if you're on a 64 bit machine, this reduced precision may reduce your memory consumption. np.float16 may help you on memory even further, but your calculations may slow down (I've seen examples where processing may take 10x the amount of time):
matrix = dok_matrix((dim_x, dim_y), dtype=np.float32)
or possibly much slower, but even less memory consumption:
matrix = dok_matrix((dim_x, dim_y), dtype=np.float16)
Another option: buy more system memory.
Finally, if you can avoid creating your matrix with dok_matrix, and can create it instead with csr_matrix (I don't know if this is possible for your calculations) you may save a little overhead on the dict that dok_matrix uses.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

coo_matrix without concatenate - python

Related

scipy sparse `LinearOperator` preserves sparseness under what conditions?

How to use a ndarray of stored ndarrays with memmap as a big ndarray tensor

efficient setting 1D range values in a DataFrame (or a ndarray) with boolean array

Performant creation of sparse (stiffness) matrix

Huge sparse matrix in python

Categories

Resources