Performant creation of sparse (stiffness) matrix

Performant creation of sparse (stiffness) matrix - python

i'm currently implementing a small finite element sim. using Python/Numpy, and i am looking for an efficient way to create the global stiffness matrix:
1) I think that the creation of a sparse matrix from smaller element stiffness matrices should be done using coo_matrix(). However, can i extend an existing coo_matrix, or should i create it from the final i,j and v lists?
2) Currently, i am creating the i and j lists from the smaller element stiffness matrix using list comprehensions and concatenating them. Is there a better way to create these lists?
3) Creation of the data vector: Same question, are python lists preferred over numpy vectors due to the easy extension possibilities?
4) Of course i am open for any advices :). Thank You!
Here is a small example of my current plan to do the global assembly to make clear what i intend:
import numpy as np
from scipy.sparse import coo_matrix
#2 nodes, 3 dof per node
locations = [0, 6]
nNodes = 2
dof =3
totSize = nNodes * dof
Ke = np.array([[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[1,1,1, 2,2,2],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3],
[2,2,2, 3,3,3]])
I = []
J = []
#generate rowwise i and j lists:
i = [ idx + u for i in range(totSize) for idx in locations for u in range(dof) ]
j = [ idx + u for idx in locations for u in range(dof) for i in range(totSize) ]
I += i
J += J
Data = Ke.flatten()
cMatrix = coo_matrix( (Data, (i,j)), )

In this post, I would try to focus on performance issue specific to the creation of lists i, j and finally matrix cMatrix.
Under those loop/list comprehensions, you are basically performing element-wise additions of locations and range(dof). Porting over to NumPy, we could leverage broadcasting there. Finally, to simulate for range(totSize) again in those comprehensions, we could tile the final addition result with np.tile. We will use it as its flattened version for indexing into columns of the sparse matrix and its transposed flattened version for rows.
Thus, the implementation would look something like this -
idx0 = (np.asarray(locations)[:,None] + np.arange(dof)).ravel()
J = np.tile(idx0[:,None],totSize)
cMatrix = coo_matrix( (Data, (J.ravel('F'),J.ravel())), )

Related

simplity construction of sparse (transition) matrix

I am constructing a transition matrix from a n1 x n2 x ... x nN x nN array. For concreteness let N = 3, e.g.,
import numpy as np
# example with N = 3
n1, n2, n3 = 3, 2, 5
dim = (n1, n2, n3)
arr = np.random.random_sample(dim + (n3,))
Here arr contains transition probabilities between 2 states, where the "from"-state is indexed by the first 3 dimensions, and the "to"-state is indexed by the first 2 and the last dimension. I want to construct a transition matrix, which expresses these probabilities raveled into a sparse (n1*n2*n3) x (n1*n2*n3 matrix.
To clarify, let me provide my current approach that does what I want to do. Unfortunately, it's slow and doesn't work when N and n1, n2, ... are large. So I am looking for a more efficient way of doing the same that scales better for larger problems.
My approach
import numpy as np
from scipy import sparse as sparse
## step 1: get the index correponding to each dimension of the from and to state
# ravel axes 1 to 3 into single axis and make sparse
spmat = sparse.coo_matrix(arr.reshape(np.prod(dim), -1))
data = spmat.data
row = spmat.row
col = spmat.col
# use unravel to get idx for
row_unravel = np.array(np.unravel_index(row, dim))
col_unravel = np.array(np.unravel_index(col, n3))
## step 2: combine "to" index with rows 1 and 2 of "from"-index to get "to"-coordinates in full state space
row_unravel[-1, :] = col_unravel # first 2 dimensions of state do not change
colnew = np.ravel_multi_index(row_unravel, dim) # ravel back to 1d
## step 3: assemble transition matrix
out = sparse.coo_matrix((data, (row, colnew)), shape=(np.prod(dim), np.prod(dim)))
Final thought
I will be running this code many times. Across iterations, the data of arr may change, but the dimensions will stay the same. So one thing I could do is to save and load row and colnew from a file, skipping everything between the definition of data (line 2) and final line of my code. Do you think this would be the best approach?
Edit: One problem I see with this strategy is that if some elements of arr are zero (which is possible) then the size of data will change across iterations.

One approach that beats the one posted in the OP. Not sure if it's the most efficient.
import numpy as np
from scipy import sparse
# get col and row indices
idx = np.arange(np.prod(dim))
row = idx.repeat(dim[-1])
col = idx.reshape(-1, dim[-1]).repeat(dim[-1], axis=0).ravel()
# get the data
data = arr.ravel()
# construct the sparse matrix
out = sparse.coo_matrix((data, (row, col)), shape=(np.prod(dim), np.prod(dim)))
Two things that could be improved:
(1) if arr is sparse, the output matrix out will have zeros coded as nonzero.
(2) The approach relies on the new state being the last dimension of dim. It would be nice to generalize so that the last axis of arr can replace any of the originating axis, not just the last one.

Efficient way of constructing a 3D stack of block diagonal matrix in numpy/scipy from a 3D stack of matrices

I am trying to construct a stack of block diagonal matrix in the form of nXMXM in numpy/scipy from a given stacks of matrices (nXmXm), where M=k*m with k the number of stacks of matrices. At the moment, I'm using the scipy.linalg.block_diag function in a for loop to perform this task:
import numpy as np
import scipy.linalg as linalg
a = np.ones((5,2,2))
b = np.ones((5,2,2))
c = np.ones((5,2,2))
result = np.zeros((5,6,6))
for k in range(0,5):
result[k,:,:] = linalg.block_diag(a[k,:,:],b[k,:,:],c[k,:,:])
However, since n is in my case getting quite large, I'm looking for a more efficient way than a for loop. I found 3D numpy array into block diagonal matrix but this does not really solve my problem. Anything I could imagine is transforming each stack of matrices into block diagonals
import numpy as np
import scipy.linalg as linalg
a = np.ones((5,2,2))
b = np.ones((5,2,2))
c = np.ones((5,2,2))
a = linalg.block_diag(*a)
b = linalg.block_diag(*b)
c = linalg.block_diag(*c)
and constructing the resulting matrix from it by reshaping
result = linalg.block_diag(a,b,c)
result = result.reshape((5,6,6))
which does not reshape. I don't even know, if this approach would be more efficient, so I'm asking if I'm on the right track or if somebody knows a better way of constructing this block diagonal 3D matrix or if I have to stick with the for loop solution.
Edit:
Since I'm new to this platform, I don't know where to leave this (Edit or Answer?), but I want to share my final solution: The highlightet solution from panadestein worked very nice and easy, but I'm now using higher dimensional arrays, where my matrices reside in the last two dimensions. Additionally my matrices are no longer of the same dimension (mostly a mixture of 1x1, 2x2, 3x3), so I adopted V. Ayrat's solution with minor changes:
def nd_block_diag(arrs):
shapes = np.array([i.shape for i in arrs])
out = np.zeros(np.append(np.amax(shapes[:,:-2],axis=0), [shapes[:,-2].sum(), shapes[:,-1].sum()]))
r, c = 0, 0
for i, (rr, cc) in enumerate(shapes[:,-2:]):
out[..., r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
return out
which works also with array broadcasting, if the input arrays are shaped properly (i.e. the dimensions, which are to be broadcasted are not added automatically). Thanks to pandestein and V. Ayrat for your kind and fast help, I've learned a lot about the possibilites of list comprehensions and array indexing/slicing!

block_diag also just iterate through shapes. Almost all time spend in copying data so you can do it whatever way your want for example with little change of source code of block_diag
arrs = a, b, c
shapes = np.array([i.shape for i in arrs])
out = np.zeros([shapes[0, 0], shapes[:, 1].sum(), shapes[:, 2].sum()])
r, c = 0, 0
for i, (_, rr, cc) in enumerate(shapes):
out[:, r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
print(np.allclose(result, out))
# True

I don't think that you can escape all possible loops to solve your problem. One way that I find convenient and perhaps more efficient than your for loop is to use a list comprehension:
import numpy as np
from scipy.linalg import block_diag
# Define input matrices
a = np.ones((5, 2, 2))
b = np.ones((5, 2, 2))
c = np.ones((5, 2, 2))
# Generate block diagonal matrices
mats = np.array([a, b, c]).reshape(5, 3, 2, 2)
result = [block_diag(*bmats) for bmats in mats]
Maybe this can give you some ideas to improve your implementation.

How to get chunks of submatrices faster?

I have a really big matrix (nxn)for which I would to build the intersecting tiles (submatrices) with the dimensions mxm. There will be an offset of step bvetween each contiguous submatrices. Here is an example for n=8, m=4, step=2:
import numpy as np
matrix=np.random.randn(8,8)
n=matrix.shape[0]
m=4
step=2
This will store all the corner indices (x,y) from which we will take a 4x4 natrix: (x:x+4,x:x+4)
a={(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step)}
The submatrices will be extracted like that
sub_matrices = np.zeros([m,m,len(a)])
for i,ind in enumerate(a):
x,y=ind
sub_matrices[:,:,i]=matrix[x:x+m, y:y+m]
Is there a faster way to do this submatrices initialization?

We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
# Get indices as array
ar = np.array(list(a))
# Get all sliding windows
w = view_as_windows(matrix,(m,m))
# Get selective ones by indexing with ar
selected_windows = np.moveaxis(w[ar[:,0],ar[:,1]],0,2)
Alternatively, we can extract the row and col indices with a list comprehension and then index with those, like so -
R = [i[0] for i in a]
C = [i[1] for i in a]
selected_windows = np.moveaxis(w[R,C],0,2)
Optimizing from the start, we can skip the creation of stepping array, a and simply use the step arg with view_as_windows, like so -
view_as_windows(matrix,(m,m),step=2)
This would give us a 4D array and indexing into the first two axes of it would have all the mxm shaped windows. These windows are simply views into input and hence no extra memory overhead plus virtually free runtime!

import numpy as np
a = np.random.randn(n, n)
b = a[0:m*step:step, 0:m*step:step]
If you have a one-dimension array, you can get it's submatrix by the following code:
c = a[start:end:step]
If the dimension is two or more, add comma between every dimension.
d = a[start1:end1:step1, start2:end3:step2]

coo_matrix without concatenate

I have a number of indices and values that make up a scipy.coo_matrix. The indices/values are generated from different subroutines and are concatenated together before handed over to the matrix constructor:
import numpy
from scipy import sparse
n = 100000
I0 = range(n)
J0 = range(n)
V0 = numpy.random.rand(n)
I1 = range(n)
J1 = range(n)
V1 = numpy.random.rand(n)
# [...]
I = numpy.concatenate([I0, I1])
J = numpy.concatenate([J0, J1])
V = numpy.concatenate([V0, V1])
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
Now, the components of (I, J, V) can be quite large such that the concatenate operations become significant. (In the above example it takes over 20% of the runtime on my machine.) I'm reading that it's not possible to concatenate without a copy.
Is there a way for handing over indices and values without copying the input data around first?

If you look at the code for coo_matrix.__init__ you'll see that it's pretty simple. In fact if the (V, (I,J)) inputs are right it will simply assign those 3 arrays to its .data, row, col attributes. You can even check that after creation by comparing those attributes with your variables.
If they aren't 1d arrays of the right dtype, it will massage them - make the arrays, etc. So without getting into details, processing that you do before hand might save time in the coo call.
self.row = np.array(row, copy=copy, dtype=idx_dtype)
self.col = np.array(col, copy=copy, dtype=idx_dtype)
self.data = np.array(obj, copy=copy)
One way or other those attributes will have to each be a single array, not a loose list of arrays or lists of lists.
sparse.bmat makes a coo matrix from other ones. It collected their coo attributes, joins them in the fill an empty array styles, and calls coo_matrix. Look at its code.
Almost all numpy operations that return a new array do so by allocating an empty and filling it. Letting numpy do that in compiled code (with np.concatentate) should be a be a little faster, but details like the size and number of inputs will make a difference.
A non_connonical coo matrix is just the start. Many operations require a conversion to one of the other formats.
Efficiently construct FEM/FVM matrix
This is about sparse matrix constrution where there are many duplicate points that need to be summed - and using using the csr format for calculations.

You can try pre-allocating the arrays. It'll spare you the copy at least. I didn't see any speedup for the example, but you might see a change.
import numpy
from scipy import sparse
n = 100000
I = np.empty(2*n, np.double)
J = np.empty_like(I)
V = np.empty_like(I)
I[:n] = range(n)
J[:n] = range(n)
V[:n] = numpy.random.rand(n)
I[n:] = range(n)
J[n:] = range(n)
V[n:] = numpy.random.rand(n)
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))

sparse matrix LP problems in Gurobi / python

I am trying to solve an LP problem represented using sparse matrices in Gurobi / python.
max c′ x, subject to A x = b, L &leq; x &leq; U
where A is a SciPy linked list sparse matrix of size ~10002. Using the code
model = gurobipy.Model()
rows, cols = len(b), len(c)
for j in range(cols):
model.addVar(lb=L[j], ub=U[j], obj=c[j])
model.update()
vars = model.getVars()
S = scipy.sparse.coo_matrix(A)
expr, used = [], []
for i in range(rows):
expr.append(gurobipy.LinExpr())
used.append(False)
for i, j, s in zip(S.row, S.col, S.data):
expr[i] += s*vars[j]
used[i] = True
for i in range(rows):
if used[i]:
model.addConstr(lhs=expr[i], sense=gurobipy.GRB.EQUAL, rhs=b[i])
model.update()
model.ModelSense = -1
model.optimize()
the problem is built and solved in ~1s, which is ~10-100 times slower than the same task in Gurobi / Matlab. Do you have any suggestions for improving the efficiency of the problem definition, or suggestions for avoiding translation to sparse coordinate format?

MATLAB will always be more efficient than scipy when dealing with sparse matrices. However, there are a few things you might try to speed things up.
Gurobi's Python interface takes individual sparse constraints. This means that you want to access your matrix in compressed sparse row format (rather than coordinate format).
Try doing:
S = S.tocsr()
or directly constructing your matrix in compressed sparse row format.
This page indicates that you can access the raw data, indicies, and row pointers from a scipy sparse matrix in CSR format. So you should then be able to iterate over these as follows:
model = gurobipy.Model()
row, cols = len(b), len(c)
x = []
for j in xrange(cols):
x.append(model.addVar(lb=L[j], ub=U[j], obj=c[j])
model.update()
# iterate over the rows of S adding each row into the model
for i in xrange(rows):
start = S.indptr[i]
end = S.indptr[i+1]
variables = [x[j] for j in S.indices[start:end]]
coeff = S.data[start:end]
expr = gurobipy.LinExpr(coeff, variables)
model.addConstr(lhs=expr, sense=gurobipy.GRB.EQUAL, rhs=b[i])
model.update()
model.ModelSense = -1
model.optimize()
Note that I added all the terms at once into the expression using the LinExpr() constructor.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Performant creation of sparse (stiffness) matrix - python

Related

simplity construction of sparse (transition) matrix

Efficient way of constructing a 3D stack of block diagonal matrix in numpy/scipy from a 3D stack of matrices

How to get chunks of submatrices faster?

coo_matrix without concatenate

sparse matrix LP problems in Gurobi / python

Categories

Resources