simplity construction of sparse (transition) matrix - python

I am constructing a transition matrix from a n1 x n2 x ... x nN x nN array. For concreteness let N = 3, e.g.,
import numpy as np
# example with N = 3
n1, n2, n3 = 3, 2, 5
dim = (n1, n2, n3)
arr = np.random.random_sample(dim + (n3,))
Here arr contains transition probabilities between 2 states, where the "from"-state is indexed by the first 3 dimensions, and the "to"-state is indexed by the first 2 and the last dimension. I want to construct a transition matrix, which expresses these probabilities raveled into a sparse (n1*n2*n3) x (n1*n2*n3 matrix.
To clarify, let me provide my current approach that does what I want to do. Unfortunately, it's slow and doesn't work when N and n1, n2, ... are large. So I am looking for a more efficient way of doing the same that scales better for larger problems.
My approach
import numpy as np
from scipy import sparse as sparse
## step 1: get the index correponding to each dimension of the from and to state
# ravel axes 1 to 3 into single axis and make sparse
spmat = sparse.coo_matrix(arr.reshape(np.prod(dim), -1))
data = spmat.data
row = spmat.row
col = spmat.col
# use unravel to get idx for
row_unravel = np.array(np.unravel_index(row, dim))
col_unravel = np.array(np.unravel_index(col, n3))
## step 2: combine "to" index with rows 1 and 2 of "from"-index to get "to"-coordinates in full state space
row_unravel[-1, :] = col_unravel # first 2 dimensions of state do not change
colnew = np.ravel_multi_index(row_unravel, dim) # ravel back to 1d
## step 3: assemble transition matrix
out = sparse.coo_matrix((data, (row, colnew)), shape=(np.prod(dim), np.prod(dim)))
Final thought
I will be running this code many times. Across iterations, the data of arr may change, but the dimensions will stay the same. So one thing I could do is to save and load row and colnew from a file, skipping everything between the definition of data (line 2) and final line of my code. Do you think this would be the best approach?
Edit: One problem I see with this strategy is that if some elements of arr are zero (which is possible) then the size of data will change across iterations.

One approach that beats the one posted in the OP. Not sure if it's the most efficient.
import numpy as np
from scipy import sparse
# get col and row indices
idx = np.arange(np.prod(dim))
row = idx.repeat(dim[-1])
col = idx.reshape(-1, dim[-1]).repeat(dim[-1], axis=0).ravel()
# get the data
data = arr.ravel()
# construct the sparse matrix
out = sparse.coo_matrix((data, (row, col)), shape=(np.prod(dim), np.prod(dim)))
Two things that could be improved:
(1) if arr is sparse, the output matrix out will have zeros coded as nonzero.
(2) The approach relies on the new state being the last dimension of dim. It would be nice to generalize so that the last axis of arr can replace any of the originating axis, not just the last one.

Related

Matrix-vector-multiplication with tensors in numpy

I have a numpy.array A with shape (l,l) and another numpy.array B with shape (l,m,n). Usually, the second and third dimension in B correspond to spatial cells and the first to something else.
I want to compute
l,m,n = 2,3,4 # dummy dimensions
A = np.random.rand(l,l) # dummy data
B = np.random.rand(l,m,n) # dummy data
C = np.zeros((l,m,n))
for i in range(m):
for j in range(n):
C[:,i,j] = A#B[:,i,j]
i.e., in every spatial cell, I want to perform a matrix-vector-multiplication.
Since I have to do this frequently, I would like to know, if there's a more compact way to write this with numpy. (Especially, because there are several situations in which the tensor has shape (l,m,n,o,p).)
Thank you in advance!
I found the answer using np.einsum:
np.einsum('ij,jkl->ikl', A,B)
Explanation:
Einstein notation implies that we sum over matching subscripts.
np.einsum('ij,jkl->ikl', A,B)
= rewritten in math terms
A_{i,j} B_{j,k,l}
= Einstein notation implies summation
sum_j A_{i,j} B_{j,k,l}

way to create a 3d matrix of 2 vectors and 1 matrix

Hello i have a question regarding a problem I am facing in python. I was studying about tensors and I saw that each row/column of a tensor must have the same size. Is it possible to create a tensor of perhaps a 3d object or matrix where lets say we have 3 axis : x,y,z
In the x axis I want to create a vector to work as an index. So let x be from 0 to N
Then on the y axis I want to have N random integer vectors of size m (where mm
Is it possible?
My first approach was to create a big vector of Nm and a big matrix of (Nm,Nm) dimensions where i would store all my random vectors and matrices and then if I wanted to change for example the my second vector then i would have to play with the indexes. However is there another way to approach this problem with tensors or numpy that I m unaware of?
Thank you in advance for your advices
First vector, N = 3, [1,2, 3]
Second N vectors with length m, m = 2
[[4,5], [6,7], [7,8]]
So, N matrices of size (m,m)
[[[1,1], [2,2]], [[1,1], [2,2]], [[1,1], [2,2]] ]
Lets create numpy arrays from them.
import numpy as np
N = 3
m = 2
a = np.array([1,2,3])
b = np.random.randn(N, m)
c = np.random.randn(N, m, m)
You see the problem here? The last matrix c has already 3 dimensions according to your definitions.
Your argument can be simplified.
Let's say our final matrix is -
a = np.zeros((3,2,2)) # 3 dimensions, x,y,z
1) For first dimension -
a[0,:,:] = 0 # first axis, first index = 0
a[1,:,:] = 1 # first axis, 2nd index = 1
a[2,:,:] = 2 # first axis, 3rd index = 2
2) Now, we need to fill up the rest of the positions, but dimensions don't match up.
So, it's better to create separate tensors for them.

How to get chunks of submatrices faster?

I have a really big matrix (nxn)for which I would to build the intersecting tiles (submatrices) with the dimensions mxm. There will be an offset of step bvetween each contiguous submatrices. Here is an example for n=8, m=4, step=2:
import numpy as np
matrix=np.random.randn(8,8)
n=matrix.shape[0]
m=4
step=2
This will store all the corner indices (x,y) from which we will take a 4x4 natrix: (x:x+4,x:x+4)
a={(i,j) for i in range(0,n-m+1,step) for j in range(0,n-m+1,step)}
The submatrices will be extracted like that
sub_matrices = np.zeros([m,m,len(a)])
for i,ind in enumerate(a):
x,y=ind
sub_matrices[:,:,i]=matrix[x:x+m, y:y+m]
Is there a faster way to do this submatrices initialization?
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
# Get indices as array
ar = np.array(list(a))
# Get all sliding windows
w = view_as_windows(matrix,(m,m))
# Get selective ones by indexing with ar
selected_windows = np.moveaxis(w[ar[:,0],ar[:,1]],0,2)
Alternatively, we can extract the row and col indices with a list comprehension and then index with those, like so -
R = [i[0] for i in a]
C = [i[1] for i in a]
selected_windows = np.moveaxis(w[R,C],0,2)
Optimizing from the start, we can skip the creation of stepping array, a and simply use the step arg with view_as_windows, like so -
view_as_windows(matrix,(m,m),step=2)
This would give us a 4D array and indexing into the first two axes of it would have all the mxm shaped windows. These windows are simply views into input and hence no extra memory overhead plus virtually free runtime!
import numpy as np
a = np.random.randn(n, n)
b = a[0:m*step:step, 0:m*step:step]
If you have a one-dimension array, you can get it's submatrix by the following code:
c = a[start:end:step]
If the dimension is two or more, add comma between every dimension.
d = a[start1:end1:step1, start2:end3:step2]

Efficiently creating masks - Numpy /Python

Suppose I have a NumPy array with shape (50, 10000, 10000) with 1000 distinct "clusters". For example, there would be small volume somewhere with just 1s, another small volume with 2s, etc. I would like to iterate through each cluster to create a mask like so:
for i in np.unique(arr)[1:]:
mask = arr == i
#do other stuff with mask
Creating each mask takes about 15 seconds, and iterating through 1000 clusters would take more than 4 hours. Is there a possible way to speed up the code or is this the best there is since there is no avoiding iterating through each element of the array?
EDIT: the dtype of the array is uint16
I'm assuming arr is sparse:
you say the clusters are small, and 1000 clusters isn't going to tile an array that big
you iterate over np.unique(arr)[1:], so I assume the first unique value is 0
In this case I would recommend leveraging a scipy.sparse.csr_matrix
from scipy.sparse import csr_matrix
sp_arr = csr_matrix(arr.reshape(1,-1))
This turns your big dense array into a one-row compressed sparse row array. Since sparse arrays don't like more than 2 dimensions, this tricks it into using ravelled indices. Now sp_arr has data (the cluster labels), indices (the ravelled indices), and indptr (which is trivial here since we only have one row). So,
for i in np.unique(sp_arr.data): # as a bonus this `unique` call should be faster too
x, y, z = np.unravel_index(sp_arr.indices[sp_arr.data == i], arr.shape)
Should much more efficiently give equivalent coordinates to
for i in np.unique(arr)[:1]:
x, y, z = np.nonzero(arr == i)
where x, y, z are the indices of the True values in mask. From there you can either reconstruct mask or work off the indices (recommended).
You could also do this purely with numpy, and still have a boolean mask at the end, but a bit less memory efficient:
all_mask = arr != 0 # points assigned to any cluster
data = arr[all_mask] # all cluster labels
for i in np.unique(data):
mask = all_mask.copy()
mask[mask] = data == i # now mask is same as before

Creating list of 2D matrices in Python

I am trying to create a list of 2d matrices, as per the illustration below:
list of 2d matrices
Basically, I want to start with a NxN matrix with all zeros and sequentially replace the 0's with 1's (as shown in the image). With each modification changing the 0's to 1's, I would like to output the matrix at that step and save it in a list or array.
For the first row of matrices in the illustration, I have this:
dim = 4
x=[]
for i in range(0,dim):
matrix = np.zeros((dim,dim))
matrix[0,i] = 1
x.append(matrix)
m0 = x[0]
m1 = x[0]+x[1]
m2 = x[0]+x[1]+x[2]
m3 = x[0]+x[1]+x[2]+x[3]
I would like to generalize this so I not only get the first row but the rest of the rows shown in the image and change the matrix size through 'dim'. I can't seem to figure this out. I'd appreciate any help with this.
This will do the job:
import numpy as np
dim = 4
x=[]
for i in range(dim):
lst=[]
matrix=np.zeros((dim,dim))
vec=np.ones(i+1)
for j in range(dim):
matrix[0:i+1,j]=vec
lst.append(np.copy(matrix))
x.append(lst)
print(x)

Categories

Resources