Sampling few rows of a scipy sparse matrix into another - python

How can I sample some of the rows of a scipy sparse matrix and form a new scipy sparse matrix from these sampled rows?
For eg. if I have a scipy sparse matrix A with 10 rows and I want to make a new scipy sparse matrix B with rows 1,3,4 from A, how to do that?

Left-multiply with an appropriate indicator matrix. The indicator matrix can be built using scipy.sparse.block_diag or directly, using csr format, as shown below.
>>> import numpy as np
>>> from scipy import sparse
>>>
# create example
>>> m, n = 10, 8
>>> subset = [1,3,4]
>>> A = sparse.csr_matrix(np.random.randint(-10, 5, (m, n)).clip(0, None))
>>> A.A
array([[3, 2, 4, 0, 0, 0, 2, 0],
[0, 0, 2, 0, 0, 0, 0, 0],
[4, 0, 0, 0, 0, 2, 0, 0],
[0, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 1, 4, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 0],
[0, 0, 0, 4, 0, 4, 4, 0],
[0, 2, 0, 0, 0, 3, 0, 0],
[4, 0, 3, 3, 0, 0, 0, 2],
[4, 0, 0, 0, 0, 2, 0, 1]], dtype=int64)
>>>
# build indicator matrix
# either using block_diag ...
>>> split_points = np.arange(len(subset)+1).repeat(np.diff(np.concatenate([[0], subset, [m-1]])))
>>> indicator = sparse.block_diag(np.split(np.ones(len(subset), int), split_points)).T
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int64)
>>>
# ... or manually---this also works for non sorted non unique subset,
# and is therefore to be preferred over block_diag
>>> indicator = sparse.csr_matrix((np.ones(len(subset), int), subset, np.arange(len(subset)+1)), (len(subset), m))
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])
>>>
# apply
>>> result = indicator#A
>>> result.A
array([[0, 0, 2, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 4, 0],
[3, 0, 0, 0, 1, 4, 0, 0]], dtype=int64)

Related

How to compose several matrices into a big matrix diagonally in Pytorch

I have several matrices, let's say [M1,M2,M3,M4]. Each matrix has a different shape. How do I compose these matrices into one big matrix diagonally like:
[[M1, 0, 0, 0]
[0, M2, 0, 0]
[0, 0, M2, 0]
[0, 0, 0, M2]]
Example:
M1 = [[1,2],[2,1]]
M2 = [[1,2]]
M3 = [[3]]
M4 = [[3,4,5],[4,5,6]]
To compose this big matrix:
[[1, 2, 0, 0, 0, 0, 0]
[2, 1, 0, 0, 0, 0, 0]
[0, 0, 1, 2, 0, 0, 0]
[0, 0, 0, 0, 3, 4, 5]
[0, 0, 0, 0, 4, 5, 6]]
Here is how to do it using SciPy:
from scipy.sparse import block_diag
block_diag((M1, M2, M3, M4))
Use PyTorch's torch.block_diag():
>>> torch.block_diag(M1,M2,M3,M4)
tensor([[1, 2, 0, 0, 0, 0, 0, 0],
[2, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 2, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0, 0, 0],
[0, 0, 0, 0, 0, 3, 4, 5],
[0, 0, 0, 0, 0, 4, 5, 6]])

Python: use numpy array of indices to "lookup" values from another matrix

What is the fastest way of creating a new matrix that is a result of a "look-up" of some numpy matrix X (using an array of indices to be looked up in matrix X)? Example of what I want to achieve:
indices = np.array([[[1,1],[1,1],[3,3]],[[1,1],[5,8],[6,9]]]) #[i,j]
new_matrix = lookup(X, use=indices)
Output will be something like:
new_matrix = np.array([[3,3,7],[3,4,9]])
where for example X[1,1] was 3. I'm using python 2.
Use sliced columns for indexing into X -
X[indices[...,0], indices[...,1]]
Or with tuple -
X[tuple(indices.T)].T # or X[tuple(indices.transpose(2,0,1))]
Sample run -
In [142]: X
Out[142]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 9]])
In [143]: indices
Out[143]:
array([[[1, 1],
[1, 1],
[3, 3]],
[[1, 1],
[5, 8],
[6, 9]]])
In [144]: X[indices[...,0], indices[...,1]]
Out[144]:
array([[3, 3, 7],
[3, 4, 9]])

Python array creation with shape

a = np.diag(np.array([2,3,4,5,6]),k=-1)
For the above code, I want to know how to change it for shaping the 6*6 matrix into 6*5 matrix with the first line is filled with 0 and the following lines with 2,3,4,5,6 to be diagonal? Thank you very much
I don't understand what you want to know.
In your code if k>0
then the resultant matrix will have k extra columns,if k=2 then,
output will be :
array([[0, 0, 2, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 0, 0],
[0, 0, 0, 0, 4, 0, 0],
[0, 0, 0, 0, 0, 5, 0],
[0, 0, 0, 0, 0, 0, 6],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
And if k<0 then it will have the k extra rows , for example if k=-1
then:
array([[0, 0, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 0],
[0, 3, 0, 0, 0, 0],
[0, 0, 4, 0, 0, 0],
[0, 0, 0, 5, 0, 0],
[0, 0, 0, 0, 6, 0]])
and if k=0 then :
array([[2, 0, 0, 0, 0],
[0, 3, 0, 0, 0],
[0, 0, 4, 0, 0],
[0, 0, 0, 5, 0],
[0, 0, 0, 0, 6]])
I think you want to create a matrix of 5*5 and then want too add a row. Then you can do it using this
a=a.tolist()
Now a is 2d list and you can insert the row wherever you want.
Do this for your result.
a.insert(0,[0,0,0,0,0])

How to create diagonal array from existing array in numpy

I am trying to make a diagonal numpy array from:
[1,2,3,4,5,6,7,8,9]
Expected result:
[[ 0, 0, 1, 0, 0],
[ 0, 0, 0, 2, 0],
[ 0, 0, 0, 0, 3],
[ 4, 0, 0, 0, 0],
[ 0, 5, 0, 0, 0],
[ 0, 0, 6, 0, 0],
[ 0, 0, 0, 7, 0],
[ 0, 0, 0, 0, 8],
[ 9, 0, 0, 0, 0]]
What would be an efficient way of doing this?
You can use integer array indexing to set the specified elements of the output:
>>> import numpy as np
>>> a = [1,2,3,4,5,6,7,8,9]
>>> arr = np.zeros((9, 5), dtype=int) # create empty array
>>> arr[np.arange(9), np.arange(2,11) % 5] = a # insert a
>>> arr
array([[0, 0, 1, 0, 0],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 3],
[4, 0, 0, 0, 0],
[0, 5, 0, 0, 0],
[0, 0, 6, 0, 0],
[0, 0, 0, 7, 0],
[0, 0, 0, 0, 8],
[9, 0, 0, 0, 0]])
Inspired by np.fill_diagonal which can wrap, but not offset:
In [308]: arr=np.zeros((9,5),int)
In [309]: arr.flat[2:45:6]=np.arange(1,10)
In [310]: arr
Out[310]:
array([[0, 0, 1, 0, 0],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 3],
[0, 0, 0, 0, 0],
[4, 0, 0, 0, 0],
[0, 5, 0, 0, 0],
[0, 0, 6, 0, 0],
[0, 0, 0, 7, 0],
[0, 0, 0, 0, 8]])
(though for some reason this has the 4th all zero row).
def fill_diagonal(a, val, wrap=False):
...
step = a.shape[1] + 1
# Write the value out into the diagonal.
a.flat[:end:step] = val

place the mydata_array into the random location of Big_array of zeros

mydata is an numpy array of shape(10,100,100) of the form(z,y,x). And i have created the empty array of shape(10,800,800). Now i need to place the mydata_array into some random locations of empty_array such that if I would plot the output, it should look like mydata is placed randomly in the ouput plot of array(10,800,800).
I used the np.hstack() and np.vstack().
But it places the mydata_array side by side. I need to place my_data_array in random location.
How could i do this? Any Suggestions please..
Regards
Raj
Here's a demonstration of placing several copies of one array inside another, using slice indexing:
In [802]: out = np.zeros((10,10),int)
In [803]: src = np.arange(6).reshape(2,3)
In [804]: out
Out[804]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
One copy in the upper left:
In [805]: out[:2,:3] = src
In [806]: out
Out[806]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
....
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Several more copies:
In [808]: out[4:6, 6:9] = src
In [809]: out[1:3, 4:7] = src
In [810]: out
Out[810]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0, 3, 4, 5, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Just repeat that kind of action for a selection of random locations. Make sure that the slice ranges match the src shape, and that they lie within the dimensions of the target array.
While may be possible to insert many copies at once (the flattening of the answer may be needed), let's start with understanding how to insert one copy at a time.
=========
#alvis' answer places the src items in shuffled order on one row of the out (or wrapped rows):
array([[2, 4, 5, 3, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
===================
Looped placement of multiple blocks:
def foo1(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
for i,j in idx:
out[i:i+n, j:j+m] = src
return out
idx=np.array([[0,0],[1,4],[4,4],[8,7],[7,2]])
In [940]: out1 = foo1(src, idx, (10,10))
In [941]: out1
Out[941]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 0, 0, 0, 0, 0],
[0, 0, 3, 4, 5, 0, 0, 0, 1, 2],
[0, 0, 0, 0, 0, 0, 0, 3, 4, 5]])
================
Placement of a block with advanced indexing (arrays instead of slices):
In [880]: I = np.array([1,1,1,2,2,2])
In [881]: J = np.array([3,4,5,3,4,5])
In [882]: out[I,J] = src.flat
In [883]: out
Out[883]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 2, 0, 0, 0, 0],
[0, 0, 0, 3, 4, 5, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
And for multiple blocks
def foo2(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
ni = len(idx)
IJ = [np.mgrid[i:i+n, j:j+m] for i,j in idx]
IJ = np.concatenate(IJ, axis=1).reshape(2,-1)
out[IJ[0,:], IJ[1,:]] = np.tile(src,(ni,1)).flat
return out
In this small example the alternate is considerably slower (14x). For (1000,1000) out it is still slow (6x). Most of the time is spent in generating IJ.
This handles the I,J index calculation much faster (it needs to be generalize), but it is still slower than the looped slicing:
def foo3(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
ni = len(idx)
I = np.repeat((idx[:,[0]]+np.arange(2)).flatten(),3)
J = np.repeat((idx[:,[1]]+np.arange(3)),2,axis=0).flatten()
out[I, J] = np.tile(src,(ni,1)).flat
return out
This reminds me of work I did years ago to speed up the creation of a finite element stiffness matrix in MATLAB. There it was per-element stiffness blocks that needed to be placed in a large sparse global stiffness matrix.
==================
Regular pattern with broadcasting (see edit history)
According to your question, you don't need to preserve elements relatively to the first dimension of your array. For example, if there is one non-zero element a in (100,100) matrix z=0, and two elements b and c in the matrix z=1, then in your output all a, b, c can appear in z=0. In this case I suggest the following solution:
import numpy as np
#replace this with your input data
mydata = np.ones((10,100,100))
mydata_large = np.zeros((10,800,800))
mydata_flatten = mydata.flatten()
ind = np.array([i for i in range(len(mydata_flatten))])
np.random.shuffle(ind)
mydata_large_f = mydata_large.flatten()
np.put(mydata_large_f,ind[:len(mydata_flatten)],mydata_flatten)
mydata_large = np.reshape(mydata_large_f, (10,800,800))

Categories

Resources