How to index one array with another using SciPy CSR Sparse Arrays? - python

I have two arrays A and B. In NumPy you can use A as an index to B e.g.
A = np.array([[1,2,3,1,7,3,1,2,3],[4,5,6,4,5,6,4,5,6],[7,8,9,7,8,9,7,8,9]])
B= np.array([1,2,3,4,5,6,7,8,9,0])
c = B[A]
Which produces:
[[2 3 4 2 8 4 2 3 4] [5 6 7 5 6 7 5 6 7] [8 9 0 8 9 0 8 9 0]]
However, in my case the arrays A and B are SciPy CSR sparse arrays and they don't seem to support indexing.
A_sparse = sparse.csr_matrix(A)
B_sparse = sparse.csr_matrix(B)
c = B_sparse[A_sparse]
This results in:
IndexError: Indexing with sparse matrices is not supported except boolean indexing where matrix and index are equal shapes.
I've come up with the function below to replicate NumPy's behavior with the sparse arrays:
def index_sparse(A,B):
A_sparse = scipy.sparse.coo_matrix(A)
B_sparse = sparse.csr_matrix(B)
res = sparse.csr_matrix(A_sparse)
for i,j,v in zip(A_sparse.row, A_sparse.col, A_sparse.data):
res[i,j] = B_sparse[0, v]
return res
res = index_sparse(A, B)
print res.todense()
Looping over the array and having to create a new array in Python isn't ideal. Is there a better way of doing this using built-in functions from SciPy/ NumPy?

Sparse indexing is less developed. coo format for example doesn't implement it at all.
I haven't tried to implement this problem, though I have answered others that involve working with the sparse format attributes. So I'll just make some general observations.
B_sparse is a matrix, so its shape is (1,10). So the equivalent to B[A] is
In [294]: B_sparse[0,A]
Out[294]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 24 stored elements in Compressed Sparse Row format>
In [295]: _.A
Out[295]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
B_sparse[A,:] or B_sparse[:,A] gives a 3d warning, since it would be trying to create a matrix version of:
In [298]: B[None,:][:,A]
Out[298]:
array([[[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]]])
As to your function:
A_sparse.nonzero() does A_sparse.tocoo() and returns its row and col. Effectively the same as what you do.
Here's something that should be faster, though I haven't tested it enough to be sure it is robust:
In [342]: Ac=A_sparse.tocoo()
In [343]: res=Ac.copy()
In [344]: res.data[:]=B_sparse[0, Ac.data].A[0]
In [345]: res
Out[345]:
<3x9 sparse matrix of type '<class 'numpy.int32'>'
with 27 stored elements in COOrdinate format>
In [346]: res.A
Out[346]:
array([[2, 3, 4, 2, 8, 4, 2, 3, 4],
[5, 6, 7, 5, 6, 7, 5, 6, 7],
[8, 9, 0, 8, 9, 0, 8, 9, 0]], dtype=int32)
In this example there are 2 zeros that could cleaned up as well (look at res.nonzero()).
Since you are setting each res[i,j] with values from Ac.row and Ac.col, res has the same row,col values as Ac, so I initialize it as a copy. Then it's just a matter of updating the res.data attribute. It would be faster to index Bc.data directly, but that doesn't account for its sparsity.

Related

how to use slicing to get 2 numbers in a multiple array (numpy)

if i have an array
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
output would be a =[1,2,3],[4,5,6],[7,8,9]
using slice [start:endindex:stepindex],
how could i retrieve 3 and 7?
is it possible?
I have tried
a[:3:2]
this gave me 1rst row and third row
In [928]: a = np.array([[1,2,3],[4,5,6],[7,8,9]])
In [929]: a
Out[929]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
[3,7] isn't regular pattern in this 2d array. But its flattened view:
In [931]: a.ravel()
Out[931]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])
In [932]: a.ravel()[2::4]
Out[932]: array([3, 7])
In [933]: a.flat[2::4]
Out[933]: array([3, 7])
Now guarantee that it can be extended for larger arrays and selections.

Slicing a different range at each index of a multidimensional numpy array [duplicate]

This question already has answers here:
Selecting Random Windows from Multidimensional Numpy Array Rows
(2 answers)
Closed 3 years ago.
I have an m x n numpy array arr, and for each column of arr, I have a given range of rows that I want to access.
I have an n x 1 array vec that describes when this range starts.
The range has some constant duration d.
How can I extract this d x n array of interest efficiently?
Can this be done by clever slicing?
My initial thought was to try something like:
arr = np.tile(np.arange(10),(4,1)).T
vec = np.array([3,4,5,4])
d = 3
vec_2 = vec+d
out = arr[vec:vec2,np.arange(n)]
But this gives the following error:
TypeError: only integer scalar arrays can be converted to a scalar index
The desired output would be the following array:
array([[3, 4, 5, 4],
[4, 5, 6, 5],
[5, 6, 7, 6],
[6, 7, 8, 7])
I could loop over d, but performance is important for this piece of code so I would prefer to vectorize it.
In [489]: arr=np.arange(24).reshape(6,4)
In [490]: vec=np.array([0,2,1,3])
Taking advantage of the recent expansion of linspace to generate several arrays:
In [493]: x = np.linspace(vec,vec+2,3).astype(int)
In [494]: x
Out[494]:
array([[0, 2, 1, 3],
[1, 3, 2, 4],
[2, 4, 3, 5]])
In [495]: arr[x, np.arange(4)]
Out[495]:
array([[ 0, 9, 6, 15],
[ 4, 13, 10, 19],
[ 8, 17, 14, 23]])
the column iteration approach:
In [498]: np.stack([arr[i:j,k] for k,(i,j) in enumerate(zip(vec,vec+3))],1)
Out[498]:
array([[ 0, 9, 6, 15],
[ 4, 13, 10, 19],
[ 8, 17, 14, 23]])

Split Numpy array into equal-length sub-arrays

I have a very huge numpy array like this:
np.array([1, 2, 3, 4, 5, 6, 7 , ... , 12345])
I need to create subgroups of n elements (in the example n = 3) in another array like this:
np.array([[1, 2, 3],[4, 5, 6], [6, 7, 8], [...], [12340, 12341, 12342], [12343, 12344, 12345]])
I did accomplish that using normal python lists, just appending the subgroups to another list. But, I'm having a hard time trying to do that in numpy.
Any ideas how can I do that?
Thanks!
You can use np.reshape(-1, 3), where the -1 means "whatever's left".
>>> array = np.arange(1, 12346)
>>> array
array([ 1, 2, 3, ..., 12343, 12344, 12345])
>>> array.reshape(-1, 3)
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
...,
[12337, 12338, 12339],
[12340, 12341, 12342],
[12343, 12344, 12345]])
You can use np.reshape():
From the documentation (link in title):
numpy.reshape(a, newshape, order='C')
Gives a new shape to an array without changing its data.
Here is an example of how you can apply it to your situation:
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 12345])
>>> a.reshape((int(len(a)/3), 3))
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 12345]], dtype=object)
Note that obviously, the length of the array (len(a)) has to be a multiple of 3 to be able to reshape it into a 2-dimensional numpy array, because they must be rectangular.

Sort matrix based on its diagonal entries

First of all I would like to point out that my question is different than this one: Sort a numpy matrix based on its diagonal
The question is as follow:
Suppose I have a numpy matrix
A=
5 7 8
7 2 9
8 9 3
I would like to sort the matrix based on its diagonal and then re-arrange the matrix element based on it. Such that now
sorted_A:
2 9 7
9 3 8
7 8 5
Note that:
(1). The diagonal is sorted
(2). The other elements (non-diagonal) re-adjusted by it. How?
because diag(A)= [5,2,3] & diag(sorted_A)=[2,3,5]
so row/column indices A=[0,1,2] become [1,2,0] in sorted_A.
So far I use brute force where I extract the diagonal elements, get the indices O(N²) and then re-arrange the matrix (another O(N²)). I wonder if there is any efficient/elegant way to do this. I appreciate all the help I can get.
Sorting the rows based on the diagonal values is easy:
In [192]: A=np.array([[5,7,8],[7,2,9],[8,9,3]])
In [193]: A
Out[193]:
array([[5, 7, 8],
[7, 2, 9],
[8, 9, 3]])
In [194]: np.diag(A)
Out[194]: array([5, 2, 3])
In [195]: idx=np.argsort(np.diag(A))
In [196]: idx
Out[196]: array([1, 2, 0], dtype=int32)
In [197]: A[idx,:]
Out[197]:
array([[7, 2, 9],
[8, 9, 3],
[5, 7, 8]])
Rearranging the elements in each row to the original diagonals are back on the diagonal will take some experimenting - trial and error. We probably have to 'roll' each row based on some value related to the sorting idx. I don't recall if there is a function to roll each row separately or if we have to iterate over the rows to do that.
In [218]: A1=A[idx,:]
In [219]: [np.roll(a,-i) for a,i in zip(A1,[1,1,1])]
Out[219]: [array([2, 9, 7]), array([9, 3, 8]), array([7, 8, 5])]
In [220]: np.array([np.roll(a,-i) for a,i in zip(A1,[1,1,1])])
Out[220]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
So roll with [1,1,1] does the job. But off hand I don't see how that can be derived. I suspect we need to generate several more test cases, possibly larger ones, and look for a pattern.
That roll probably has something to do with how much the row has moved, the difference between the original position and the new one. Let's try:
np.arange(3)-idx
In [222]: np.array([np.roll(a,i) for a,i in zip(A1,np.arange(3)-idx)])
Out[222]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Applying the sorting idx to both rows and columns seems to do the trick as well:
In [227]: A[idx,:][:,idx]
Out[227]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
In [229]: A[idx[:,None],idx]
Out[229]:
array([[2, 9, 7],
[9, 3, 8],
[7, 8, 5]])
Here I simplify a straightforward solution that has been stated before but is hard to get your heads around.
This is useful if you want to sort a table (e.g. confusion matrix by its diagonal magnitude and arrange rows and columns accordingly.
>>> A=np.array([[5,1,4],[7,2,9],[8,0,3]])
>>> A
array([[5, 1, 4],
[7, 2, 9],
[8, 0, 3]])
>>> diag = np.diag(A)
>>> diag
array([5, 2, 3])
>>> idx=np.argsort(diag) # get the order of items that are in diagon
>>> A[idx,:][:,idx] # reorder rows and arrows based on the order of items on diagon
array([[2, 9, 7],
[0, 3, 8],
[1, 4, 5]])
if you want to sort in descending order just add idx = idx[::-1] # reverse order

Numpy array, specifiyng what elements to return

Say I have the following 5x5 numpy array called A
array([[6, 7, 7, 7, 8],
[4, 2, 5, 5, 9],
[1, 2, 4, 7, 4],
[0, 7, 3, 6, 8],
[4, 9, 6, 1, 6]])
and this 5x5 array called F
array([[1,0,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0],
[1,0,0,0,0],
[0,0,0,0,0]])
I've been trying to use np.copyto, but I can't wrap my head around why it is not working/how it works.ValueError: could not broadcast input array from shape (5,5) into shape (2)
Is there a easy way to get the values of only the matching integers that have a corresponding 1 in F when laid over A? e.i it would return, 6,4,1,0
you can just do this little trick: A[F==1]
In [8]:
A[F==1]
Out[8]:
array([6, 4, 1, 0])
Check out Boolean indexing
To use np.copyto make sure that the destination array is np.empty.
This basically solved my problem.

Categories

Resources