python sparse csr matrix: how to serialize it

python sparse csr matrix: how to serialize it - python

I have a csr_matrix, which is constructed as follows:
from scipy.sparse import csr_matrix
import numpy as np
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, (row, col)), shape=(3, 3))
Now to serialize (and for some other purpose), I want to get row, col and data information from matrix "a".
Kindly tell me an easy way to achieve it.
Edit: a.data will give me the data, but how to get row and col informaion

coo format has the values that you want:
In [3]: row = np.array([0, 0, 1, 2, 2, 2])
In [4]: col = np.array([0, 2, 2, 0, 1, 2])
In [5]: data = np.array([1, 2, 3, 4, 5, 6])
In [6]: a = sparse.csr_matrix((data,(row,col)), shape=(3,3))
In [7]: a.data
Out[7]: array([1, 2, 3, 4, 5, 6])
In [8]: a.indices # csr has coor in indices and indptr
Out[8]: array([0, 2, 2, 0, 1, 2])
In [9]: a.indptr
Out[9]: array([0, 2, 3, 6])
In [10]: ac=a.tocoo()
In [11]: ac.data
Out[11]: array([1, 2, 3, 4, 5, 6])
In [12]: ac.col
Out[12]: array([0, 2, 2, 0, 1, 2])
In [13]: ac.row
Out[13]: array([0, 0, 1, 2, 2, 2])
These values are compatible with the ones you input, but aren't guaranteed to be the same.
In [14]: a.nonzero()
Out[14]: (array([0, 0, 1, 2, 2, 2]), array([0, 2, 2, 0, 1, 2]))
In [17]: a[a.nonzero()].A
Out[17]: array([[1, 2, 3, 4, 5, 6]])
nonzero also returns the coor, by the same coo conversion, but first it cleans up the data (removing extra zeros, etc).

Related

Is there any fast way to find identical rows of two sparse matrices with different sizes?

Consider A, an n by j matrix, and B, an m by j matrix, both in SciPy with m<n. Is there any way that I can find the indices of the rows of A which are identical to rows of B?
I have tried for loops and tried to convert them into Numpy arrays. In my case, they're not working because I'm dealing with huge matrices.
Here is the link to the same question for Numpy arrays.
Edit:
An Example for A, B, and the desired output:
>>> import numpy as np
>>> from scipy.sparse import csc_matrix
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 3, 3, 4, 5, 6])
>>> A = csc_matrix((data, (row, col)), shape=(5, 3))
>>> A.toarray()
array([[1, 0, 4],
[0, 0, 5],
[3, 3, 6],
[0, 0, 0],
[0, 0, 0]])
>>> row = np.array([0, 2, 2, 0, 1, 2])
>>> col = np.array([0, 0, 1, 2, 2, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> B = csc_matrix((data, (row, col)), shape=(4, 3))
>>> B.toarray()
array([[1, 0, 4],
[0, 0, 5],
[2, 3, 6],
[0, 0, 0]])
Desired output:
def some_function(A,B):
# Some operations
return indices
>>> some_function(A,B)
[0, 1, 3, 4]

Change dtype of none square numpy ndarray

a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([a, b])
c has two np.ndarrays inside of different size, when I try to call c.astype(np.int8), I would get a value error of ValueError: setting an array element with a sequence.. How can I change dtype of c?

To specify the type of your array during the creation, simply use dtype=xxx.
Ex:
c = np.array([a,b], dtype=object)
If you want to change the type from int64 to int8, you could use:
a.dtype = np.int8
b.dtype = np.int8
Or you can copy a and b:
c = np.array(a, dtype=np.int8)
d = np.array(a, dtype=np.int8)
Finally, if you don't have access to a and b but only to c, here how you can do the same:
for arr in c:
arr.dtype = np.int8

Assuming arr is a numpy array of dtype object containing numpy arrays, you could do:
arr8 = np.array([i.astype('int8') for i in arr])
Demo:
arr = array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
... array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
... array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
... dtype=object)
print(arr)
array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
dtype=object)
print(np.array([i.astype('int8') for i in arr]))
array([array([0], dtype=int8), array([0, 1], dtype=int8),
array([0, 1, 2], dtype=int8), array([0, 1, 2, 3], dtype=int8),
array([0, 1, 2, 3, 4], dtype=int8),
array([0, 1, 2, 3, 4, 5], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int8)], dtype=object)

Maybe you could do something like this:
arr = list()
for row in range(len(df.desired_column)):
arr.append(np.array(df.desired_column.loc[row], dtype=np.int8))
arr = np.array(arr)
This way every element of arr will be a numpy array with the desired dtype. On this example, np.int8.

How i can get the indexes of numpy array that contain one's

How I can get the indexes of element that contain 1 in numpy array, in an elegant way?
I tried to do a loop:
indexes = []
for i in range(len(array)):
if array[i] == 1:
indexes += [i]

Use np.where:
a = np.array([0, 0, 1, 1, 0, 1, 1, 1, 0])
np.where(a)
Output:
(array([2, 3, 5, 6, 7], dtype=int64),)
Or np.nonzero:
a.nonzero()
Output:
(array([2, 3, 5, 6, 7], dtype=int64),)
You can also index into np.arange:
np.arange(len(a))[a.astype(bool)]
Output:
array([2, 3, 5, 6, 7])

numpy.argwhere() could be a perfect worker API for doing this. Additionally, we also have to remove the singleton dimension using arr.squeeze(). Below are two cases:
If your input is a 0-1 array, then:
In [101]: a = np.array([0, 0, 1, 1, 0, 1, 1, 1, 0])
In [102]: np.argwhere(a).squeeze()
Out[102]: array([2, 3, 5, 6, 7])
On the other hand, if you have a generic array, then:
In [98]: np.random.seed(23)
In [99]: arr = np.random.randint(0, 5, 10)
In [100]: arr
Out[100]: array([3, 0, 1, 0, 4, 3, 2, 1, 3, 3])
In [106]: np.argwhere(arr == 1).squeeze()
Out[106]: array([2, 7])

Clone items in a list by index

I have a numpy array
np.array([[1,4,3,5,2],
[3,2,5,2,3],
[5,2,4,2,1]])
and I want to clone items by their indexes. For example, I have an index of
np.array([[1,4],
[2,4],
[1,4]])
These correspond to the positions of the items at each row. e.g. the first [1,4] are the indexes for 4, 2 in the first row.
I want in the end returning a new numpy array giving initial array and the index array.
np.array([[1,4,4,3,5,2,2],
[3,2,5,5,2,3,3],
[5,2,2,4,2,1,1]])
The effect is the selected column values are repeated once. Any way to do this? Thanks.

I commented that this could be viewed as a 1d problem. There's nothing 2d about it, except that you are adding 2 values per row, so you end up with a 2d array. The other key idea is that np.repeats lets us repeat selected elements several times.
In [70]: arr =np.array([[1,4,3,5,2],
...: [3,2,5,2,3],
...: [5,2,4,2,1]])
...:
In [71]: idx = np.array([[1,4],
...: [2,4],
...: [1,4]])
...:
Make an array of 'repeat' counts - start with 1 for everything, and add 1 for the elements we want to dupicate:
In [72]: repeats = np.ones_like(arr)
In [73]: repeats
Out[73]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
In [74]: for i,j in enumerate(idx):
...: repeats[i,j] += 1
...:
In [75]: repeats
Out[75]:
array([[1, 2, 1, 1, 2],
[1, 1, 2, 1, 2],
[1, 2, 1, 1, 2]])
Now just apply repeat to the flattened arrays, and reshape:
In [76]: np.repeat(arr.ravel(),repeats.ravel())
Out[76]: array([1, 4, 4, 3, 5, 2, 2, 3, 2, 5, 5, 2, 3, 3, 5, 2, 2, 4, 2, 1, 1])
In [77]: _.reshape(3,-1)
Out[77]:
array([[1, 4, 4, 3, 5, 2, 2],
[3, 2, 5, 5, 2, 3, 3],
[5, 2, 2, 4, 2, 1, 1]])
I may add a list solution, once I work that out.
a row by row np.insert solution (fleshing out the concept suggested by #f5r5e5d):
Test with one row:
In [81]: row=arr[0]
In [82]: i=idx[0]
In [83]: np.insert(row,i,row[i])
Out[83]: array([1, 4, 4, 3, 5, 2, 2])
Now apply iteratively to all rows. The list of arrays can then be turned back into an array:
In [84]: [np.insert(row,i,row[i]) for i,row in zip(idx,arr)]
Out[84]:
[array([1, 4, 4, 3, 5, 2, 2]),
array([3, 2, 5, 5, 2, 3, 3]),
array([5, 2, 2, 4, 2, 1, 1])]

np.insert may help
a = np.array([[1,4,3,5,2],
[3,2,5,2,3],
[5,2,4,2,1]])
i = np.array([[1,4],
[2,4],
[1,4]])
np.insert(a[0], 4, a[0,4])
Out[177]: array([1, 4, 3, 5, 2, 2])
as mentioned, np.insert can do more than one element at a time from a one dimensional obj
np.insert(a[0], i[0], a[0,i[0]])
Out[187]: array([1, 4, 4, 3, 5, 2, 2])

transform an array of array to an array of numbers

I have an array of values and an array of repeated times
>>> x=np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> n=np.random.randint(1,3,5)
>>> n
array([2, 1, 1, 2, 2])
And I do
>>> y=np.array([np.repeat(x[i],n[i]) for i in range(5)])
>>> y
array([array([0, 0]), array([1]), array([2]), array([3, 3]), array([4, 4])], dtype=object)
But I want my result to be array([0, 0, 1, 2, 3, 3, 4, 4]).
How can I do it?

I think this is simpler than you're making it (docs):
>>> x = np.arange(5)
>>> y = np.array([2, 1, 1, 2, 2])
>>> np.repeat(x,y)
array([0, 0, 1, 2, 3, 3, 4, 4])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python sparse csr matrix: how to serialize it - python

Related

Is there any fast way to find identical rows of two sparse matrices with different sizes?

Change dtype of none square numpy ndarray

How i can get the indexes of numpy array that contain one's

Clone items in a list by index

transform an array of array to an array of numbers

Categories

Resources