scipy equivalent of numpy.prod() for sparse matrices - python

I am looking for an equivalent of numpy.prod to be used with the sparse representations that scipy offers (scipy.sparse). Specifically, I'm trying to compute the product along a single axis. I can do it by first converting to dense (M.todense().prod(axis=0)), but am looking for something more efficient.

For prod reduction operation along each column i.e. axis=0, we would only have non-zero output for columns that have all non-zeros. We can use that fact to have one custom rolled out version, like so -
def sparse_prod_axis0(A):
# Valid mask of row length that has all non-zeros along each col
valid_mask = A.getnnz(axis=0)==A.shape[0] # Thanks to #hpaulj on this!
# Initialize o/p array of zeros
out = np.zeros(A.shape[1],dtype=A.dtype)
# Set valid positions with prod of each col from valid ones
out[valid_mask] = np.prod(A[:,valid_mask].A,axis=0)
return np.matrix(out)
Sample run -
In [92]: from scipy.sparse import csr_matrix
...: a = np.random.randint(0,4,(5,10))
...: A = csr_matrix(a)
...:
In [93]: (A.todense().prod(axis=0))
Out[93]: matrix([[ 0, 0, 6, 48, 0, 0, 0, 0, 72, 0]])
In [94]: sparse_prod_axis0(A)
Out[94]: matrix([[ 0, 0, 6, 48, 0, 0, 0, 0, 72, 0]])

Related

How to replace the N smallest elements in each row of numpy array?

I would like to replace the N smallest elements in each row for 0, and that the resulting array would respect the same order and shape of the original array.
Specifically, if the original numpy array is:
import numpy as np
x = np.array([[0,50,20],[2,0,10],[1,1,0]])
And N = 2, I would like for the result to be the following:
x = np.array([[0,50,0],[0,0,10],[0,1,0]])
I tried the following, but in the last row it replaces 3 elements instead of 2 (because it replaces both 1s and not only one)
import numpy as np
N = 2
x = np.array([[0,50,20],[2,0,10],[1,1,0]])
x_sorted = np.sort(x , axis = 1)
x_sorted[:,N:] = 0
replace = x_sorted.copy()
final = np.where(np.isin(x,replace),0,x)
Note that this is small example and I would like that it works for a much bigger matrix.
Thanks for your time!
One way using numpy.argsort:
N = 2
x[x.argsort().argsort() < N] = 0
Output:
array([[ 0, 50, 0],
[ 0, 0, 10],
[ 0, 1, 0]])
Use numpy.argpartition to find the index of N smallest elements, and then use the index to replace values:
N = 2
idy = np.argpartition(x, N, axis=1)[:, :N]
x[np.arange(len(x))[:,None], idy] = 0
x
array([[ 0, 50, 0],
[ 0, 0, 10],
[ 1, 0, 0]])
Notice if there are ties, it could be undetermined which values get replaced depending on the algorithm used.

What is the most efficient way to convert from a list of values to a scipy sparse matrix?

I have a list of values that I'm using a loop to convert to a scipy.sparse.dok_matrix. I'm aware of numpy.bincount but it doesn't work with sparse matrices. I'm wondering if there is a more efficient way to perform this conversion because the construction time for a dok_matrix is really long.
Example below for one row but I'm scaling to a 2D matrix by looping. The number of times a value x appears in the input list is the value of the xth element of the result matrix.
values = [1, 3, 3, 4]
expected_result = [0, 1, 0, 2, 1]
matrix = dok_matrix((1, MAXIMUM_EXPECTED_VALUE))
for value in values:
matrix[0, value] = matrix.get((0, card)) + 1
MAXIMUM_EXPECTED_VALUE is in the order of 100000000 but len(values) < 100, which is why I'm using a sparse matrix. Possibly off-topic: there are also only a little over 10000 actual values that are used in the range of MAXIMUM_EXPECTED_VALUE but I think hashing to a contiguous range and converting back might be more complicated.
Looks like the standard coo style inputs suits you case:
In [143]: from scipy import sparse
In [144]: values = [1,3,3,4]
In [145]: col = np.array(values)
In [146]: row = np.zeros_like(col)
In [147]: data = np.ones_like(col)
In [148]: M = sparse.coo_matrix((data, (row,col)), shape=(1,10))
In [149]: M
Out[149]:
<1x10 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in COOrdinate format>
In [150]: M.A
Out[150]: array([[0, 1, 0, 2, 1, 0, 0, 0, 0, 0]])

Pytorch: accessing a subtensor using lists of indices

I have a pair of tensors S and T of dimensions (s1,...,sm) and (t1,...,tn) with si < ti. I want to specify a list of indices in each dimensions of T to "embed" S in T. If I1 is a list of s1 indices in (0,1,...,t1) and likewise for I2 up to In, I would like to do something like
T.select(I1,...,In)=S
that will have the effect that now T has entries equal to the entries of S over the indices (I1,...,In).
for example
`S=
[[1,1],
[1,1]]
T=
[[0,0,0],
[0,0,0],
[0,0,0]]
T.select([0,2],[0,2])=S
T=
[[1,0,1],
[0,0,0],
[1,0,1]]`
If you're flexible with using NumPy only for the indices part, then here's one approach by constructing an open mesh using numpy.ix_() and using this mesh to fill-in the values from the tensor S. If this is not acceptable, then you can use torch.meshgrid()
Below is an illustration of both approaches with descriptions interspersed in comments.
# input tensors to work with
In [174]: T
Out[174]:
tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
# I'm using unique tensor just for clarity; But any tensor should work.
In [175]: S
Out[175]:
tensor([[10, 11],
[12, 13]])
# indices where we want the values from `S` to be filled in, along both dimensions
In [176]: idxs = [[0,2], [0,2]]
Now we will leverage np.ix_() or torch.meshgrid() to generate an open mesh by passing in the indices:
# mesh using `np.ix_`
In [177]: mesh = np.ix_(*idxs)
# as an alternative, we can use `torch.meshgrid()`
In [191]: mesh = torch.meshgrid([torch.tensor(lst) for lst in idxs])
# replace the values from tensor `S` using basic indexing
In [178]: T[mesh] = S
# sanity check!
In [179]: T
Out[179]:
tensor([[10, 0, 11],
[ 0, 0, 0],
[12, 0, 13]])

what is the use case of numpy array of scalar value?

In the latest scipy version, I found:
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> a = csr_matrix((3, 4), dtype=np.int8)
>>> a[0,0]
array(0) #instead of `0`
and you can create numpy array of scaler value (instead of vector/matrix) np.array(0), which is different from np.array([0]). what is the use case of np.array(0)? how to get the value inside the array from np.array(0) (not type conversion use int)?
You've created a sparse matrix, shape (3,4), but no elements:
In [220]: a = sparse.csr_matrix((3, 4), dtype=np.int8)
In [221]: a
Out[221]:
<3x4 sparse matrix of type '<class 'numpy.int8'>'
with 0 stored elements in Compressed Sparse Row format>
In [222]: a.toarray()
Out[222]:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
Selecting one element:
In [223]: a[0,0]
Out[223]: array(0, dtype=int8)
Converting it to a dense np.matrix:
In [224]: a.todense()
Out[224]:
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int8)
In [225]: a.todense()[0,0]
Out[225]: 0
and to other sparse formats:
In [226]: a.tolil()[0,0]
Out[226]: 0
In [227]: a.todok()[0,0]
Out[227]: 0
It looks like csr is some what unique in returning a scalar array like this. I'm not sure if it's intentional, a feature, or a bug. I haven't noticed it before. Usually we work with the whole matrix, rather than specific elements.
But a 0d array is allowed, even if in most cases it isn't useful. If we can have 2d or 1d arrays, why not 0?
There are a couple of ways of extracting that element from a 0d array:
In [233]: np.array(0, 'int8')
Out[233]: array(0, dtype=int8)
In [234]: _.shape
Out[234]: ()
In [235]: __.item()
Out[235]: 0
In [236]: ___[()] # index with an empty tuple
Out[236]: 0
Scipy version 1.3.0 release notes includes:
CSR and CSC sparse matrix fancy indexing performance has been improved substantially
https://github.com/scipy/scipy/pull/7827 - looks like this pull request was a long time in coming, and had a lot of faults (and may still). If this behavior is a change from previous scipy releases, we need to see if there's a related issue (and possibly create one).
https://github.com/scipy/scipy/pull/10207 BUG: Compressed matrix indexing should return a scalar
Looks like it will be fixed in 1.4.
What are they?
They're seemingly an single-element array, like an array with one element.
How do I get the value out of it?
By using:
>>> np.array(0).item()
0
>>>

Matrix Multiplication: Multiply each row of matrix by another 2D matrix in Python

I am trying to remove the loop from this matrix multiplication (and learn more about optimizing code in general), and I think I need some form of np.broadcasting or np.einsum, but after reading up on them, I'm still not sure how to use them for my problem.
A = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11,12,13,14,15]])
#A is a 3x5 matrix, such that the shape of A is (3, 5) (and A[0] is (5,))
B = np.array([[1,0,0],
[0,2,0],
[0,0,3]])
#B is a 3x3 (diagonal) matrix, with a shape of (3, 3)
C = np.zeros(5)
for i in range(5):
C[i] = np.linalg.multi_dot([A[:,i].T, B, A[:,i]])
#Each row of matrix math is [1x3]*[3x3]*[3x1] to become a scaler value in each row
#C becomes a [5x1] matrix with a shape of (5,)
I know I can't just do np.multidot by itself, because that results in a (5,5) array.
I also found this: Multiply matrix by each row of another matrix in Numpy, but I can't tell if it's actually the same problem as mine.
In [601]: C
Out[601]: array([436., 534., 644., 766., 900.])
This is a natural for einsum. I use i as you do, to denote the index that carries through to the result. j and k are indices that are used in the sum of products.
In [602]: np.einsum('ji,jk,ki->i',A,B,A)
Out[602]: array([436, 534, 644, 766, 900])
It probably can also be done with mutmul, though it may require adding a dimension and latter squeezing.
dot approaches that use diag do a lot more work than necessary. The diag throws out a lot of values.
To use matmul we have to make the i dimension the first of 3d arrays. That's the 'passive' one carries over to the result:
In [603]: A.T[:,None,:]#B#A.T[:,:,None]
Out[603]:
array([[[436]], # (5,1,1) result
[[534]],
[[644]],
[[766]],
[[900]]])
In [604]: (A.T[:,None,:]#B#A.T[:,:,None]).squeeze()
Out[604]: array([436, 534, 644, 766, 900])
Or index the extra dimensions away: (A.T[:,None,:]#B#A.T[:,:,None])[:,0,0]
You can chain to calls to dot together, then get the diagonal:
# your original output:
# >>> C
# array([436., 534., 644., 766., 900.])
>>> np.diag(np.dot(np.dot(A.T,B), A))
array([436, 534, 644, 766, 900])
Or equivalently, use your original multi_dot train of thought, but take the diagonal of the resulting 5x5 array. This may have some performance boosts (according to the docs)
>>> np.diag(np.linalg.multi_dot([A.T, B, A]))
array([436, 534, 644, 766, 900])
atTo add to the answers. If you want to make multiply the matrices you can make use of broadcasting. Edit: Note this is element wise multiplication, not dot products. For that you can use the dot methods.
B [...,None] * A
Gives:
array([[[ 1, 2, 3, 4, 5],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]],
[[ 0, 0, 0, 0, 0],
[12, 14, 16, 18, 20],
[ 0, 0, 0, 0, 0]],
[[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[33, 36, 39, 42, 45]]])

Categories

Resources