modifying sparse matrix using advanced indexing in python - python

I'm trying to use advanced indexing to modify a big sparse matrix. Say you have the following code:
import numpy as np
import scipy.sparse as sp
A = sp.lil_matrix((10, 10))
a = np.array([[1,2],[3,4]])
idx = [1,4]
A[idx, idx] += a
Why this code doesn't work? It gives me the error
ValueError: shape mismatch in assignment

For idx = [1,4], A[idx, idx] returns a sparse matrix of shape (1,2) with the elements A[1,1] and A[4,4]. However, a has shape (2,2). Therefore, there is a mismatch in shape. If you want to assign A[1,1], A[1,4], A[4,1] and A[4,4] to a, you should do:
import numpy as np
import scipy.sparse as sp
A = sp.lil_matrix((10, 10))
a = np.array([[1,2],[3,4]])
idx = np.array([1,4])
A[idx[:, np.newaxis], idx] += a # use broadcasting

Related

Tensor matrix multiplication returning vector einsum

I am confused about the following example of a matrix tensor multiplication that returns a vector. At first glance I thought that it would mean multiplying the first dimension of the tensor dydx by the matrix dLdy but I don't get the expected results as depicted below. So what is the meaning of this einsum ?
import torch
import numpy as np
dLdy = torch.randn(2,2)
dydx = torch.randn(2,2,2)
torch.einsum('jk,jki->i', dLdy, dydx)
tensor([0.3115, 3.7255])
dLdy
tensor([[-0.4845, 0.6838],
[-1.1723, 1.4914]])
dydx
tensor([[[ 1.5496, -1.2722],
[ 0.1221, 1.0495]],
[[-1.4882, 0.0307],
[-0.5134, 1.6276]]])
(dLdy * dydx[0]).sum()
-0.1985
For A and B this is contraction (sum) over the first two dimensions jk, so
res(i) = sum_{j,k} A(j,k)B(j,k,i)
for example:
import torch
import numpy as np
dLdy = torch.randn(2,2)
dydx = torch.randn(2,2,2)
print(torch.einsum('jk,jki->i', dLdy, dydx))
print((dLdy * dydx[:,:,0]).sum())
print((dLdy * dydx[:,:,1]).sum())
produces
tensor([4.6025, 1.8987])
tensor(4.6025)
tensor(1.8987)
ie (dLdy * dydx[:,:,0]).sum() is the first element of the resulting vector, etc

Cannot reshape numpy array to vector

I am trying to reshape an (N, 1) array d to an (N,) vector. According to this solution and my own experience with numpy, the following code should convert it to a vector:
from sklearn.neighbors import kneighbors_graph
from sklearn.datasets import make_circles
X, labels = make_circles(n_samples=150, noise=0.1, factor=0.2)
A = kneighbors_graph(X, n_neighbors=5)
d = np.sum(A, axis=1)
d = d.reshape(-1)
However, d.shape gives (1, 150)
The same happens when I exactly replicate the code for the linked solution. Why is the numpy array not reshaping?
The issue is that the sklearn functions returned the nearest neighbor graph as a sparse.csr.csr_matrix. Applying np.sum returned a numpy.matrix, a data type that (in my opinion) should no longer exist. numpy.matrixs are incompatible with just about everything, and numpy operations on them return unexpected results.
The solution was casting the numpy.csr.csr_matrix to a numpy.array:
A = kneighbors_graph(X, n_neighbors=5)
A = A.toarray()
d = np.sum(A, axis=1)
d = d.reshape(-1)
Now we have d.shape = (150,)

dot product of arrays with missing second dimension

Kindly take a look at the code below.
import numpy as np
a = np.random.rand(10)
b = np.random.rand(10)
c = np.dot(a,b)
a and b are arrays of dimension (10,). The dot product is however a scalar value.
By following the matrix multiplication rule (mXn * nXp = mXp), I expected an array of dimension (10,). Can you please explain?

Python reshape to Matlab reshape translation

I have the following Python code that I would like to run in MATLAB. What is the MATLAB equivalent of numpy's reshape syntax.
import numpy as np
a = np.random.randn(3,4,5)
for i in range(len(a)):
b = np.reshape(a, [a.shape[i], -1], order = 'F')
Instead of -1 for a calculated dimension, you would simply use [] in MATLAB.
for k = 1:ndims(a)
b = reshape(a, size(a, k), []);
end

Calculate Similarity of Sparse Matrix

I am using Python with numpy, scipy and scikit-learn module.
I'd like to classify the arrays in very big sparse matrix. (100,000 * 100,000)
The values in the matrix are equal to 0 or 1. The only thing I have is the index of value = 1.
a = [1,3,5,7,9]
b = [2,4,6,8,10]
which means
a = [0,1,0,1,0,1,0,1,0,1,0]
b = [0,0,1,0,1,0,1,0,1,0,1]
How can I change the index array to the sparse array in scipy ?
How can I classify those array quickly ?
Thank you very much.
If you choose the sparse coo_matrix you can create it passing the indices like:
from scipy.sparse import coo_matrix
import scipy
nrows = 100000
ncols = 100000
row = scipy.array([1,3,5,7,9])
col = scipy.array([2,4,6,8,10])
values = scipy.ones(col.size)
m = coo_matrix((values, (row,col)), shape=(nrows, ncols), dtype=float)

Categories

Resources