sparse solution in dense linear system of equations in scipy - python

I have a dense over-determined matrix. I can find solutions with np.linalg.lstsq. I want to get an answer with mostly 0s.
Here is an artificial example that produces similar results to my code:
import numpy as np
# Build matrix and vector
M = np.array([[2.71*i, 3.14*i, 4*i, 5.99*i, 6*i] for i in range(1, 1000) ])
v = np.array([[i] for i in range(1, 1000)])
solution, _, _, _ = np.linalg.lstsq(M, v)
print("solution", solution)
# print("error", np.linalg.norm(v - np.dot(M, solution)))
I got solution with many small non-zero terms.
solution [[ 0.06342455]
[ 0.1076765 ]
[ 0.12445127]
[-0.00251504]
[ 0.00121254]]
The solution is not unique and I would like to also have solution to be mostly 0s. Like [0, 0, -.5, 0, .5] or [1/2.71, 0, 0, 0, 0]. Is there an easy way to do this?

Related

How to explicitly pass an adjacency matrix when using scanpy.tl.louvain?

Here is the description for louvain in scanpy.
I would like to pass a specific adj matrix, however, I tried the minimal example as follows and got the result of "Length of values (4) does not match length of index (6)". Is this mistake due to the misuse of the sparse matrix?
Code:
import scanpy as sc
import torch
import numpy as np
import networkx as nx
nodes = [[0, 0, 0, 1], [0, 0, 0, 2], [0, 10, 0, 0], [0, 11, 0, 0], [1, 0, 0, 0], [2, 0, 0, 0]]
features = torch.tensor(nodes)
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
sc.pp.neighbors(adata, n_neighbors=2, use_rep='X')
sc.tl.louvain(adata, resolution=0.01, adjacency=G_adj) # pass the adj here
y_pred = adata.obs['louvain'].astype(int).to_numpy()
n_clusters = len(np.unique(y_pred))
Could you point out what is wrong and provide an example of how to explicitly pass an adjacency matrix when using scanpy.tl.louvain? Thanks!
G is a graph created with four nodes, and thus G_adj is a (4, 4) sparse matrix.
adata is a scanpy object with 6 observations, and four variables. the scanpy louvain algorithm clusters observations, and thus expects an adjacncy matrix of shape (6, 6).
Not sure what you were meaning to do:
If you truly have 6 nodes you should alter your code for the graph:
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph()
G.add_nodes_from(range(6))
G.add_edges_from(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
If you have 4 nodes, alter the adata creation line:
adata = sc.AnnData(features.numpy().T)

TypeError: Failed to convert object of type <class 'scipy.sparse.csr.csr_matrix'> to Tensor

I'm trying to compute the cosine similarity between 350k sentences using tensorflow.
My sentences are first vectorisd using sklearn:
doc = df['text']
vec = TfidfVectorizer(binary=False,norm='l2',use_idf=False,smooth_idf=False,lowercase=True,stop_words='english',min_df=1,max_df=1.0,max_features=None,ngram_range=(1, 1))
X = vec.fit_transform(doc)
print(X.shape)
print(type(X))
This works very well and I get sparse matrix back, I have then tried in two ways to convert my sparse matrix to a dense one.
(1) I tried this:
dense = X.toarray()
This only works with a small amount of data (around 10k sentences), but then fails on the actual computation.
(2) I have been trying to convert the output X this way, but get the same error message when doing the first step K:
K = tf.convert_to_tensor(X, dtype=None, dtype_hint=None, name=None)
Y = tf.sparse.to_dense(K, default_value=None, validate_indices=True, name=None)
Any tips/ tricks to solve this mystery would be greatly appreciated. Also happy to consider batching my computations if that should be more efficient in terms of size?
You need to make a TensorFlow sparse matrix from your SciPy one. Since your matrix seems to be in CSR format, you can do it as follows:
import numpy as np
import scipy.sparse
import tensorflow as tf
def sparse_csr_to_tf(csr_mat):
indptr = tf.constant(csr_mat.indptr, dtype=tf.int64)
elems_per_row = indptr[1:] - indptr[:-1]
i = tf.repeat(tf.range(csr_mat.shape[0], dtype=tf.int64), elems_per_row)
j = tf.constant(csr_mat.indices, dtype=tf.int64)
indices = np.stack([i, j], axis=-1)
data = tf.constant(csr_mat.data)
return tf.sparse.SparseTensor(indices, data, csr_mat.shape)
# Test
m = scipy.sparse.csr_matrix([
[0, 0, 1, 0],
[0, 0, 0, 0],
[2, 0, 3, 4],
], dtype=np.float32)
tf_mat = sparse_csr_to_tf(m)
tf.print(tf.sparse.to_dense(tf_mat))
# [[0 0 1 0]
# [0 0 0 0]
# [2 0 3 4]]

Issues with Scipy in computing eigenvalues and eigenvectors

I am playing around with spectral properties of differential operators. To get a feel for things
I decided to start out with computing the eigenvalues and eigenvectors of the 1-D Laplacian with periodic boundary conditions
Lap =
[[-2, 1, 0, 0, ..., 1],
[ 1,-2, 1, 0, ..., 0],
[ 0, 1,-2, 1, ..., 0],
...
...
[ 0, 0, ..., 1,-2, 1],
[ 1, 0, ..., 0, 1,-2]]
So I run the following
import numpy as np
import scipy.linalg as scilin
N = 12
Lap = np.zeros((N, N))
for i in range(N):
Lap[i, i] = -2
Lap[i, (i+1)%N] = 1
Lap[i, (i-1)%N] = 1
eigvals, eigvecs = scilin.eigh(Lap)
where
> print(eigvals)
[-4.00000000e+00 -3.73205081e+00 -3.73205081e+00 -3.00000000e+00
-3.00000000e+00 -2.00000000e+00 -2.00000000e+00 -1.00000000e+00
-1.00000000e+00 -2.67949192e-01 -2.67949192e-01 9.43689571e-16]
which is what I expect. However I decide to verify that these eigenvalues and eigenvectors
are correct. What I end up with is
> (Lap - eigvals[0]*np.identity(N)).dot(eigvecs[0])
array([ 0.28544445, 0.69044928, 0.83039882, 0.03466493, -0.79854101,
-0.81598463, -0.78119579, -0.7445237 , -0.769496 , -0.79741997,
-1.09625463, -0.69683007])
I expect to get the zero vector. So what is going on here?
As mentioned in the comment by #Warren, eigenvectors are columns of eigvecs. While in numpy indexing, eigvecs[0] represent first row of eigvecs. To fix it:
print((Lap-eigvals[0]*np.eye(N))#eigvecs[:,0])
[-6.66133815e-16 2.55351296e-15 -1.77635684e-15 1.11022302e-16
5.55111512e-16 -2.22044605e-16 -3.66373598e-15 -4.44089210e-16
7.77156117e-16 -1.11022302e-16 -1.66533454e-15 2.22044605e-15]
Which is basically all 0 (the numbers are there due to precision issue)

Is there a minimal, complete, working example of structure from motion/3d reconstruction?

Like the question says, I am looking for a complete, minimal, working example of the structure from motion (aka 3d reconstruction) pipeline.
Right away let me say I do not have the camera parameters. I do not know focal length or camera intrinsics. So right away, 90% of the examples/tutorials out there are not valid.
There are many questions on this topic but the code is just in snippets, and not for the complete SfM process. Many instructions are contradictory, or are just guessing, and open-source external libraries are hard to follow.
So I am looking for a short, complete, minimal, working example. Most importantly is the working requirement, since so much code out there produces bad results.
I have made a stab at it with the code below. I use synthetic data of matching pairs so there is no noise or bad correspondence issues to work around. The goal is to reconstruct a cube (8 3d points) from 2 views, each with 8 2d points. However, the final results are awful. There is no semblance of a cube shape. (I have tried normalizing and centering the data, that is not the issue).
Anyone who can provide a better minimal working example, or point out what is wrong with my attempt, is appreciated.
import cv2
import numpy as np
import scipy.linalg
def combineTR(T,R): #turn a translation vector and a rotation matrix into one 3x4 projection matrix
T4 = np.eye(4)
T4[:3, 3] = T # make it 4x4 so we can dot product it
R4 = np.eye(4)
R4[:3, :3] = R
P = np.dot(T4, R4) # combine rotation and translation into one matrix
P = P[:3, :] # cut off bottom row
return P
####################################################################
# # ground truth
# Wpts = np.array([[1, 1, 1], # A Cube in world points
# [1, 2, 1],
# [2, 1, 1],
# [2, 2, 1],
# [1, 1, 2],
# [1, 2, 2],
# [2, 1, 2],
# [2, 2, 2]])
views = np.array(
[[[ 0.211, 0.392],
[ 0.179, 0.429],
[ 0.421, 0.392],
[ 0.358, 0.429],
[ 0.189, 0.193],
[ 0.163, 0.254],
[ 0.378, 0.193],
[ 0.326, 0.254]],
[[ 0.392, 0.211],
[ 0.392, 0.421],
[ 0.429, 0.179],
[ 0.429, 0.358],
[ 0.193, 0.189],
[ 0.193, 0.378],
[ 0.254, 0.163],
[ 0.254, 0.326]]])
F = cv2.findFundamentalMat(views[0], views[1],cv2.FM_8POINT)[0]
# hartley and zimmermans method for finding P
e2 = scipy.linalg.null_space(F.T) #epipole of second image
C2R = np.cross(e2.T, F) #camera 2 rotation
C2T = e2.T[0]
P = combineTR(C2T, C2R) #projection matrix for camera 2
R = np.eye(3) # rotation matrix for camera 1
T = [0, 0, 0] # translation
P0 = combineTR(T,R)
tpts = cv2.triangulatePoints(P0,P,views[0].T,views[1].T) #triangulated point
tpts /= tpts[-1] #divide by last row and scale it
tpts *= -100
print(tpts)
Ground truth:
My results:

list of numpy vectors to sparse array

I have a list of numpy vectors of the format:
[array([[-0.36314615, 0.80562619, -0.82777381, ..., 2.00876354,2.08571887, -1.24526026]]),
array([[ 0.9766923 , -0.05725135, -0.38505339, ..., 0.12187988,-0.83129255, 0.32003683]]),
array([[-0.59539878, 2.27166874, 0.39192573, ..., -0.73741573,1.49082653, 1.42466276]])]
here, only 3 vectors in the list are shown. I have 100s..
The maximum number of elements in one vector is around 10 million
All the arrays in the list have unequal number of elements but the maximum number of elements is fixed.
Is it possible to create a sparse matrix using these vectors in python such that I have zeros in place of elements for the vectors which are smaller than the maximum size?
Try this:
from scipy import sparse
M = sparse.lil_matrix((num_of_vectors, max_vector_size))
for i,v in enumerate(vectors):
M[i, :v.size] = v
Then take a look at this page: http://docs.scipy.org/doc/scipy/reference/sparse.html
The lil_matrix format is good for constructing the matrix, but you'll want to convert it to a different format like csr_matrix before operating on them.
In this approach you replace the elements below your thresold by 0 and then create a sparse matrix out of them. I am suggesting the coo_matrix since it is the fastest to convert to the other types according to your purposes. Then you can scipy.sparse.vstack() them to build your matrix accounting all elements in the list:
import scipy.sparse as ss
import numpy as np
old_list = [np.random.random(100000) for i in range(5)]
threshold = 0.01
for a in old_list:
a[np.absolute(a) < threshold] = 0
old_list = [ss.coo_matrix(a) for a in old_list]
m = ss.vstack( old_list )
A little convoluted, but I would probably do it like this:
>>> import scipy.sparse as sps
>>> a = [np.arange(5), np.arange(7), np.arange(3)]
>>> lens = [len(j) for j in a]
>>> cols = np.concatenate([np.arange(j) for j in lens])
>>> rows = np.concatenate([np.repeat(j, len_) for j, len_ in enumerate(lens)])
>>> data = np.concatenate(a)
>>> b = sps.coo_matrix((data,(rows, cols)))
>>> b.toarray()
array([[0, 1, 2, 3, 4, 0, 0],
[0, 1, 2, 3, 4, 5, 6],
[0, 1, 2, 0, 0, 0, 0]])

Categories

Resources