How to make toeplitz matrix constraint in cvxpy? - python

Using cvxpy( convex optimisation solver in python), there are options to make a matrix variable to be symmetric or positive semidefinite, but I require that the matrix be toeplitz (all left to right diagonals have the same element is each diagonal entry). Is there an efficient way of doing this?

One way would be to add the constraint:
a[i,j] = a[i-1,j-1]
for all i,j>0. The presolver (part of a solver) would exploit this and reduce the size of the model.

Related

Solve linear equations on a Gram matrix with numpy

Consider a case where, given an MxM matrix A and a vector b, I want to solve something of the form inv(A # A.T) # b (where I know A is invertible).
As far as I know, it is always faster to use solve_* rather than inv. There are also variants for more efficient solving for PSD matrices (which A # A.T must be), using Cholesky factorization.
My question - since I'm constructing the matrix A # A.T just to immediately throw it away - is there a more specialized procedure for solving linear equations with the gram matrix of A without having to construct it?
You can compute the factorization of A and then use that to solve your system.
Assume we want to solve
A A^T x = b
for x.
Compute the factorization of A=LU.
Then solve Ay=b for y.
Then solve A^T x = y for x.
This way you dont have to compute the matrix A^T A.
Note that if one has a factorization of A=LU then one can solve Ax=b as well as A^T x=b efficiently for x.
This is because A^T=U^T L^T which is again a factorization of a lower times an upper triangular matrix.

The faster way to do matrix multiplication between a matrix and a diagonal matrix?

I am coding a computational package in python using numpy, in the package, I would do the matrix multiplication between an arbitrary large square matrix (e.g of size 100*100) and a diagonal matrix of same size frequently.
I have an O(n^2) method, but I think that further improvement could be made.
"""
A is of size 100*100
B is a diagonal matrix
want to do np.dot(A,B) quickly
"""
A=np.random.rand(100,100)
diag_elements=np.random.rand(100)
B=np.diag(diag_elements)
answer1= np.dot(A,B) ###O(n^3) method, quite slow
C=np.zeros((100,100))
C=C+diag_elements
answer2=np.multiply(A,C) ##O(n^2) method, 3times faster for n=100
The anwer2 is O(n^2) but I think it's not good enough, because the operation C+=diag_elements are wasting 1/3 time and could be avoided possibly.
I expect that some numpy function could do the matrix multiplication more elegently and faster. Could someone help me out?
Why don't you simply multiply A by the diagonal?
answer3 = np.multiply(A,diag_elements)

Clustering a sparse co-occurrence matrix

I have two N x N co-occurrence matrices (484x484 and 1060x1060) that I have to analyze. The matrices are symmetrical along the diagonal and contain lots of zero values. The non-zero values are integers.
I want to group together the positions that are non-zero. In other words, what I want to do is the algorithm on this link. When order by cluster is selected, the matrix gets re-arranged in rows and columns to group the non-zero values together.
Since I am using Python for this task, I looked into SciPy Sparse Linear Algebra library, but couldn't find what I am looking for.
Any help is much appreciated. Thanks in advance.
If you have a matrix dist with pairwise distances between objects, then you can find the order on which to rearrange the matrix by applying a clustering algorithm on this matrix (http://scikit-learn.org/stable/modules/clustering.html). For example it might be something like:
from sklearn import cluster
import numpy as np
model = cluster.AgglomerativeClustering(n_clusters=20,affinity="precomputed").fit(dist)
new_order = np.argsort(model.labels_)
ordered_dist = dist[new_order] # can be your original matrix instead of dist[]
ordered_dist = ordered_dist[:,new_order]
The order is given by the variable model.labels_, which has the number of the cluster to which each sample belongs. A few observations:
You have to find a clustering algorithm that accepts a distance matrix as input. AgglomerativeClustering is such an algorithm (notice the affinity="precomputed" option to tell it that we are using pre-computed distances).
What you have seems to be a pairwise similarity matrix, in which case you need to transform it to a distance matrix (e.g. dist=1 - data/data.max())
In the example I assumed 20 clusters, you may have to play with this variable a bit. Alternatively, you might try to find the best one-dimensional representation of your data (using e.g. MDS) to describe the optimal ordering of samples.
Because your data is sparse, treat it as a graph, not a matrix.
Then try the various graph clustering methods. For example cliques are interesting on such data.
Note that not everything may cluster.

Python Sparse matrix inverse and laplacian calculation

I have two sparse matrix A (affinity matrix) and D (Diagonal matrix) with dimension 100000*100000. I have to compute the Laplacian matrix L = D^(-1/2)*A*D^(-1/2). I am using scipy CSR format for sparse matrix.
I didnt find any method to find inverse of sparse matrix. How to find L and inverse of sparse matrix? Also suggest that is it efficient to do so by using python or shall i call matlab function for calculating L?
In general the inverse of a sparse matrix is not sparse which is why you won't find sparse matrix inverters in linear algebra libraries. Since D is diagonal, D^(-1/2) is trivial and the Laplacian matrix calculation is thus trivial to write down. L has the same sparsity pattern as A but each value A_{ij} is multiplied by (D_i*D_j)^{-1/2}.
Regarding the issue of the inverse, the standard approach is always to avoid calculating the inverse itself. Instead of calculating L^-1, repeatedly solve Lx=b for the unknown x. All good matrix solvers will allow you to decompose L which is expensive and then back-substitute (which is cheap) repeatedly for each value of b.

pseudo inverse of sparse matrix in python

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix).
In particular, I will need to compute the pseudo-inverse of my matrix to solve a least-square problem.
I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).
The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.
The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.
I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0
Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?
Thanks
Update 1st June:
thanks again for the help.
I can't really show you anything about my data, because the code in python give me some problems. However, in order to understand how I could choose a good k I've tried to create a testing function in Matlab.
The code is as follow:
F=zeros(100000,1000);
for k=1:150000
p=rand(1);
a=0;
b=0;
while a<=0 || b<=0
a=random('Binomial',100000,p);
b=random('Binomial',1000,p);
end
F(a,b)=rand(1);
end
solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
if S(s,s)~=0
S(s,s)=1/S(s,s);
end
end
inv=V*S'*U';
inv*measure
max(inv*measure-solution)
Do you have any idea of what should be k compare to the size of F? I've taken 250 (over 1000) and the results are not satisfactory (the waiting time is acceptable, but not short).
Also now I can compare the results with the known solution, but how could one choose k in general?
I also attached the plot of the 250 single values that I get and their squares normalized. I don't know exactly how to better do a screeplot in matlab. I'm now proceeding with bigger k to see if suddently the value will be much smaller.
Thanks again,
Jennifer
You could study more on the alternatives offered in scipy.sparse.linalg.
Anyway, please note that a pseudo-inverse of a sparse matrix is most likely to be a (very) dense one, so it's not really a fruitful avenue (in general) to follow, when solving sparse linear systems.
You may like to describe a slight more detailed manner your particular problem (dot(A, x)= b+ e). At least specify:
'typical' size of A
'typical' percentage of nonzero entries in A
least-squares implies that norm(e) is minimized, but please indicate whether your main interest is on x_hat or on b_hat, where e= b- b_hat and b_hat= dot(A, x_hat)
Update: If you have some idea of the rank of A (and its much smaller than number of columns), you could try total least squares method. Here is a simple implementation, where k is the number of first singular values and vectors to use (i.e. 'effective' rank).
from scipy.sparse import hstack
from scipy.sparse.linalg import svds
def tls(A, b, k= 6):
"""A tls solution of Ax= b, for sparse A."""
u, s, v= svds(hstack([A, b]), k)
return v[-1, :-1]/ -v[-1, -1]
Regardless of the answer to my comment, I would think you could accomplish this fairly easily using the Moore-Penrose SVD representation. Find the SVD with scipy.sparse.linalg.svds, replace Sigma by its pseudoinverse, and then multiply V*Sigma_pi*U' to find the pseudoinverse of your original matrix.

Categories

Resources