how to get the inverse of distance matrix? - python

I have a huge distance matrix.
Example: (10000 * 10000)..
Is there an effective way to find a inverse matrix?
I've tried numpy's Inv() but it's too slow.
Is there a more effective way?

You can try using Singular Value Decomposition https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
Inverting the decomposed form might take less time.

You probably don't actually need the inverse matrix.
There are a lot of numeric techniques that let people solve matrix problems without computing the inverse. Unfortunately, you have not described what your problem is, so there is no way to know which of those techniques might be useful to you.
For such a large matrix (10k x 10k), you probably want to look for some kind of iterative technique. Alternately, it might be better to look for some way to avoid constructing such a large matrix in the first place -- e.g., try using the source data in some other way.

Related

Performing UMAP dimension reduction on inconsistently shaped data - python

first question, I will do my best to be as clear as possible.
If I can provide UMAP with a distance function that also outputs a gradient or some other relevant information, can I apply UMAP to non-traditional looking data? (I.e., a data set with points of inconsistent dimension, data points that are non-uniformly sized matrices, etc.) The closest I have gotten to finding something that looks vaguely close to my question is in the documentation here (https://umap-learn.readthedocs.io/en/latest/embedding_space.html), but this seems to be sort of the opposite process, and as far as I can tell still supposes you are starting with tuple-based data of uniform dimension.
I'm aware that one way around this is just to calculate a full pairwise distance matrix ahead of time and give that to UMAP, but from what I understand of the way UMAP is coded, it only performs a subset of all possible distance calculations, and is thus much faster for the same amount of data than if I were to take the full pre-calculation route.
I am working in python3, but if there is an implementation of UMAP dimension reduction in some other environment that permits this, I would be willing to make a detour in my workflow to obtain this greater flexibility with incoming data types.
Thank you.
Algorithmically this is quite possible, but in practice most implementations do not support anything other than fixed dimension vectors. If computing the all pairs distances is not tractable another option is to try to find a way to featurize or vectorize the data in a way that will allow for easy distance computations. This is, of course, not always possible. The final option is to implement things yourself, but this requires handling the nearest neighbour search, which is likely a non-trivial coding project in and of itself.

elimination the linear dependent columns of a non-square matrix in python

I have a matrix A = np.array([[1,1,1],[1,2,3],[4,4,4]]) and I want only the linearly independent rows in my new matrix. The answer might be A_new = np.array([1,1,1],[1,2,3]]) or A_new = np.array([1,2,3],[4,4,4])
Since I have a very large matrix so I need to decompose the matrix into smaller linearly independent full rank matrix. Can someone please help?
There are many ways to do this, and which way is best will depend on your needs. And, as you noted in your statement, there isn't even a unique output.
One way to do this would be to use Gram-Schmidt to find an orthogonal basis, where the first $k$ vectors in this basis have the same span as the first $k$ independent rows. If at any step you find a linear dependence, drop that row from your matrix and continue the procedure.
A simple way do do this with numpy would be,
q,r = np.linalg.qr(A.T)
and then drop any columns where R_{i,i} is zero.
For instance, you could do
A[np.abs(np.diag(R))>=1e-10]
While this will work perfectly in exact arithmetic, it may not work as well in finite precision. Almost any matrix will be numerically independent, so you will need some kind of thresholding to determine if there is a linear dependence. If you use the built in QR method, you will have to make sure that there is no dependence on columns which you previously dropped.
If you need even more stability, you could iteratively solve the least squares problem
A.T[:,dependent_cols] x = A.T[:,col_to_check]
using a stable direct method. If you can solve this exactly, then A.T[:,k] is dependent on the previous vectors, with the combination given by x.
Which solver to use may also be dictated by your data type.

Python exp operation for matrices

I have a matrix
x=np.mat('0.1019623; 0.1019623; 0.1019623')
and I want to find the exponential of every element and have it in a matrix of the same size. One way I found was by converting to array and proceed. However, this won't be a solution if we have, let's say, a 2x3 matrix. Is there a general solution?
The problem was with me using math.exp instead of np.exp.

Fast matrix inversion without a package

Assume that I have a square matrix M. Assume that I would like to invert the matrix M.
I am trying to use the the fractions mpq class within gmpy2 as members of my matrix M. If you are not familiar with these fractions, they are functionally similar to python's built-in package fractions. The only problem is, there are no packages that will invert my matrix unless I take them out of fraction form. I require the numbers and the answers in fraction form. So I will have to write my own function to invert M.
There are known algorithms that I could program, such as gaussian elimination. However, performance is an issue, so my question is as follows:
Is there a computationally fast algorithm that I could use to calculate the inverse of a matrix M?
Is there anything else you know about these matrices? For example, for symmetric positive definite matrices, Cholesky decomposition allows you to invert faster than the standard Gauss-Jordan method you mentioned.
For general matrix inversions, the Strassen algorithm will give you a faster result than Gauss-Jordan but slower than Cholesky.
It seems like you want exact results, but if you're fine with approximate inversions, then there are algorithms which approximate the inverse much faster than the previously mentioned algorithms.
However, you might want to ask yourself if you need the entire matrix inverse for your specific application. Depending on what you are doing it might be faster to use another matrix property. In my experience computing the matrix inverse is an unnecessary step.
I hope that helps!

Diagonalizing large sparse matrix with Python/Scipy

I am working with a large (complex) Hermitian matrix and I am trying to diagonalize it efficiently using Python/Scipy.
Using the eigh function from scipy.linalgit takes about 3s to generate and diagonalize a roughly 800x800 matrix and compute all the eigenvalues and eigenvectors.
The eigenvalues in my problem are symmetrically distributed around 0 and range from roughly -4 to 4. I only need the eigenvectors corresponding to the negative eigenvalues, though, which turns the range I am looking to calculate into [-4,0).
My matrix is sparse, so it's natural to use the scipy.sparsepackage and its functions to calculate the eigenvectors via eigsh, since it uses much less memory to store the matrix.
Also I can tell the program to only calculate the negative eigenvalues via which='SA'. The problem with this method is, that it takes now roughly 40s to compute half the eigenvalues/eigenvectors. I know, that the ARPACK algorithm is very inefficient when computing small eigenvalues, but I can't think of any other way to compute all the eigenvectors that I need.
Is there any way, to speed up the calculation? Maybe with using the shift-invert mode? I will have to do many, many diagonalizations and eventually increase the size of the matrix as well, so I am a bit lost at the moment.
I would really appreciate any help!
This question is probably better to ask on http://scicomp.stackexchange.com as it's more of a general math question, rather than specific to Scipy or related to programming.
If you need all eigenvectors, it does not make very much sense to use ARPACK. Since you need N/2 eigenvectors, your memory requirement is at least N*N/2 floats; and probably in practice more. Using eigh requires N*N+3*N floats. eigh is then within a factor of 2 from the minimum requirement, so the easiest solution is to stick with it.
If you can process the eigenvectors "on-line" so that you can throw the previous one away before processing the next, there are other approaches; look at the answers to similar questions on scicomp.

Categories

Resources