I have a matrix A = np.array([[1,1,1],[1,2,3],[4,4,4]]) and I want only the linearly independent rows in my new matrix. The answer might be A_new = np.array([1,1,1],[1,2,3]]) or A_new = np.array([1,2,3],[4,4,4])
Since I have a very large matrix so I need to decompose the matrix into smaller linearly independent full rank matrix. Can someone please help?
There are many ways to do this, and which way is best will depend on your needs. And, as you noted in your statement, there isn't even a unique output.
One way to do this would be to use Gram-Schmidt to find an orthogonal basis, where the first $k$ vectors in this basis have the same span as the first $k$ independent rows. If at any step you find a linear dependence, drop that row from your matrix and continue the procedure.
A simple way do do this with numpy would be,
q,r = np.linalg.qr(A.T)
and then drop any columns where R_{i,i} is zero.
For instance, you could do
A[np.abs(np.diag(R))>=1e-10]
While this will work perfectly in exact arithmetic, it may not work as well in finite precision. Almost any matrix will be numerically independent, so you will need some kind of thresholding to determine if there is a linear dependence. If you use the built in QR method, you will have to make sure that there is no dependence on columns which you previously dropped.
If you need even more stability, you could iteratively solve the least squares problem
A.T[:,dependent_cols] x = A.T[:,col_to_check]
using a stable direct method. If you can solve this exactly, then A.T[:,k] is dependent on the previous vectors, with the combination given by x.
Which solver to use may also be dictated by your data type.
Related
first question, I will do my best to be as clear as possible.
If I can provide UMAP with a distance function that also outputs a gradient or some other relevant information, can I apply UMAP to non-traditional looking data? (I.e., a data set with points of inconsistent dimension, data points that are non-uniformly sized matrices, etc.) The closest I have gotten to finding something that looks vaguely close to my question is in the documentation here (https://umap-learn.readthedocs.io/en/latest/embedding_space.html), but this seems to be sort of the opposite process, and as far as I can tell still supposes you are starting with tuple-based data of uniform dimension.
I'm aware that one way around this is just to calculate a full pairwise distance matrix ahead of time and give that to UMAP, but from what I understand of the way UMAP is coded, it only performs a subset of all possible distance calculations, and is thus much faster for the same amount of data than if I were to take the full pre-calculation route.
I am working in python3, but if there is an implementation of UMAP dimension reduction in some other environment that permits this, I would be willing to make a detour in my workflow to obtain this greater flexibility with incoming data types.
Thank you.
Algorithmically this is quite possible, but in practice most implementations do not support anything other than fixed dimension vectors. If computing the all pairs distances is not tractable another option is to try to find a way to featurize or vectorize the data in a way that will allow for easy distance computations. This is, of course, not always possible. The final option is to implement things yourself, but this requires handling the nearest neighbour search, which is likely a non-trivial coding project in and of itself.
I have a huge distance matrix.
Example: (10000 * 10000)..
Is there an effective way to find a inverse matrix?
I've tried numpy's Inv() but it's too slow.
Is there a more effective way?
You can try using Singular Value Decomposition https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
Inverting the decomposed form might take less time.
You probably don't actually need the inverse matrix.
There are a lot of numeric techniques that let people solve matrix problems without computing the inverse. Unfortunately, you have not described what your problem is, so there is no way to know which of those techniques might be useful to you.
For such a large matrix (10k x 10k), you probably want to look for some kind of iterative technique. Alternately, it might be better to look for some way to avoid constructing such a large matrix in the first place -- e.g., try using the source data in some other way.
I'm working on a panel method code at the moment. To keep us from being bogged down in the minutia, I won't show the code - this is a question about overall program structure.
Currently, I solve my system by:
Generating the corresponding rows of the A matrix and b vector in an explicit component for each boundary condition
Assembling the partial outputs into the full A, b.
Solving the linear system, Ax=b, using a LinearSystemComp.
Here's a (crude) diagram:
I would prefer to be able to do this by just writing one implicit component to represent each boundary condition, vectorising the inputs/outputs to represent multiple rows/cols in the matrix, then allowing openMDAO to solve for the x while driving the residual for each boundary condition to 0.
I've run into trouble trying to make this work, as each implicit component is underdetermined (more rows in the output vector x than the component output residuals; that is, A1.x - b1= R1, length(R1) < length(x). Essentially, I would like openMDAO to take each of these underdetermined implicit systems, and find the value of x that solves the determined full system - without needing to do all of the assembling stuff myself.
Something like this:
To try and make my goal clearer, I'll explain what I actually want from the perspective of my panel method. I'd like a component, let's say Influence, that computes the potential induced by a given panel at a given point in the panel's reference frame. I'd like to vectorise the input panels and points such that it can compute the influence coefficent of many panels on one point, of many points on one panel, or of many points on many panels.
I'd then like a system of implicit boundary conditions to find the correct value of mu to solve the system. These boundary conditions, again, should be able to be vectorised to compute the violation of the boundary condition at many points under the influence of many panels.
I get confused again at this part. Not every boundary condition will use the influence coefficient values - some, like the Kutta condition, are just enforced on the mu vector, e.g .
How would I implement this as an implicit component? It has no inputs, and doesn't output the full mu vector.
I appreciate that the question is rather long and rambling, but I'm pretty confused. To summarise:
How can I use openMDAO to solve multiple individually underdetermined (but combined, fully determined) implicit systems?
How can I use openMDAO to write an implicit component that takes no inputs and only uses a portion of the overall solution vector?
In the OpenMDAO docs there is a close analog to what you are trying to accomplish, with the node-voltage analysis tutorial. In that code, the balance comp is used to create an implicit relationship that is similar to what you're describing. Its singular on its own, but part of a larger group is a well defined system.
You'll need to find a way to build similar components for your model. Each "row" in your equation will be associated with one state variables (one entry in your x vector).
In the simplest case, each row (or set of rows) would have one input which is the associated row of the A matrix, and a second input which is ALL of the other values for x, and a final input which is the entry of the b vector (right hand side vector). Then you could evaluate the residual for that specific row, which would be the following
R['x_i'] = np.sum(A*x_full) - b
where x_full is the assembly of the full x-vector from the x_other input and the x_i state variable.
#########
Having proposed the above solution, I have to say that I don't think this is a particularly efficient way to build or solve this linear system. It is modular, and might give you some flexibility, but you're jumping through a lot of hoops to avoid doing some index-math, and shoving everything into a matrix.
Granted, the derivatives might be a bit easier in your design, because the matrix assembly is going to get handled "magically" by the connections you have to create between the various row-components. So maybe its worth the trade... but i would say you might be better of trying a more traditional coding approach and using JAX or some other AD code to make the derivatives easier.
I have one large scipy csr_matrix and want to calculate matrix.dot(matrix.T). However, I do not need every single dot product, but rather only those based on some rules. For eample, as specified by another binary matrix that has nonzero elements for those rows/columns that should be calculated. For eample, if rules[0,10]=1 then, the dot product between row 0 and row 10 should be determined. These rules could of course also be represented by some other data structure.
A simple solution would be to manually loop through the rules and then slice according rows/columns and determine the dot product. This does not seem to be the best solution to me, specifically as slicing is also quite expensive with sparse matrices. Maybe, someone has a better idea about how to approach this.
I am trying to generate a few very large arrays, and at least one is ending up being singular, which is made obvious by this familiar error message:
File "C:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix
Of course I do not want my array to be singular, but I am more interested in determining WHY my array is singular. What I mean by this is that I would like to have a way to answer the following questions without manually checking each entry:
Is the array square? (I believe this is returned by a separate error message, which is convenient, but I'll include this as a singularity property anyway)
Are any rows populated only by zeros?
Are any columns populated only by zeros?
Are any rows not linearly independent of all other rows?
For relatively small arrays, the first two conditions are easily answered by visual inspection. However, because my arrays are substantially large, I do not want to have to go in and manually check each array element to see if any of those conditions are met.
I tried pulling up the linalg.py script to see if I could see how it determines a matrix to be singular, but I could not tell how it determines a matrix to be singular.
(this paragraph was edited for clarity)
I also tried searching for info online, and nothing seemed to be of help. Most topics seemed to only answer some form of the following questions/objectives: 1) "I want Python to tell me if my matrix is singular" or 2) why is Python giving me this error message". Because I already know that my matrix/matrices are singular, neither of these two questions are of importance to me.
Again, I am not looking for an answer along the lines of, "Oh, well this particular matrix is singular because . . .". I am looking for a method I can use immediately on ANY singular matrix to determine (especially for large arrays) what is causing the singularity.
Is there a built-in Python function that does this, or is there some other relatively simple way to do this before I try to create a function that will do this for me?
Singular matrices have at least one eigenvalue equal to zero. You can create a diagonalizable singular matrix by starting from its eigenvalue decomposition:
A = V D V^{-1}
D is the diagonal matrix of eigenvalues. So create any matrix V, the diagonal matrix D that has at least one zero in the diagonal, and then A will be singular.
The traditional way of checking is by computing an SVD. This is what the function numpy.linalg.matrix_rank uses to compute the rank, and you can then check if matrix_rank(M) == M.shape[0] (assuming a square matrix).
For more information, check out this excellent answer to a similar question for Matlab users.
The rank of the matrix will tell you how many rows aren't zero or linear combinations, but not specifically which ones. It's a relatively fast operation, so it might be useful as a first-pass check.