I have a question in which I am asked to show that the determinant of matrix B equals 0. Matrix B is defined as:
import numpy as np
from numpy import linalg as m
B = np.array([[-1-3.j,-8-10.j,0-3.j],
[-7-3.j,-4-9.j,-3-2.j],
[11-3.j,-16-12.j,6-5.j]
])
print(B)
[[ -1. -3.j -8.-10.j 0. -3.j]
[ -7. -3.j -4. -9.j -3. -2.j]
[ 11. -3.j -16.-12.j 6. -5.j]]
The determinant is straightforward using numpy
m.linalg.det(B)
(-8.126832540256171e-14-1.5987211554602298e-14j)
Which is clearly not equal to zero.
I double checked my answer using https://www.symbolab.com/ and the determinant is definitely zero.
I feel like I am doing something ridiculously silly, but can't quite figure out what. Any help?
What you're seeing are really tiny numbers that are almost equal to zero. They're not exactly equal to zero only due to numerical inaccuracies.
That's why we're usually not testing them for equality but for closeness
np.allclose(np.linalg.det(B), 0). # True
To expand a little on Nils answer:
There are various ways to compute determinants. The way taught in algebra classes -- laplace expansion -- is a reasonable way to go for small (eg 3 x 3) matrices, but rapidly becomes impossible -- because of the number of computations required -- for larger matrices.
In your case, where all the real and imaginary parts are small integers, such a computation would evaluate the determinant exactly, as 0.
In python linalg.det uses a different approach, where you factorise the matrix into factors -- triangular matrices and permutations -- whose determinants can easily be computed, and then the determinant of the product is the product of the determinants of the factors. This is a order N cubed computation, and so can be used for even quite large matrices.
However such factorisations are (a little) inaccurate; the original matrix will not be exactly equal to the product. Thus the determinant will also be, most likely, a little inaccurate.
Related
I'm getting different answers coming from np.linalg.eigvals depending on whether I use the transpose of a matrix.
To replicate:
mat = np.array([[ -7.00616288e-08, -2.79704289e-09 , 1.67598654e-10],
[ -3.23676574e+07, -1.58978291e+15, 0.00000000e+00],
[ 0.00000000e+00 , 1.80156232e-02 , -2.32851854e+07]])
print(np.linalg.eigvals(mat))
print(np.linalg.eigvals(mat.transpose()))
I get:
[ -7.00616288e-08 -1.58978291e+15 -2.32851854e+07]
[ -1.58978291e+15 2.50000000e-01 -2.32851854e+07]
Note that these values are different. Since the eigenvalues of a matrix and its transpose are identical, I assume that these issues are due to overflow. Is there some maximum value I should limit to, to make sure that this is always consistent?
Not due to an overflow. Overflow is easy to detect, and it generates a warning. The issue is the limit of double precision: significant digits can be lost when numbers of very different magnitudes are added, and then subtracted. For example, (1e20 + 1) - 1e20 == 0.
The second result, with 2 negative eigenvalues, is incorrect, because the determinant of your matrix is clearly negative: the product of main-diagonal entries is of magnitude 1e15 and dominates all other terms in the determinant by a large margin. So the sign of the determinant is the sign of this product, which is negative.
The issue is that mat.T has all tiny entries in the first column, much smaller than those in other columns. When looking for a pivot, an algorithm may scan that column and settle for what is found there. This is not necessarily how .eigvals works, but same principle -- numerical linear algebra algorithms tend to proceed from the upper left corner, and so it's best to avoid small entries there. Here's one way to:
mat1 = np.roll(mat, 1, axis=[0, 1])
print(np.linalg.eigvals(mat1))
print(np.linalg.eigvals(mat1.T))
prints
[-7.00616288e-08 -2.32851854e+07 -1.58978291e+15]
[-2.32851854e+07 -1.58978291e+15 -7.00616288e-08]
which are consistent. Rolling both axes means conjugating mat by a permutation matrix, which does not change the eigenvalues. The rolled matrix is
[[-2.32851854e+07 0.00000000e+00 1.80156232e-02]
[ 1.67598654e-10 -7.00616288e-08 -2.79704289e-09]
[ 0.00000000e+00 -3.23676574e+07 -1.58978291e+15]]
which gives NumPy a nice large number to start with.
Ideally it would do something like that itself, but no (practical) algorithm is ideal for every situation.
Problem: In numpy, I have a matrix M1 that I am multiplying with another matrix M2.
I know that I can spare half of the values in M1 because the resulting matrix will be symmetric and I only need the top k values.
So I am thinking to use numpy.tril to zero out half the values, hoping that the underlying C-functions will be faster for multiplications a*b where a==0 as they can stop right at seeing a==0 instead of doing the whole float operation.
I know I can time this but I think this is a question of general interest.
Note that M1 is not sparse, just that half of it needs not be considered.
Perhaps there is even a better way to save 50% computation?
Background: This has to do with
p(A|B)*p(B) == p(B|A)*p(A)
... if you see what I mean.
Example: This is only one point where it happens, but in the very end, we have
a |A| x |A| matrix p(A|B) (A and B are the same variables)
a 1 x |A| matrix p(B)
the result is a |A| x |A| matrix p(A,B) = p(A|B)*p(B) where we do not care about the diagonal as it is the probability of the value given itself and the part above or below the diagonal as it is duplicate of the other half. Nice for a sanity check but unnecessary after all.
Note that here it is actually not a dot product. But I guess half the computations leading to p(A|B) are also unnecessary.
Update: I will persue a more reasonable approach for this application, which is to limit A and B to be disjoint. Then all matrices are reduced in size. It can be done elegantly in numpy but adds some complexity to reading the code.
That did not make sense after all. The only option would be to create M1.shape[0]-1 submatrices that recreate the triangle, but that would certainly produce too much overhead.
In my code I'm using theano to calculate an euclidean distance matrix (code from here):
import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
But the following code causes some values of the matrix to be NaN. I've read that this happens when calculating theano.tensor.sqrt() and here it's suggested to
Add an eps inside the sqrt (or max(x,EPs))
So I've added an eps to my code:
import theano
import theano.tensor as T
eps = 1e-9
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
And I'm adding it before performing sqrt. I'm getting less NaNs, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT is T.dmatrix() there are no NaN
There are two likely sources of NaNs when computing Euclidean distances.
Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).
Imagine MAT has the value
[[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763]
[-2.3015387 1.74481176 -0.7612069 0.3190391 -0.24937038]
[ 1.46210794 -2.06014071 -0.3224172 -0.38405435 1.13376944]
[-1.09989127 -0.17242821 -0.87785842 0.04221375 0.58281521]]
Now, if we break down the computation we see that (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) has value
[[ 10.3838024 -9.92394296 10.39763039 -1.51676099]
[ -9.92394296 18.16971188 -14.23897281 5.53390084]
[ 10.39763039 -14.23897281 15.83764622 -0.65066204]
[ -1.51676099 5.53390084 -0.65066204 4.70316652]]
and 2 * MAT.dot(MAT.T) has value
[[ 10.3838024 14.27675714 13.11072431 7.54348446]
[ 14.27675714 18.16971188 17.00367905 11.4364392 ]
[ 13.11072431 17.00367905 15.83764622 10.27040637]
[ 7.54348446 11.4364392 10.27040637 4.70316652]]
The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this
This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)
[[ 0.00000000e+00 2.42007001e+01 2.71309392e+00 9.06024545e+00]
[ 2.42007001e+01 -7.10542736e-15 3.12426519e+01 5.90253836e+00]
[ 2.71309392e+00 3.12426519e+01 0.00000000e+00 1.09210684e+01]
[ 9.06024545e+00 5.90253836e+00 1.09210684e+01 0.00000000e+00]]
The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get NaN in that position because the square root of a negative number is undefined (for real numbers).
[[ 0. 4.91942071 1.64714721 3.01002416]
[ 4.91942071 nan 5.58951267 2.42951402]
[ 1.64714721 5.58951267 0. 3.30470398]
[ 3.01002416 2.42951402 3.30470398 0. ]]
Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.
If y = sqrt(x) then dy/dx = 1/(2 * sqrt(x)). So if x=0 or, for your purposes, if squared_euclidean_distances=0 then the gradient will be NaN because 2 * sqrt(0) = 0 and dividing by zero is undefined.
The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:
T.sqrt(T.maximum(squared_euclidean_distances, 0.))
To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:
T.sqrt(T.maximum(squared_euclidean_distances, eps))
The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (e.g. by ensuring that gradients are never computed/used for zero-length vectors). But NaN values can be pernicious: they can spread like weeds.
Just checking
In squared_euclidian_distances you're adding a column, a row, and a matrix. Are you sure this is what you want?
More precisely, if MAT is of shape (n, p), you're adding matrices of shapes (n, 1), (1, n) and (n, n).
Theano seems to silently repeat the rows (resp. the columns) of each one-dimensional member to match the number of rows and columns of the dot product.
If this is what you want
In reshape, you should probably specify ndim=2 according to basic tensor functionality : reshape.
If the shape is a Variable argument, then you might need to use the optional ndim parameter to declare how many elements the shape has, and therefore how many dimensions the reshaped Variable will have.
Also, it seems that squared_euclidean_distances should always be positive, unless imprecision errors in the difference change zero values into small negative values. If this is true, and if negative values are responsible for the NaNs you're seeing, you could indeed get rid of them without corrupting your result by surrounding squared_euclidean_distances with abs(...).
I am a newbie when it comes to using python libraries for numerical tasks. I was reading a paper on LexRank and wanted to know how to compute eigenvectors of a transition matrix. I used the eigval function and got a result that I have a hard time interpreting:
a = numpy.zeros(shape=(4,4))
a[0,0]=0.333
a[0,1]=0.333
a[0,2]=0
a[0,3]=0.333
a[1,0]=0.25
a[1,1]=0.25
a[1,2]=0.25
a[1,3]=0.25
a[2,0]=0.5
a[2,1]=0.0
a[2,2]=0.0
a[2,3]=0.5
a[3,0]=0.0
a[3,1]=0.333
a[3,2]=0.333
a[3,3]=0.333
print LA.eigval(a)
and the eigenvalue is:
[ 0.99943032+0.j
-0.13278637+0.24189178j
-0.13278637-0.24189178j
0.18214242+0.j ]
Can anyone please explain what j is doing here? Isn't the eigenvalue supposed to be a scalar quantity? How can I interpret this result broadly?
j is the imaginary number, the square root of minus one. In math it is often denoted by i, in engineering, and in Python, it is denoted by j.
A single eigenvalue is a scalar quantity, but an (m, m) matrix will have m eigenvalues (and m eigenvectors). The Wiki page on eigenvalues and eigenvectors has some examples that might help you to get your head around the concepts.
As #unutbu mentions, j denotes the imaginary number in Python. In general, a matrix may have complex eigenvalues (i.e. with real and imaginary components) even if it contains only real values (see here, for example). Symmetric real-valued matrices are an exception, in that they are guaranteed to have only real eigenvalues.
By definition, a square matrix that has a zero determinant should not be invertible. However, for some reason, after generating a covariance matrix, I take the inverse of it successfully, but taking the determinant of the covariance matrix ends up with an output of 0.0.
What could be potentially going wrong? Should I not trust the determinant output, or should I not trust the inverse covariance matrix? Or both?
Snippet of my code:
cov_matrix = np.cov(data)
adjusted_cov = cov_matrix + weight*np.identity(cov_matrix.shape[0]) # add small weight to ensure cov_matrix is non-singular
inv_cov = np.linalg.inv(adjusted_cov) # runs with no error, outputs a matrix
det = np.linalg.det(adjusted_cov) # ends up being 0.0
The numerical inversion of matrices does not involve computing the determinant. (Cramer's formula for the inverse is not practical for large matrices.) So, the fact that determinant evaluates to 0 (due to insufficient precision of floats) is not an obstacle for the matrix inversion routine.
Following up on the comments by BobChao87, here is a simplified test case (Python 3.4 console, numpy imported as np)
A = 0.2*np.identity(500)
np.linalg.inv(A)
Output: a matrix with 5 on the main diagonal, which is the correct inverse of A.
np.linalg.det(A)
Output: 0.0, because the determinant (0.2^500) is too small to be represented in double precision.
A possible solution is a kind of pre-conditioning (here, just rescaling): before computing the determinant, multiply the matrix by a factor that will make its entries closer to 1 on average. In my example, np.linalg.det(5*A) returns 1.
Of course, using the factor of 5 here is cheating, but np.linalg.det(3*A) also returns a nonzero value (about 1.19e-111). If you try np.linalg.det(2**k*A) for k running through modest positive integers, you will likely hit one that will return nonzero. Then you will know that the determinant of the original matrix was approximately 2**(-k*n) times the output, where n is the matrix size.