Theano sqrt returning NaN values

Theano sqrt returning NaN values - python

In my code I'm using theano to calculate an euclidean distance matrix (code from here):
import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
But the following code causes some values of the matrix to be NaN. I've read that this happens when calculating theano.tensor.sqrt() and here it's suggested to
Add an eps inside the sqrt (or max(x,EPs))
So I've added an eps to my code:
import theano
import theano.tensor as T
eps = 1e-9
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))
def pdist_euclidean(mat):
return f_euclidean(mat)
And I'm adding it before performing sqrt. I'm getting less NaNs, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT is T.dmatrix() there are no NaN

There are two likely sources of NaNs when computing Euclidean distances.
Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).
Imagine MAT has the value
[[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763]
[-2.3015387 1.74481176 -0.7612069 0.3190391 -0.24937038]
[ 1.46210794 -2.06014071 -0.3224172 -0.38405435 1.13376944]
[-1.09989127 -0.17242821 -0.87785842 0.04221375 0.58281521]]
Now, if we break down the computation we see that (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) has value
[[ 10.3838024 -9.92394296 10.39763039 -1.51676099]
[ -9.92394296 18.16971188 -14.23897281 5.53390084]
[ 10.39763039 -14.23897281 15.83764622 -0.65066204]
[ -1.51676099 5.53390084 -0.65066204 4.70316652]]
and 2 * MAT.dot(MAT.T) has value
[[ 10.3838024 14.27675714 13.11072431 7.54348446]
[ 14.27675714 18.16971188 17.00367905 11.4364392 ]
[ 13.11072431 17.00367905 15.83764622 10.27040637]
[ 7.54348446 11.4364392 10.27040637 4.70316652]]
The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this
This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)
[[ 0.00000000e+00 2.42007001e+01 2.71309392e+00 9.06024545e+00]
[ 2.42007001e+01 -7.10542736e-15 3.12426519e+01 5.90253836e+00]
[ 2.71309392e+00 3.12426519e+01 0.00000000e+00 1.09210684e+01]
[ 9.06024545e+00 5.90253836e+00 1.09210684e+01 0.00000000e+00]]
The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get NaN in that position because the square root of a negative number is undefined (for real numbers).
[[ 0. 4.91942071 1.64714721 3.01002416]
[ 4.91942071 nan 5.58951267 2.42951402]
[ 1.64714721 5.58951267 0. 3.30470398]
[ 3.01002416 2.42951402 3.30470398 0. ]]
Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.
If y = sqrt(x) then dy/dx = 1/(2 * sqrt(x)). So if x=0 or, for your purposes, if squared_euclidean_distances=0 then the gradient will be NaN because 2 * sqrt(0) = 0 and dividing by zero is undefined.
The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:
T.sqrt(T.maximum(squared_euclidean_distances, 0.))
To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:
T.sqrt(T.maximum(squared_euclidean_distances, eps))
The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (e.g. by ensuring that gradients are never computed/used for zero-length vectors). But NaN values can be pernicious: they can spread like weeds.

Just checking
In squared_euclidian_distances you're adding a column, a row, and a matrix. Are you sure this is what you want?
More precisely, if MAT is of shape (n, p), you're adding matrices of shapes (n, 1), (1, n) and (n, n).
Theano seems to silently repeat the rows (resp. the columns) of each one-dimensional member to match the number of rows and columns of the dot product.
If this is what you want
In reshape, you should probably specify ndim=2 according to basic tensor functionality : reshape.
If the shape is a Variable argument, then you might need to use the optional ndim parameter to declare how many elements the shape has, and therefore how many dimensions the reshaped Variable will have.
Also, it seems that squared_euclidean_distances should always be positive, unless imprecision errors in the difference change zero values into small negative values. If this is true, and if negative values are responsible for the NaNs you're seeing, you could indeed get rid of them without corrupting your result by surrounding squared_euclidean_distances with abs(...).

Related

Numpy Matrix Determinant Not Working as Expected?

I have a question in which I am asked to show that the determinant of matrix B equals 0. Matrix B is defined as:
import numpy as np
from numpy import linalg as m
B = np.array([[-1-3.j,-8-10.j,0-3.j],
[-7-3.j,-4-9.j,-3-2.j],
[11-3.j,-16-12.j,6-5.j]
])
print(B)
[[ -1. -3.j -8.-10.j 0. -3.j]
[ -7. -3.j -4. -9.j -3. -2.j]
[ 11. -3.j -16.-12.j 6. -5.j]]
The determinant is straightforward using numpy
m.linalg.det(B)
(-8.126832540256171e-14-1.5987211554602298e-14j)
Which is clearly not equal to zero.
I double checked my answer using https://www.symbolab.com/ and the determinant is definitely zero.
I feel like I am doing something ridiculously silly, but can't quite figure out what. Any help?

What you're seeing are really tiny numbers that are almost equal to zero. They're not exactly equal to zero only due to numerical inaccuracies.
That's why we're usually not testing them for equality but for closeness
np.allclose(np.linalg.det(B), 0). # True

To expand a little on Nils answer:
There are various ways to compute determinants. The way taught in algebra classes -- laplace expansion -- is a reasonable way to go for small (eg 3 x 3) matrices, but rapidly becomes impossible -- because of the number of computations required -- for larger matrices.
In your case, where all the real and imaginary parts are small integers, such a computation would evaluate the determinant exactly, as 0.
In python linalg.det uses a different approach, where you factorise the matrix into factors -- triangular matrices and permutations -- whose determinants can easily be computed, and then the determinant of the product is the product of the determinants of the factors. This is a order N cubed computation, and so can be used for even quite large matrices.
However such factorisations are (a little) inaccurate; the original matrix will not be exactly equal to the product. Thus the determinant will also be, most likely, a little inaccurate.

numpy.linalg.eigvals shows different values for the transpose of a matrix - is this due to overflow?

I'm getting different answers coming from np.linalg.eigvals depending on whether I use the transpose of a matrix.
To replicate:
mat = np.array([[ -7.00616288e-08, -2.79704289e-09 , 1.67598654e-10],
[ -3.23676574e+07, -1.58978291e+15, 0.00000000e+00],
[ 0.00000000e+00 , 1.80156232e-02 , -2.32851854e+07]])
print(np.linalg.eigvals(mat))
print(np.linalg.eigvals(mat.transpose()))
I get:
[ -7.00616288e-08 -1.58978291e+15 -2.32851854e+07]
[ -1.58978291e+15 2.50000000e-01 -2.32851854e+07]
Note that these values are different. Since the eigenvalues of a matrix and its transpose are identical, I assume that these issues are due to overflow. Is there some maximum value I should limit to, to make sure that this is always consistent?

Not due to an overflow. Overflow is easy to detect, and it generates a warning. The issue is the limit of double precision: significant digits can be lost when numbers of very different magnitudes are added, and then subtracted. For example, (1e20 + 1) - 1e20 == 0.
The second result, with 2 negative eigenvalues, is incorrect, because the determinant of your matrix is clearly negative: the product of main-diagonal entries is of magnitude 1e15 and dominates all other terms in the determinant by a large margin. So the sign of the determinant is the sign of this product, which is negative.
The issue is that mat.T has all tiny entries in the first column, much smaller than those in other columns. When looking for a pivot, an algorithm may scan that column and settle for what is found there. This is not necessarily how .eigvals works, but same principle -- numerical linear algebra algorithms tend to proceed from the upper left corner, and so it's best to avoid small entries there. Here's one way to:
mat1 = np.roll(mat, 1, axis=[0, 1])
print(np.linalg.eigvals(mat1))
print(np.linalg.eigvals(mat1.T))
prints
[-7.00616288e-08 -2.32851854e+07 -1.58978291e+15]
[-2.32851854e+07 -1.58978291e+15 -7.00616288e-08]
which are consistent. Rolling both axes means conjugating mat by a permutation matrix, which does not change the eigenvalues. The rolled matrix is
[[-2.32851854e+07 0.00000000e+00 1.80156232e-02]
[ 1.67598654e-10 -7.00616288e-08 -2.79704289e-09]
[ 0.00000000e+00 -3.23676574e+07 -1.58978291e+15]]
which gives NumPy a nice large number to start with.
Ideally it would do something like that itself, but no (practical) algorithm is ideal for every situation.

Why does NumPy give a different result when summing over a zero padded array?

I calculated the sum over an array and over a zero padded version of the same array:
import numpy as np
np.random.seed(3635250408)
n0, n1 = int(2**16.9), 2**17
xx = np.random.randn(n0)
yy = np.zeros(n1)
yy[:n0] = xx
sx, sy = np.sum(xx), np.sum(yy)
print(f"sx = {sx}, sy = {sy}") # -> sx = -508.33773983674155, sy = -508.3377398367416
print(f"sy - sx:", sy - sx) # -> sy - sx: -5.68434188608e-14
print("np.ptp(yy[:n0] - xx) =", np.ptp(yy[:n0] - xx)) # -> 0
Why don't I get identical results?
Interestingly, I am able to show similar effects in Mathematica. I am using Python 3.6 (Anaconda 5.0 with MKL support) and Numpy 1.13.3. Perhaps, could it be an MKL issue?
Update: #rich-l and #jkim noted that rounding problems might be the cause. I am not convinced, because adding zero should not alter a floating point number (The problem arose, when investigating a data set of that size - where the deviations were significantly larger).

You might be running into floating-point precision issues at this point.
By default, numpy uses double precision floats for storing the values, with 16 digits of precision. The first result outputs 17 digits.
I suspect that in the former case the fluctuations in values result in the two values being rounded slightly differently, with the former being resulting in a rounding to a half (5.5e-16), and the latter exceeding the threshold to be rounded to a full number (6.0e-16).
However, this is just a hypothesis - I don't know for sure how numpy does rounding for the least significant digit.

Floating-point arithmetic is not associative:
In [129]: ((0.1+0.2)+0.3) == (0.1+(0.2+0.3))
Out[129]: False
So the order in which the items are added affects the result.
numpy.sum usually uses pairwise summation. It reverts to naive summation (from left to right) when the length of the array is less than 8 or when summing over a strided axis.
Since pairwise summation recursively breaks the sequence into two groups, the
addition of zero padding affects the midpoint where the sequence gets divided and hence
alters the order in which the values are added. And since floating-point
arithmetic is not associative, zero padding can affect the result.
For example, consider
import numpy as np
np.random.seed(3635250408)
n0, n1 = 6, 8
xx = np.random.randn(n0)
# array([ 1.8545852 , -0.30387171, -0.57164897, -0.40679684, -0.8569989 ,
# 0.32546545])
yy = np.zeros(n1)
yy[:n0] = xx
# array([ 1.8545852 , -0.30387171, -0.57164897, -0.40679684, -0.8569989 ,
# 0.32546545, 0. , 0. ])
xx.sum() and yy.sum() are not the same value:
In [138]: xx.sum()
Out[138]: 0.040734223419930771
In [139]: yy.sum()
Out[139]: 0.040734223419930826
In [148]: xx.sum() == yy.sum()
Out[148]: False
Since len(xx) < 8, the values in xx are summed from left to right:
In [151]: xx.sum() == (((((xx[0]+xx[1])+xx[2])+xx[3])+xx[4])+xx[5])
Out[151]: True
Since len(yy) >= 8, pairwise summation is used to compute yy.sum():
In [147]: yy.sum() == (yy[0]+yy[1]+yy[2]+yy[3])+(yy[4]+yy[5]+yy[6]+yy[7])
Out[147]: True
Related NumPy developer discussions:
numpy.sum is not stable
implementation of pairwise summation
implementing a numerically stable sum
numpy.sum does not use Kahan nor Shewchuk summation (used by math.fsum). I believe these algorithms would
produce a stable result under the zero-padding issue that you've raised but I'm not expert enough to say for sure.

numpy: Possible for zero determinant matrix to be inverted?

By definition, a square matrix that has a zero determinant should not be invertible. However, for some reason, after generating a covariance matrix, I take the inverse of it successfully, but taking the determinant of the covariance matrix ends up with an output of 0.0.
What could be potentially going wrong? Should I not trust the determinant output, or should I not trust the inverse covariance matrix? Or both?
Snippet of my code:
cov_matrix = np.cov(data)
adjusted_cov = cov_matrix + weight*np.identity(cov_matrix.shape[0]) # add small weight to ensure cov_matrix is non-singular
inv_cov = np.linalg.inv(adjusted_cov) # runs with no error, outputs a matrix
det = np.linalg.det(adjusted_cov) # ends up being 0.0

The numerical inversion of matrices does not involve computing the determinant. (Cramer's formula for the inverse is not practical for large matrices.) So, the fact that determinant evaluates to 0 (due to insufficient precision of floats) is not an obstacle for the matrix inversion routine.
Following up on the comments by BobChao87, here is a simplified test case (Python 3.4 console, numpy imported as np)
A = 0.2*np.identity(500)
np.linalg.inv(A)
Output: a matrix with 5 on the main diagonal, which is the correct inverse of A.
np.linalg.det(A)
Output: 0.0, because the determinant (0.2^500) is too small to be represented in double precision.
A possible solution is a kind of pre-conditioning (here, just rescaling): before computing the determinant, multiply the matrix by a factor that will make its entries closer to 1 on average. In my example, np.linalg.det(5*A) returns 1.
Of course, using the factor of 5 here is cheating, but np.linalg.det(3*A) also returns a nonzero value (about 1.19e-111). If you try np.linalg.det(2**k*A) for k running through modest positive integers, you will likely hit one that will return nonzero. Then you will know that the determinant of the original matrix was approximately 2**(-k*n) times the output, where n is the matrix size.

KDE fails with two points?

The following trivial example returns a singular matrix. Why? Any ways to overcome it?
In: from scipy.stats import gaussian_kde
Out:
In: points
Out: (array([63, 84]), array([46, 42]))
In: gaussian_kde(points)
Out: (array([63, 84]), array([46, 42]))
LinAlgError: singular matrix

Looking at the backtrace, you can see it fails when inverting the covariance matrix. This is due to exact multicollinearity of your data. From the page, you have multicollinearity in your data if two variables are collinear, i.e. if
the correlation between two independent variables is equal to 1 or -1
In this case, the two variables have only two samples, and they are always collinear (trivially, there exists always one line passing two distinct points). We can check that:
np.corrcoef(array([63,84]),array([46,42]))
[[ 1. -1.]
[-1. 1.]]
To not be necessarily collinear, two variables must have at least n=3 samples. To add to this constraint, you have the limitation pointed out by ali_m, that the number of samples n should be greater or equal to the number of variables p. Putting the two together,
n>=max(3,p)
in this case p=2 and n>=3 is the right constraint.

The error occurs when gaussian_kde() tries to take the inverse of the covariance matrix of your input data. In order for the covariance matrix to be nonsingular, the number of (non-identical) points in your input must be >= to the number of variables. Try adding a third point and you should see that it works.
This answer on Crossvalidated has a proper explanation for why this is the case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Theano sqrt returning NaN values - python

Related

Numpy Matrix Determinant Not Working as Expected?

numpy.linalg.eigvals shows different values for the transpose of a matrix - is this due to overflow?

Why does NumPy give a different result when summing over a zero padded array?

numpy: Possible for zero determinant matrix to be inverted?

KDE fails with two points?

Categories

Resources