I'm doing matrix inversion in python, and I found it very weird that the result differs by the data scale.
In the code below, it is expected that A_inv/B_inv = B/A. However, it shows that the difference between A_inv/B_inv and B/A becomes larger and larger depend on the data scale... Is this because Python cannot compute matrix inverse precisely for matrix with large values?
Also, I checked the condition number for B, which is a constant ~3.016 no matter the scale is.
Thanks!!!
import numpy as np
from matplotlib import pyplot as plt
D = 30
N = 300
np.random.seed(10)
original_data = np.random.sample([D, N])
A = np.cov(original_data)
A_inv = np.linalg.inv(A)
B_cond = []
diff = []
for k in xrange(1,10):
B = A * np.power(10,k)
B_cond.append(np.linalg.cond(B))
B_inv = np.linalg.inv(B)
### Two measurements of difference are used
diff.append(np.log(np.linalg.norm(A_inv/B_inv - B/A)))
#diff.append(np.max(np.abs(A_inv/B_inv - B/A)))
# print B_cond
plt.figure()
plt.plot(xrange(1,10), diff)
plt.xlabel('data(B) / data(A)')
plt.ylabel('log(||A_inv/B_inv - B/A||)')
plt.savefig('Inversion for large matrix')
I may be wrong, but I think it comes from number representation in machine.
When you are dealing with great numbers, your inverse matrix is going to have very little number in magnitude (close to zero). And clsoe to zero, the representation of the floating number is not precise enough, I guess...
https://en.wikipedia.org/wiki/Floating-point_arithmetic
There is no reason that you should expect np.linalg.norm(A_inv/B_inv - B/A) to be equal to anything special. Instead, you can check the quality of the inverse calculation by multiplying the original matrix by its inverse and checking the determinant, np.linalg.det(A.dot(A_inv)), which should be equal to 1.
Related
I am using NumPy's linalg.eig on square matrices. My square matrices are a function of a 2D domain, and I am looking at its eigenvectors' complex angles along a parameterized circle on this domain. As long as the path I am considering is smooth, I expect the complex angles of each eigenvector's components to be smooth. However, for some cases, this is not the case with Python (although it is with other programming languages). For the parameter M=0 (some argument in my matrix that appears on its diagonal), I have components that look like:
when they should ideally look like (M=0.1):
What I have tried:
I verified that the matrices are Hermitian in both cases.
When I use linalg.eigh, M=0.1 becomes discontinuous while M=0 sometimes becomes continuous.
Using np.unwrap did nothing.
The difference between component phases (i.e. np.angle(v1-v2) for eigenvector v=[[v1],[v2]]) is smooth/continuous, but this is not what I want.
Fixing the NumPy seed before solving did nothing for different values of the seed. For example: np.random.seed(1).
What else can I do? I am trying to use Sympy's eigenvects just because I am running out of options, and I asked another question asking about another potential approach here: How do I force first component of NumPy eigenvectors to be real? . But, I do not know what else I can try.
Here is a minimal working example that works nicely in a Jupyter notebook:
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
M = 0.01; # nonzero M is okay
M = 0.0; # M=0 causes problems
def matrix_generator(kx,ky,M):
a = 2.46; t = 1; k = np.array((kx,ky));
d1 = (a/2)*np.array((1,np.sqrt(3)));d2 = (a/2)*np.array((1,-np.sqrt(3)));d3 = -a*np.array((1,0));
sx = np.matrix([[0,1],[1,0]]);sy = np.matrix([[0,-1j],[1j,0]]);sz = np.matrix([[1,0],[0,-1]]);
hx = np.cos(k#d1)+np.cos(k#d2)+np.cos(k#d3);hy = np.sin(k#d1)+np.sin(k#d2)+np.sin(k#d3);
return -t*(hx*sx - hy*sy + M*sz)
n_segs = 200; #number of segments in (kx,ky) loop
evecs_along_loop = np.zeros((n_segs,2,2),dtype=float)
# parameterize circular loop
kx0 = 0.5; ky0 = 1; r1=0.2; r2=0.2;
a = np.linspace(0.0, 2*np.pi, num=n_segs+2)
kloop=np.zeros((n_segs+2,2))
for i in range(n_segs+2):
kloop[i,:]=np.array([kx0 + r1*np.cos(a[i]), ky0 + r2*np.sin(a[i])])
# assign eigenvector complex angles
for j in np.arange(n_segs):
np.random.seed(2)
H = matrix_generator(kloop[j][0],kloop[j][1],M)
eval0, psi0 = LA.eig(H)
evecs_along_loop[j,:,:] = np.angle(psi0)
# plot eigenvector complex angles
for p in np.arange(2):
for q in np.arange(2):
print(f"Phase for eigenvector element {p},{q}:")
fig = plt.figure()
ax = plt.axes()
ax.plot((evecs_along_loop[:,p,q]))
plt.show()
Clarification for anon01's comment:
For M=0, a sample matrix at some value of (kx,ky) would look like:
a = np.matrix([[0.+0.j, 0.99286437+1.03026667j],
[0.99286437-1.03026667j, 0.+0.j]])
For M =/= 0, the diagonal will be non-zero (but real).
I think that in general this is a tough problem. The fundamental issue is that eigenvectors (unlike eigenvalues) are not unambiguously defined. An eigenvector v of M with eigenvalue c is any non-zero vector for which
M*v = c*v
In particular for any non zero scalar s, multiplying an eigenvector by s yields an eigenvector, and even if you demand (as usual) that eigenvectors have length 1, we are still free to multiply by any scalar of absolute value 1. Even worse, if v1,..vd are orthogonal eigenvectors for c, then any non-zero linear combination of the v's is also an eigenvector for c.
Different eigendecomposition routines might well, therefore, come up with very different eigenvectors and still be doing their job. Moreover some routines might produce eigenvectors that are far apart for matrices that are close together.
A simple tractable case is where you know that all your eigenvalues are non-degenerate (i.e. each eigenspace is of dimension 1) and you happen to know that for a particular i, the i'th component of each eigenvector will be non zero. Then you could multiply the eigenvector v by a scalar, of absolute value 1, chosen so that after the multiplication v[i] is a positive real number. In C
s = conj(v[i])/cabs(v[i])
where
conj(z) is the complex conjugate of the complex number z,
and cabs(z) is the absolute value of the complex number z
Note that the above supposes that we are using the same index for every eigenvector, though the factor s varies from eigenvector to eigenvector.
This would impose a uniqueness on the eigenvectors, and, one would hope, mean that they varied continuously with the parameters of your matrix.
I want to implement ifft2 using DFT matrix. The following code works for fft2.
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.fft2(rA)
dftMtxM=DFT_matrix(sizeM)
dftMtxN=DFT_matrix(sizeN)
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
To get to ifft2 I assumd I need to change only the dft matrix to it's transpose, so expected the following to work, but I got false for the last two print any suggesetion please?
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.ifft2(rA)
dftMtxM=np.conj(DFT_matrix(sizeM))
dftMtxN=np.conj(DFT_matrix(sizeN))
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
I am going to be building on some things from my answer to your previous question. Please note that I will try to distinguish between the terms Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT). Remember that DFT is the transform while FFT is only an efficient algorithm for performing it. People, including myself, however very commonly refer to the DFT as FFT since it is practically the only algorithm used for computing the DFT
The problem here is again the normalization of the data. It's interesting that this is such a fundamental and confusing part of any DFT operations yet I couldn't find a good explanation on the internet. I will try to provide a summary at the end about DFT normalization however I think the best way to understand this is by working through some examples yourself.
Why the comparisons fail?
It's important to note, that even though both of the allclose tests seemingly fail, they are actually not a very good method of comparing two complex number arrays.
Difference between two angles
In particular, the problem is when it comes to comparing angles. If you just take the difference of two close angles that are on the border between -pi and pi, you can get a value that is around 2*pi. The allclose just takes differences between values and checks that they are bellow some threshold. Thus in our cases, it can report a false negative.
A better way to compare angles is something along the lines of this function:
def angle_difference(a, b):
diff = a - b
diff[diff < -np.pi] += 2*np.pi
diff[diff > np.pi] -= 2*np.pi
return diff
You can then take the maximum absolute value and check that it's bellow some threshold:
np.max(np.abs(angle_difference(np.angle(mA), np.angle(rAfft)))) < threshold
In the case of your example, the maximum difference was 3.072209153742733e-12.
So the angles are actually correct!
Magnitude scaling
We can get an idea of the issue is when we look at the magnitude ratio between the matrix iDFT and the library iFFT.
print(np.abs(mA)/np.abs(rAfft))
We find that all the values in mA are 800, which means that our absolute values are 800 times larger than those computed by the library. Suspiciously, 800 = 40 * 20, the dimensions of our data! I think you can see where I am going with this.
Confusing DFT normalization
We spot some indications why this is the case when we have a look at the FFT formulas as taken from the Numpy FFT documentation:
You will notice that while the forward transform doesn't normalize by anything. The reverse transform divides the output by 1/N. These are the 1D FFTs but the exact same thing applies in the 2D case, the inverse transform multiplies everything by 1/(N*M)
So in our example, if we update this line, we will get the magnitudes to agree:
mA = dftMtxM # rA/(sizeM * sizeN) # dftMtxN
A side note on comparing the outputs, an alternative way to compare complex numbers is to compare the real and imaginary components:
print(np.allclose(mA.real, rAfft.real))
print(np.allclose(mA.imag, rAfft.imag))
And we find that now indeed both methods agree.
Why all this normalization mess and which should I use?
The fundamental property of the DFT transform must satisfy is that iDFT(DFT(x)) = x. When you work through the math, you find that the product of the two coefficients before the sum has to be 1/N.
There is also something called the Parseval's theorem. In simple terms, it states that the energy in the signals is just the sum of square absolutes in both the time domain and frequency domain. For the FFT this boils down to this relationship:
Here is the function for computing the energy of a signal:
def energy(x):
return np.sum(np.abs(x)**2)
You are basically faced with a choice about the 1/N factor:
You can put the 1/N before the DFT sum. This makes senses as then the k=0 DC component will be equal to the average of the time domain values. However you will have to multiply the energy in frequency domain by N in order to match it with time domain frequency.
N = len(x)
X = np.fft.fft(x)/N # Compute the FFT scaled by `1/N`
# Energy related by `N`
np.allclose(energy(x), energy(X) * N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y*N) # Compute the iFFT, remember to cancel out the built in `1/N` of ifft
You put the 1/N before the iDFT. This is, slightly counterintuitively, what most implementations, including Numpy do. I could not find a definitive consensus on the reasoning behind this, but I think it has something to do with the implementation efficiency. (If anyone has a better explanation for this, please leave it in the comments) As shown in the equations earlier, the energy in the frequency domain has to be divided by N to match the time domain energy.
N = len(x)
X = np.fft.fft(x) # Compute the FFT without scaling
# Energy, related by 1/N
np.allclose(energy(x), energy(X) / N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y) # Compute the iFFT with the build in `1/N`
You can split the 1/N by placing 1/sqrt(N) before each of the transforms making them perfectly symmetric. In Numpy, you can provide the parameter norm="ortho" to the fft functions which will make them use the 1/sqrt(N) normalization instead: np.fft.fft(x, norm="ortho") The nice property here is that the energy now matches in both domains.
X = np.fft.fft(x, norm='orth') # Compute the FFT scaled by `1/sqrt(N)`
# Perform some processing...
# Energy are equal:
np.allclose(energy(x), energy(X)) == True
Y = X * H
y = np.fft.ifft(Y, norm='orth') # Compute the iFFT, with scaling by `1/sqrt(N)`
In the end it boils down to what you need. Most of the time the absolute magnitude of your DFT is actually not that important. You are mostly interested in the ratio of various components or you want to perform some operation in the frequency domain but then transform back to the time domain or you are interested in the phase (angles). In all of these case, the normalization does not really play an important role, as long as you stay consistent.
I have to calculate the exponential of the following array for my project:
w = [-1.52820754859, -0.000234000845064, -0.00527938881237, 5797.19232191, -6.64682108484,
18924.7087966, -69.308158911, 1.1158892974, 1.04454511882, 116.795573742]
But I've been getting overflow due to the number 18924.7087966.
The goal is to avoid using extra packages such as bigfloat (except "numpy") and get a close result (which has a small relative error).
1.So far I've tried using higher precision (i.e. float128):
def getlogZ_robust(w):
Z = sum(np.exp(np.dot(x,w).astype(np.float128)) for x in iter_all_observations())
return np.log(Z)
But I still get "inf" which is what I want to avoid.
I've tried clipping it using nump.clip():
def getlogZ_robust(w):
Z = sum(np.exp(np.clip(np.dot(x,w).astype(np.float128),-11000, 11000)) for x in iter_all_observations())
return np.log(Z)
But the relative error is too big.
Can you help me solving this problem, if it is possible?
Only significantly extended or arbitrary precision packages will be able to handle the huge differences in numbers. The exponential of the largest and most negative numbers in w differ by 8000 (!) orders of magnitude. float (i.e. double precision) has 'only' 15 digits of precision (meaning 1+1e-16 is numerically equal to 1), such that adding the small numbers to the huge exponential of the largest number has no effect. As a matter of fact, exp(18924.7087966) is so huge, that it dominates the sum. Below is a script performing the sum with extended precision in mpmath: the ratio of the sum of exponentials and exp(18924.7087966) is basically 1.
w = [-1.52820754859, -0.000234000845064, -0.00527938881237, 5797.19232191, -6.64682108484,
18924.7087966, -69.308158911, 1.1158892974, 1.04454511882, 116.795573742]
u = min(w)
v = max(w)
import mpmath
#using plenty of precision
mpmath.mp.dps = 32768
print('%.5e' % mpmath.log10(mpmath.exp(v)/mpmath.exp(u)))
#exp(w) differs by 8000 orders of magnitude for largest and smallest number
s = sum([mpmath.exp(mpmath.mpf(x)) for x in w])
print('%.5e' % (mpmath.exp(v)/s))
#largest exp(w) dominates such that ratio over the sums of exp(w) and exp(max(w)) is approx. 1
If the issues of loosing digits in the final results due to hugely different orders of magnitudes of added terms in not a concern, one could also mathematically transform the log of sums over exponentials the following way avoiding exp of large numbers:
log(sum(exp(w)))
= log(sum(exp(w-wmax)*exp(wmax)))
= wmax + log(sum(exp(w-wmax)))
In python:
import numpy as np
v = np.array(w)
m = np.max(v)
print(m + np.log(np.sum(np.exp(v-m))))
Note that np.log(np.sum(np.exp(v-m))) is numerically zero as the exponential of the largest number completely dominates the sum here.
Numpy has a function called logaddexp which computes
logaddexp(x1, x2) == log(exp(x1) + exp(x2))
without explicitly computing the intermediate exp() values. This way it avoids the overflow. So here is the solution:
def getlogZ_robust(w):
Z = 0
for x in iter_all_observations():
Z = np.logaddexp(Z, np.dot(x,w))
return Z
I tried to calculate the Pearson's correlation coefficients between every pairs of rows from two 2D arrays. Then, sort the rows/columns of the correlation matrix based on its diagonal elements. First, the correlation coefficient matrix (i.e., 'ccmtx') was calculated from one random matrix (i.e., 'randmtx') in the following code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
def correlation_map(x, y):
n_row_x = x.shape[0]
n_row_y = x.shape[0]
ccmtx_xy = np.empty((n_row_x, n_row_y))
for n in range(n_row_x):
for m in range(n_row_y):
ccmtx_xy[n, m] = pearsonr(x[n, :], y[m, :])[0]
return ccmtx_xy
randmtx = np.random.randn(100, 1000) # generating random matrix
#ccmtx = np.corrcoef(randmtx, randmtx) # cc matrix based on numpy.corrcoef
ccmtx = correlation_map(randmtx, randmtx) # cc matrix based on scipy pearsonr
#
ccmtx_diag = np.diagonal(ccmtx)
#
ids, vals = np.argsort(ccmtx_diag, kind = 'mergesort'), np.sort(ccmtx_diag, kind = 'mergesort')
#ids, vals = np.argsort(ccmtx_diag, kind = 'quicksort'), np.sort(ccmtx_diag, kind = 'quicksort')
plt.plot(ids)
plt.show()
plt.plot(ccmtx_diag[ids])
plt.show()
vals[0]
The issue here is when the 'pearsonr' was used, the diagonal elements of 'ccmtx' are exactly 1.0 which makes sense. However, the 'corrcoef' was used, the diagonal elements of 'ccmtrix' are not exactly one (and slightly less than 1 for some diagonals) seemingly due to a precision error of floating point numbers.
I found to be annoying that the auto-correlation matrix of a single matrix have diagnoal elements not being 1.0 since this resulted in the shuffling of rows/columes of the correlation matrix when the matrix is sorted based on the diagonal elements.
My questions are:
[1] is there any good way to accelerate the computation time when I stick to use the 'pearsonr' function? (e.g., vectorized pearsonr?)
[2] Is there any good way/practice to prevent this precision error when using the 'corrcoef' in numpy? (e.g. 'decimals' option in np.around?)
I have searched the correlation coefficient calculations between all pairs of rows or columns from two matrices. However, as the algorithms containe some sort of "cov / variance" operation, this kind of precision issue seems always existing.
Minor point: the 'mergesort' option seems to provide reliable results than the 'quicksort' as the quicksort shuffled 1d array with exactly 1 to random order.
Any thoughts/comments would be greatly appreciated!
For question 1 vectorized pearsonr see the comments to the question.
I will answer only question 2: how to improve the precision of np.corrcoef.
The correlation matrix R is computed from the covariance matrix C according to
.
The implementation is optimized for performance and memory usage. It computes the covariance matrix, and then performs two divisions by sqrt(C_ii) and by sqrt(Cjj). This separate square-rooting is where the imprecision comes from. For example:
np.sqrt(3 * 3) - 3 == 0.0
np.sqrt(3) * np.sqrt(3) - 3 == -4.4408920985006262e-16
We can fix this by implementing our own simple corrcoef routine:
def corrcoef(a, b):
c = np.cov(a, b)
d = np.diag(c)
return c / np.sqrt(d[:, None] * d[None, :])
Note that this implementation requires more memory than the numpy implementation because it needs to store a temporary matrix with size n * n and it is slightly slower because it needs to do n^2 square roots instead of only 2 n.
If I have a an N^3 array of triplets in a numpy array, how do I do a vector sum on all of the triplets in the array? For some reason I just can't wrap my brain around the summation indices. Here is what I tried, but it doesn't seem to work:
a = np.random.random((5,5,5,3)) - 0.5
s = a.sum((0,1,2))
np.linalg.norm(s)
I would expect that as N gets large, if the sum is working correctly I should converge to 0, but I just keep getting bigger. The sum gives me a vector that is the correct shape (3x1), but obviously I must be doing something wrong. I know this should be easy, but I'm just not getting it.
Thanks in advance!
Is is easier to understand you problem analytically if instead of uniform random numbers we use standard normal numbers, and the qualitative results can be applied to your particular case:
>>> a = np.random.normal(0, 1, size=(5, 5, 5, 3))
>>> s = a.sum(axis=(0, 1, 2))
So now each of the three items of s is the sum of 125 numbers, each drawn from a standard normal distribution. It is a well established fact that adding up two normal distributions gives you another normal distribution with mean the sum of the means, and variance the sum of the variances. So each of the three values in s will be distributed as a random sample from a normal distribution with mean 0 and standard deviation sqrt(125) = 11.18.
The fact that the variance of the distribution grows means that, even though if you run your code many times, you will see an average value of 0 for each of those numbers, on any given run you are more likely to see larger offsets from 0.
Furthermore you then go and compute the norm of those three values. Squaring three standard normal distributions and adding them together gives you a chi-squared distribution. If you then take the square root, you get a chi distribution. The former is easier to deal with, and it predicts that the average value of the square of the norm of your three values will be 3 * 125. And it most certainly seems to be:
>>> mean_norm_sq = 0
>>> for n in xrange(1000):
... a = np.random.normal(0, 1, size=(5, 5, 5, 3))
... s = a.sum(axis=(0, 1, 2))
... mean_norm_sq += np.sum(s**2)
...
>>> mean_norm_sq / 1000
374.47629802482447
As the comments note, there is no reason why the squared sum should approach zero. By the description, an array of N three-dimensional vectors sounds like it should have the shape of (N,3) not (N,N,N,3), but I may be misunderstanding it. Either way, it is simple to observe what happens in the two cases:
import numpy as np
avg_sum = []
sq_sum = []
N_val = 2**np.arange(15)
for N in N_val:
A = np.random.random((N,3)) - 0.5
avg_sum.append( A.sum(axis=1).mean() )
sq_sum.append ( (A**2).sum(axis=1).mean() )
import pylab as plt
plt.plot(N_val, avg_sum, label="Average sum")
plt.plot(N_val, sq_sum, label="Squared sum")
plt.legend(loc="best")
plt.show()
The average sum goes to zero as your intuition expects.