I have a simple question regarding normalization when doing a 2D FFT in python.
My understanding is that normalization factors can be determined from making arrays filled with ones.
For example in 1d, FFT of [1,1,1,1] would give me [4+0j,0+0j,0+0j,0+0j] so the normalization factor should be 1/N=1/4.
In 2D, FFT of [[1,1],[1,1]] would give me [[4+0j,0+0j],[0+0j,0+0j]] so the normalization should be 1/MN=1/(2*2)=1/4.
Now suppose we have a 3000 by 3000 matrix, each element with a Gaussian distributed value with mean 0. When we FFT and normalize this (normalization factor = 1/(3000*3000)), we get a mean power of order 10^-7.
Now we repeat this using a 1000 by 1000 element sub-region (normalization factor = 1/(1000*1000)). The mean power we get from this is of order 10^-6. I'm wondering why there is a factor of ~10 difference. Shouldn't the mean power be the same? Am I missing an extra normalization factor?
If we say that the factor difference is infact 9, then I could guess that this comes from the number of elements (3000 x 3000 has 9 times more elements than 1000 x 1000), but what is the intuitive reason for this extra factor? Also, how do we determine the absolute normalization factor to obtain the "true" underlying mean power?
Any insight will be greatly appreciated. Thanks in advance!
sample code:
import numpy as np
a = np.random.randn(3000,3000)
af = np.fft.fft2(a)/3000.0/3000.0
aP = np.mean(np.abs(af)**2)
b = a[1000:2000,1000:2000]
bf = np.fft.fft2(b)/1000.0/1000.0
bP = np.mean(np.abs(bf)**2)
print aP,bP
>1.11094908545e-07 1.00226264535e-06
First, it's important to note that this issue is not related to the difference between 1D and 2D FFTs, but rather to how total power and mean power scale with the number of elements in an array.
You are exactly right when you say that the factor of 9 comes from the 9x more elements in a than b. What is confusing, perhaps, is that you noticed that you've already normalized by dividing by np.fft.fft2(a)/3000./3000. and np.fft.fft2(b)/1000./1000. In fact, those normalizations are necessary to get the total (not mean) power to be equal in the space and frequency domains. To get the mean you have to divide again by the array sizes.
Your question is really about Parseval's theorem, which states that the total power in the two domains (space/time and frequency) are equal. Its statement, for DFT is this. Notice, that in spite of the 1/N on the right, this is not mean power, but total power. The reason for the 1/N is the normalization convention for the DFT.
Put in Python, this means that for a time/space signal sig, Parseval equivalence may be stated as:
np.sum(np.abs(sig)**2) == np.sum(np.abs(np.fft.fft(sig))**2)/sig.size
Below is a complete example, starting with some toy cases (one and two dimensional arrays filled one 1s) and ending with your own case. Note that I used the .size property of numpy.ndarray, which returns the total number of elements in the array. It's equivalent to your /1000./1000. etc. Hope this helps!
import numpy as np
print 'simple examples:'
# 1-d, 4 elements:
ones_1d = np.array([1.,1.,1.,1.])
ones_1d_f = np.fft.fft(ones_1d)
# compute total power in space and frequency domains:
space_power_1d = np.sum(np.abs(ones_1d)**2)
freq_power_1d = np.sum(np.abs(ones_1d_f)**2)/ones_1d.size
print 'space_power_1d = %f'%space_power_1d
print 'freq_power_1d = %f'%freq_power_1d
# 2-d, 4 elements:
ones_2d = np.array([[1.,1.],[1.,1.]])
ones_2d_f = np.fft.fft2(ones_2d)
# compute and print total power in space and frequency domains:
space_power_2d = np.sum(np.abs(ones_2d)**2)
freq_power_2d = np.sum(np.abs(ones_2d_f)**2)/ones_2d.size
print 'space_power_2d = %f'%space_power_2d
print 'freq_power_2d = %f'%freq_power_2d
# 2-d, 9 elements:
ones_2d_big = np.array([[1.,1.,1.],[1.,1.,1.],[1.,1.,1.]])
ones_2d_big_f = np.fft.fft2(ones_2d_big)
# compute and print total power in space and frequency domains:
space_power_2d_big = np.sum(np.abs(ones_2d_big)**2)
freq_power_2d_big = np.sum(np.abs(ones_2d_big_f)**2)/ones_2d_big.size
print 'space_power_2d_big = %f'%space_power_2d_big
print 'freq_power_2d_big = %f'%freq_power_2d_big
print
# asker's example array a and fft af:
print 'askers examples:'
a = np.random.randn(3000,3000)
af = np.fft.fft2(a)
# compute the space and frequency total powers:
space_power_a = np.sum(np.abs(a)**2)
freq_power_a = np.sum(np.abs(af)**2)/af.size
# mean power is the total power divided by the array size:
freq_power_a_mean = freq_power_a/af.size
print 'space_power_a = %f'%space_power_a
print 'freq_power_a = %f'%freq_power_a
print 'freq_power_a_mean = %f'%freq_power_a_mean
print
# the central 1000x1000 section of the 3000x3000 original array:
b = a[1000:2000,1000:2000]
bf = np.fft.fft2(b)
# we expect the total power in the space and frequency domains
# to be about 1/9 of the total power in the space frequency domains
# for matrix a:
space_power_b = np.sum(np.abs(b)**2)
freq_power_b = np.sum(np.abs(bf)**2)/bf.size
# we expect the mean power to be the same as the mean power from
# matrix a:
freq_power_b_mean = freq_power_b/bf.size
print 'space_power_b = %f'%space_power_b
print 'freq_power_b = %f'%freq_power_b
print 'freq_power_b_mean = %f'%freq_power_b_mean
Related
I want to generate a rank 5 100x600 matrix in numpy with all the entries sampled from np.random.uniform(0, 20), so that all the entries will be uniformly distributed between [0, 20). What will be the best way to do so in python?
I see there is an SVD-inspired way to do so here (https://math.stackexchange.com/questions/3567510/how-to-generate-a-rank-r-matrix-with-entries-uniform), but I am not sure how to code it up. I am looking for a working example of this SVD-inspired way to get uniformly distributed entries.
I have actually managed to code up a rank 5 100x100 matrix by vertically stacking five 20x100 rank 1 matrices, then shuffling the vertical indices. However, the resulting 100x100 matrix does not have uniformly distributed entries [0, 20).
Here is my code (my best attempt):
import numpy as np
def randomMatrix(m, n, p, q):
# creates an m x n matrix with lower bound p and upper bound q, randomly.
count = np.random.uniform(p, q, size=(m, n))
return count
Qs = []
my_rank = 5
for i in range(my_rank):
L = randomMatrix(20, 1, 0, np.sqrt(20))
# L is tall
R = randomMatrix(1, 100, 0, np.sqrt(20))
# R is long
Q = np.outer(L, R)
Qs.append(Q)
Q = np.vstack(Qs)
#shuffle (preserves rank 5 [confirmed])
np.random.shuffle(Q)
Not a perfect solution, I must admit. But it's simple and comes pretty close.
I create 5 vectors that are gonna span the space of the matrix and create random linear combinations to fill the rest of the matrix.
My initial thought was that a trivial solution will be to copy those vectors 20 times.
To improve that, I created linear combinations of them with weights drawn from a uniform distribution, but then the distribution of the entries in the matrix becomes normal because the weighted mean basically causes the central limit theorm to take effect.
A middle point between the trivial approach and the second approach that doesn't work is to use sets of weights that favor one of the vectors over the others. And you can generate these sorts of weight vectors by passing any vector through the softmax function with an appropriately high temperature parameter.
The distribution is almost uniform, but the vectors are still very close to the base vectors. You can play with the temperature parameter to find a sweet spot that suits your purpose.
from scipy.stats import ortho_group
from scipy.special import softmax
import numpy as np
from matplotlib import pyplot as plt
N = 100
R = 5
low = 0
high = 20
sm_temperature = 100
p = np.random.uniform(low, high, (1, R, N))
weights = np.random.uniform(0, 1, (N-R, R, 1))
weights = softmax(weights*sm_temperature, axis = 1)
p_lc = (weights*p).sum(1)
rand_mat = np.concatenate([p[0], p_lc])
plt.hist(rand_mat.flatten())
I just couldn't take the fact the my previous solution (the "selection" method) did not really produce strictly uniformly distributed entries, but only close enough to fool a statistical test sometimes. The asymptotical case however, will almost surely not be distributed uniformly. But I did dream up another crazy idea that's just as bad, but in another manner - it's not really random.
In this solution, I do smth similar to OP's method of forming R matrices with rank 1 and then concatenating them but a little differently. I create each matrix by stacking a base vector on top of itself multiplied by 0.5 and then I stack those on the same base vector shifted by half the dynamic range of the uniform distribution. This process continues with multiplication by a third, two thirds and 1 and then shifting and so on until i have the number of required vectors in that part of the matrix.
I know it sounds incomprehensible. But, unfortunately, I couldn't find a way to explain it better. Hopefully, reading the code would shed some more light.
I hope this "staircase" method will be more reliable and useful.
import numpy as np
from matplotlib import pyplot as plt
'''
params:
N - base dimention
M - matrix length
R - matrix rank
high - max value of matrix
low - min value of the matrix
'''
N = 100
M = 600
R = 5
high = 20
low = 0
# base vectors of the matrix
base = low+np.random.rand(R-1, N)*(high-low)
def build_staircase(base, num_stairs, low, high):
'''
create a uniformly distributed matrix with rank 2 'num_stairs' different
vectors whose elements are all uniformly distributed like the values of
'base'.
'''
l = levels(num_stairs)
vectors = []
for l_i in l:
for i in range(l_i):
vector_dynamic = (base-low)/l_i
vector_bias = low+np.ones_like(base)*i*((high-low)/l_i)
vectors.append(vector_dynamic+vector_bias)
return np.array(vectors)
def levels(total):
'''
create a sequence of stritcly increasing numbers summing up to the total.
'''
l = []
sum_l = 0
i = 1
while sum_l < total:
l.append(i)
i +=1
sum_l = sum(l)
i = 0
while sum_l > total:
l[i] -= 1
if l[i] == 0:
l.pop(i)
else:
i += 1
if i == len(l):
i = 0
sum_l = sum(l)
return l
n_rm = R-1 # number of matrix subsections
m_rm = M//n_rm
len_rms = [ M//n_rm for i in range(n_rm)]
len_rms[-1] += M%n_rm
rm_list = []
for len_rm in len_rms:
# create a matrix with uniform entries with rank 2
# out of the vector 'base[i]' and a ones vector.
rm_list.append(build_staircase(
base = base[i],
num_stairs = len_rms[i],
low = low,
high = high,
))
rm = np.concatenate(rm_list)
plt.hist(rm.flatten(), bins = 100)
A few examples:
and now with N = 1000, M = 6000 to empirically demonstrate the nearly asymptotic behavior:
I want to implement ifft2 using DFT matrix. The following code works for fft2.
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.fft2(rA)
dftMtxM=DFT_matrix(sizeM)
dftMtxN=DFT_matrix(sizeN)
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
To get to ifft2 I assumd I need to change only the dft matrix to it's transpose, so expected the following to work, but I got false for the last two print any suggesetion please?
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.ifft2(rA)
dftMtxM=np.conj(DFT_matrix(sizeM))
dftMtxN=np.conj(DFT_matrix(sizeN))
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
I am going to be building on some things from my answer to your previous question. Please note that I will try to distinguish between the terms Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT). Remember that DFT is the transform while FFT is only an efficient algorithm for performing it. People, including myself, however very commonly refer to the DFT as FFT since it is practically the only algorithm used for computing the DFT
The problem here is again the normalization of the data. It's interesting that this is such a fundamental and confusing part of any DFT operations yet I couldn't find a good explanation on the internet. I will try to provide a summary at the end about DFT normalization however I think the best way to understand this is by working through some examples yourself.
Why the comparisons fail?
It's important to note, that even though both of the allclose tests seemingly fail, they are actually not a very good method of comparing two complex number arrays.
Difference between two angles
In particular, the problem is when it comes to comparing angles. If you just take the difference of two close angles that are on the border between -pi and pi, you can get a value that is around 2*pi. The allclose just takes differences between values and checks that they are bellow some threshold. Thus in our cases, it can report a false negative.
A better way to compare angles is something along the lines of this function:
def angle_difference(a, b):
diff = a - b
diff[diff < -np.pi] += 2*np.pi
diff[diff > np.pi] -= 2*np.pi
return diff
You can then take the maximum absolute value and check that it's bellow some threshold:
np.max(np.abs(angle_difference(np.angle(mA), np.angle(rAfft)))) < threshold
In the case of your example, the maximum difference was 3.072209153742733e-12.
So the angles are actually correct!
Magnitude scaling
We can get an idea of the issue is when we look at the magnitude ratio between the matrix iDFT and the library iFFT.
print(np.abs(mA)/np.abs(rAfft))
We find that all the values in mA are 800, which means that our absolute values are 800 times larger than those computed by the library. Suspiciously, 800 = 40 * 20, the dimensions of our data! I think you can see where I am going with this.
Confusing DFT normalization
We spot some indications why this is the case when we have a look at the FFT formulas as taken from the Numpy FFT documentation:
You will notice that while the forward transform doesn't normalize by anything. The reverse transform divides the output by 1/N. These are the 1D FFTs but the exact same thing applies in the 2D case, the inverse transform multiplies everything by 1/(N*M)
So in our example, if we update this line, we will get the magnitudes to agree:
mA = dftMtxM # rA/(sizeM * sizeN) # dftMtxN
A side note on comparing the outputs, an alternative way to compare complex numbers is to compare the real and imaginary components:
print(np.allclose(mA.real, rAfft.real))
print(np.allclose(mA.imag, rAfft.imag))
And we find that now indeed both methods agree.
Why all this normalization mess and which should I use?
The fundamental property of the DFT transform must satisfy is that iDFT(DFT(x)) = x. When you work through the math, you find that the product of the two coefficients before the sum has to be 1/N.
There is also something called the Parseval's theorem. In simple terms, it states that the energy in the signals is just the sum of square absolutes in both the time domain and frequency domain. For the FFT this boils down to this relationship:
Here is the function for computing the energy of a signal:
def energy(x):
return np.sum(np.abs(x)**2)
You are basically faced with a choice about the 1/N factor:
You can put the 1/N before the DFT sum. This makes senses as then the k=0 DC component will be equal to the average of the time domain values. However you will have to multiply the energy in frequency domain by N in order to match it with time domain frequency.
N = len(x)
X = np.fft.fft(x)/N # Compute the FFT scaled by `1/N`
# Energy related by `N`
np.allclose(energy(x), energy(X) * N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y*N) # Compute the iFFT, remember to cancel out the built in `1/N` of ifft
You put the 1/N before the iDFT. This is, slightly counterintuitively, what most implementations, including Numpy do. I could not find a definitive consensus on the reasoning behind this, but I think it has something to do with the implementation efficiency. (If anyone has a better explanation for this, please leave it in the comments) As shown in the equations earlier, the energy in the frequency domain has to be divided by N to match the time domain energy.
N = len(x)
X = np.fft.fft(x) # Compute the FFT without scaling
# Energy, related by 1/N
np.allclose(energy(x), energy(X) / N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y) # Compute the iFFT with the build in `1/N`
You can split the 1/N by placing 1/sqrt(N) before each of the transforms making them perfectly symmetric. In Numpy, you can provide the parameter norm="ortho" to the fft functions which will make them use the 1/sqrt(N) normalization instead: np.fft.fft(x, norm="ortho") The nice property here is that the energy now matches in both domains.
X = np.fft.fft(x, norm='orth') # Compute the FFT scaled by `1/sqrt(N)`
# Perform some processing...
# Energy are equal:
np.allclose(energy(x), energy(X)) == True
Y = X * H
y = np.fft.ifft(Y, norm='orth') # Compute the iFFT, with scaling by `1/sqrt(N)`
In the end it boils down to what you need. Most of the time the absolute magnitude of your DFT is actually not that important. You are mostly interested in the ratio of various components or you want to perform some operation in the frequency domain but then transform back to the time domain or you are interested in the phase (angles). In all of these case, the normalization does not really play an important role, as long as you stay consistent.
I am trying to use PCA to reduce the size of an input image from 4096 x 4096 to 4096 x 163 while keeping its important attributes. However, there is something off with my method as I get incorrect results. I believe it is while constructing my matrix U. My results vs correct results are listed below.
Start code:
# Reshape data to 4096 x 163
X_reshape = np.transpose(X_all, (1,2,3,0)).reshape(-1, X_all.shape[0])
X = X_reshape[:, :163]
mean_array = np.mean(X, axis = 1)
X_tilde = np.reshape(mean_array, (4096,1))
X_tilde = X - X_tilde
# Construct the covariance matrix for computing u'_i
covmat = np.cov(X_tilde.T)
# Compute u'_i, which is stored in the variable v
w, v = np.linalg.eig(covmat)
# Compute u_i from u'_i, and store it in the variable U
U = np.dot(X_tilde, v)
# Normalize u_i, i.e., each column of U
U = U / np.linalg.norm(U)
My results:
PC1 explains 0.08589754681312775% of the total variance
PC2 explains 0.07613195994874819% of the total variance
First 100 PCs explains 0.943423133356313% of the total variance
Shape of U: (4096, 163)
First 5 elements of first column of U: [-0.00908046 -0.00905446 -0.00887831 -0.00879843 -0.00850907]
First 5 elements of last column of U: [0.00047628 0.00048451 0.00045043 0.00035762 0.00025785]
Expected results:
PC1 explains 14.32% of the total variance
PC2 explains 7.08% of the total variance
First 100 PCs explains 94.84% of the total variance
Shape of U: (4096, 163)
First 5 elements of first column of U: [0.03381537 0.03353881 0.03292298 0.03238798 0.03146345]
First 5 elements of last column of U: [-0.00672667 -0.00496044 -0.00672151 -0.00759426
-0.00543667]
There must be something off with my calculations, I just can't figure out what. Let me know if you need additional information.
Proof I am using:
It looks to me like you have the steps out of order. You're dropping dimensions from the input before you calculate the eigenvectors and eigenvalues, so you're effectively randomly dropping a bunch of input at this stage with no justification.
# Reshape data to 4096 x 163
X_reshape = np.transpose(X_all, (1,2,3,0)).reshape(-1, X_all.shape[0])
X = X_reshape[:, :163]
I don't quite follow what the intent is behind the call to transpose above, but I don't think it matters. You can only drop dimensions from the input after calculating the eigenvectors and eigenvalues of the covariance matrix. And you don't drop dimensions from the data explicitly; you truncate the matrix of eigenvectors and then use that reduced eigenvector matrix for the projection step.
The covariance matrix in this case should be a 4096x4096 matrix. The eigenvalues and eigenvectors will be returned in order, with the largest eigenvalue and corresponding eigenvector at the beginning. You can then truncate the number of eigenvectors to 163 and create the dimension-reduced projection.
It's possible that I've misunderstood something about the assignment, but I am pretty sure this is the problem. I'm reluctant to say more since it's homework.
Description of the problem
FFT can be used to compute cross-correlation between two signals or images. To determine the delay or lag between two signals A and B, it suffices to locate the peak of:
IFFT(FFT(A)*conjugate(FFT(B)))
However, the amplitude of the peak is related to the amplitude of the frequency spectra of the individual signals. Thus to determine the Pearson correlation (rho), the amplitude of this peak must be scaled by the total energy in the two signals.
One way to do this is to normalize by the geometric mean of the individual autocorrelations. This gives a reasonable approximation of rho, especially when the delay between samples is small, but not the exact value.
I thought the reason for this error was that the Pearson correlation is only defined for the overlapping portions of the signal, whereas the normalization factor (the geometric mean of the two autocorrelation peaks) includes contributions from the non-overlapping portions. I considered two approaches for fixing this and producing an exact value for rho via FFT. In the first (called rho_exact_1 below), I trimmed the samples down to their overlapping portions and computed the normalization factor from those. In the second (called rho_exact_2 below), I computed the fraction of the measurements contained in the overlapping portion of the signals and multiplied the full-autocorrelation-normalization factor by that fraction.
Neither works! The figure below shows plots of the three approaches for calculating Pearson's rho using DFT-based cross-correlation. Only the region of the cross-correlation peak is shown. Each estimate is close to the correct value of 1.0, but not equal to it.
The code I used to perform the calculations is below. I used a simple sine wave as an example signal. I noticed that if I use a square-wave (w/ duty cycle not necessarily 50%) the approaches' errors change.
Can somebody explain what's going on?
A working example
import numpy as np
from matplotlib import pyplot as plt
# make a time vector w/ 256 points
# and a source signal
N_cycles = 10.0
N_points = 256.0
t = np.arange(0,N_cycles*np.pi,np.pi*N_cycles/N_points)
signal = np.sin(t)
use_rect = False
if use_rect:
threshold = -0.75
signal[np.where(signal>=threshold)]=1.0
signal[np.where(signal<threshold)]=-1.0
# normalize the signal (not technically
# necessary for this example, but required
# for measuring correlation of physically
# different signals)
signal = signal/signal.std()
# generate two samples of the signal
# with a temporal offset:
N = 128
offset = 5
sample_1 = signal[:N]
sample_2 = signal[offset:N+offset]
# determine the offset through cross-
# correlation
xc_num = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_2).conjugate()))
offset_estimate = np.argmax(xc_num)
if offset_estimate>N//2:
offset_estimate = offset_estimate - N
# for an approximate estimate of Pearson's
# correlation, we normalize by the RMS
# of individual autocorrelations:
autocorrelation_1 = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_1).conjugate()))
autocorrelation_2 = np.abs(np.fft.ifft(np.fft.fft(sample_2)*np.fft.fft(sample_2).conjugate()))
xc_denom_approx = np.sqrt(np.max(autocorrelation_1))*np.sqrt(np.max(autocorrelation_2))
rho_approx = xc_num/xc_denom_approx
print 'rho_approx',np.max(rho_approx)
# this is an approximation because we've
# included autocorrelation of the whole samples
# instead of just the overlapping portion;
# using cropped versions of the samples should
# yield the correct correlation:
sample_1_cropped = sample_1[offset:]
sample_2_cropped = sample_2[:-offset]
# these should be identical vectors:
assert np.all(sample_1_cropped==sample_2_cropped)
# compute autocorrelations of cropped samples
# and corresponding value for rho
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_1 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_1 = xc_num/xc_denom_exact_1
print 'rho_exact_1',np.max(rho_exact_1)
# alternatively we could try to use the
# whole sample autocorrelations and just
# scale by the number of pixels used to
# compute the numerator:
scaling_factor = float(len(sample_1_cropped))/float(len(sample_1))
rho_exact_2 = xc_num/(xc_denom_approx*scaling_factor)
print 'rho_exact_2',np.max(rho_exact_2)
# finally a sanity check: is rho actually 1.0
# for the two signals:
rho_corrcoef = np.corrcoef(sample_1_cropped,sample_2_cropped)[0,1]
print 'rho_corrcoef',rho_corrcoef
x = np.arange(len(rho_approx))
plt.plot(x,rho_approx,label='FFT rho_approx')
plt.plot(x,rho_exact_1,label='FFT rho_exact_1')
plt.plot(x,rho_exact_2,label='FFT rho_exact_2')
plt.plot(x,np.ones(len(x))*rho_corrcoef,'k--',label='Pearson rho')
plt.legend()
plt.ylim((.75,1.25))
plt.xlim((0,20))
plt.show()
The normalised cross correlation between two N-periodic discrete signals F and G is defined as:
Since the numerator is a dot product between two vectors (F and G_x) and the denominator is the product of the norm of these two vectors, the scalar r_x must indeed lie between -1 and +1 and it is the cosinus of the angle between the vectors (See there). If the vector F and G_x are aligned, then r_x=1. If r_x=1, then the vector F and G_x are aligned due to the triangular inequality. To ensure these properties, the vectors at the numerator must match those at the denominator.
All numerators can be computed at once by using the Discrete Fourier Transform. Indeed, that transform turns the convolution into pointwise products in the Fourier space. Here is why the different estimated normalized cross correlations are not 1 in the tests you performed.
For the first test "approx", sample_1 and sample_2 are both extracted from a periodic signal. Both are of the same length, but the length is not a multiple of the period as it is 2.5 periods (5pi) (figure below). As a result, since the dft performs the correlation as if they where periodic signals, it is found that sample_1 and sample_2 are not perfectly correlated and r_x<1.
For the second test rho_exact_1, the convolution is performed on signals of length N=128, but the norms at the denominator are computed on truncated vectors of size N-offset=128-5. As a result, the properties of r_x are lost. In addition, it must be noticed that the proposed convolution and norms are not normalized: the computed norms and convolution product are globally proportionnal to the number of points of the considered vectors. As a result, the norms of the truncated vectors are slightly lower compared to the previous case and r_x increases: values larger that 1 are likely encountered as the offset increases.
For the third test rho_exact_2, a scaling factor is introduced to try to correct the first test: the properties of r_x are also lost and values larger than one can be encountered as the scaling factor is larger than one.
Nevertheless, the function corrcoef() of numpy actually computes a r_x equal to 1 for the truncated signals. Indeed, these signals are perfectly identical! The same result can be obtained using DFTs:
xc_num_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_11 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_11 = xc_num_cropped/xc_denom_exact_11
print 'rho_exact_11',np.max(rho_exact_11)
To provide the user with a significant value for r_x, you can stick to the value provided by the first test, which can be lower than one for identical periodic signals if the length of the frame is not a multiple of the period. To correct this drawback, the estimated offset can also be retreived and used to build two cropped signals of the same length. The whole correlation procedure must be re-run to get a new value for r_x, which will not be plaged by the fact that the length of the cropped frame is not a multiple of the period.
Lastly, if the DFT is a very efficient way to compute the convolution at the numerator for all values of x at once, the denominator can be efficiently computed as 2-norms of vector, using numpy.linalg.norm. Since the argmax(r_x) for the cropped signals will likely be zero if the first correlation was successful, it could be sufficient to compute r_0 using a dot product `sample_1_cropped.dot(sample_2_cropped).
I'm doing matrix inversion in python, and I found it very weird that the result differs by the data scale.
In the code below, it is expected that A_inv/B_inv = B/A. However, it shows that the difference between A_inv/B_inv and B/A becomes larger and larger depend on the data scale... Is this because Python cannot compute matrix inverse precisely for matrix with large values?
Also, I checked the condition number for B, which is a constant ~3.016 no matter the scale is.
Thanks!!!
import numpy as np
from matplotlib import pyplot as plt
D = 30
N = 300
np.random.seed(10)
original_data = np.random.sample([D, N])
A = np.cov(original_data)
A_inv = np.linalg.inv(A)
B_cond = []
diff = []
for k in xrange(1,10):
B = A * np.power(10,k)
B_cond.append(np.linalg.cond(B))
B_inv = np.linalg.inv(B)
### Two measurements of difference are used
diff.append(np.log(np.linalg.norm(A_inv/B_inv - B/A)))
#diff.append(np.max(np.abs(A_inv/B_inv - B/A)))
# print B_cond
plt.figure()
plt.plot(xrange(1,10), diff)
plt.xlabel('data(B) / data(A)')
plt.ylabel('log(||A_inv/B_inv - B/A||)')
plt.savefig('Inversion for large matrix')
I may be wrong, but I think it comes from number representation in machine.
When you are dealing with great numbers, your inverse matrix is going to have very little number in magnitude (close to zero). And clsoe to zero, the representation of the floating number is not precise enough, I guess...
https://en.wikipedia.org/wiki/Floating-point_arithmetic
There is no reason that you should expect np.linalg.norm(A_inv/B_inv - B/A) to be equal to anything special. Instead, you can check the quality of the inverse calculation by multiplying the original matrix by its inverse and checking the determinant, np.linalg.det(A.dot(A_inv)), which should be equal to 1.