I need to program the Discrete Fourier Transform using python. (I cannot use the numpy function for fft).
You can find the numpy fft function in my code, but it is just for verification.
Not sure why, but it seems that my code is getting in a infinite loop (I have to Keyboard Interrupt).
Any ideas?
import matplotlib.pyplot as plt
import numpy as np
import cmath
Fs = 40000; # sampling
Ts = 1.0/Fs; # sampling period
t = np.arange(0,1,Ts) # time vector
f = 100; # frequencia do sinal
x1_n = np.sin(2*np.pi*f*t + 0)
f = 1000;
x2_n = np.sin(2*np.pi*f*t + 180)
x_n = x1_n + x2_n
n = len(x_n) # signal length
k = np.arange(n) #vetor em k
T = n/Fs
frq = k/T # both sides of the frequency vetor
frq = frq[range(int(n/2))] # one side of the frequency vector
X = np.fft.fft(x_n)/n # fft using numpy and normalization
print("A")
print(X) # printing the results for numpy fft
m = len(x_n)
output = []
for k in range(m): # For each output element
s = complex(0)
for t in range(m): # For each input element
angle = 2j * cmath.pi * t * k / m
s += x_n[t] * cmath.exp(-angle)
output.append(s)
print("B") #printing the results for the fft implementation using for loops
print(output)
It's not an infinite loop, but since m = 40000, your inner loop is going to run 1.6 billion times. That's going to take a LOT of time, which is why FFTs are not implemented in Python. On my machine, it does about 5 outer loops per second, so it's going to take 3 hours to run.
You've written a perfectly good implementation of a Fourier transform. You have not written a fast Fourier Transform, which specifically involves a series of techniques to bring the runtime town from O(n^2) to (n log n).
This is what made the FFT so revolutionary when it was discovered. Hard problems that could only be used using a Fourier Transform suddenly became a lot faster.
Related
I want to implement ifft2 using DFT matrix. The following code works for fft2.
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.fft2(rA)
dftMtxM=DFT_matrix(sizeM)
dftMtxN=DFT_matrix(sizeN)
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
To get to ifft2 I assumd I need to change only the dft matrix to it's transpose, so expected the following to work, but I got false for the last two print any suggesetion please?
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * np.pi * 1j / N )
W = np.power( omega, i * j ) # Normalization by sqrt(N) Not included
return W
sizeM=40
sizeN=20
np.random.seed(0)
rA=np.random.rand(sizeM,sizeN)
rAfft=np.fft.ifft2(rA)
dftMtxM=np.conj(DFT_matrix(sizeM))
dftMtxN=np.conj(DFT_matrix(sizeN))
# Matrix multiply the 3 matrices together
mA = dftMtxM # rA # dftMtxN
print(np.allclose(np.abs(mA), np.abs(rAfft)))
print(np.allclose(np.angle(mA), np.angle(rAfft)))
I am going to be building on some things from my answer to your previous question. Please note that I will try to distinguish between the terms Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT). Remember that DFT is the transform while FFT is only an efficient algorithm for performing it. People, including myself, however very commonly refer to the DFT as FFT since it is practically the only algorithm used for computing the DFT
The problem here is again the normalization of the data. It's interesting that this is such a fundamental and confusing part of any DFT operations yet I couldn't find a good explanation on the internet. I will try to provide a summary at the end about DFT normalization however I think the best way to understand this is by working through some examples yourself.
Why the comparisons fail?
It's important to note, that even though both of the allclose tests seemingly fail, they are actually not a very good method of comparing two complex number arrays.
Difference between two angles
In particular, the problem is when it comes to comparing angles. If you just take the difference of two close angles that are on the border between -pi and pi, you can get a value that is around 2*pi. The allclose just takes differences between values and checks that they are bellow some threshold. Thus in our cases, it can report a false negative.
A better way to compare angles is something along the lines of this function:
def angle_difference(a, b):
diff = a - b
diff[diff < -np.pi] += 2*np.pi
diff[diff > np.pi] -= 2*np.pi
return diff
You can then take the maximum absolute value and check that it's bellow some threshold:
np.max(np.abs(angle_difference(np.angle(mA), np.angle(rAfft)))) < threshold
In the case of your example, the maximum difference was 3.072209153742733e-12.
So the angles are actually correct!
Magnitude scaling
We can get an idea of the issue is when we look at the magnitude ratio between the matrix iDFT and the library iFFT.
print(np.abs(mA)/np.abs(rAfft))
We find that all the values in mA are 800, which means that our absolute values are 800 times larger than those computed by the library. Suspiciously, 800 = 40 * 20, the dimensions of our data! I think you can see where I am going with this.
Confusing DFT normalization
We spot some indications why this is the case when we have a look at the FFT formulas as taken from the Numpy FFT documentation:
You will notice that while the forward transform doesn't normalize by anything. The reverse transform divides the output by 1/N. These are the 1D FFTs but the exact same thing applies in the 2D case, the inverse transform multiplies everything by 1/(N*M)
So in our example, if we update this line, we will get the magnitudes to agree:
mA = dftMtxM # rA/(sizeM * sizeN) # dftMtxN
A side note on comparing the outputs, an alternative way to compare complex numbers is to compare the real and imaginary components:
print(np.allclose(mA.real, rAfft.real))
print(np.allclose(mA.imag, rAfft.imag))
And we find that now indeed both methods agree.
Why all this normalization mess and which should I use?
The fundamental property of the DFT transform must satisfy is that iDFT(DFT(x)) = x. When you work through the math, you find that the product of the two coefficients before the sum has to be 1/N.
There is also something called the Parseval's theorem. In simple terms, it states that the energy in the signals is just the sum of square absolutes in both the time domain and frequency domain. For the FFT this boils down to this relationship:
Here is the function for computing the energy of a signal:
def energy(x):
return np.sum(np.abs(x)**2)
You are basically faced with a choice about the 1/N factor:
You can put the 1/N before the DFT sum. This makes senses as then the k=0 DC component will be equal to the average of the time domain values. However you will have to multiply the energy in frequency domain by N in order to match it with time domain frequency.
N = len(x)
X = np.fft.fft(x)/N # Compute the FFT scaled by `1/N`
# Energy related by `N`
np.allclose(energy(x), energy(X) * N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y*N) # Compute the iFFT, remember to cancel out the built in `1/N` of ifft
You put the 1/N before the iDFT. This is, slightly counterintuitively, what most implementations, including Numpy do. I could not find a definitive consensus on the reasoning behind this, but I think it has something to do with the implementation efficiency. (If anyone has a better explanation for this, please leave it in the comments) As shown in the equations earlier, the energy in the frequency domain has to be divided by N to match the time domain energy.
N = len(x)
X = np.fft.fft(x) # Compute the FFT without scaling
# Energy, related by 1/N
np.allclose(energy(x), energy(X) / N) == True
# Perform some processing...
Y = X * H
y = np.fft.ifft(Y) # Compute the iFFT with the build in `1/N`
You can split the 1/N by placing 1/sqrt(N) before each of the transforms making them perfectly symmetric. In Numpy, you can provide the parameter norm="ortho" to the fft functions which will make them use the 1/sqrt(N) normalization instead: np.fft.fft(x, norm="ortho") The nice property here is that the energy now matches in both domains.
X = np.fft.fft(x, norm='orth') # Compute the FFT scaled by `1/sqrt(N)`
# Perform some processing...
# Energy are equal:
np.allclose(energy(x), energy(X)) == True
Y = X * H
y = np.fft.ifft(Y, norm='orth') # Compute the iFFT, with scaling by `1/sqrt(N)`
In the end it boils down to what you need. Most of the time the absolute magnitude of your DFT is actually not that important. You are mostly interested in the ratio of various components or you want to perform some operation in the frequency domain but then transform back to the time domain or you are interested in the phase (angles). In all of these case, the normalization does not really play an important role, as long as you stay consistent.
The script below filters frequencies by cutting of all frequencies larger than 6.
However instead of using the seemingly correct function rfftfreq, fftfreq is being used.
To my understandnig rfftfreq should be used together with rfft. Why does this code work although it uses fftfreq with rfft?
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
time = np.linspace(0,10,2000)
signal = np.cos(5*np.pi*time) + np.cos(7*np.pi*time)
W = fftfreq(signal.size, d=time[1]-time[0])
f_signal = rfft(signal)
# If our original signal time was in seconds, this is now in Hz
cut_f_signal = f_signal.copy()
cut_f_signal[(W<6)] = 0
cut_signal = irfft(cut_f_signal)
Background: rfft gives an array sorting the fourier modes with real and imaginery in separate entries. Such as [R(0), R(1), I(1), ... R(N/2),I(N/2)] for R(n) and I(n) being the real and imaginery part of the fourier mode respectively. (assuming an even entry array)
Therefore, rfftfreq yields an array of frequencies corresponding to this array such as (assuming an even entry array and sampling spacing of 1):
[0, 1/n, 1/n, 2/n, 2/n ... n/(2n), n/(2n)]
However this code works with fftfreq where the output of the function is
[0, 1/n, ... n/(2n), -n/(2n), ..., -1/n]
Clearly, fftfreq should lead to wrong results when used together with rfft because the frequencies and FFT bins do not match.
You mis-specified the frequencies in the original signal.
A sine wave is parameterized according to this equation (from Wikipedia):
The factor 2 is missing in the definition of signal = np.cos(5*np.pi*time) + np.cos(7*np.pi*time). Therefore, the actual frequencies are
5*pi*t = 2*pi*t * f
f = (5*pi*t) / (2*pi*t) = 5 / 2
7*pi*t = 2*pi*t * f
f = (7*pi*t) / (2*pi*t) = 7 / 2
In words, the two frequencies are half of what you think they are. Ironically, that is why it seems to work with fftfreq instead of rfftfreq. The former covers twice the frequency range (positive and negative frequencies), and therefore compensates for the missing factor 2.
This is the correct code:
signal = np.cos(5 * 2*np.pi * time) + np.cos(7 * 2*np.pi * time)
W = rfftfreq(signal.size, d=time[1]-time[0])
I have an expression in the time domain
f = -1j*H(t) * exp(-(1j*a+b)*t)
which can be Fourier transformed analytically using known properties (H is the Heaviside step function). The result of this FT operation is
F = (w-a-1j*b)/((w-a)**2+b**2)
where w is frequency.
Now I'm using the tips in this article to do numerical Fourier transform on f in Python, and confirm that I do get the same analytical result F:
import numpy as np
import matplotlib.pyplot as plt
t = np.linspace(-10,10,1e4) # time
w = np.linspace(-10,10,1e4) # frequency
b = 0.1
a = 1
H = lambda x: 1*(x>0) # heaviside function
# function in time
f = -1j*H(t)*np.exp(-(1j*a+b)*t)
# function in frequency (analytical work)
F = (w-a-1j*b)/((w-a)**2+b**2)
hann = np.hanning(len(t)) # hanning window
# function in frequency (numerical work)
F2 = 2/len(t)*np.fft.fft(hann*f)
plt.figure()
plt.plot(w,F.real,'b',label='analytical')
plt.plot(w,F2.real,'b--',label='fft')
plt.xlabel(r'$\omega$')
plt.ylabel(r'Re($F(\omega)$)')
plt.legend(loc='best')
plt.figure()
plt.plot(w,F.imag,'g',label='analytical')
plt.plot(w,F2.imag,'g--',label='fft')
plt.xlabel(r'$\omega$')
plt.ylabel(r'Im($F(\omega)$)')
plt.legend(loc='best')
plt.show()
However Python's FFT function seems to give me something completely wrong. This is evident when F and F2 are plotted.
Edit: Here are the plots...
It's not obvious in these figures, but if you zoom in around the w=-10 and 10 areas, there are small oscillations, possibly due to the fft algorithm.
The FFT algorithm computes the DFT, which has the origin (both spatial and in frequency domain) on the first sample. You need to shift your signal (after applying the Hanning window) so that t=0 is the leftmost sample, and after computing the FFT you have to do the inverse shift.
MATLAB has ifftshift and fftshift, which implement those two shifts. NumPy must have similar functions.
Another issue with your code is that you compute the DFT, and plot it at the locations given by the w that you computed, but is unrelated to the actual frequencies at which the DFT is computed.
Here is your code, translated to MATLAB, and fixed to properly compute F2 and w *. I hope this is useful. One thing to note is that your F does not match F2, I am confident that this is not due to an error in F2, but an error in your computation of F. The shapes are similar, but F is scaled differently and mirrored.
N = 1e3;
t = linspace(-100,100,N); % time
Fs = 1/(t(2)-t(1));
w = Fs * (-floor(N/2):floor((N-1)/2)) / N; % NOTE proper frequencies
b = 0.1;
a = 1;
H = #(x)1*(x>0); % Heaviside function
% function in time
f = -1j*H(t).*exp(-(1j*a+b)*t);
% function in frequency (analytical work)
F = (w-a-1j*b)./((w-a).^2+b.^2);
% hanning window
hann = 0.5*(1-cos(2*pi*linspace(0,1,N)));
% function in frequency (numerical work)
F2 = fftshift(fft(ifftshift(hann.*f))); % NOTE shifting of origin
figure
subplot(2,1,1), hold on
plot(w,real(F),'b-')
plot(w,real(F2),'r-')
xlabel('\omega')
ylabel('Re(F(\omega))')
legend({'analytical','fft'},'Location','best')
subplot(2,1,2), hold on
plot(w,imag(F),'b-')
plot(w,imag(F2),'r-')
xlabel('\omega')
ylabel('Im(F(\omega))')
legend({'analytical','fft'},'Location','best')
Footnote:
* I also changed the colors, MATLAB's green is too light.
I generate some time-series out of a theoretical power spectral density.
Basically, my function in time-space is given by X(t) = SUM_n sqrt(a_n) + cos(w_n t + phi_n), where a_n is the value of the PSD at a given w_n and phi is some random phase. To get a realistic timeseries, i have to sum up 2^25 modes, and my t of course is sized 2^25 as well.
If i do that with python, this will take some weeks...
Is there any way to speed this up? Like some vector calculation?
t_full = np.linspace(0,1e-2,2**12, endpoint = False)
signal = np.zeros_like(t_full)
for i in range(w.shape[0]):
signal += dataCOS[i] * np.cos(2*np.pi* t_full * w[i] + random.uniform(0,2*np.pi))
where dataCOS is sqrt a_n, w = w and random.uniform represents the random phase shift phi
You can use the outer functions to calculate the angles and then sum along one axis to obtain your signal in a vectorized way:
import numpy as np
t_full = np.linspace(0, 1e-2, 2**12, endpoint=False)
thetas = np.multiply.outer((2*np.pi*t_full), w)
thetas += 2*pi*np.random.random(thetas.shape)
signal = np.cos(thetas)
signal *= dataCOS
signal = signal.sum(-1)
This is faster because when you use a Python for loop the interpreter will loop at a slower speed compared to a C loop. In this case, using numpy outer operations allow you to compute the multiplications and sums at the C loop speed.
I analyzed the sunspots.dat data (below) using fft which is a classic example in this area. I obtained results from fft in real and imaginery parts. Then I tried to use these coefficients (first 20) to recreate the data following the formula for Fourier transform. Thinking real parts correspond to a_n and imaginery to b_n, I have
import numpy as np
from scipy import *
from matplotlib import pyplot as gplt
from scipy import fftpack
def f(Y,x):
total = 0
for i in range(20):
total += Y.real[i]*np.cos(i*x) + Y.imag[i]*np.sin(i*x)
return total
tempdata = np.loadtxt("sunspots.dat")
year=tempdata[:,0]
wolfer=tempdata[:,1]
Y=fft(wolfer)
n=len(Y)
print n
xs = linspace(0, 2*pi,1000)
gplt.plot(xs, [f(Y, x) for x in xs], '.')
gplt.show()
For some reason however, my plot does not mirror the one generated by ifft (I use the same number of coefficients on both sides). What could be wrong ?
Data:
http://linuxgazette.net/115/misc/andreasen/sunspots.dat
When you called fft(wolfer), you told the transform to assume a fundamental period equal to the length of the data. To reconstruct the data, you have to use basis functions of the same fundamental period = 2*pi/N. By the same token, your time index xs has to range over the time samples of the original signal.
Another mistake was in forgetting to do to the full complex multiplication. It's easier to think of this as Y[omega]*exp(1j*n*omega/N).
Here's the fixed code. Note I renamed i to ctr to avoid confusion with sqrt(-1), and n to N to follow the usual signal processing convention of using the lower case for a sample, and the upper case for total sample length. I also imported __future__ division to avoid confusion about integer division.
forgot to add earlier: Note that SciPy's fft doesn't divide by N after accumulating. I didn't divide this out before using Y[n]; you should if you want to get back the same numbers, rather than just seeing the same shape.
And finally, note that I am summing over the full range of frequency coefficients. When I plotted np.abs(Y), it looked like there were significant values in the upper frequencies, at least until sample 70 or so. I figured it would be easier to understand the result by summing over the full range, seeing the correct result, then paring back coefficients and seeing what happens.
from __future__ import division
import numpy as np
from scipy import *
from matplotlib import pyplot as gplt
from scipy import fftpack
def f(Y,x, N):
total = 0
for ctr in range(len(Y)):
total += Y[ctr] * (np.cos(x*ctr*2*np.pi/N) + 1j*np.sin(x*ctr*2*np.pi/N))
return real(total)
tempdata = np.loadtxt("sunspots.dat")
year=tempdata[:,0]
wolfer=tempdata[:,1]
Y=fft(wolfer)
N=len(Y)
print(N)
xs = range(N)
gplt.plot(xs, [f(Y, x, N) for x in xs])
gplt.show()
The answer from mtrw was extremely helpful and helped me answer the same question as the OP, but my head almost exploded trying to understand the nested loop.
Here's the last part but with numpy broadcasting (not sure if this even existed when the question was asked) rather than calling the f function:
xs = np.arange(N)
omega = 2*np.pi/N
phase = omega * xs[:,None] * xs[None,:]
reconstruct = Y[None,:] * (np.cos(phase) + 1j*np.sin(phase))
reconstruct = (reconstruct).sum(axis=1).real / N
# same output
plt.plot(reconstruct)
plt.plot(wolfer)