how to optimize the Fourier transformation with python - python

i'm working in a machine learning project in which i want to extract a device power consumption signature here is an example
then i want to extract features using harmonic analysis from that signature , so first, I did the Fourier transformation and i got those for the signature that i already showed to you
my problem is that i think that this transformation is not so much significant, how can I optimize or ameliorate the transformation,
here is the code that i used to transform the signal
plt.subplot(2,1,1)
plt.plot(dt,pt,'k-')
plt.xlabel('time')
plt.ylabel('amplitude')
Fs = 3000
plt.subplot(2,1,2)
n = len(pt) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
freq = frq[range(int(n/2))] # one side frequency range
Y = np.fft.fft(pt)/n # fft computing and normalization
Y = Y[range(int(n/2))]
plt.plot(freq, abs(Y), 'r-')
and here is another example
thank you in advance

Related

How to correctly scale the FFT python function in order to validate it with the rect() and sinc() function pair?

I am currently working on an python implementation that is using the FFT to convert signals in the time domain to signals in the frequency domain and the other way around. In order to validate my FFT function, I’ve tried to use the rectangular and sinc function pair given by
def rect(t, T): #rectangular function
ans = np.zeros(len(t))
for i in range(len(t)):
if abs(t[i]) < T/2:
ans[i] = 1
return ans
and
def Tsinc_hz(f, T): # sinc function
return np.sin(np.pi * f * T) / (np.pi * f)
Unfortunately, the results are unsatisfying and I don’t quit get why. I hope someone is able to help me.
Here is my FFT code:
def to_frequency_domain(t, ys):
last = t[-1] # length of time sample, so is eual to limit
N = len(t) # number of samples in the time domain
T = last/N # 1/Sample rate
xf = fftfreq(N, T)
xf = fftshift(xf)
yf = fft(ys, N)
yf = fftshift(yf)
return xf, yf
def to_time_domain(x, y):
N = len(x) # Number of samples in the time domain
last = (len(x)-1) / (2*x[-1]) # length in the time domain
T = last/(N-1) # 1/sample rate
ys =ifft(ifftshift(y), N)
return ys
The following code is supposed to validate the FFT implementation by comparing the analytical functions given above with the FFT output in time and frequency domain.
t = np.linspace(0, 10, 250*10)
ans = rect(t, 1)
x_val,y_val = to_frequency_domain(t, ans)
f = np.linspace(-125, 125, 2500)
proof = Tsinc_hz(f, 1)
# sinc function comparison plot
plt.plot(x_val, y_val.real)
plt.plot(f,proof.real, c='r')
plt.show()
proof_time = to_time_domain(f, proof)
# rectangular function comparison plot
plt.plot(t, proof_time, linewidth = 3, c = 'r')
plt.plot(t, ans)
plt.show()
Running the code gives the following plot:
sinc function comparison plot and rectangular function comparison plot
It is obvious that the there is a scaling problem. I know that the frequency domain is dependent on the sample rate of my time domain. In this case I have used a sample rate of 250 and 2500 data points, meaning that I have 0,1 hz per frequency bin and thus my x-axis in the frequency domain reach form - 125 to 125. I was wondering if I can also formulate a relation between the frequency power and the sample rate. I was thinking that if I keep my data points constant and reduce my sampling rate the function is obviously jammed along the x-axis. Can this somehow result in stretching along the y-axis?
Furthermore, the transfer of the sinc-function in the second plot (proof_time) is mirrored.
Following, I have tried to fix my first problem by dividing the FFT output by the sampling rate
def to_frequency_domain(t, ys):
last = t[-1] # length of time sample, so is eual to limit
N = len(t) # number of samples in the time domain
T = last/N # 1/Sample rate
xf = fftfreq(N, T)
xf = fftshift(xf)
yf = fft(ys, N)*T
yf = fftshift(yf)
return xf, yf
def to_time_domain(x, y):
N = len(x) # Number of samples in the time domain
last = (len(x)-1) / (2*x[-1]) # length in the time domain
T = last/(N-1) # 1/sample rate
ys =ifft(ifftshift(y)/T, N)
return ys
which gives:
sinc function comparison plot_scaled, rectangular function comparison plot_scaled
This time both functions are obviously closer together but the result is still not perfect.
Additional, I am a little confused by the output of the FFT when I define my time domain to be symmetrical about zero t =np.linespace(-10,10, 250*20))
sinc function comparison plot
Why is the blue curve doubled like that? Supposedly because we have a negative and positive frequency component for every value in time, right? But how do I fix that?
I have tried figuring it out for a while now but just can’t seem to solve the problem, so I am very grateful for every tip!
Thanks in advance!
My guess is that you have to scale by d_omega= d_f/(2*pi) as the fft assumes a time step of 1 by default while you are using a different time step. If i add
df = xf[1] - xf[0]
domega = df / 2 / np.pi
yf = fft(ys, N) * domega
it scales.

How do I get the frequencies from a signal?

I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()

Fourier series data fit with numpy: fft vs coding

Suppose I have some data, y, to which I would like to fit a Fourier series. On this post, a solution was posted by Mermoz using the complex format of the series and "calculating the coefficient with a riemann sum". On this other post, the series is obtained through the FFT and an example is written down.
I tried implementing both approaches (image and code below - notice everytime the code is run, different data will be generated due to the use of numpy.random.normal) but I wonder why I am getting different results - the Riemann approach seems "wrongly shifted" while the FFT approach seems "squeezed". I am also not sure about my definition of the period "tau" for the series. I appreciate the attention.
I am using Spyder with Python 3.7.1 on Windows 7
Example
import matplotlib.pyplot as plt
import numpy as np
# Assume x (independent variable) and y are the data.
# Arbitrary numerical values for question purposes:
start = 0
stop = 4
mean = 1
sigma = 2
N = 200
terms = 30 # number of terms for the Fourier series
x = np.linspace(start,stop,N,endpoint=True)
y = np.random.normal(mean, sigma, len(x))
# Fourier series
tau = (max(x)-min(x)) # assume that signal length = 1 period (tau)
# From ref 1
def cn(n):
c = y*np.exp(-1j*2*n*np.pi*x/tau)
return c.sum()/c.size
def f(x, Nh):
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(1,Nh+1)])
return f.sum()
y_Fourier_1 = np.array([f(t,terms).real for t in x])
# From ref 2
Y = np.fft.fft(y)
np.put(Y, range(terms+1, len(y)), 0.0) # zero-ing coefficients above "terms"
y_Fourier_2 = np.fft.ifft(Y)
# Visualization
f, ax = plt.subplots()
ax.plot(x,y, color='lightblue', label = 'artificial data')
ax.plot(x, y_Fourier_1, label = ("'Riemann' series fit (%d terms)" % terms))
ax.plot(x,y_Fourier_2, label = ("'FFT' series fit (%d terms)" % terms))
ax.grid(True, color='dimgray', linestyle='--', linewidth=0.5)
ax.set_axisbelow(True)
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()
Performing two small modifications is sufficient to make the sums nearly similar to the output of np.fft. The FFTW library indeed computes these sums.
1) The average of the signal, c[0] is to be accounted for:
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(0,Nh+1)]) # here : 0, not 1
2) The output must be scaled.
y_Fourier_1=y_Fourier_1*0.5
The output seems "squeezed" because the high frequency components have been filtered. Indeed, the high frequency oscillations of the input have been cleared and the output looks like a moving average.
Here, tau is actually defined as stop-start: it corresponds to the length of the frame. It is the expected period of the signal.
If the frame does not correspond to a period of the signal, you can guess its period by convoluting the signal with itself and finding the first maximum. See
Find period of a signal out of the FFT Nevertheless, it is unlikely to work properly with a dataset generated by numpy.random.normal : this is an Additive White Gaussian Noise. As it features a constant power spectral density, it can hardly be discribed as periodic!

Applying Fourier Transform on Time Series data and avoiding aliasing

I am willing to apply Fourier transform on a time series data to convert data into frequency domain. I am not sure if the method I've used to apply Fourier Transform is correct or not? Following is the link to data that I've used.
After reading the data file I've plotted original data using
t = np.linspace(0,55*24*60*60, 55)
s = df.values
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.plot(t, s)
plt.show()
Since the data is on a daily frequency I've converted it into seconds using 24*60*60 and for a period of 55 days using 55*24*60*60
The graph looks as follows:
Next I've implemeted Fourier Transform using following piece of code and obtained the image as follows:
#Applying Fourier Transform
fft = fftpack.fft(s)
#Time taken by one complete cycle of wave (seconds)
T = t[1] - t[0]
#Calculating sampling frequency
F = 1/T
N = s.size
#Avoid aliasing by multiplying sampling frequency by 1/2
f = np.linspace(0, 0.5*F, N)
#Convert frequency to mHz
f = f * 1000
#Plotting frequency domain against amplitude
sns.set_style("darkgrid")
plt.ylabel("Amplitude")
plt.xlabel("Frequency [mHz]")
plt.plot(f[:N // 2], np.abs(fft)[:N // 2])
plt.show()
I've following questions:
I am not sure if my above methodology is correct to implement Fourier Transform.
I am not sure if the method I am using to avoid aliasing is correct.
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Finally, how would I invert transform using only frequencies that are significant.
While I'd refrain from answering your first two questions (it looks okay to me but I'd love an expert's input), I can weigh in on the latter two:
If, what I've done is correct than how to interpret the three peaks in Frequency domain plot.
Well, that means you've got three main components to your signal at frequencies roughly 0.00025 mHz (not the best choice of units here, possibly!), 0.00125 mHz and 0.00275 mHz.
Finally, how would I invert transform using only frequencies that are significant.
You could just zero out every frequency below a cutoff you decide (say, absolute value of 3 - that should cover your peaks here). Then you can do:
below_cutoff = np.abs(fft) < 3
fft[below_cutoff] = 0
cleaner_signal = fftpack.ifft(fft)
And that should do it, really!

How to limit the number of coefficients to use in reconstructing a 1D signal using wavelet decomposition in python

I am new in the area of using wavelet decomposition. And I am trying to decompose and reconstruct (with very few coefficients) a 1D data in python (using pywt). From this documentation I wrote the code below which reconstructs the data with 512 coefficients (i.e. size of cA or cD) but I think their should be a way choosing (limiting) the number of coefficients that I consider to produce reasonable data reconstruction.
%matplotlib inline
import pylab as plt
import pywt
# Data
data = ll[5].x0
n = len(data)
w = 'db1'
(cA, cD) = pywt.dwt(data, w, 'sp1') # Decomposition
# Perfect Reconstruction of data
perfect_reconstruction = pywt.upcoef('a',cA[:],w,take=n) + pywt.upcoef('d',cD[:],w,take=n)
reconstructed = pywt.upcoef('a',cA[:],w,take=n) # Approximate Reconstruction of data
x = np.arange(1.008,1.008+1024*0.001,0.001)
plt.figure(figsize=(20,8))
plt.subplot2grid((2,1),(0,0))
plt.title('Perfect Reconstruction of data - %s with rms error of 1.39 x e$^{-15}$'%w, fontsize=20)
plt.plot(x,data,'-',label='Data')
plt.plot(x,perfect_reconstruction,'-r',label='Reconstructed data')
plt.legend(loc='best',fontsize='x-large')
plt.xticks(fontsize = 14)
plt.yticks(fontsize = 14)
plt.subplot2grid((2,1),(1,0))
plt.title('Approximate Reconstruction of data - %s with rms error of 1.30 x e$^{-3}$'%w, fontsize=20)
plt.plot(x,data,'-',label='Data')
plt.plot(x,reconstructed,'-r',label='Reconstructed data')
plt.legend(loc='best',fontsize='x-large')
plt.xticks(fontsize = 14)
plt.yticks(fontsize = 14)
plt.show()
Please, if anyone can help me with any suggestions on what I can do to achieve proper decomposition and reconstruction with fewer coefficients I will highly appreciate it and any information on how to write the maths behind this because my goal is the to find a mathematical expression that best describes the data with fewer coefficients after deomposition.

Categories

Resources