Suppose I have some data, y, to which I would like to fit a Fourier series. On this post, a solution was posted by Mermoz using the complex format of the series and "calculating the coefficient with a riemann sum". On this other post, the series is obtained through the FFT and an example is written down.
I tried implementing both approaches (image and code below - notice everytime the code is run, different data will be generated due to the use of numpy.random.normal) but I wonder why I am getting different results - the Riemann approach seems "wrongly shifted" while the FFT approach seems "squeezed". I am also not sure about my definition of the period "tau" for the series. I appreciate the attention.
I am using Spyder with Python 3.7.1 on Windows 7
Example
import matplotlib.pyplot as plt
import numpy as np
# Assume x (independent variable) and y are the data.
# Arbitrary numerical values for question purposes:
start = 0
stop = 4
mean = 1
sigma = 2
N = 200
terms = 30 # number of terms for the Fourier series
x = np.linspace(start,stop,N,endpoint=True)
y = np.random.normal(mean, sigma, len(x))
# Fourier series
tau = (max(x)-min(x)) # assume that signal length = 1 period (tau)
# From ref 1
def cn(n):
c = y*np.exp(-1j*2*n*np.pi*x/tau)
return c.sum()/c.size
def f(x, Nh):
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(1,Nh+1)])
return f.sum()
y_Fourier_1 = np.array([f(t,terms).real for t in x])
# From ref 2
Y = np.fft.fft(y)
np.put(Y, range(terms+1, len(y)), 0.0) # zero-ing coefficients above "terms"
y_Fourier_2 = np.fft.ifft(Y)
# Visualization
f, ax = plt.subplots()
ax.plot(x,y, color='lightblue', label = 'artificial data')
ax.plot(x, y_Fourier_1, label = ("'Riemann' series fit (%d terms)" % terms))
ax.plot(x,y_Fourier_2, label = ("'FFT' series fit (%d terms)" % terms))
ax.grid(True, color='dimgray', linestyle='--', linewidth=0.5)
ax.set_axisbelow(True)
ax.set_ylabel('y')
ax.set_xlabel('x')
ax.legend()
Performing two small modifications is sufficient to make the sums nearly similar to the output of np.fft. The FFTW library indeed computes these sums.
1) The average of the signal, c[0] is to be accounted for:
f = np.array([2*cn(i)*np.exp(1j*2*i*np.pi*x/tau) for i in range(0,Nh+1)]) # here : 0, not 1
2) The output must be scaled.
y_Fourier_1=y_Fourier_1*0.5
The output seems "squeezed" because the high frequency components have been filtered. Indeed, the high frequency oscillations of the input have been cleared and the output looks like a moving average.
Here, tau is actually defined as stop-start: it corresponds to the length of the frame. It is the expected period of the signal.
If the frame does not correspond to a period of the signal, you can guess its period by convoluting the signal with itself and finding the first maximum. See
Find period of a signal out of the FFT Nevertheless, it is unlikely to work properly with a dataset generated by numpy.random.normal : this is an Additive White Gaussian Noise. As it features a constant power spectral density, it can hardly be discribed as periodic!
Related
I am currently working on an python implementation that is using the FFT to convert signals in the time domain to signals in the frequency domain and the other way around. In order to validate my FFT function, I’ve tried to use the rectangular and sinc function pair given by
def rect(t, T): #rectangular function
ans = np.zeros(len(t))
for i in range(len(t)):
if abs(t[i]) < T/2:
ans[i] = 1
return ans
and
def Tsinc_hz(f, T): # sinc function
return np.sin(np.pi * f * T) / (np.pi * f)
Unfortunately, the results are unsatisfying and I don’t quit get why. I hope someone is able to help me.
Here is my FFT code:
def to_frequency_domain(t, ys):
last = t[-1] # length of time sample, so is eual to limit
N = len(t) # number of samples in the time domain
T = last/N # 1/Sample rate
xf = fftfreq(N, T)
xf = fftshift(xf)
yf = fft(ys, N)
yf = fftshift(yf)
return xf, yf
def to_time_domain(x, y):
N = len(x) # Number of samples in the time domain
last = (len(x)-1) / (2*x[-1]) # length in the time domain
T = last/(N-1) # 1/sample rate
ys =ifft(ifftshift(y), N)
return ys
The following code is supposed to validate the FFT implementation by comparing the analytical functions given above with the FFT output in time and frequency domain.
t = np.linspace(0, 10, 250*10)
ans = rect(t, 1)
x_val,y_val = to_frequency_domain(t, ans)
f = np.linspace(-125, 125, 2500)
proof = Tsinc_hz(f, 1)
# sinc function comparison plot
plt.plot(x_val, y_val.real)
plt.plot(f,proof.real, c='r')
plt.show()
proof_time = to_time_domain(f, proof)
# rectangular function comparison plot
plt.plot(t, proof_time, linewidth = 3, c = 'r')
plt.plot(t, ans)
plt.show()
Running the code gives the following plot:
sinc function comparison plot and rectangular function comparison plot
It is obvious that the there is a scaling problem. I know that the frequency domain is dependent on the sample rate of my time domain. In this case I have used a sample rate of 250 and 2500 data points, meaning that I have 0,1 hz per frequency bin and thus my x-axis in the frequency domain reach form - 125 to 125. I was wondering if I can also formulate a relation between the frequency power and the sample rate. I was thinking that if I keep my data points constant and reduce my sampling rate the function is obviously jammed along the x-axis. Can this somehow result in stretching along the y-axis?
Furthermore, the transfer of the sinc-function in the second plot (proof_time) is mirrored.
Following, I have tried to fix my first problem by dividing the FFT output by the sampling rate
def to_frequency_domain(t, ys):
last = t[-1] # length of time sample, so is eual to limit
N = len(t) # number of samples in the time domain
T = last/N # 1/Sample rate
xf = fftfreq(N, T)
xf = fftshift(xf)
yf = fft(ys, N)*T
yf = fftshift(yf)
return xf, yf
def to_time_domain(x, y):
N = len(x) # Number of samples in the time domain
last = (len(x)-1) / (2*x[-1]) # length in the time domain
T = last/(N-1) # 1/sample rate
ys =ifft(ifftshift(y)/T, N)
return ys
which gives:
sinc function comparison plot_scaled, rectangular function comparison plot_scaled
This time both functions are obviously closer together but the result is still not perfect.
Additional, I am a little confused by the output of the FFT when I define my time domain to be symmetrical about zero t =np.linespace(-10,10, 250*20))
sinc function comparison plot
Why is the blue curve doubled like that? Supposedly because we have a negative and positive frequency component for every value in time, right? But how do I fix that?
I have tried figuring it out for a while now but just can’t seem to solve the problem, so I am very grateful for every tip!
Thanks in advance!
My guess is that you have to scale by d_omega= d_f/(2*pi) as the fft assumes a time step of 1 by default while you are using a different time step. If i add
df = xf[1] - xf[0]
domega = df / 2 / np.pi
yf = fft(ys, N) * domega
it scales.
For a single exponential curve such as shown in the image here curve_fit for as single exponential curve , I am able to fit the data using scipy.optimize.curve_fit. However, I am unsure on how to realize a fit for similar dataset composed of multiple exponential curves as shown here double exponential curves.
I achieved the fit for the single curve using the following approach:
def exp_decay(x,a,r):
return a * ((1-r)**x)
x = np.linspace(0,50,50)
y = exp_decay(x, 400, 0.06)
y1 = exp_decay(x, 550, 0.06) # this is to be used to append to y to generate two curves
pars, cov = curve_fit(exp_decay, x, y, p0=[0,0])
plt.scatter(x,y)
plt.plot(x, exp_decay(x, *pars), 'r-') #this realizes the fit for a single curve
yx = np.append(y,y1) #this realizes two exponential curves (as shown above - double exponential curves) for which I don't need to fit a model to
Can someone help describe how to achieve this for a dataset of two curves. My actual dataset comprises of multiple exponential curves but I think if I can realize a fit for two curves, I may be able to replicate same for my dataset. This must not be done with scipy's curve_fit; any implementation that works is fine.
PLEASE HELP !!!
Your problem can easily be tackled by splitting your dataset using a simple criterion such as first derivative estimate and then we can apply simple curve fitting procedure to each sub dataset.
Trial Dataset
First, let's import some packages and create a synthetic dataset with three curves to represent your problem.
We use a two parameters exponential model as time origin shift will be handled by the splitting methodology. We also add noise as there is always noise on real world data:
import numpy as np
import pandas as pd
from scipy import optimize
import matplotlib.pyplot as plt
def func(x, a, b):
return a*np.exp(b*x)
N = 1001
n1 = N//3
n2 = 2*n1
t = np.linspace(0, 10, N)
x0 = func(t[:n1], 1, -0.2)
x1 = func(t[n1:n2]-t[n1], 5, -0.4)
x2 = func(t[n2:]-t[n2], 2, -1.2)
x = np.hstack([x0, x1, x2])
xr = x + 0.025*np.random.randn(x.size)
Graphically it renders as follow:
Dataset Splitting
We can split the dataset into three sub-datasets using a simple criterion as first derivative estimate using first difference to assess it. The goal is to detect when curve drastically goes up or down (where dataset should be split. First derivative is estimated as follow):
dxrdt = np.abs(np.diff(xr)/np.diff(t))
The criterion requires an extra parameter (threshold) that must be tuned accordingly to your signal specifications. The criterion is equivalent to:
xcrit = 20
q = np.where(dxrdt > xcrit) # (array([332, 665], dtype=int64),)
And split index are:
idx = [0] + list(q[0]+1) + [t.size] # [0, 333, 666, 1001]
Mainly the criterion threshold will be affected by the nature and the power of the noise on your data and the gap magnitudes between two curves. The usage of this methodology depends on the ability to detect curves gap in presence of noise. It will break when the noise power has the same magnitude of the gap we want to detect. You can also observe false split index if the noise is heavily tailed (few strong outliers).
In this MCVE, we have set the threshold to 20 [Signal Units/Time Units]:
An alternative to this hand-crafted criterion is to delegate the identification to the excellent find_peaks method of scipy. But it will not avoid the requirement to tune the detection to your signal specifications.
Fit origin-shifted dataset
Now we can apply the curve fitting on each sub-dataset (with origin shifted time), collect parameters and statistics and plot the result:
trials = []
fig, axe = plt.subplots()
for k, (i, j) in enumerate(zip(idx[:-1], idx[1:])):
p, s = optimize.curve_fit(func, t[i:j]-t[i], xr[i:j])
axe.plot(t[i:j], xr[i:j], '.', label="Data #{}".format(k+1))
axe.plot(t[i:j], func(t[i:j]-t[i], *p), label="Data Fit #{}".format(k+1))
trials.append({"n0": i, "n1": j, "t0": t[i], "a": p[0], "b": p[1],
"s_a": s[0,0], "s_b": s[1,1], "s_ab": s[0,1]})
axe.set_title("Curve Fits")
axe.set_xlabel("Time, $t$")
axe.set_ylabel("Signal Estimate, $\hat{g}(t)$")
axe.legend()
axe.grid()
df = pd.DataFrame(trials)
It returns the following fitting results:
n0 n1 t0 a b s_a s_b s_ab
0 0 333 0.00 0.998032 -0.199102 0.000011 4.199937e-06 -0.000005
1 333 666 3.33 5.001710 -0.399537 0.000013 3.072542e-07 -0.000002
2 666 1001 6.66 2.002495 -1.203943 0.000030 2.256274e-05 -0.000018
Which complies with our original parameters (see Trial dataset section).
Graphically we can check the goodness of fits:
i'm working in a machine learning project in which i want to extract a device power consumption signature here is an example
then i want to extract features using harmonic analysis from that signature , so first, I did the Fourier transformation and i got those for the signature that i already showed to you
my problem is that i think that this transformation is not so much significant, how can I optimize or ameliorate the transformation,
here is the code that i used to transform the signal
plt.subplot(2,1,1)
plt.plot(dt,pt,'k-')
plt.xlabel('time')
plt.ylabel('amplitude')
Fs = 3000
plt.subplot(2,1,2)
n = len(pt) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
freq = frq[range(int(n/2))] # one side frequency range
Y = np.fft.fft(pt)/n # fft computing and normalization
Y = Y[range(int(n/2))]
plt.plot(freq, abs(Y), 'r-')
and here is another example
thank you in advance
The analytical Fourier transform of a sinusoidal signal is purely imginary. However, when numerically computing discrete Fourier transform, the result is not.
Tldr: Find all answers to this question here.
Consider therefore the following code
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft, fftfreq
f_s = 200 # Sampling rate = number of measurements per second in [Hz]
t = np.arange(0,10000, 1 / f_s)
N = len(t)
A = 4 # Amplitude of sinus signal
x = A * np.sin(t)
X = fft(x)[1:N//2]
freqs = (fftfreq(len(x)) * f_s)[1:N//2]
fig, (ax1,ax2) = plt.subplots(2,1, sharex = True)
ax1.plot(freqs, X.real, label = "$\Re[X(\omega)]$")
ax1.plot(freqs, X.imag, label = "$\Im[X(\omega)]$")
ax1.set_title("Discrete Fourier Transform of $x(t) = A \cdot \sin(t)$")
ax1.legend()
ax1.grid(True)
ax2.plot(freqs, np.abs(X), label = "$|X(\omega)|$")
ax2.legend()
ax2.set_xlabel("Frequency $\omega$")
ax2.set_yscale("log")
ax2.grid(True, which = "both")
ax2.set_xlim(0.15,0.175)
plt.show()
Clearly, the absolute value |X(w)| can be used as good approximation to the analytical result. However, the imaginary and real value of the function X(w) are different. Already another question on SO mentioned this fact, but did not explain why. So I can only use the absolute value and the phase?
Another question would be how the Amplitude is related to the numerical result. Mathematically speaking it should be the integral under the curve of |X(w)| divided by normalization (which, as far as I understood, should be given by N), i.e. approximately by
A_approx = np.sum(np.abs(X)) / N
print(f"Numerical value: {A_approx:.1f}, Correct value: {A:.1f}")
Numerical value: 13.5, Correct value: 4.0
This does not seem to be the case. Any insights? Ideas?
Related questions which did not help are here and here.
An FFT does not produce the result you expect because it is finite in length, and thus more similar to the Fourier Transform of a rectangular window on your sinusoid. The length and placement of this rectangular window will affect the phase and amplitude of the FFT result.
Just started working with numpy package and started it with the simple task to compute the FFT of the input signal. Here's the code:
import numpy as np
import matplotlib.pyplot as plt
#Some constants
L = 128
p = 2
X = 20
x = np.arange(-X/2,X/2,X/L)
fft_x = np.linspace(0,128,128, True)
fwhl = 1
fwhl_y = (2/fwhl) \
*(np.log([2])/np.pi)**0.5*np.e**(-(4*np.log([2]) \
*x**2)/fwhl**2)
fft_fwhl = np.fft.fft(fwhl_y, norm='ortho')
ampl_fft_fwhl = np.abs(fft_fwhl)
plt.bar(fft_x, ampl_fft_fwhl, width=.7, color='b')
plt.show()
Since I work with an exponential function with some constant divided by pi before it, I expect to get the exponential function in Fourier space, where the constant part of the FFT is always equal to 1 (zero frequency).
But the value of that component I get using numpy is larger (it's about 1,13). Here I have an amplitude spectrum which is normalized by 1/(number_of_counts)**0.5 (that's what I read in numpy documentation). I can't understand what's wrong... Can anybody help me?
Thanks!
[EDITED] It seems like the problem is solved, all you need to get the same result of Fourier integral and of FFT is to multiply FFT by the step (in my case it's X/L). And as for normalization as option of numpy.fft.fft(..., norm='ortho'), it's used only to save the scale of the transform, otherwise you'll need to divide the result of the inverse FFT by the number of samples. Thanks everyone for their help!
I've finally solved my problem. All you need to bond FFT with Fourier integral is to multiply the result of the transform (FFT) by the step (X/L in my case, FFTX/L), it works in general. In my case it's a bit more complex since I have an extra rule for the function to be transformed. I have to be sure that the area under the curve is equal to 1, because it's a model of δ function, so since the step is unchangeable, I have to fulfill stepsum(fwhl_y)=1 condition, that is X/L=1/sum(fwhl_y). So to get the correct result I have to make following things:
to calculate FFT fft_fwhl = np.fft.fft(fwhl_y)
to get rid of phase component which comes due to the symmetry of fwhl_y function, that is the function defined in [-T/2,T/2] interval, where T is period and np.fft.fft operation thinks that my function is defined in [0,T] interval. So to get amplitude spectrum only (that's what I need) I simply use np.abs(FFT)
to get the values I expect I should multiply the result I got on previous step by X/L, that is np.abs(FFT)*X/L
I have an extra condition on the area under the curve, so it's X/L*sum(fwhl_y)=1 and I finally come to np.abs(FFT)*X/L = np.abs(FFT)/sum(fwhl_y)
Hope it'll help anyone at least.
Here's a possible solution to your problem:
import numpy as np
import matplotlib.pyplot as plt
from scipy import fft
from numpy import log, pi, e
# Signal setup
Fs = 150
Ts = 1.0 / Fs
t = np.arange(0, 1, Ts)
ff = 50
fwhl = 1
y = (2 / fwhl) * (log([2]) / pi)**0.5 * e**(-(4 * log([2]) * t**2) / fwhl**2)
# Plot original signal
plt.subplot(2, 1, 1)
plt.plot(t, y, 'k-')
plt.xlabel('time')
plt.ylabel('amplitude')
# Normalized FFT
plt.subplot(2, 1, 2)
n = len(y)
k = np.arange(n)
T = n / Fs
frq = k / T
freq = frq[range(n / 2)]
Y = np.fft.fft(y) / n
Y = Y[range(n / 2)]
plt.plot(freq, abs(Y), 'r-')
plt.xlabel('freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()
With fwhl=1:
With fwhl=0.1:
You can see in the above graphs how the exponential & FFT plots varies when fwhl is close to 0