Analyzing seasonality of Google trend time series using FFT - python

I am trying to evaluate the amplitude spectrum of the Google trends time series using a fast Fourier transformation. If you look at the data for 'diet' in the data provided here it shows a very strong seasonal pattern:
I thought I could analyze this pattern using a FFT, which presumably should have a strong peak for a period of 1 year.
However when I apply a FFT like this (a_gtrend_ham being the time series multiplied with a Hamming window):
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import fft, fftshift
import pandas as pd
gtrend = pd.read_csv('multiTimeline.csv',index_col=0)
gtrend.index = pd.to_datetime(gtrend.index, format='%Y-%m')
# Sampling rate
fs = 12 #Points per year
a_gtrend_orig = gtrend['diet: (Worldwide)']
N_gtrend_orig = len(a_gtrend_orig)
length_gtrend_orig = N_gtrend_orig / fs
t_gtrend_orig = np.linspace(0, length_gtrend_orig, num = N_gtrend_orig, endpoint = False)
a_gtrend_sel = a_gtrend_orig.loc['2005-01-01 00:00:00':'2017-12-01 00:00:00']
N_gtrend = len(a_gtrend_sel)
length_gtrend = N_gtrend / fs
t_gtrend = np.linspace(0, length_gtrend, num = N_gtrend, endpoint = False)
a_gtrend_zero_mean = a_gtrend_sel - np.mean(a_gtrend_sel)
ham = np.hamming(len(a_gtrend_zero_mean))
a_gtrend_ham = a_gtrend_zero_mean * ham
N_gtrend = len(a_gtrend_ham)
ampl_gtrend = 1/N_gtrend * abs(fft(a_gtrend_ham))
mag_gtrend = fftshift(ampl_gtrend)
freq_gtrend = np.linspace(-0.5, 0.5, len(ampl_gtrend))
response_gtrend = 20 * np.log10(mag_gtrend)
response_gtrend = np.clip(response_gtrend, -100, 100)
My resulting amplitude spectrum does not show any dominant peak:
Where is my misunderstanding of how to use the FFT to get the spectrum of the data series?

Here is a clean implementation of what I think you are trying to accomplish. I include graphical output and a brief discussion of what it likely means.
First, we use the rfft() because the data is real valued. This saves time and effort (and reduces the bug rate) that otherwise follows from generating the redundant negative frequencies. And we use rfftfreq() to generate the frequency list (again, it is unnecessary to hand code it, and using the api reduces the bug rate).
For your data, the Tukey window is more appropriate than the Hamming and similar cos or sin based window functions. Notice also that we subtract the median before multiplying by the window function. The median() is a fairly robust estimate of the baseline, certainly more so than the mean().
In the graph you can see that the data falls quickly from its intitial value and then ends low. The Hamming and similar windows, sample the middle too narrowly for this and needlessly attenuate a lot of useful data.
For the FT graphs, we skip the zero frequency bin (the first point) since this only contains the baseline and omitting it provides a more convenient scaling for the y-axes.
You will notice some high frequency components in the graph of the FT output.
I include a sample code below that illustrates a possible origin of those high frequency components.
Okay here is the code:
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import rfft, rfftfreq
from scipy.signal import tukey
from numpy.fft import fft, fftshift
import pandas as pd
gtrend = pd.read_csv('multiTimeline.csv',index_col=0,skiprows=2)
#print(gtrend)
gtrend.index = pd.to_datetime(gtrend.index, format='%Y-%m')
#print(gtrend.index)
a_gtrend_orig = gtrend['diet: (Worldwide)']
t_gtrend_orig = np.linspace( 0, len(a_gtrend_orig)/12, len(a_gtrend_orig), endpoint=False )
a_gtrend_windowed = (a_gtrend_orig-np.median( a_gtrend_orig ))*tukey( len(a_gtrend_orig) )
plt.subplot( 2, 1, 1 )
plt.plot( t_gtrend_orig, a_gtrend_orig, label='raw data' )
plt.plot( t_gtrend_orig, a_gtrend_windowed, label='windowed data' )
plt.xlabel( 'years' )
plt.legend()
a_gtrend_psd = abs(rfft( a_gtrend_orig ))
a_gtrend_psdtukey = abs(rfft( a_gtrend_windowed ) )
# Notice that we assert the delta-time here,
# It would be better to get it from the data.
a_gtrend_freqs = rfftfreq( len(a_gtrend_orig), d = 1./12. )
# For the PSD graph, we skip the first two points, this brings us more into a useful scale
# those points represent the baseline (or mean), and are usually not relevant to the analysis
plt.subplot( 2, 1, 2 )
plt.plot( a_gtrend_freqs[1:], a_gtrend_psd[1:], label='psd raw data' )
plt.plot( a_gtrend_freqs[1:], a_gtrend_psdtukey[1:], label='windowed psd' )
plt.xlabel( 'frequency ($yr^{-1}$)' )
plt.legend()
plt.tight_layout()
plt.show()
And here is the output displayed graphically. There are strong signals at 1/year and at 0.14 (which happens to be 1/2 of 1/14 yrs), and there is a set of higher frequency signals that at first perusal might seem quite mysterious.
We see that the windowing function is actually quite effective in bringing the data to baseline and you see that the relative signal strengths in the FT are not altered very much by applying the window function.
If you look at the data closely, there seems to be some repeated variations within the year. If those occur with some regularity, they can be expected to appear as signals in the FT, and indeed the presence or absence of signals in the FT is often used to distinguish between signal and noise. But as will be shown, there is a better explanation for the high frequency signals.
Okay, now here is a sample code that illustrates one way those high frequency components can be produced. In this code, we create a single tone, and then we create a set of spikes at the same frequency as the tone. Then we Fourier transform the two signals and finally, graph the raw and FT data.
import matplotlib.pyplot as plt
import numpy as np
from numpy.fft import rfft, rfftfreq
t = np.linspace( 0, 1, 1000. )
y = np.cos( 50*3.14*t )
y2 = [ 1. if 1.-v < 0.01 else 0. for v in y ]
plt.subplot( 2, 1, 1 )
plt.plot( t, y, label='tone' )
plt.plot( t, y2, label='spikes' )
plt.xlabel('time')
plt.subplot( 2, 1, 2 )
plt.plot( rfftfreq(len(y),d=1/100.), abs( rfft(y) ), label='tone' )
plt.plot( rfftfreq(len(y2),d=1/100.), abs( rfft(y2) ), label='spikes' )
plt.xlabel('frequency')
plt.legend()
plt.tight_layout()
plt.show()
Okay, here are the graphs of the tone, and the spikes, and then their Fourier transforms. Notice that the spikes produce high frequency components that are very similar to those in our data.
In other words, the origin of the high frequency components is very likely in the short time scales associated with the spikey character of signals in the raw data.

Related

How to De-Trend a waveform which is piece-wise linear or non-linear?

I'm trying to remove the trend present in the waveform which looks like the following:
For doing so, I use scipy.signal.detrend() as follows:
autocorr = scipy.signal.detrend(autocorr)
But I don't see any significant flattening in trend. I get the following:
My objective is to have the trend completely eliminated from the waveform. And I need to also generalize it so that it can detrend any kind of waveform - be it linear, piece-wise linear, polynomial, etc.
Can you please suggest a way to do the same?
Note: In order to replicate the above waveform, you can simply run the following code that I used to generate it:
#Loading Libraries
import warnings
warnings.filterwarnings("ignore")
import json
import sys, os
import numpy as np
import pandas as pd
import glob
import pickle
from statsmodels.tsa.stattools import adfuller, acf, pacf
from scipy.signal import find_peaks, square
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
#Generating a function with Dual Seasonality:
def white_noise(mu, sigma, num_pts):
""" Function to generate Gaussian Normal Noise
Args:
sigma: std value
num_pts: no of points
mu: mean value
Returns:
generated Gaussian Normal Noise
"""
noise = np.random.normal(mu, sigma, num_pts)
return noise
def signal_line_plot(input_signal: pd.Series, title: str = "", y_label: str = "Signal"):
""" Function to plot a time series signal
Args:
input_signal: time series signal that you want to plot
title: title on plot
y_label: label of the signal being plotted
Returns:
signal plot
"""
plt.plot(input_signal)
plt.title(title)
plt.ylabel(y_label)
plt.show()
# Square with two periodicities of daily and weekly. With #15min sampling frequency it means 4*24=96 samples and 4*24*7=672
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend))
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
signal_line_plot(x_daily_weekly_long)
signal_line_plot(x_daily_weekly_long[0:1000])
#Finding Autocorrelation & Lags for the signal [WHICH THE FINAL PARAMETERS WHICH ARE TO BE PLOTTED]:
#Determining Autocorrelation & Lag values
import scipy.signal as signal
autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode="same")
#Normalize the autocorr values (such that the hightest peak value is at 1)
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")
#Visualization
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)
#DETRENDING:
autocorr = scipy.signal.detrend(autocorr)
#Visualization
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)
Since it's an auto-correlation, it will always be even; so detrending with a breakpoint at lag=0 should get you part of the way there.
An alternative way to detrend is to use a high-pass filter; you could do this in two ways. What will be tricky is deciding what the cut-off frequency should be.
Here's a possible way to do this:
#Loading Libraries
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
#Generating a function with Dual Seasonality:
def white_noise(mu, sigma, num_pts):
""" Function to generate Gaussian Normal Noise
Args:
sigma: std value
num_pts: no of points
mu: mean value
Returns:
generated Gaussian Normal Noise
"""
noise = np.random.normal(mu, sigma, num_pts)
return noise
# High-pass filter via discrete Fourier transform
# Drop all components from 0th to dropcomponent-th
def dft_highpass(x, dropcomponent):
fx = np.fft.rfft(x)
fx[:dropcomponent] = 0
return np.fft.irfft(fx)
# Square with two periodicities of daily and weekly. With #15min sampling frequency it means 4*24=96 samples and 4*24*7=672
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*signal.square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*signal.square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend))
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
#Finding Autocorrelation & Lags for the signal [WHICH THE FINAL PARAMETERS WHICH ARE TO BE PLOTTED]:
#Determining Autocorrelation & Lag values
autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode="same")
#Normalize the autocorr values (such that the hightest peak value is at 1)
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")
# detrend w/ breakpoints
dautocorr = signal.detrend(autocorr, bp=len(lags)//2)
# detrend w/ high-pass filter
# use `filtfilt` to get zero-phase
b, a = signal.butter(1, 1e-3, 'high')
fautocorr = signal.filtfilt(b, a, autocorr)
# detrend with DFT HPF
rautocorr = dft_highpass(autocorr, len(autocorr) // 1000)
#Visualization
fig, ax = plt.subplots(3)
for i in range(3):
ax[i].plot(lags, autocorr, label='orig')
ax[0].plot(lags, dautocorr, label='detrend w/ bp')
ax[1].plot(lags, fautocorr, label='HPF')
ax[2].plot(lags, rautocorr, label='DFT')
for i in range(3):
ax[i].legend()
ax[i].set_ylabel('autocorr')
ax[-1].set_xlabel('lags')
giving

Issues with scipy find_peaks function when used on an inverted dataset

The script below is a mixture of stackoverflow answers on different topics, but closely related to finding peaks on signals. Finding peaks based on prominence, as noted here works incredibly well, but my issue is that I need to find the lowest point immediately after the peak. The dataset is a fluorescence signal of a plant captured during 14 continuous hours, and the peaks are saturating pulses used to determined saturation under light conditions. A picture of the dataset (a 68MB CSV file) bellow:
This is my python script:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
# A parser is required to translate the timestamp
custom_date_parser = lambda x: datetime.strptime(x, "%d-%m-%Y %H:%M_%S.%f")
df = pd.read_csv('15-01-2022_05_00.csv', parse_dates=[ 'Timestamp'], date_parser=custom_date_parser)
x = df['Timestamp']
y = df['Mean_values']
# As per accepted answer here:
#https://stackoverflow.com/questions/1713335/peak-finding-algorithm-for-python-scipy
peaks, _ = find_peaks(y, prominence=1)
# Invert the data to find the lowest points of peaks as per answer here:
#https://stackoverflow.com/questions/61365881/is-there-an-opposite-version-of-scipy-find-peaks
valleys, _ = find_peaks(-y, prominence=1)
print(y[peaks])
print(y[valleys])
plt.subplot(2, 1, 1)
plt.plot(peaks, y[peaks], "ob"); plt.plot(y); plt.legend(['Prominence'])
plt.subplot(2, 1, 2)
plt.plot(valleys, y[valleys], "vg"); plt.plot(y); plt.legend(['Prominence Inverted'])
plt.show()
As you can see on the picture, not all the 'prominence inverted' points are below the respective peak. The prominence inverted function comes from this post here, and it simply inverts the dataset. Some are adjacent to the previous peak (difficult to see in the picture). Peaks and valleys below:
Peaks
1817 109.587178
3674 89.191393
56783 72.779385
111593 77.868118
166403 83.288949
221213 84.955026
276023 84.340550
330833 83.186605
385643 81.134827
440453 79.060960
495264 77.457803
550074 76.292243
604884 75.867575
659694 75.511924
714504 74.221657
769314 73.830941
824125 76.977637
878935 78.826169
933745 77.819844
988555 77.298089
1043365 77.188105
1098175 75.340765
1152985 74.311185
1207796 73.163844
1262606 72.613317
1317416 73.460068
1372226 70.388324
1427036 70.835355
1481845 70.154085
Valleys
2521 4.669368
56629 26.551585
56998 26.184984
111791 26.288734
166620 27.717165
221434 28.312708
330432 28.235397
385617 27.535091
440341 26.886589
495174 26.379043
549353 26.040947
550239 25.760023
605051 25.594147
714352 25.354300
714653 25.008184
769472 24.883584
824284 25.135316
879075 25.477464
933907 25.265173
988711 25.160046
1097917 25.058851
1098333 24.626667
1153134 24.357835
1207943 23.982878
1262750 23.938298
1371013 23.766077
1372381 23.351263
1427187 23.368314
Any ideas about this awkward result on the valleys?
You complicate your task by trying to find all valleys. This will always be difficult because they do not stand out as well as your peaks in comparison to the surrounding data. Whatever your parameters for find_peaks, sometimes it will identify two valleys after a peak, sometimes none. Instead, just identify the local minimum after each peak:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
#sample data
from scipy.misc import electrocardiogram
x = electrocardiogram()[2000:4000]
date_range = pd.date_range("20210116", periods=x.size, freq="10ms")
df = pd.DataFrame({"Timestamp": date_range, "Mean_values": x})
x = df['Timestamp']
y = df['Mean_values']
fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12, 8))
#peak finding
peaks, _ = find_peaks(y, prominence=1)
ax1.plot(x[peaks], y[peaks], "ob")
ax1.plot(x, y)
ax1.legend(['Prominence'])
#valley finder general
valleys, _ = find_peaks(-y, prominence=1)
ax2.plot(x[valleys], y[valleys], "vg")
ax2.plot(x, y)
ax2.legend(['Valleys without filtering'])
#valley finding restricted to a short time period after a peak
#set time window, e.g., for 200 ms
time_window_size = pd.Timedelta(200, unit="ms")
time_of_peaks = x[peaks]
peak_end = x.searchsorted(time_of_peaks + time_window_size)
#in case of evenly spaced data points, this can be simplified
#and you just add n data points to your peak index array
#peak_end = peaks + n
true_valleys = peaks.copy()
for i, (start, stop) in enumerate(zip(peaks, peak_end)):
true_valleys[i] = start + y[start:stop].argmin()
ax3.plot(x[true_valleys], y[true_valleys], "sr")
ax3.plot(x, y)
ax3.legend(['Valleys after events'])
plt.show()
Sample output:
I am not sure what you intend to do with these minima, but if you are only interested in baseline shifts, you can directly calculate the peak-wise baseline values like
baseline_per_peak = peaks.copy().astype(float)
for i, (start, stop) in enumerate(zip(peaks, peak_end)):
baseline_per_peak[i] = y[start:stop].mean()
print(baseline_per_peak)
Sample output:
[-0.71125 -0.203 0.29225 0.72825 0.6835 0.79125 0.51225 0.23
0.0345 -0.3945 -0.48125 -0.4675 ]
This can, of course, also easily be adapted to the period before the peak:
#valley in the short time period before a peak
#set time window, e.g., for 200 ms
time_window_size = pd.Timedelta(200, unit="ms")
time_of_peaks = x[peaks]
peak_start = x.searchsorted(time_of_peaks - time_window_size)
#in case of evenly spaced data points, this can be simplified
#and you just add n data points to your peak index array
#peak_start = peaks - n
true_valleys = peaks.copy()
for i, (start, stop) in enumerate(zip(peak_start, peaks)):
true_valleys[i] = start + y[start:stop].argmin()

Setting Wn for analog Bessel model in scipy.signal

I have been trying to implement an analog Bessel filter with a cutoff frequency 2kHz using scipy.signal, and I am confused about what value of Wn to set, as the documentation states Wn (for analog filters) should be set to angular frequency (12000 rad/s approximately). But if I implement this to my 1 second of dummy data, with half a second pulse sampled at 500 000 Hz, I get a string of 0s and nans. What is it that I am missing?
import numpy as np
import scipy
import matplotlib.pyplot as plt
import scipy.signal
def make_signal(pulse_length, rate = 500000):
new_x = np.zeros(rate)
end_signal = 250000+pulse_length
new_x[250000:end_signal] = 1
data = new_x
print (np.shape(data))
# pad on both sides
data=np.concatenate((np.zeros(rate),data,np.zeros(rate)))
return data
def conv_time(t):
pulse_length = t * 500000
pulse_length = int(pulse_length)
return pulse_length
def make_data(ti): #give time in seconds
pulse_length=conv_time(ti)
print (pulse_length)
data = make_signal(pulse_length)
return data
time_scale = np.linspace(0,1,500000)
data = make_data(0.5)
[b,a] = scipy.signal.bessel(4, 12566.37, btype='low', analog=True, output='ba', norm='phase', fs=None)
output_signal = scipy.signal.filtfilt(b, a, data)
plt.plot(data[600000:800000])
plt.plot(output_signal[600000:800000])
When plotting response using freqs, it doesn't seem that bad to me; where am I making a mistake?
You are passing an analog filter to a function, scipy.signal.filtfilt, that expects a digital (i.e. discrete time) filter. If you are going to use filtfilt or lfilter, the filter must be digital.
To work with continuous time systems, take a look at the functions
scipy.signal.impulse (scipy.signal.impulse2)
scipy.signal.step (scipy.signal.step2)
scipy.signal.lsim (scipy.signal.lsim2)
(The 2 versions solve the same mathematical problem as the version without 2 but use a different method. In most cases, the version without 2 is fine and is much faster than the 2 version.)
Other related functions and classes are listed in the section Continuous-Time Linear Systems of the SciPy documentation.
For example, here's a script that plots the impulse and step responses of your Bessel filter:
import numpy as np
from scipy.signal import bessel, step, impulse
import matplotlib.pyplot as plt
order = 4
Wn = 2*np.pi * 2000
b, a = bessel(order, Wn, btype='low', analog=True, output='ba', norm='phase')
# Note: the upper limit for t was chosen after some experimentation.
# If you don't give a T argument to impulse or step, it will choose a
# a "pretty good" time span.
t = np.linspace(0, 0.00125, 2500, endpoint=False)
timp, yimp = impulse((b, a), T=t)
tstep, ystep = step((b, a), T=t)
plt.subplot(2, 1, 1)
plt.plot(timp, yimp, label='impulse response')
plt.legend(loc='upper right', framealpha=1, shadow=True)
plt.grid(alpha=0.25)
plt.title('Impulse and step response of the Bessel filter')
plt.subplot(2, 1, 2)
plt.plot(tstep, ystep, label='step response')
plt.legend(loc='lower right', framealpha=1, shadow=True)
plt.grid(alpha=0.25)
plt.xlabel('t')
plt.show()
The script generates this plot:

How to validate the downsampling is as intended

how to validate whether the down sampled output is correct. For example, I had make some example, however, I am not sure whether the output is correct or not?
Any idea on the validation
Code
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
import mne
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=400 # frequency of the signal
x = np.arange(fs)
y = [ np.sin(2*np.pi*fTwo * (i/fs)) for i in x]
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure(1)
plt.subplot(211)
plt.stem(x, y)
plt.subplot(212)
plt.stem(xnew, f_res, 'r')
plt.show()
Plotting the data is a good first take at a verification. Here I made regular plot with the points connected by lines. The lines are useful since they give a guide for where you expect the down-sampled data to lie, and also emphasize what the down-sampled data is missing. (It would also work to only show lines for the original data, but lines, as in a stem plot, are too confusing, imho.)
import numpy as np
import matplotlib.pyplot as plt # For ploting
from scipy import signal
fs = 100 # sample rate
rsample=43 # downsample frequency
fTwo=13 # frequency of the signal
x = np.arange(fs, dtype=float)
y = np.sin(2*np.pi*fTwo * (x/fs))
print y
f_res = signal.resample(y, rsample)
xnew = np.linspace(0, 100, f_res.size, endpoint=False)
#
# ##############################
#
plt.figure()
plt.plot(x, y, 'o')
plt.plot(xnew, f_res, 'or')
plt.show()
A few notes:
If you're trying to make a general algorithm, use non-rounded numbers, otherwise you could easily introduce bugs that don't show up when things are even multiples. Similarly, if you need to zoom in to verify, go to a few random places, not, for example, only the start.
Note that I changed fTwo to be significantly less than the number of samples. Somehow, you need at least more than one data point per oscillation if you want to make sense of it.
I also remove the loop for calculating y: in general, you should try to vectorize calculations when using numpy.
The spectrum of the resampled signal should have a tone at the same frequency as the input signal just in a smaller nyquist bandwidth.
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
import scipy.fftpack as fft
fs = 100 # sample rate
rsample=50 # downsample frequency
fTwo=10 # frequency of the signal
n = np.arange(1024)
y = np.sin(2*np.pi*fTwo/fs*n)
y_res = signal.resample(y, len(n)/2)
Y = fft.fftshift(fft.fft(y))
f = -fs*np.arange(-512, 512)/1024
Y_res = fft.fftshift(fft.fft(y_res, 1024))
f_res = -fs/2*np.arange(-512, 512)/1024
plt.figure(1)
plt.subplot(211)
plt.stem(f, abs(Y))
plt.subplot(212)
plt.stem(f_res, abs(Y_res))
plt.show()
The tone is still at 10.
IF you down sample a signal both signals will still have the exact same value and a given time , so just loop through "time" and check that the values are the same. In your case you go from a sample rate of 100 to 50. Assuming you have 1 seconds worth of data from building your x from fs, then just loop through t = 0 to t=1 in 1/50'th increments and make sure that Yd(t) = Ys(t) where Yd d is the down sampled f and Ys is the original sampled frequency. Or to say it simply Yd(n) = Ys(2n) for n = 1,2,3,...n=total_samples-1.

Why is there a difference in magnitude response between scipy.filtfilt and scipy.lfilter?

I was trying to filter a signal using the scipy module of python and I wanted to see which of lfilter or filtfilt is better.
I tried to compare them and I got the following plot from my mwe
import numpy as np
import scipy.signal as sp
import matplotlib.pyplot as plt
frequency = 100. #cycles/second
samplingFrequency = 2500. #samples/second
amplitude = 16384
signalDuration = 2.3
cycles = frequency*signalDuration
time = np.linspace(0, 2*np.pi*cycles, signalDuration*samplingFrequency)
freq = np.fft.fftfreq(time.shape[-1])
inputSine = amplitude*np.sin(time)
#Create IIR Filter
b, a = sp.iirfilter(1, 0.3, btype = 'lowpass')
#Apply filter to input
filteredSignal = sp.filtfilt(b, a, inputSine)
filteredSignalInFrequency = np.fft.fft(filteredSignal)
filteredSignal2 = sp.lfilter(b, a, inputSine)
filteredSignal2InFrequency = np.fft.fft(filteredSignal2)
plt.close('all')
plt.figure(1)
plt.title('Sine filtered with filtfilt')
plt.plot(freq, abs(filteredSignalInFrequency))
plt.subplot(122)
plt.title('Sine filtered with lfilter')
plt.plot(freq, abs(filteredSignal2InFrequency))
print max(abs(filteredSignalInFrequency))
print max(abs(filteredSignal2InFrequency))
plt.show()
Can someone please explain why there is a difference in the magnitude response?
Thanks a lot for your help.
Looking at your graphs shows that the signal filtered with filtfilt has a peak magnitude of 4.43x107 in the frequency domain compared with 4.56x107 for the signal filtered with lfilter. In other words, the signal filtered with filtfilt has an peak magnitude that is 0.97 that when filtering with
Now we should note that scipy.signal.filtfilt applies the filter twice, whereas scipy.signal.lfilter only applies it once. As a result, input signals get attenuated twice as much. To confirm this we can have a look at the frequency response of the Butterworth filter you have used (obtained with iirfilter) around the input tone's normalized frequency of 100/2500 = 0.04:
which indeed shows an that the application of this filter does causes an attenuation of ~0.97 at a frequency of 0.04.

Categories

Resources