fft bandpass filter in python - python

What I try is to filter my data with fft. I have a noisy signal recorded with 500Hz as a 1d- array. My high-frequency should cut off with 20Hz and my low-frequency with 10Hz.
What I have tried is:
fft=scipy.fft(signal)
bp=fft[:]
for i in range(len(bp)):
if not 10<i<20:
bp[i]=0
ibp=scipy.ifft(bp)
What I get now are complex numbers. So something must be wrong. What? How can I correct my code?

It's worth noting that the magnitude of the units of your bp are not necessarily going to be in Hz, but are dependent on the sampling frequency of signal, you should use scipy.fftpack.fftfreq for the conversion. Also if your signal is real you should be using scipy.fftpack.rfft. Here is a minimal working example that filters out all frequencies less than a specified amount:
import numpy as np
from scipy.fftpack import rfft, irfft, fftfreq
time = np.linspace(0,10,2000)
signal = np.cos(5*np.pi*time) + np.cos(7*np.pi*time)
W = fftfreq(signal.size, d=time[1]-time[0])
f_signal = rfft(signal)
# If our original signal time was in seconds, this is now in Hz
cut_f_signal = f_signal.copy()
cut_f_signal[(W<6)] = 0
cut_signal = irfft(cut_f_signal)
We can plot the evolution of the signal in real and fourier space:
import pylab as plt
plt.subplot(221)
plt.plot(time,signal)
plt.subplot(222)
plt.plot(W,f_signal)
plt.xlim(0,10)
plt.subplot(223)
plt.plot(W,cut_f_signal)
plt.xlim(0,10)
plt.subplot(224)
plt.plot(time,cut_signal)
plt.show()

There's a fundamental flaw in what you are trying to do here - you're applying a rectangular window in the frequency domain which will result in a time domain signal which has been convolved with a sinc function. In other words there will be a large amount of "ringing" in the time domain signal due to the step changes you have introduced in the frequency domain. The proper way to do this kind of frequency domain filtering is to apply a suitable window function in the frequency domain. Any good introductory DSP book should cover this.

Related

Is the Fourier transform I am computing in Python displayed in the time domain? If so, how can I display this in the Frequency domain?

My goal is to detect if a certain frequency is present in an audio recording and output a binary response. To do this, I plan on performing a Fourier transform on the audio file, and querying the values contained in the frequency bins. If I find that the bin associated with the frequency I am looking for has a high value, this should mean that it is present (if my thinking is correct). However, I am having trouble generating my transform correctly. My code is below:
from scipy.io import wavfile
from scipy.fft import fft, fftfreq
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
user_in = input("Please enter the relative path to your wav file --> ")
sampling_rate, data = wavfile.read(user_in)
print("sampling rate:", sampling_rate)
duration = len(data) / float(sampling_rate)
print("duration:", duration)
number_samples_in_seg = int(sampling_rate * duration)
fft_of_data = fft(data)
fft_bins_from_data = fftfreq(number_samples_in_seg, 1 / sampling_rate)
print(fft_bins_from_data.size)
plt.plot(fft_bins_from_data, fft_of_data, label="Real part")
plt.show()
Trying this code using a few different wav files leads me to wonder whether I am displaying my transform in the time domain, rather than the frequency domain, which I need:
Input: 200hz.wav
Output:
sampling rate: 48000
duration: 60.000375
2880018
Input: 8000hz.wav
Output:
sampling rate: 48000
duration: 60.000375
2880018
With these files that should contain a pure signal, I would expect to see only one spike on my plot, where x = 200 or x = 800. One final file contributes to my concern that I am not viewing the frequency domain:
Input: beep.wav
Output:
sampling rate: 48000
duration: 5.061958333333333
24297
This appears to show the distinct beeping as it progresses over an x-axis of time.
I attempted to clean up the plotting by only plotting the magnitude of the positive values. Unfortunately, I am still not seeing the frequencies isolated on a frequency spectrum:
plt.plot(fft_bins_from_data[0:number_samples_in_seg//2], abs(fft_of_data[0:number_samples_in_seg//2])
plt.show()
beep output updated
I have referred to these resources before posting:
How to get a list of frequencies in a wav file
Python frequency detection
Fourier Transforms With scipy.fft: Python Signal Processing
Calculate the magnitude and phase of a signal at a particular frequency in python
What is the difference between numpy.fft.fft and numpy.fft.fftfreq
A summary of my questions:
Are my plots displaying the time domain or frequency domain of the signal?
Why is the number of samples equal to the number of bins, and should this be the case for frequency domain?
If these plots are indeed the frequency domain, how do I interpret them and query the values in the bins?
Try this:
import scipy as sp
import scipy.signal as sig
import numpy as np
from numpy import fft
import matplotlib.pyplot as plt
number_samples_in_seg = len(data)
time_axis = np.arange(0, number_samples_in_seg)/sampling_rate
win = sig.windows.hann(number_samples_in_seg)
windowed_data = win*data
plt.plot(time_axis, windowed_data)
That will plot the signal in the time domain if that's not obvious. I applied a Hann window to the signal, which will reduce artifacts if the start and end of the signal don't match up (as the FFT assumes that the snippet of the signal is periodic).
For the plotting of the FFT:
fft_data = fft.fft(windowed_data)[0:int(np.floor(number_samples_in_seg/2))]
freq_axis = sp.fft.fftfreq(number_samples_in_seg, 1.0/sample_rate)[0:int(np.floor(number_samples_in_seg/2))]
plt.plot(freq_axis, 20.0*np.log10(np.abs(fft_data)))
The square bracket indexing on fft_data and freq_axis are to eliminate the negative frequency portion of the FFT. I generated a 200Hz sine wave in Audacity with a length of 4096 samples (just so that it fit within a power of two for nice FFT-ing) and there is a peak at 200Hz in my plot. Also note the 20*log10(abs(fft_data)) thing for plotting in dB.
The above should answer your question #3. As for question #2, the FFT always has the same number of time and frequency points. Not sure about question #1, but again, the above code should sort that out.

Area under the peak of a FFT in Python

I'm trying to do some tests before I proceed analyzing some real dataset via FFT, and I've found the following problem.
First, I create a signal as the sum of two cosines and then use rfft to to the transformation (since it has only real values):
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import rfft, rfftfreq
# Number of sample points
N = 800
# Sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = 0.5*np.cos(10*2*np.pi*x) + 0.5*np.cos(200*2*np.pi*x)
# FFT
yf = rfft(y)
xf = rfftfreq(N, T)
fig, ax = plt.subplots(1,2,figsize=(15,5))
ax[0].plot(x,y)
ax[1].plot(xf, 2.0/N*np.abs(yf))
As it can be seen from the definition of the signal, I have two oscillations with amplitude 0.5 and frequency 10 and 200. Now, I would expect the FFT spectrum to be something like two deltas at those points, but apparently increasing the frequency broadens the peaks:
From the first peak it can be infered that the amplitude is 0.5, but not for the second. I've tryied to obtain the area under the peak using np.trapz and use that as an estimate for the amplitude, but as it is close to a dirac delta it's very sensitive to the interval I choose. My problem is that I need to get the amplitude as exact as possible for my data analysis.
EDIT: As it seems to be something related with the number of points, I decided to increment (now that I can) the sample frequency. This seems to solve the problem, as it can be seen in the figure:
However, it still seems strange that for a certain number of points and sample frequency, the high frequency peaks broaden...
It is not strange , you have leakage of the frequency bins. When you discretize the signal (sampling) needed for the Fourier transfrom , frequency bins are created which are frequency intervals where the the amplitude is calculated. And each bin has wide which is given by the sample_rate / num_points . So , the less the number of bins the more difficult is to assign precise amplitudes to every frequency. Other problems in choosing the best sampling rate exist such as the shannon-nyquist theorem to prevent aliasing. https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem . But depending on the problem sometimes there some custom rates used for sampling. E.g. when dealing with audio a sampling rate of 44,100 Hz is widely used , cause is based on the limits of the human hearing. So it depends also on nature of the data you want to perform analysis as you wrote. Anyway , since this question has also theoretical value , you can also check https://dsp.stackexchange.com for some useful info.
I would comment to George's answer, but yet I cannot.
Maybe a starting point for your research are the properties of the Discrete Fourier Transform.
The signal in the time domain is actual the cosines multiplied by a box window which transforms into the frequency domain as the convolution of the deltas with the sinc function. The sinc functions will smear the spectrum.
However, I am not sure we are observing spectral leakage here, since the window fits exactly to the full period of cosines. The discretization of the bins might still play a role here.

Convolution using numpy.fft causes time shift

I'm trying to write a convolution code entirely in the spectral domain. I'm taking a spike series in time (example below only has one spike for simplicity) of n samples and calculating the Fourier series with numpy.fft.fft. I create a 'Ricker wavelet' of m samples (m << n) and calculate its Fourier series with numpy.fft.fft, but specifying that its output Fourier series be n samples long. Both the spike series and wavelet have the same sampling interval. The resulting convolved series is shifted (peak of wavelet is shifted along the time axis with respect to the spike). This shift seems to depend on the size, m, of the wavelet.
I thought it had something to do with the parameters of numpy.fft.fft(a, n=None, axis=-1, norm=None), particularly the 'axis' parameter. But, I do not understand the documentation for this parameter at all.
Can anyone help me understand why I'm getting this shift (if it isn't clear, let me be explicit and say that the peak of the wavelet in the convolved series must the at the same time sample of the spike in the input spike series)?
My code follows:
################################################################################
#
# import libraries
#
import math
import numpy as np
import scipy
import matplotlib.pyplot as plt
import os
from matplotlib.ticker import MultipleLocator
from random import random
# Define lists
#
Time=[]; Ricker=[]; freq=25; rickersize=51; timeiter=0.002; serieslength=501; TIMElong=[]; Reflectivity=[];
Series=[]; IMPEDANCE=[]; CONVOLUTION=[];
#
# Create ricker wavelet and its time sequence
#
for i in range(0,rickersize):
time=(float(i-rickersize//2)*timeiter)
ricker=(1-2*math.pi*math.pi*freq*freq*time*time)*math.exp(-1*math.pi*math.pi*freq*freq*time*time)
Time.append(time)
Ricker.append(ricker)
#
# Do various FFT operations on the Ricker wavelet:
# Normal FFT, FFT of longer Ricker, Amplitude of the FFTs, their inverse FFTs and their frequency sequence
#
FFT=np.fft.fft(Ricker); FFTlong=np.fft.fft(Ricker,n=serieslength,axis=0,norm=None);
AMP=abs(FFT); AMPlong=abs(FFTlong);
RICKER=np.fft.ifft(FFT); RICKERlong=np.fft.ifft(FFTlong);
FREQ=np.fft.fftfreq(len(Ricker),d=timeiter); FREQlong=np.fft.fftfreq(len(RICKERlong),d=timeiter)
PHASE=np.angle(FFT); PHASElong=np.angle(FFTlong);
#
# Create a single spike in the otherwise empty (0) series of length 'serieslength' (=len(RICKERlong)
# this spikes mimics a very simple seismic reflectivity series in time
#
for i in range(0,serieslength):
time=(float(i)*timeiter)
TIMElong.append(time)
if i==int(serieslength/2):
Series.append(1)
else:
Series.append(0)
#
# Do various FFT operations on the spike series
# Normal FFT, Amplitude of the FFT, its inverse FFT and frequency sequence
#
FFTSeries=np.fft.fft(Series)
AMPSeries=abs(FFTSeries)
SERIES=np.fft.ifft(FFTSeries)
FREQSeries=np.fft.fftfreq(len(Series),d=timeiter)
#
# Do convolution of the spike series with the (long) Ricker wavelet in the frequency domain and see result via inverse FFT
#
FFTConvolution=[FFTlong[i]*FFTSeries[i] for i in range(len(Series))]
CON=np.fft.ifft(FFTConvolution)
CONVOLUTION=[CON[i].real for i in range(len(Series))]
#
# plotting routines
#
fig,axs = plt.subplots(nrows=1,ncols=3, figsize=(14,8))
axs[0].barh(TIMElong,Series,height=0.005, color='black')
axs[1].plot(Ricker,Time,color='black', linestyle='solid',linewidth=1)
axs[2].plot(CONVOLUTION,TIMElong,color='black', linestyle='solid',linewidth=1)
#
axs[0].set_aspect(aspect=8); axs[0].set_title('Reflectivity',fontsize=12); axs[0].yaxis.grid(); axs[0].xaxis.grid();
axs[0].set_xlim(-2,2); axs[0].set_ylim(min(TIMElong),max(TIMElong)); axs[0].invert_yaxis(); axs[0].tick_params(axis='both',which='major',labelsize=12);
#
axs[1].set_aspect(aspect=6.2); axs[1].set_title('Ricker',fontsize=12); axs[1].yaxis.grid(); axs[1].xaxis.grid();
axs[1].set_xlim(-1.0,1.02); axs[1].set_ylim(min(Time),max(Time)); axs[1].invert_yaxis(); axs[1].tick_params(axis='both',which='major',labelsize=12);
#
axs[2].set_aspect(aspect=8); axs[2].set_title('Convolution',fontsize=12); axs[2].yaxis.grid(); axs[2].xaxis.grid();
axs[2].set_xlim(-2,2); axs[2].set_ylim(min(TIMElong),max(TIMElong)); axs[2].invert_yaxis(); axs[2].tick_params(axis='both',which='major',labelsize=12);
#
fig.tight_layout()
fig.show()
####
It turns out that, as far as I can understand, that my question has nothing to do with the peculiarities of python and numpy. The problem is 'circular convolution'. That is, the convolution of two data sequences is longer by a combination of the lengths of both sequences. This has to be accounted for in the fft and ifft. I wasn't doing this. I still don't know exactly how to handle this, but it should be simpler now I know what the problem is.
Apologies to those who tried to answer my malformed question.

How do I get the frequencies from a signal?

I am look for a way to obtain the frequency from a signal. Here's an example:
signal = [numpy.sin(numpy.pi * x / 2) for x in range(1000)]
This Array will represent the sample of a recorded sound (x = miliseconds)
sin(pi*x/2) => 250 Hrz
How can we go from the signal (list of points), to obtaining the frequencies form this array?
Note:
I have read many Stackoverflow threads and watch many youtube videos. I am yet to find an answer. Please use simple words.
(I am Thankfull for every answer)
What you're looking for is known as the Fourier Transform
A bit of background
Let's start with the formal definition:
The Fourier transform (FT) decomposes a function (often a function of time, or a signal) into its constituent frequencies
This is in essence a mathematical operation that when applied over a signal, gives you an idea of how present each frequency is in the time series. In order to get some intuition behind this, it might be helpful to look at the mathematical definition of the DFT:
Where k here is swept all the way up t N-1 to calculate all the DFT coefficients.
The first thing to notice is that, this definition resembles somewhat that of the correlation of two functions, in this case x(n) and the negative exponential function. While this may seem a little bit abstract, by using Euler's formula and by playing a bit around with the definition, the DFT can be expressed as the correlation with both a sine wave and a cosine wave, which will account for the imaginary and the real parts of the DFT.
So keeping in mind that this is in essence computing a correlation, whenever a corresponding sine or cosine from the decomposition of the complex exponential matches with that of x(n), there will be a peak in X(K), meaning that, such frequency is present in the signal.
How can we do the same with numpy?
So having given a very brief theoretical background, let's consider an example to see how this can be implemented in python. Lets consider the following signal:
import numpy as np
import matplotlib.pyplot as plt
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 50; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
plt.plot(t, y)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()
Now, the DFT can be computed by using np.fft.fft, which as mentioned, will be telling you which is the contribution of each frequency in the signal now in the transformed domain:
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[:len(frq)//2] # one side frequency range
Y = np.fft.fft(y)/n # dft and normalization
Y = Y[:n//2]
Now, if we plot the actual spectrum, you will see that we get a peak at the frequency of 50Hz, which in mathematical terms it will be a delta function centred in the fundamental frequency of 50Hz. This can be checked in the following Table of Fourier Transform Pairs table.
So for the above signal, we would get:
plt.plot(frq,abs(Y)) # plotting the spectrum
plt.xlabel('Freq (Hz)')
plt.ylabel('|Y(freq)|')
plt.show()

Scipy FFT Frequency Analysis of very noisy signal

I have noisy data for which I want to calculate frequency and amplitude. The samples were collected every 1/100th sec. From trends, I believe frequency to be ~ 0.3
When I use numpy fft module, I end up getting very high frequency (36.32 /sec) which is clearly not correct. I tried to filter the data with pandas rolling_mean to remove the noise before fft, but that too didn't work.
import pandas as pd
from numpy import fft
import numpy as np
import matplotlib.pyplot as plt
Moisture_mean_x = pd.read_excel("signal.xlsx", header = None)
Moisture_mean_x = pd.rolling_mean(Moisture_mean_x, 10) # doesn't helps
Moisture_mean_x = Moisture_mean_x.dropna()
Moisture_mean_x = Moisture_mean_x -Moisture_mean_x.mean()
frate = 100. #/sec
Hn = fft.fft(Moisture_mean_x)
freqs = fft.fftfreq(len(Hn), 1/frate)
idx = np.argmax(np.abs(Hn))
freq_in_hertz = freqs[idx]
Can someone guide me how to fix this?
You are right there is something wrong. One needs to explictiy ask pandas for the zeroth column:
Hn = np.fft.fft(Moisture_mean_x[0])
Else something wrong happen, which you can see by the fact that the FFT result was not symetric, which should be the case for real input.
Seems like #tillsten already answered your question, but here is some additional confirmation. The first plot is your data (zero mean and I changed it to a csv). The second is the power spectral density and you can see a fat mass with a peak at ~0.3 Hz. I 'zoomed' in on the third plot to see if there was a second hidden frequency close to the main frequency.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
x = pd.read_csv("signal.csv")
x = np.array(x, dtype=float)[:,0]
x = x - np.mean(x)
fs = 1e2
f, Pxx = signal.welch(x, fs, nperseg=1024)
f_res, Pxx_res = signal.welch(x, fs, nperseg=2048)
plt.subplot(3,1,1)
plt.plot(x)
plt.subplot(3,1,2)
plt.plot(f, Pxx)
plt.xlim([0, 1])
plt.xlabel('frequency [Hz]')
plt.ylabel('PSD')
plt.subplot(3,1,3)
plt.plot(f_res, Pxx_res)
plt.xlim([0, 1])
plt.xlabel('frequency [Hz]')
plt.ylabel('PSD')
plt.show()
Hn = fft.fft(x)
freqs = fft.fftfreq(len(Hn), 1/fs)
idx = np.argmax(np.abs(Hn))
freq_in_hertz = freqs[idx]
print 'Main freq:', freq_in_hertz
print 'RMS amp:', np.sqrt(Pxx.max())
This prints:
Main freq: 0.32012805122
RMS amp: 0.0556044913489
An FFT is a filter bank. Just look for the magnitude peak only within the expected frequency range in the FFT result (instead of the entire result vector), and most of the other spectrum will essentially be filtered out.
It isn't necessary to filter the signal beforehand, because the FFT is a filter. Just skip those parts of the FFT that correspond to frequencies you know to contain a lot of noise - zero them out, or otherwise exclude them.
I hope this may help you.
https://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html
You should filter only the band around the expected frequency and improve the signal noise ratio before applying the FFT.
Edit:
Mark Ransom gave a smarter answer, if you have to do the FFT you can just cut off the noise after the transformation. It won't give a worse result than a filter would.
You should use a low pass filter, which should keep the larger periodic variations and smooth out some of the higher frequency stuff first. After that, then can do FFT to get the peaks. Here is a recipe for FIR filter typically used for this exact sort of thing.

Categories

Resources