I'm trying to make sense of the output produced by the python FFT library.
I have a sqlite database where I have logged several series of ADC values. Each series consist of 1024 samples taken with a frequency of 1 ms.
After importing a dataseries, I normalize it and run int through the fft method. I've included a few plots of the original signal compared to the FFT output.
import sqlite3
import struct
import numpy as np
from matplotlib import pyplot as plt
import time
import math
conn = sqlite3.connect(r"C:\my_test_data.sqlite")
c = conn.cursor()
c.execute('SELECT ID, time, data_blob FROM log_tbl')
for row in c:
data_raw = bytes(row[2])
data_raw_floats = struct.unpack('f'*1024, data_raw)
data_np = np.asarray(data_raw_floats)
data_normalized = (data_np - data_np.mean()) / (data_np.max() - data_np.min())
fft = np.fft.fft(data_normalized)
N = data_normalized .size
plt.figure(1)
plt.subplot(211)
plt.plot(data_normalized )
plt.subplot(212)
plt.plot(np.abs(fft)[:N // 2] * 1 / N)
plt.show()
plt.clf()
The signal clearly contains some frequencies, and I was expecting them to be visible from the FFT output.
What am I doing wrong?
You need to make sure that your data is evenly spaced when using np.fft.fft, otherwise the output will not be accurate. If they are not evenly spaced, you can use LS periodograms for example: http://docs.astropy.org/en/stable/stats/lombscargle.html.
Or look up non-uniform fft.
About the plots:
I don't think that you are doing something obviously wrong. Your signal consists a signal with period in the order of magnitude 100, so you can expect a strong frequency signal around 1/period=0.01. This is what is visible on your graphs. The time-domain signals are not that sinusoidal, so your peak in the frequency domain will be blurry, as seen on your graphs.
Related
I have a time series sensor data. It is for a period of 24 hours sampled every minute (so in total 1440 data points per day). I did a fft on this to see what are the dominant frequencies. But what I got is a very noisy fft and a strong peak at zero.
I have already subtracted the mean to remove for the DC component at bin 0. But I still get a strong peak at zero. I'm not able to figure what could be the other reason or what should I try next to remove this.
The graph is very different from I have usually seen online as I was learning about fft. In the sense, I'm not able to see dominant peaks like how it is usually seen. Is my fft wrong?
Attaching code that i tried and images:
import numpy as np
from matplotlib import pyplot as plt
from scipy.fftpack import fft,fftfreq
import random
x = np.random.default_rng().uniform(29,32,1440).tolist()
x=np.array(x)
x=x-x.mean()
N = 1440
# sample spacing
T = 1.0 / 60
yf = fft(x)
yf_abs = abs(yf).tolist()
plt.plot(abs(yf))
plt.show()
freqs = fftfreq(len(x), 60)
plt.plot(freqs,yf_abs)
plt.show()
Frequency vs amplitude
Since I'm new to this, I'm not able to figure out where I'm wrong or interpret the results. Any help will be appreciated. Thanks! :)
Problem
I am trying to remove a frequency from a set of data obtained from an audio file.
To simplify down my problem, I have created the code below which creates a set of waves and merges them into a complex wave. Then it finds the fourier transform of this complex wave and inverses it.
I am expecting to see the original wave as a result since there should be no data loss, however I receive a very different wave instead.
Code:
import numpy as np
import matplotlib.pyplot as plt
import random
#Get plots
fig, c1 = plt.subplots()
c2 = c1.twinx()
fs = 100 # sample rate
f_list = [5,10,15,20,100] # the frequency of the signal
x = np.arange(fs) # the points on the x axis for plotting
# compute the value (amplitude) of the sin wave for each sample
wave = []
for f in f_list:
wave.append(list(np.sin(2*np.pi*f * (x/fs))))
#Adds the sine waves together into a single complex wave
wave4 = []
for i in range(len(wave[0])):
data = 0
for ii in range(len(wave)):
data += wave[ii][i]
wave4.append(data)
#Get frequencies from complex wave
fft = np.fft.rfft(wave4)
fft = np.abs(fft)
#Note: Here I will add some code to remove specific frequencies
#Get complex wave from frequencies
waveV2 = np.fft.irfft(fft)
#Plot the complex waves, should be the same
c1.plot(wave4, color="orange")
c1.plot(waveV2)
plt.show()
Results: (Orange is created wave, blue is original wave)
Expected results:
The blue and orange lines (the original and new wave created) should have the exact same values
You took the absolute value of the FFT before you do the inverse FFT. That changes things, and is probably the cause of your problem.
I'm analyzing what is essentially a respiratory waveform, constructed in 3 different shapes (the data originates from MRI, so multiple echo times were used; see here if you'd like some quick background). Here are a couple of segments of two of the plotted waveforms for some context:
For each waveform, I'm trying to perform a DFT in order to discover the dominant frequency or frequencies of respiration.
My issue is that when I plot the DFTs that I perform, I get one of two things, dependent on the FFT library that I use. Furthermore, neither of them is representative of what I am expecting. I understand that data doesn't always look the way we want, but I clearly have waveforms present in my data, so I would expect a discrete Fourier transform to produce a frequency peak somewhere reasonable. For reference here, I would expect about 80 to 130 Hz.
My data is stored in a pandas data frame, with each echo time's data in a separate series. I'm applying the FFT function of choice (see the code below) and receiving different results for each of them.
SciPy (fftpack)
import pandas as pd
import scipy.fftpack
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(scipy.fftpack.fft)
lead_pts_fft_df
NumPy:
import pandas as pd
import numpy as np
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
The rest of the relevant code:
ECHO_TIMES = [0.080, 0.200, 0.400] # milliseconds
f_s = 1 / (0.006) # 0.006 = time between samples
freq = np.linspace(0, 29556, 29556) * (f_s / 29556) # (29556 = length of data)
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq, lead_pts_fft_df.magnitude[idx].real, # real part
freq, lead_pts_fft_df.magnitude[idx].imag, # imaginary part
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
Post-DFT (SciPy fftpack):
Post-DFT (NumPy)
Here is a link to the dataset (on Dropbox) used to create these plots, and here is a link to the Jupyter Notebook.
EDIT:
I used the posted advice and took the magnitude (absolute value) of the data, and also plotted with a logarithmic y-axis. The new results are posted below. It appears that I have some wraparound in my signal. Am I not using the correct frequency scale? The updated code and plots are below.
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq[1::], np.log(np.abs(lead_pts_fft_df.magnitude[idx][1::])),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
I've figured out my issues.
Cris Luengo was very helpful with this link, which helped me determine how to correctly scale my frequency bins and plot the DFT appropriately.
ADDITIONALLY: I had forgotten to take into account the offset present in the signal. Not only does it cause issues with the huge peak at 0 Hz in the DFT, but it is also responsible for most of the noise in the transformed signal. I made use of scipy.signal.detrend() to eliminate this and got a very appropriate looking DFT.
# import DFT and signal packages from SciPy
import scipy.fftpack
import scipy.signal
# temporary copy to maintain data structure; original data frame is NOT changed due to ".copy()"
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame AND detrend the signal
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(scipy.signal.detrend)
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
Arrange frequency bins accordingly:
num_projections = 29556
N = num_projections
T = (6 * 3) / 1000 # 6*3 b/c of the nature of my signal: 1 pt for each waveform collected every third acquisition
xf = np.linspace(0.0, 1.0 / (2.0 * T), num_projections / 2)
Then plot:
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(xf, 2.0/num_projections * np.abs(lead_pts_fft_df.magnitude[idx][:num_projections//2]),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
I am doing a project where I want to use the data of a .wav file to drive animation. The problems I am facing are mainly due to the fact that the animation is 25fps and I have 44100 samples per second in the .wav file, so I've broken down apart to 44100/25 samples. Working with the amplitude is fine and I created an initial test to try it out and it worked. This is the code
import wave
import struct
wav = wave.open('test.wav', 'rb')
rate = 44100
nframes = wav.getnframes()
data = wav.readframes(-1)
wav.close()
data_c = [data[offset::2] for offset in range(2)]
ch1 = struct.unpack('%ih' % nframes, data_c[0])
ch2 = struct.unpack('%ih' % nframes, data_c[1])
kf = []
for i in range(0, len(ch2), 44100/25):
cur1 = 0
cur2 = 0
for j in range(i, i+44100/25):
cur1+=ch2[j]
cur2+=ch1[j]
cur = (cur1+cur2) / 44100. / 25. / 2.
kf.append(cur)
min_v = min(kf)
max_v = max(kf)
if abs(max_v) > abs(min_v):
kf = [float(i)/max_v for i in kf]
else:
kf = [float(i)/min_v for i in kf]
Now I want to get the spectrum for each separate keyframe as I do for the amplitude, but I am struggling to think of a way to do it. I can get the spectrum for the whole file using FFT, but that's not I want, because ideally I would like to have different movements of the objects in accordance to different frequencies.
Look at scipy wavfile. It'll turn the wave file into a numpy array. Numpy also has fft functions. Scipy/matplotlib has a spectrogram plot for the entire spectrogram.
from scipy.io import wavfile
sample_rate, data = wavfile.read(filename)
Then you have to get your timing of how you want to read the data. Matplotlib has animation tools that will call a function at a given interval. The other way of doing it is to use PyAudio. If you use pyaudio you can listen to the data while it is displayed.
Next run the data through the FFT. Store the FFT values in a spectrogram array and use matplotlib imshow to display the spectrogram array. You will probably have to rotate the array in some fashion when you display the spectrogram.
From personal experience be careful of python threads. Threading works for I/O, but for calculations the thread can just dominate the whole application slowing everything down. Also GUI elements (like plotting) don't really work in threads. Use matplotlibs animation tools for the plotting.
The original data is on the google drive. It is a two columns data, t and x. I did the following discrete fft transform. I don't quite understand that the main peak(sharp one) has a lower height than the side one. The second the subplot shows that it is indeed that the sharp peak(most close to 2.0) is the main frequency. The code and the figure is as follows:
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.fftpack import fft,fftfreq
freqUnit=0.012/(2*np.pi)
data = np.loadtxt(fname='data.txt')
t = data[:,0]
x = data[:,1]
n=len(t)
d=t[1]-t[0]
fig=plt.figure()
ax1=fig.add_subplot(3,1,1)
ax2=fig.add_subplot(3,1,2)
ax3=fig.add_subplot(3,1,3)
y = abs(fft(x))
freqs = fftfreq(n, d)/freqUnit
ax1.plot(t, x)
ax2.plot(t, x)
ax2.set_xlim(40000,60000)
ax2.set_ylim(0.995,1.005)
ax3.plot(freqs,y,'-.')
ax3.set_xlim(0,4)
ax3.set_ylim(0,1000)
plt.show()
You need to apply a window function prior to your FFT, otherwise you will see artefacts such as the above, particularly where a peak in the spectrum does not correspond directly with an FFT bin centre.
See "Why do I need to apply a window function to samples when building a power spectrum of an audio signal?" question for further details.