Interpreting and understanding fft plots of time series data - python

I have a time series sensor data. It is for a period of 24 hours sampled every minute (so in total 1440 data points per day). I did a fft on this to see what are the dominant frequencies. But what I got is a very noisy fft and a strong peak at zero.
I have already subtracted the mean to remove for the DC component at bin 0. But I still get a strong peak at zero. I'm not able to figure what could be the other reason or what should I try next to remove this.
The graph is very different from I have usually seen online as I was learning about fft. In the sense, I'm not able to see dominant peaks like how it is usually seen. Is my fft wrong?
Attaching code that i tried and images:
import numpy as np
from matplotlib import pyplot as plt
from scipy.fftpack import fft,fftfreq
import random
x = np.random.default_rng().uniform(29,32,1440).tolist()
x=np.array(x)
x=x-x.mean()
N = 1440
# sample spacing
T = 1.0 / 60
yf = fft(x)
yf_abs = abs(yf).tolist()
plt.plot(abs(yf))
plt.show()
freqs = fftfreq(len(x), 60)
plt.plot(freqs,yf_abs)
plt.show()
Frequency vs amplitude
Since I'm new to this, I'm not able to figure out where I'm wrong or interpret the results. Any help will be appreciated. Thanks! :)

Related

Time series graph analysis by detecting specific point increment and decrement

I have one month hourly flow data as follows
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(index=pd.date_range(start = "01/01/2010", end = "02/01/2010"),
data=np.random.rand(32)*140,
columns=['data'])
df.plot(label='data')
plt.legend(); plt.ylabel('flow')
The maximum and minimum flow values are 140 and 30. Data in x-axis is hourly distributed.
"My task is to distribute flow data in a way that when value decrease from 100 to 30 it should take 4.5 hours. If flow value is less than 100 (while going down) it should follow same slope as it did in previous condition."
In simple, I have to draw a line from point 100 to 30 with slope -15.55 while trend are decreasing. But don't know how to detect point 100 and its occurring time in x-axes.
Need some code and analyze advise to do it.
I am very new in python and trying to learn techniques. Would be appreciable if can suggest different way to do and some explanation if possible. Thank You.

Understanding the output from the fast Fourier transform method

I'm trying to make sense of the output produced by the python FFT library.
I have a sqlite database where I have logged several series of ADC values. Each series consist of 1024 samples taken with a frequency of 1 ms.
After importing a dataseries, I normalize it and run int through the fft method. I've included a few plots of the original signal compared to the FFT output.
import sqlite3
import struct
import numpy as np
from matplotlib import pyplot as plt
import time
import math
conn = sqlite3.connect(r"C:\my_test_data.sqlite")
c = conn.cursor()
c.execute('SELECT ID, time, data_blob FROM log_tbl')
for row in c:
data_raw = bytes(row[2])
data_raw_floats = struct.unpack('f'*1024, data_raw)
data_np = np.asarray(data_raw_floats)
data_normalized = (data_np - data_np.mean()) / (data_np.max() - data_np.min())
fft = np.fft.fft(data_normalized)
N = data_normalized .size
plt.figure(1)
plt.subplot(211)
plt.plot(data_normalized )
plt.subplot(212)
plt.plot(np.abs(fft)[:N // 2] * 1 / N)
plt.show()
plt.clf()
The signal clearly contains some frequencies, and I was expecting them to be visible from the FFT output.
What am I doing wrong?
You need to make sure that your data is evenly spaced when using np.fft.fft, otherwise the output will not be accurate. If they are not evenly spaced, you can use LS periodograms for example: http://docs.astropy.org/en/stable/stats/lombscargle.html.
Or look up non-uniform fft.
About the plots:
I don't think that you are doing something obviously wrong. Your signal consists a signal with period in the order of magnitude 100, so you can expect a strong frequency signal around 1/period=0.01. This is what is visible on your graphs. The time-domain signals are not that sinusoidal, so your peak in the frequency domain will be blurry, as seen on your graphs.

How can I detect periodicity using auto-correlation automatically?

This is my code:
import matplotlib.pyplot as plt
import numpy as np
from pandas.plotting import autocorrelation_plot
y = np.sin(np.arange(1,6*3.14,0.1))
autocorrelation_plot(y)
plt.show()
And this is the output of the auto-correlation plot:
auto-correlation plot of y
I would like to figure out a way to classify whether the function is periodic or not automatically (without using the bare-eye to look at the autocorrelation plot). I read that it is related to the confidence interval which is the line shown in the attached plot, but still have doubt on what I should do with it to better decide. So is there an automated way for using auto-correlation to decide the perdiodicity of the data?
Though, this is my try for an automated way:
result = np.correlate(y, y, mode = "full")
ACF = result[np.round(result.size/2).astype(int):]
ACF = ACF/ACF[0]
acceptedVar = []
for i in range(len(ACF)):
if ACF[i] > 0.05:
acceptedVar = np.append(acceptedVar, ACF[i])
percent = len(acceptedVar)/len(ACF) * 100
I just made a threshold of 0.05 to detect the points for which the confidence interval is 95%. Don't know if this is right or wrong statistically and logically. I then see if percent is bigger than 95% for a periodic pattern. I'm not sure of that as well.
Credit to: the first answer to How can I use numpy.correlate to do autocorrelation?
To start, with e.g. ax = autocorrelation_plot(y) you can use ax.lines[5].get_data()[1] to use the values from the pandas autocorrelation function directly.
This may be a somewhat naïve solution, but say you are just looking for the first, most significant, periodicity, you could just grab the first index of the highest peak in the plot:
first_max = np.argmax(autocorr) + 1
Which gives you the lag for which autocorrelation is highest = the period of interest (in units of your data's sampling interval.)
Say you wanted the next most significant period:
second_max = np.argmax(autocorr[first_max:]) + first_max + 1
And so on and so forth...
To note: this wouldn't work as well if your data is not as regular and periodic as it seems to be from your autocorrelation plot.

Normalizing FFT spectrum magnitude to 0dB

I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.
Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.
What's physical meaning of the amplitude in the second picture?
How to normalize the amplitude to 0dB like the one in Audacity?
Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?
Related code:
import numpy as np
from pylab import plot, show
from scipy.io import wavfile
sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0
rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))
plot(f, p)
show()
Update
I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.
And scale the y-axis to decibel as #Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?
Updated code:
fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))
About the bounty
With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.
I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.
After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
segment_size = 512
fs, x = wavfile.read('sine3k6k.wav')
x = x / 32768.0 # scale signal to [-1.0 .. 1.0]
noverlap = segment_size / 2
f, Pxx = signal.welch(x, # signal
fs=fs, # sample rate
nperseg=segment_size, # segment size
window='hanning', # window type to use
nfft=segment_size, # num. of samples in FFT
detrend=False, # remove DC part
scaling='spectrum', # return power spectrum [V^2]
noverlap=noverlap) # overlap between segments
# set 0 dB to energy of sine wave with maximum amplitude
ref = (1/np.sqrt(2)**2) # simply 0.5 ;)
p = 10 * np.log10(Pxx/ref)
fill_to = -150 * (np.ones_like(p)) # anything below -150dB is irrelevant
plt.fill_between(f, p, fill_to )
plt.xlim([f[2], f[-1]])
plt.ylim([-90, 6])
# plt.xscale('log') # uncomment if you want log scale on x-axis
plt.xlabel('f, Hz')
plt.ylabel('Power spectrum, dB')
plt.grid(True)
plt.show()
Some necessary explanations on parameters:
wave file is read as 16-bit PCM, in order to be compatible with Audacity it should be scaled to be |A|<1.0
segment_size is corresponding to Size in Audacity's GUI.
default window type is 'Hanning', you can change it if you want.
overlap is segment_size/2 as in Audacity code.
output window is framed to follow Audacity style. They throw away first low frequency bins and cut everything below -90dB
What's physical meaning of the amplitude in the second picture?
It is basically amount of energy in the frequency bin.
How to normalize the amplitude to 0dB like the one in Audacity?
You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.
Can I treat the amplitude below -90dB so low that it can be ignored?
Yes, anything below -40dB is usually considered negligeble

Extracting meaning from an FFT analysis of data

My question is about Fast Fourier Transforms, since this is the first time i'm using them.
So, I have a set of data by years (from 1700 - 2009) and each year corresponding to a certain value (a reading).
when i plot the readings against the years it gives me the first plot below. Now, my aim is to find the dominant period with the highest readings using FFT with python (From the graph it seems that it is around 1940 - 1950). So i performed an FFT and got its amplitude and power spectra (see second plot for power spectrum). The power spectrum shows that the dominant frequencies are between 0.08 and 0.1 (cycles/year). My question is, how do i link this to the Readings vs. years ? i.e how do i know from this dominant frequency what year (or period of years) is the dominant one (or how can i use it to find it) ?
The data list can be found here:
http://www.physics.utoronto.ca/%7Ephy225h/web-pages/sunspot_yearly.txt
the code i wrote is:
from pylab import *
from numpy import *
from scipy import *
from scipy.optimize import leastsq
import numpy.fft
#-------------------------------------------------------------------------------
# Defining the time array
tmin = 0
tmax = 100 * pi
delta = 0.1
t = arange(tmin, tmax, delta)
# Loading the data from the text file
year, N_sunspots = loadtxt('/Users/.../Desktop/sunspot_yearly.txt', unpack = True) # years and number of sunspots
# Ploting the data
figure(1)
plot(year, N_sunspots)
title('Number of Sunspots vs. Year')
xlabel('time(year)')
ylabel('N')
# Computing the FFT
N_w = fft(N_sunspots)
# Obtaining the frequencies
n = len(N_sunspots)
freq = fftfreq(n) # dt is default to 1
# keeping only positive terms
N = N_w[1:len(N_w)/2.0]/float(len(N_w[1:len(N_w)/2.0]))
w = freq[1:len(freq)/2.0]
figure(2)
plot(w, real(N))
plot(w, imag(N))
title('The data function f(w) vs. frequency')
xlabel('frequency(cycles/year)')
ylabel('f(w)')
grid(True)
# Amplitude spectrum
Amp_spec = abs(N)
figure(3)
plot(w, Amp_spec)
title('Amplitude spectrum')
xlabel('frequency(cycles/year)')
ylabel('Amplitude')
grid(True)
# Power spectrum
Pwr_spec = abs(N)**2
figure(4)
plot(w, Pwr_spec 'o')
title('Power spectrum')
xlabel('frequency(cycles/year)')
ylabel('Power')
grid(True)
show()
The graph below shows the data input to the FFT. The original data file contains a total of 309 samples. The zero values at the right end are added automatically by the FFT, to pad the number of input samples to the next higher power of two (2^9 = 512).
The graph below shows the data input to the FFT, with the Kaiser-Bessel a=3.5 window function applied. The window function reduces the spectral leakage errors in the FFT, when the input to the FFT is a non-periodic signal over the sampling interval, as in this case.
The graph below shows the FFT output at full scale. Without the window function. The peak is at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968).
The graph below shows the FFT output at full scale. With the Kaiser-Bessel a=3.5 window applied. The peak remains in the same frequency location at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968). The peak is more clearly visible above the background, due to the reduction in spectral leakage provided by the window function.
In conclusion, we can say with high certainty that the Sun spot data, provided in the original post, is periodic with a fundamental period of 10.89 years.
FFT and graphs were done with the Sooeet FFT calculator

Categories

Resources