Please feel free to point out any errors/improvements in the existing code
So this is a very basic question and I only have a beginner level understanding of signal processing. I have a 1.02 second accelerometer data sampled at 32000 Hz. I am looking to extract the following frequency domain features after having performed FFT in python -
Mean Freq, Median Freq, Power Spectrum Deformation, Spectrum energy, Spectral Kurtosis, Spectral Skewness, Spectral Entropy, RMSF (Root Mean Square Freq.), RVF (Root Variance Frequency), Power Cepstrum.
More specifically, I am looking for plots of these features as a final output.
The csv file containing data has four columns: Time, X Axis Value, Y Axis Value, Z Axis Value (The accelerometer is a triaxial one). So far on python, I have been able to visualize the time domain data, apply convolution filter to it, applied FFT and generated a Spectogram that shows an interesting shock
To Visualize Data
#Importing pandas and plotting modules
import numpy as np
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
#Reading Data
data = pd.read_csv('HelicalStage_Aug1.csv', index_col=0)
data = data[['X Value', 'Y Value', 'Z Value']]
date_rng = pd.date_range(start='1/8/2018', end='11/20/2018', freq='s')
#Plot the entire time series data and show gridlines
data.grid=True
data.plot()
enter image description here
Denoising
# Applying Convolution Filter
mylist = [1, 2, 3, 4, 5, 6, 7]
N = 3
cumsum, moving_aves = [0], []
for i, x in enumerate(mylist, 1):
cumsum.append(cumsum[i-1] + x)
if i>=N:
moving_ave = (cumsum[i] - cumsum[i-N])/N
#can do stuff with moving_ave here
moving_aves.append(moving_ave)
np.convolve(x, np.ones((N,))/N, mode='valid')
result_X = np.convolve(data[["X Value"]].values[:,0], np.ones((20001,))/20001, mode='valid')
result_Y = np.convolve(data[["Y Value"]].values[:,0], np.ones((20001,))/20001, mode='valid')
result_Z = np.convolve(data[["Z Value"]].values[:,0],
np.ones((20001,))/20001, mode='valid')
plt.plot(result_X-np.mean(result_X))
plt.plot(result_Y-np.mean(result_Y))
plt.plot(result_Z-np.mean(result_Z))
enter image description here
FFT and Spectogram
import numpy as np
import scipy as sp
import scipy.fftpack
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('HelicalStage_Aug1.csv')
df = df.drop(columns="Time")
df.plot()
plt.title('Sensor Data as Time Series')
signal = df[['Y Value']]
signal = np.squeeze(signal)
Y = np.fft.fftshift(np.abs(np.fft.fft(signal)))
Y = Y[int(len(Y)/2):]
Y = Y[10:]
plt.figure()
plt.plot(Y)
enter image description here
plt.figure()
powerSpectrum, freqenciesFound, time, imageAxis = plt.specgram(signal, Fs= 32000)
plt.show()
enter image description here
If my code is correct and the generated FFT and spectrogram are good, then how can I graphically compute the previously mentioned frequency domain features?
I have tried doing the following for MFCC -
import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
from scipy.io import wavfile
from python_speech_features import mfcc
from python_speech_features import logfbank
# Extract MFCC and Filter bank features
mfcc_features = mfcc(signal, Fs)
filterbank_features = logfbank(signal, Fs)
# Printing parameters to see how many windows were generated
print('\nMFCC:\nNumber of windows =', mfcc_features.shape[0])
print('Length of each feature =', mfcc_features.shape[1])
print('\nFilter bank:\nNumber of windows =', filterbank_features.shape[0])
print('Length of each feature =', filterbank_features.shape[1])
Visualizing filter bank features
#Matrix needs to be transformed in order to have horizontal time domain
mfcc_features = mfcc_features.T
plt.matshow(mfcc_features)
plt.title('MFCC')
enter image description here
enter image description here
I think your fft taking procedure is not correct, fft output is usually peak and when you are taking abs it should be one peak, as , probably you should change it to Y = np.fft.fftshift(np.abs(np.fft.fft(signal))) to Y=np.abs(np.fft.fftshift(signal)
Related
I'm trying to plot the frequencies that make up the first 1 second of a voice recording.
My approach was to:
Read the .wav file as a numpy array containing time series data
Slice the array from [0:sample_rate-1], given that the sample rate has units of [samples/1 second], which implies that sample_rate [samples/seconds] * 1 [seconds] = sample_rate [samples]
Perform a fast fourier transform (fft) on the time series array in order to get the frequencies that make up that time-series sample.
Plot the the frequencies on the x-axis, and amplitude on the y-axis. The frequency domain would range from 0:(sample_rate/2) since the Nyquist Sampling Theorem tells us that the recording captured frequencies of at least two times the maximum frequency, i.e 2*max(frequency). I'll also slice the frequency output array in half since the output frequency data is symmetrical
Here is my implementation
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft
from scipy.io import wavfile
sample_rate, audio_time_series = wavfile.read(audio_path)
single_sample_data = audio_time_series[:sample_rate]
def fft_plot(audio, sample_rate):
N = len(audio) # Number of samples
T = 1/sample_rate # Period
y_freq = fft(audio)
domain = len(y_freq) // 2
x_freq = np.linspace(0, sample_rate//2, N//2)
plt.plot(x_freq, abs(y_freq[:domain]))
plt.xlabel("Frequency [Hz]")
plt.ylabel("Frequency Amplitude |X(t)|")
return plt.show()
fft_plot(single_sample_data, sample_rate)
This is the plot that it generated
However, this is incorrect, my spectrogram tells me I should have frequency peaks below the 5kHz range:
In fact, what this plot is actually showing, is the first second of my time series data:
Which I was able to debug by removing the absolute value function from y_freq when I plot it, and entering the entire audio signal into my fft_plot function:
...
sample_rate, audio_time_series = wavfile.read(audio_path)
single_sample_data = audio_time_series[:sample_rate]
def fft_plot(audio, sample_rate):
N = len(audio) # Number of samples
y_freq = fft(audio)
domain = len(y_freq) // 2
x_freq = np.linspace(0, sample_rate//2, N//2)
# Changed from abs(y_freq[:domain]) -> y_freq[:domain]
plt.plot(x_freq, y_freq[:domain])
plt.xlabel("Frequency [Hz]")
plt.ylabel("Frequency Amplitude |X(t)|")
return plt.show()
# Changed from single_sample_data -> audio_time_series
fft_plot(audio_time_series, sample_rate)
The code sample above produced, this plot:
Therefore, I think one of two things is going on:
The fft() function is not actually performing an fft on the time series data it is being given
The .wav file does not contain time series data to begin with
What could be the issue? Has anyone else experienced this?
I have replicated, essentially replicated, the code in the question and I don't see the problem the OP has described.
In [172]: %reset -f
...: import matplotlib.pyplot as plt
...: import numpy as np
...: from scipy.fftpack import fft
...: from scipy.io import wavfile
...:
...: sr, data = wavfile.read('sample.wav')
...: print(data.shape, sr)
...: signal = data[:sr,0]
...: Signal = fft(signal)
...: fig, (axt, axf) = plt.subplots(2, 1,
...: constrained_layout=1,
...: figsize=(11.8,3))
...: axt.plot(signal, lw=0.15) ; axt.grid(1)
...: axf.plot(np.abs(Signal[:sr//2]), lw=0.15) ; axf.grid(1)
...: plt.show()
sr, data = wavfile.read('sample.wav')
(268237, 2) 8000
Hence, I'm voting for closing the question because it is "Not reproducible or was caused by a typo".
I'm trying to determine the different time-periods present in a waveform (shown below) using findpeaks().
The dataset for which I'm finding the peaks and eventually time-period, is an Autocorrelation Plot (though it could have been any given plot). I generated it using the following code (which you can use as it is for your reference):
import json
import sys, os
import numpy as np
import pandas as pd
import glob
import pickle
from statsmodels.tsa.stattools import adfuller, acf, pacf
from scipy.signal import find_peaks, square
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
#GENERATION OF A FUNCTION WITH DUAL SEASONALITY & NOISE
def white_noise(mu, sigma, num_pts):
""" Function to generate Gaussian Normal Noise
Args:
sigma: std value
num_pts: no of points
mu: mean value
Returns:
generated Gaussian Normal Noise
"""
noise = np.random.normal(mu, sigma, num_pts)
return noise
def signal_line_plot(input_signal: pd.Series, title: str = "", y_label: str = "Signal"):
""" Function to plot a time series signal
Args:
input_signal: time series signal that you want to plot
title: title on plot
y_label: label of the signal being plotted
Returns:
signal plot
"""
plt.plot(input_signal)
plt.title(title)
plt.ylabel(y_label)
plt.show()
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend))
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
signal_line_plot(x_daily_weekly_long)
signal_line_plot(x_daily_weekly_long[0:1000])
#x_daily_weekly_long is the final waveform on which I'm carrying out Autocorrelation
#PERFORMING AUTOCORRELATION:
import scipy.signal as signal
autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode = "same")
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")
#VISUALIZATION:
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)
My approach to determining the time-period is as follows:
#1) Finding peak indices
indices = find_peaks(autocorr.flatten())[0]
#2) Determining Period
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
short_period = abs(np.mean(diff))/fs
short_period #is the distance between all the individual peaks.
In this way, I'm able to only find the time-period (distance) between individual peaks. But my need is to determine the time-period for all the multiple periods that the data may contain.
Can anyone please help, how to determine all possible time-periods present in the data?
I'm trying to remove the trend present in the waveform which looks like the following:
For doing so, I use scipy.signal.detrend() as follows:
autocorr = scipy.signal.detrend(autocorr)
But I don't see any significant flattening in trend. I get the following:
My objective is to have the trend completely eliminated from the waveform. And I need to also generalize it so that it can detrend any kind of waveform - be it linear, piece-wise linear, polynomial, etc.
Can you please suggest a way to do the same?
Note: In order to replicate the above waveform, you can simply run the following code that I used to generate it:
#Loading Libraries
import warnings
warnings.filterwarnings("ignore")
import json
import sys, os
import numpy as np
import pandas as pd
import glob
import pickle
from statsmodels.tsa.stattools import adfuller, acf, pacf
from scipy.signal import find_peaks, square
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt
#Generating a function with Dual Seasonality:
def white_noise(mu, sigma, num_pts):
""" Function to generate Gaussian Normal Noise
Args:
sigma: std value
num_pts: no of points
mu: mean value
Returns:
generated Gaussian Normal Noise
"""
noise = np.random.normal(mu, sigma, num_pts)
return noise
def signal_line_plot(input_signal: pd.Series, title: str = "", y_label: str = "Signal"):
""" Function to plot a time series signal
Args:
input_signal: time series signal that you want to plot
title: title on plot
y_label: label of the signal being plotted
Returns:
signal plot
"""
plt.plot(input_signal)
plt.title(title)
plt.ylabel(y_label)
plt.show()
# Square with two periodicities of daily and weekly. With #15min sampling frequency it means 4*24=96 samples and 4*24*7=672
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend))
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
signal_line_plot(x_daily_weekly_long)
signal_line_plot(x_daily_weekly_long[0:1000])
#Finding Autocorrelation & Lags for the signal [WHICH THE FINAL PARAMETERS WHICH ARE TO BE PLOTTED]:
#Determining Autocorrelation & Lag values
import scipy.signal as signal
autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode="same")
#Normalize the autocorr values (such that the hightest peak value is at 1)
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")
#Visualization
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)
#DETRENDING:
autocorr = scipy.signal.detrend(autocorr)
#Visualization
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)
Since it's an auto-correlation, it will always be even; so detrending with a breakpoint at lag=0 should get you part of the way there.
An alternative way to detrend is to use a high-pass filter; you could do this in two ways. What will be tricky is deciding what the cut-off frequency should be.
Here's a possible way to do this:
#Loading Libraries
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
#Generating a function with Dual Seasonality:
def white_noise(mu, sigma, num_pts):
""" Function to generate Gaussian Normal Noise
Args:
sigma: std value
num_pts: no of points
mu: mean value
Returns:
generated Gaussian Normal Noise
"""
noise = np.random.normal(mu, sigma, num_pts)
return noise
# High-pass filter via discrete Fourier transform
# Drop all components from 0th to dropcomponent-th
def dft_highpass(x, dropcomponent):
fx = np.fft.rfft(x)
fx[:dropcomponent] = 0
return np.fft.irfft(fx)
# Square with two periodicities of daily and weekly. With #15min sampling frequency it means 4*24=96 samples and 4*24*7=672
t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*signal.square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*signal.square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend))
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
#Finding Autocorrelation & Lags for the signal [WHICH THE FINAL PARAMETERS WHICH ARE TO BE PLOTTED]:
#Determining Autocorrelation & Lag values
autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode="same")
#Normalize the autocorr values (such that the hightest peak value is at 1)
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")
# detrend w/ breakpoints
dautocorr = signal.detrend(autocorr, bp=len(lags)//2)
# detrend w/ high-pass filter
# use `filtfilt` to get zero-phase
b, a = signal.butter(1, 1e-3, 'high')
fautocorr = signal.filtfilt(b, a, autocorr)
# detrend with DFT HPF
rautocorr = dft_highpass(autocorr, len(autocorr) // 1000)
#Visualization
fig, ax = plt.subplots(3)
for i in range(3):
ax[i].plot(lags, autocorr, label='orig')
ax[0].plot(lags, dautocorr, label='detrend w/ bp')
ax[1].plot(lags, fautocorr, label='HPF')
ax[2].plot(lags, rautocorr, label='DFT')
for i in range(3):
ax[i].legend()
ax[i].set_ylabel('autocorr')
ax[-1].set_xlabel('lags')
giving
I have 136 numbers which have an overlapping distribution of 8 Gaussian distributions. I want to find it's means, and variances with each Gaussian distribution! Can you find any mistakes with my code?
file = open("1.txt",'r') #data is in 1.txt like 0,0,0,0,0,0,1,0,0,1,4,4,6,14,25,43,71,93,123,194...
y=[int (i) for i in list((file.read()).split(','))] # I want to make list which element is above data
x=list(range(1,len(y)+1)) # it is x values
z=list(zip(x,y)) # z elements consist as (1, 0), (2, 0), ...
Therefore, through the above process, for the 136 points (x,y) on the xy plane having the first given data as y values, a list z using this as an element was obtained.
Now I want to obtain each Gaussian distribution's mean, variance. At this time, the basic assumption is that the given data consists of overlapping 8 Gaussian distributions.
import numpy as np
from sklearn.mixture import GaussianMixture
data = np.array(z).reshape(-1,1)
model = GaussianMixture(n_components=8).fit(data)
print(model.means_)
file.close()
Actually, I don't know how to make it's code to print 8 means and variances... Anyone can help me?
You can use this, I have made a sample code for your visualizations -
import numpy as np
from sklearn.mixture import GaussianMixture
import scipy
import matplotlib.pyplot as plt
%matplotlib inline
#Sample data
x = [0,0,0,0,0,0,1,0,0,1,4,4,6,14,25,43,71,93,123,194]
num_components = 2
#Fit a model onto the data
data = np.array(x).reshape(-1,1)
model = GaussianMixture(n_components=num_components).fit(data)
#Get list of means and variances
mu = np.abs(model.means_.flatten())
sd = np.sqrt(np.abs(model.covariances_.flatten()))
#Plotting
extend_window = 50 #this is for zooming into or out of the graph, higher it is , more zoom out
x_values = np.arange(data.min()-extend_window, data.max()+extend_window, 0.1) #For plotting smooth graphs
plt.plot(data, np.zeros(data.shape), linestyle='None', markersize = 10.0, marker='o') #plot the data on x axis
#plot the different distributions (in this case 2 of them)
for i in range(num_components):
y_values = scipy.stats.norm(mu[i], sd[i])
plt.plot(x_values, y_values.pdf(x_values))
I'm starting DSP on Python and I'm having some difficulties:
I'm trying to define a sine wave with frequency 1000Hz
I try to do the FFT and find its frequency with the following piece of code:
import numpy as np
import matplotlib.pyplot as plt
sampling_rate = int(10e3)
n = int(10e3)
sine_wave = [100*np.sin(2 * np.pi * 1000 * x/sampling_rate) for x in range(0, n)]
s = np.array(sine_wave)
print(s)
plt.plot(s[:200])
plt.show()
s_fft = np.fft.fft(s)
frequencies = np.abs(s_fft)
plt.plot(frequencies)
plt.show()
So first plot makes sense to me.
Second plot (FFT) shows two frequencies:
i) 1000Hz, which is the one I set at the beggining
ii) 9000Hz, unexpectedly
freqeuncy domain
Your data do not respect Shannon criterion. you do not set a correct frequencies axis.
It's easier also to use rfft rather than fft when the signal is real.
Your code can be adapted like :
import numpy as np
import matplotlib.pyplot as plt
sampling_rate = 10000
n = 10000
signal_freq = 4000 # must be < sampling_rate/2
amplitude = 100
t=np.arange(0,n/sampling_rate,1/sampling_rate)
sine_wave = amplitude*np.sin(2 * np.pi *signal_freq*t)
plt.subplot(211)
plt.plot(t[:30],sine_wave[:30],'ro')
spectrum = 2/n*np.abs(np.fft.rfft(sine_wave))
frequencies = np.fft.rfftfreq(n,1/sampling_rate)
plt.subplot(212)
plt.plot(frequencies,spectrum)
plt.show()
Output :
There is no information loss, even if a human eye can be troubled by the temporal representation.