I have a program in MATLAB which I want to port to Python. The problem is that in it I use the built-in spectrogram function and, although the matplotlib specgram function seems identical, I'm getting different results when I run both.
These is the code I've been running.
MATLAB:
data = 1:999; %Dummy data. Just for testing.
Fs = 8000; % All the songs we'll be working on will be sampled at an 8KHz rate
tWindow = 64e-3; % The window must be long enough to get 64ms of the signal
NWindow = Fs*tWindow; % Number of elements the window must have
window = hamming(NWindow); % Window used in the spectrogram
NFFT = 512;
NOverlap = NWindow/2; % We want a 50% overlap
[S, F, T] = spectrogram(data, window, NOverlap, NFFT, Fs);
Python:
import numpy as np
from matplotlib import mlab
data = range(1,1000) #Dummy data. Just for testing
Fs = 8000
tWindow = 64e-3
NWindow = Fs*tWindow
window = np.hamming(NWindow)
NFFT = 512
NOverlap = NWindow/2
[s, f, t] = mlab.specgram(data, NFFT = NFFT, Fs = Fs, window = window, noverlap = NOverlap)
And this is the result I get in both executions:
http://i.imgur.com/QSPvYsC.png
(The F and T variables are exactly the same in both programs)
It's obvious that they're different; in fact, the Python execution even doesn't return complex numbers. What could be the problem? Is there any way to fix it or I should use another spectrogram function?
Thank you so much in advance for your help.
In matplotlib, specgram by default returns the power spectral density (mode='PSD'). In MATLAB, spectrogram by default returns the short-time fourier transform, unless nargout==4, in which case it also computes the PSD. To get the matplotlib behaviour to match the MATLAB behaviour, set mode='complex'
Related
I am working with the pyaudio and matplotlib packages for the first time and I am attempting to plot live audio data from microphone input, transform it to frequency domain information, and then output peaks with an input distance. This project is a modification of the three-part guide to build a spectrum analyzer found here.
Currently the code is formatted in a class as I have alternative methods that I am applying to the audio but I am only posting the class with the relevant methods as they don't make reference to each and are self-contained. Another quirk of the program is that it calls upon a local file though it only uses input from the user microphone; this is a leftover from the original functionality of plotting a sound file's intensity while it played and is no longer integral to the code.
import pyaudio
import wave
import struct
import pandas as pd
from scipy.fftpack import fft
from scipy.signal import find_peaks
import matplotlib.pyplot as plt
import numpy as np
class Wave:
def __init__(self, file) -> None:
self.CHUNK = 1024 * 4
self.obj = wave.open(file, "r")
self.callback_output = []
self.data = self.obj.readframes(self.CHUNK)
self.rate = 44100
# Initiate an instance of PyAudio
self.p = pyaudio.PyAudio()
# Open a stream with the file specifications
self.stream = self.p.open(format = pyaudio.paInt16,
channels = self.obj.getnchannels(),
rate = self.rate,
output = True,
input = True,
frames_per_buffer = self.CHUNK)
def fft_plot(self, distance: float):
x_fft = np.linspace(0, self.rate, self.CHUNK)
fig, ax = plt.subplots()
line_fft, = ax.semilogx(x_fft, np.random.rand(self.CHUNK), "-", lw = 2)
# Bind plot window sizes
ax.set_xlim(20, self.rate / 2)
plot_data = self.stream.read(self.CHUNK)
self.data_int = pd.DataFrame(struct.unpack(\
str(self.CHUNK * 2) + 'h', plot_data)).astype(dtype = "b")[::2]
y_fft = fft(self.data_int)
line_fft.set_ydata(np.abs(y_fft[0:self.CHUNK]) / (256 * self.CHUNK))
plt.show(block = False)
while True:
# Read incoming audio data
data = self.stream.read(self.CHUNK)
# Convert data to bits then to array
self.data_int = struct.unpack(str(4 * self.CHUNK) + 'B', data)
# Recompute FFT and update line
yf = fft(self.data_int)
line_data = np.abs(yf[0:self.CHUNK]) / (128 * self.CHUNK)
line_fft.set_ydata(line_data)
# Find all values above threshold
peaks, _ = find_peaks(line_data, distance = distance)
# Update the plot
plt.plot(peaks, line_data[peaks], "x")
fig.canvas.draw()
fig.canvas.flush_events()
# Exit program when plot window is closed
fig.canvas.mpl_connect('close_event', exit)
test_file = "C:/Users/Tam/Documents/VScode/Final Project/PrismGuitars.wav"
audio_test = Wave(test_file)
audio_test.fft_plot(2000)
The code does not throw any errors and runs fine with an okay framerate and only terminates when the plot window is closed, all of which is good. The issue I'm encountering is with the determination and plotting of the peaks of line_data as when I run this code the output over time looks like this matplotlib graph instance.
It seems that the peaks (or peak) are being found but at a lower frequency than the x of line_data and as such are shifted comparatively. The other, more minor, issue is that since this is a live plot I would like to clear the previous instance of the peak marker so that it only shows the current instance and not all of the ones plotted prior.
I have attempted in prior fixes to use the line_fft in the peak detection but as it is cast to a Line2D format the peak detection algorithm isn't able to deal with the data type. I have also tried implementing a list comprehension as seen in this post but the time to cast to list is prohibitively slow and did not return any peak markers when I ran it.
EDIT: Following Jody's input the program now returns the proper values as I was only printing an index for the x-coordinate of the peak marker. Nevertheless I would still appreciate some insight as to whether it is possible to update per marker rather than having all the previous ones constantly displayed.
As for the marker updating I have attempted to clear the plot in the while loop both before and after drawing the markers (in different tests of course) but I only ever end up with a completely blank graph.
Please let me know if there is anything I should clarify and thank you for your time.
As Jody pointed out the peaks variable contains indexes for the detected peaks that then need to be retrieved from x_fft and line_data in order to match up with the displayed data.
First we create a scatter plot:
scat = ax.scatter([], [], c = "purple", marker = "x")
This data can then be stacked using a container variable in the while loop as such:
array_peaks = np.c_[x_fft[peaks], line_data[peaks]]
and update the data in the while loop with:
scat.set_offsets(array_peaks)
My main task is to recognize a human humming from a microphone in real time. As the first step to recognizing signals in general, I have made a 5 seconds recording of a 440 Hz signal generated from an app on my phone and tried to detect the same frequency.
I used Audacity to plot and verify the spectrum from the same 440Hz wav file and I got this, which shows that 440Hz is indeed the dominant frequency :
(https://i.imgur.com/2UImEkR.png)
To do this with python, I use the PyAudio library and refer this blog. The code I have so far which I run with the wav file is this :
"""PyAudio Example: Play a WAVE file."""
import pyaudio
import wave
import sys
import struct
import numpy as np
import matplotlib.pyplot as plt
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
i = 0
while data != '':
i += 1
data_unpacked = struct.unpack('{n}h'.format(n= len(data)/2 ), data)
data_np = np.array(data_unpacked)
data_fft = np.fft.fft(data_np)
data_freq = np.abs(data_fft)/len(data_fft) # Dividing by length to normalize the amplitude as per https://www.mathworks.com/matlabcentral/answers/162846-amplitude-of-signal-after-fft-operation
print("Chunk: {} max_freq: {}".format(i,np.argmax(data_freq)))
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(data_freq)
ax.set_xscale('log')
plt.show()
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
In the output, I get that the max frequency is 10 for all the chunks and an example of one of the plots is :
(https://i.imgur.com/zsAXME5.png)
I had expected this value to be 440 instead of 10 for all the chunks. I admit I know very little about the theory of FFTs and I appreciate any help in letting my solve this.
EDIT:
The sampling rate is 44100. no. of channels is 2 and sample width is also 2.
Forewords
As xdurch0 pointed out, you are reading a kind of index instead of a frequency. If you are about to make all computation by yourself you need to compute you own frequency vector before plotting if you want to get consistent result. Reading this answer may help you towards the solution.
The frequency vector for FFT (half plane) is:
f = np.linspace(0, rate/2, N_fft/2)
Or (full plane):
f = np.linspace(-rate/2, rate/2, N_fft)
On the other hand we can delegate most of the work to the excellent scipy.signal toolbox which aims to cope with this kind of problems (and many more).
MCVE
Using scipy package it is straight forward to get the desired result for a simple WAV file with a single frequency (source):
import numpy as np
from scipy import signal
from scipy.io import wavfile
import matplotlib.pyplot as plt
# Read the file (rate and data):
rate, data = wavfile.read('tone.wav') # See source
# Compute PSD:
f, P = signal.periodogram(data, rate) # Frequencies and PSD
# Display PSD:
fig, axe = plt.subplots()
axe.semilogy(f, P)
axe.set_xlim([0,500])
axe.set_ylim([1e-8, 1e10])
axe.set_xlabel(r'Frequency, $\nu$ $[\mathrm{Hz}]$')
axe.set_ylabel(r'PSD, $P$ $[\mathrm{AU^2Hz}^{-1}]$')
axe.set_title('Periodogram')
axe.grid(which='both')
Basically:
Read the wav file and get the sample rate (here 44.1kHz);
Compute the Power Spectrum Density and frequencies;
Then display it with matplotlib.
This outputs:
Find Peak
Then we can find the frequency of the first highest peak (P>1e-2, this criterion is subject to tuning) using find_peaks:
idx = signal.find_peaks(P, height=1e-2)[0][0]
f[idx] # 440.0 Hz
Putting all together it merely boils down to:
def freq(filename, setup={'height': 1e-2}):
rate, data = wavfile.read(filename)
f, P = signal.periodogram(data, rate)
return f[signal.find_peaks(P, **setup)[0][0]]
Handling multiple channels
I tried this code with my wav file, and got the error for the line
axe.semilogy(f, Pxx_den) as follows : ValueError: x and y must have
same first dimension. I checked the shapes and f has (2,) while
Pxx_den has (220160,2). Also, the Pxx_den array seems to have all
zeros only.
Wav file can hold multiple channels, mainly there are mono or stereo files (max. 2**16 - 1 channels). The problem you underlined occurs because of multiple channels file (stereo sample).
rate, data = wavfile.read('aaaah.wav') # Shape: (46447, 2), Rate: 48 kHz
It is not well documented, but the method signal.periodogram also performs on matrix and its input is not directly consistent with wavfile.read output (they perform on different axis by default). So we need to carefully orient dimensions (using axis switch) when performing PSD:
f, P = signal.periodogram(data, rate, axis=0, detrend='linear')
It also works with Transposition data.T but then we need to back transpose the result.
Specifying the axis solve the issue: frequency vector is correct and PSD is not null everywhere (before it performed on the axis=1 which is of length 2, in your case it performed 220160 PSD on 2-samples signals we wanted the converse).
The detrend switch ensure the signal has zero mean and its linear trend is removed.
Real application
This approach should work for real chunked samples, provided chunks hold enough data (see Nyquist-Shannon sampling theorem). Then data are sub-samples of the signal (chunks) and rate is kept constant since it does not change during the process.
Having chunks of size 2**10 seems to work, we can identify specific frequencies from them:
f, P = signal.periodogram(data[:2**10,:], rate, axis=0, detrend='linear') # Shapes: (513,) (513, 2)
idx0 = signal.find_peaks(P[:,0], threshold=0.01, distance=50)[0] # Peaks: [46.875, 2625., 13312.5, 16921.875] Hz
fig, axe = plt.subplots(2, 1, sharex=True, sharey=True)
axe[0].loglog(f, P[:,0])
axe[0].loglog(f[idx0], P[idx0,0], '.')
# [...]
At this point, the trickiest part is the fine tuning of find-peaks method to catch desired frequencies. You may need to consider to pre-filter your signal or post-process the PSD in order to make the identification easier.
I have a Matlab script to compute the DFT of a signal and plot it:
(data can be found here)
clc; clear; close all;
fid = fopen('s.txt');
txt = textscan(fid,'%f');
s = cell2mat(txt);
nFFT = 100;
fs = 24000;
deltaF = fs/nFFT;
FFFT = [0:nFFT/2-1]*deltaF;
win = hann(length(s));
sw = s.*win;
FFT = fft(sw, nFFT)/length(s);
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
absFFT = 20*log10(abs(FFT));
plot(FFFT, absFFT)
grid on
I am trying to translate it to Python and can't get the same result.
import numpy as np
from matplotlib import pyplot as plt
x = np.genfromtxt("s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = [n * deltaF for n in range(nfft/2-1)]
ffft = np.array(ffft)
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.fft(xw, nfft)/len(x)
fft = fft[0]+ [2*fft[1:nfft/2]]
fftabs = 20*np.log10(np.absolute(fft))
plt.figure()
plt.plot(ffft, np.transpose(fftabs))
plt.grid()
The plots I get (Matlab on the left, Python on the right):
What am I doing wrong?
Both codes are different in one case you concatenate two lists
FFT = [FFT(1); 2*FFT(2:nFFT/2)];
in the matlab code
in the other you add the first value of fft with the rest of the vector
fft = fft[0]+ [2*fft[1:nfft/2]]
'+' do not concatenate here because you have numpy array
In python, it should be:
fft = fft[0:nfft/2]
fft[1:nfft/2] = 2*fft[1:nfft/2]
I am not a Mathlab user so I am not sure but there are few things I'd ask to see if I can help you.
You called np.array after array has been made (ffft). That probably will not change the nature of array as well as you hoped, perhaps it would be better to try to define it inside np.array(n * deltaF for n in range(nfft/2-1)) I am not sure of formatting but you get the idea. The other thing is that the range doesn't seem right to me. You want it to have a value of 49?
Another one is the fft = fft[0]+ [2*fft[1:nfft/2]] compared to FFT = [FFT(1); 2*FFT(2:nFFT/2)]; I am not sure if the comparsion is accurate or not. It just seemed to be a different type of definition to me?
Also, when I do these type of calculations, I 'print' out the intermediate steps so I can compare the numbers to see where it breaks.
Hope this helps.
I found out that using np.fft.rfft instead of np.fft.fft and modifying the code as following does the job :
import numpy as np
from matplotlib import pyplot as pl
x = np.genfromtxt("../Matlab/s.txt", delimiter=' ')
nfft = 100
fs = 24000
deltaF = fs/nfft;
ffft = np.array([n * deltaF for n in range(nfft/2+1)])
window = np.hanning(len(x))
xw = np.multiply(x, window)
fft = np.fft.rfft(xw, nfft)/len(x)
fftabs = 20*np.log10(np.absolute(fft))
pl.figure()
pl.plot(np.transpose(ffft), fftabs)
pl.grid()
The resulting plot :
right result with Python
I can see that the first and the last points, as well as the amplitudes are not the same. It isn't a problem for me (I am more interested in the general shape), but if someone can explain, I'd be happy.
I have this EMG signal and I would like to plot the mean power frequency based on this article. I implement it in Matlab using the following code:
clear all;
close all;
EMG=load('EMG.txt');
N=1000; %my window
z=1;
fs=200 %sampling rate
for i=1:length(EMG)-N
DUM=0;
NUM=0;
FT=fft(EMG(i:i+N-1));
psd=FT.*conj(FT);
NFFT=length(fft2);
f = [1:NFFT/2]*fs/N;
for j=1:NFFT/2
NUM=NUM+f(j)*psd(j);
DUM=DUM+psd(j);
end
MPF(z)=NUM/DUM;
z=z+1;
end
And the plot of MPF is:
Following I am trying to do the same in Python. The code is:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('EMG.txt', names=['EMG'])
EMG=df['EMG'].tolist()
sampling_rate=200
N=1000 #my window
FT=np.fft.fft(EMG, axis=0)
psd=FT*np.conj(FT)
NFFT=len(FT)
f =(np.arange(0,NFFT/2)*sampling_rate)/N
NUM=0
DUM=0
MPF=[]
for j in np.arange(1,NFFT/2):
NUM=NUM+f[j]*psd[j]
DUM=DUM+psd[i]
MPF.append(NUM/DUM)
plt.plot(MPF)
plt.show()
And the plot of the MPF is:
Why are different?
Update
Following the advice of Dan at the comment section I modify my Python code as follow and the results more or less the same, except that the Matlab code is way faster than Python which in my case was running out of memory:
sampling_rate=200
N=1000
MPF=[]
for i in range(0,len(EMG)-N):
signal=EMG[i:(i+N)]
FT=np.fft.fft(signal, axis=0)
psd=FT*np.conj(FT)
NFFT=len(FT)
f =(np.arange(0,NFFT/2)*sampling_rate)/N
D_1=0
N_1=0
for j in np.arange(1,NFFT/2):
D_1=D_1+f[j]*psd[j]
N_1=N_1+psd[j]
MPF.append(D_1/N_1)
plt.plot(MPF)
plt.show()
Choosing the first 22000 samples the results are:
'detrend' option would cause the difference of the first value (0 Hz or DC).
The default setting is 'constant', which removes DC component from the original data.
f, Pxx_den = signal.periodogram(x, fs, detrend=False)
I am going on with this question and answer.
1) In Matlab, try to use the built-in function meanfreq, and medfreq will also help you to find the median frequency.
2) Until now, I have not found a package to solve the mean/median frequency with Python.
3) I also tried to solve the mean/median frequency with Python.
And I follow the computation steps in Matlab meanfreq. (In Matlab command windows, type edit meanfreq.m, you could see the source code of meanfreq. The source code would help a lot.
4) The following is my python-based implementation of meanfreq
from scipy import signal
def meanfreq(x, fs):
f, Pxx_den = signal.periodogram(x, fs)
Pxx_den = np.reshape( Pxx_den, (1,-1) )
width = np.tile(f[1]-f[0], (1, Pxx_den.shape[1]))
f = np.reshape(f, (1, -1))
P = Pxx_den * width
pwr = np.sum(P)
mnfreq = np.dot(P, f.T)/pwr
return mnfreq
5) You can debug these source code of both mine and Matlab.
6) There is also a small 'bug' in my code, or not mine. When you debug them, you may find the first (only the first value) of Pxx is totally different. The rest of values are identical.
I am also very confused here. Help me. :)
I have a signal in frequency domain.Then I took numpy.fft.ifft of signal.I got time domain signal.Again I took fft of same time signal properly I'm not getting negative and positive frequencies(Plot 3 in Figure).
time = np.arange(0, 10, .01)
N = len(time)
signal_td = np.cos(2.0*np.pi*2.0*time)
signal_fd = np.fft.fft(signal_td)
signal_fd2 = signal_fd[0:N/2]
inv_td2 = np.fft.ifft(signal_fd2)
fd2 = np.fft.fft(inv_td2)
General comment: I avoid using time as a variable name because IPython loads it as a "magic" command.
Something I find at times confusing about matplotlib is that when you plot a complex array, it actually plots the real part. In the code snippet:
tt = np.arange(0, 10, .01)
N = len(tt)
signal_td = np.cos(2.0*np.pi*2.0*tt)
signal_fd = np.fft.fft(signal_td)
signal_fd2 = signal_fd[0:N/2]
inv_td2 = np.fft.ifft(signal_fd2)
fd2 = np.fft.fft(inv_td2)
The following arrays have dtype of float64: tt and signal_td. The others are complex128. The reason you only see one peak in fd2 is because it is a transform of exp(4j*np.pi*tt) rather than cos(4*np.pi*tt).