Creating an amplitude vs frequency spectrogram of an audio file in Python

Creating an amplitude vs frequency spectrogram of an audio file in Python - python

I am trying to create an amplitude vs frequency spectrogram of an audio file in Python. what is the procedure to do so?
Some sample code would be of great help.

Simple spectrum
The simplest way to get an amplitude vs. frequency relationship for an evenly sampled signal x is to compute its Discrete Fourier Transform through the efficient Fast Fourier Transform algorithm. Given a signal x sampled at a regular sampling rate fs, you could do this with:
import numpy as np
Xf_mag = np.abs(np.fft.fft(x))
Each index of the Xf_mag array will then contain the amplitude of a frequency bin whose frequency is given by index * fs/len(Xf_mag). These frequencies can be conveniently computed using:
freqs = np.fft.fftfreq(len(Xf_mag), d=1.0/fs)
Finally the spectrum could be plotted using matplotlib:
import matplotlib.pyplot as plt
plt.plot(freqs, Xf_mag)
Refining the spectrum estimation
You might notice that the spectrum obtained with the simple FFT approach yields a spectrum which appears very noisy (ie. with lots of spikes).
To get a more accurate estimate, a more sophisticated approach would be to compute a power spectrum estimate using techniques such as periodograms (implemented by scipy.signal.periodogram) and Welch's method (implemented by scipy.signal.welch). Note however that in these cases the computed spectrum is proportional to the square of the amplitudes, so that its square root provide an estimate of the Root-Mean-Squared (RMS) amplitudes.
Going back to the signal x sampled at a regular sampling rate fs, such a power spectrum estimate could thus be obtained as described in the samples from scipy's documentation with the following:
f, Pxx = signal.periodogram(x, fs)
A_rms = np.sqrt(Pxx)
The corresponding frequencies f are also calculated in the process, so you could then plot the result with
plt.plot(f, A_rms)
Using scipy.signal.welch is quite similar, but uses a slightly different implementation which provides a different accuracy/resolution tradeoff.

from scipy import signal
import matplotlib.pyplot as plt
fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.01 * fs / 2
time = np.arange(N) / float(fs)
mod = 500*np.cos(2*np.pi*0.25*time)
carrier = amp * np.sin(2*np.pi*3e3*time + mod)
noise = np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
noise *= np.exp(-time/5)
x = carrier + noise
f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
This is pulled from the scipy documentation as you will need scientific computing to create a spectrogram.
install scipy on your machine if you do not have it already and read its documentation:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.spectrogram.html

Related

Plotting high frequency sinusoid with sympy produces distorted plot

I am building a lesson on radio modulation mathematics in a Jupyter notebook. I want to produce a graph showing the flat envelope of a simple sinusoid so I can compare it to the envelope of the product of two sinusoids.
I am representing my sinusoid as a sympy expression, albeit with a rather high frequency. When I plot it, the resulting graph is heavily distorted.
from sympy import *
%matplotlib inline
t = symbols('t')
f_carrier = 10000
carrier = cos(2*pi*f_carrier * t)
plot(carrier, (t, -0.001, 0.001))
Making the domain tighter (-0.0005 to 0.0005) produces less distortion, but still some:
Lower frequencies (with proportional domains), become progressively less distorted.

for plot following the documentation, you should set the adaptive argument to False to allow custom number of points and set nb_of_points parameter of plot to a high number to increase its smoothness. (typically 10 points per cycle)
from sympy import *
%matplotlib inline
t = symbols('t')
f_carrier = 10000
carrier = cos(2*pi*f_carrier * t)
plot(carrier, (t, -0.001, 0.001),adaptive = False,nb_of_points =1000)

How to fit a sinusoidal graph to an audio signal in order to determine frequency and amplitude in Python?

I have an audio signal which has a form similar to the below. I have tried to use Scipy's optimise library to fit a sine function to the data however this does not seem to work due to the form of the data. How else could I fit a sine function to determine the frequency and amplitude?

There are many ways to extract this information out of the data. You can (and probably should) apply some spectral analysis for the most accurate results. Check out SciPy's spectrogram, for instance. However, to quickly get an estimate of the frequency, you could just look at the zero-crossings:
import numpy as np
import matplotlib.pyplot as plt
from math import pi
# Generate a 2 second array of data with millisecond resolution
time, timestep = np.linspace(0, 2, 2000, endpoint=False, retstep=True)
# Generate a constant frequency sine wave with varying amplitude
frequency = 42
amplitude = 1 / ((time - 1)**2 + 0.03)
data = amplitude * np.sin(2*pi*frequency*time)
plt.plot(time, data)
# Extract rising zero crossings
rising_zero_crossing_indices = np.where(np.diff(np.sign(data)) > 0)[0]
rising_zero_crossing_times = time[rising_zero_crossing_indices]
# Find the frequency
period = np.diff(rising_zero_crossing_times)
avg_frequency = 1/np.mean(period)
print(avg_frequency)

Python Implementation of Bartlett Periodogram

I am trying to implement Periodogram in Python based on the description from Bartlett's method, and compared the result with those from Scipy, by setting overlap=0, use window='boxcar' (rectangle window). However, my result is off by some scale factor. Can someone points out what was wrong with my code? Thanks
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
def my_bartlett_periodogram(x, fs, nperseg, nfft):
nsegments = len(x) // nperseg
psd = np.zeros(nfft)
for segment in x.reshape(nsegments, nperseg):
psd += np.abs(np.fft.fft(segment))**2 / nfft
psd[0] = 0 # important!!
psd /= nsegments
psd = psd[0 : nfft//2]
freq = np.linspace(0, fs/2, nfft//2)
return freq, psd
def plot_output(t, x, f1, psd1, f2, psd2):
fig, axs = plt.subplots(3,1, figsize=(12,15))
axs[0].plot(t[:300], x[:300])
axs[1].plot(freq1, psd1)
axs[2].plot(freq2, psd2)
axs[0].set_title('Input (len=8192, fs=512)')
axs[1].set_title('Bartlett Periodogram (nfft=512, zero-overlap, no-window)')
axs[2].set_title('Scipy Periodogram (nfft=512, zero-overlap, no-window)')
axs[0].set_xticks([])
axs[2].set_xlabel('Freq (Hz)')
plt.show()
# Run
fs = nfft = nperseg = 512
t = np.arange(8192) / fs
x = np.sin(2*np.pi*50*t) + np.sin(2*np.pi*100*t) + np.sin(2*np.pi*150*t)
freq1, psd1 = my_bartlett_periodogram(x, fs, nperseg, nfft)
freq2, psd2 = signal.welch(x, fs, nperseg=nperseg, nfft=nfft, window='boxcar', noverlap=0)
plot_output(t, x, freq1, psd1, freq2, psd2)

TL;DR:
Nothing wrong with the code. But welch returns the power spectral density, which is the power spectrum times fs and it compensates for cutting away half the spectrum by multiplying with 2.
To compensate, psd2 * fs / 2 should be very similar to psd.
According to Wikipedia the calculation of psd seems correct:
The original N point data segment is split up into K (non-overlapping) data segments, each of length M
For each segment, compute the periodogram by computing the discrete Fourier transform (DFT version which does not divide by M), then computing the squared magnitude of the result and dividing this by M.
Average the result of the periodograms above for the K data segments.
So whom shall we trust more, Wikipedia or scipy? I would tend towards the latter, but we can find out for ourselves. According to Parseval's theorem the integral over the squared signal should be the same as the integral over the sqared FFT magnitude. Since the Periodogram is obtained from the squared FFT the theorem should hold approximately.
print(np.mean(y**2)) # 1.499727698431174
print(np.mean(psd)) # (1.4999999999999991+0j)
print(np.mean(psd2)) # 0.0058365758754863788
That's close enough for psd, so let's assume it's correct. But I refuse to believe that scipy should be so blatantly wrong! Let's take a closer look at the documentation and see what they have to say about the scaling argument (emphasis mine):
Selects between computing the power spectral density (‘density’) where Pxx has units of V**2/Hz and computing the power spectrum (‘spectrum’) where Pxx has units of V**2, if x is measured in V and fs is measured in Hz. Defaults to ‘density’
Uh-huh! welch's result is the power spectral density, which means it has units of Power per Hz. However, we compared it against the signal power. If we multiply psd2 with the sampling rate to get rid of the 1/Hz units it's the same as psd. Well, except for a factor 2. This factor is meant to compensate for cutting away half the spectrum. If we set return_onesided=False to get the full spectrum that factor is gone.

FFT lots of detail around certain frequency

I have an arbitrary signal and I need to know the frequency spectrum of the signal, which I obtain by doing an FFT. The issue is, I need lots of resolution only around this one particular frequency. The issue is, if I increase my window width, or if I up the sample rate, it goes too slow and I end up with a lot of detail everywhere. I only want a lot of detail in one point, and minimal detail everywhere else.
I tried using a Goertzel filter around just the area I need, and then FFT everywhere else, but that didn't get me any more resolution, which I suppose was to be expected.
Any ideas? My only idea at the moment is to sweep and innerproduct around the value I want.
Thanks.

Increasing the sample rate will not give you a higher spectral resolution, it will only give you more high-frequency information, which you are not interested in. The only way to increase spectral resolution is to increase the window length. There is a way to increase the length of your window artificially by zero-padding, but this only gives you 'fake resolution', it will just yield a smooth curve between the normal points. So the only way is to measure data over a longer period, there is no free lunch.
For the problem you described, the standard way to reduce computation time of the FFT is to use demodulation (or heterodyning, not sure what the official name is). Multiply your data with a sine with a frequency close to your frequency of interest (could be the exact frequency, but that is not necessary), and then decimate your date (low-pass filtering with corner frequency just below the Nyquist frequency of your down-sampled sample rate, followed by down-sampling). In this way, you have much less points, so your FFT will be faster. The resulting spectrum will be similar to your original spectrum, but simply shifted by the demodulation frequency. So when making a plot, simply add f_demod to your x-axis.
One thing to be careful about is that if you multiply with a real sine, your down-sampled spectrum will actually be the sum of two mirrored spectra, since a real sine consists of positive and negative frequencies. There are two solutions to this
demodulate by both a sine and a cosine of the same frequency, so that you obtain 2 spectra, after which taking the sum or difference will get you your spectrum.
demodulate by multiplying with a complex sine of the form exp(2*pi*i*f_demod*t). The input for your FFT will now be complex, so you will have to calculate a two-sided spectrum. But this is exactly what you want, you will get both the frequencies below and above f_demod.
I prefer the second solution. Quick example:
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import psd
from scipy.signal import decimate
f_line = 123.456
f_demod = 122
f_sample = 1000
t_total = 100
t_win = 10
ratio = 10
t = np.arange(0, t_total, 1 / f_sample)
x = np.sin(2*np.pi*f_line * t) + np.random.randn(len(t)) # sine plus white noise
lo = 2**.5 * np.exp(-2j*np.pi*f_demod * t) # local oscillator
y = decimate(x * lo, ratio) # demodulate and decimate to 100 Hz
z = decimate(y, ratio) # decimate further to 10 Hz
nfft = int(round(f_sample * t_win))
X, fx = psd(x, NFFT = nfft, noverlap = nfft/2, Fs = f_sample)
nfft = int(round(f_sample * t_win / ratio))
Y, fy = psd(y, NFFT = nfft, noverlap = nfft/2, Fs = f_sample / ratio)
nfft = int(round(f_sample * t_win / ratio**2))
Z, fz = psd(z, NFFT = nfft, noverlap = nfft/2, Fs = f_sample / ratio**2)
plt.semilogy(fx, X, fy + f_demod, Y, fz + f_demod, Z)
plt.xlabel('Frequency (Hz)')
plt.ylabel('PSD (V^2/Hz)')
plt.legend(('Full bandwidth FFT', '100 Hz FFT', '10 Hz FFT'))
plt.show()
Result:
If you zoom in, you will note that the results are virtually identical within the pass-band of the decimation filter. One thing to be careful of is that the low-pass filters used in decimate will become numerically instable if you use decimation ratios much larger than 10. The solution to this is to decimate in several passes for large ratios, i.e. to decimate by a factor of 1000, you decimate 3 times by a factor 10.

Bandwidth of an EEG signal

I'm trying to perform FFT of an EEG signal in Python, and then basing on the bandwidth determine whether it's alpha or beta signal. It looked fine, but the resulting plots are nothing like they should, the frequencies and magnitude values are not what I expected. Any help appreciated, here's the code:
from scipy.io import loadmat
import scipy
import numpy as np
from pylab import *
import matplotlib.pyplot as plt
eeg = loadmat("eeg_2013.mat");
eeg1=eeg['eeg1'][0]
eeg2=eeg['eeg2'][0]
fs = eeg['fs'][0][0]
fft1 = scipy.fft(eeg1)
f = np.linspace (fs,len(eeg1), len(eeg1), endpoint=False)
plt.figure(1)
plt.subplot(211)
plt.plot (f, abs (fft1))
plt.title ('Magnitude spectrum of the signal')
plt.xlabel ('Frequency (Hz)')
show()
plt.subplot(212)
fft2 = scipy.fft(eeg2)
f = np.linspace (fs,len(eeg2), len(eeg2), endpoint=False)
plt.plot (f, abs (fft2))
plt.title ('Magnitude spectrum of the signal')
plt.xlabel ('Frequency (Hz)')
show()
And the plots:

In order to get an array of the fft frequencies, you should use fftfreq; it gives you an array of frequencies to use as absciss:
from scipy.fftpack import fftfreq
eeg = loadmat("eeg_2013.mat");
eeg1=eeg['eeg1'][0]
eeg2=eeg['eeg2'][0]
fs = eeg['fs'][0][0]
fft1 = scipy.fft(eeg1)
f=fftfreq(eeg1.size,1/fs)
Sorry, I can't test this code in real conditions because you didn't post a data sample, but I hope this should work.
Concerning how to determine the bandwidth, as far as I understand, you want to get the fundamental frequency. There are different ways, more or less complicated whether your signal is noisy or not, ... In your case, you only want to know if the fundamental frequency f0 is in the range 8-13Hz (alpha) or 13-30Hz (beta); one very simple way is to compute the maximum of the fft in the range 8-13Hz: fft1[(f>8) & (f<13)].max() and if it's more than, say, 1000, it's an alpha wave, otherwise it's beta. If your signals are less similar, please post some examples of different kinds of samples and the result you would have, so that we can try more complicated algorithms.

If your sampling frequency is fs and you have N=len(eeg1) samples, then the fft procedure will, of course, return an array of N values. The first N/2 of them correspond to the frequency range 0..fs/2, the second half of the frequency corresponds to the mirrored frequency range -fs/2..0. For real input signals the mirrored half is just the complex conjugate of the positive half, so it can be disregarded in further analysis (but not in the inverse fft).
So essentially, you should format
f=linspace(0,N-1,N)*fs/N
Edit: or even more simple with minimal changes to the inital code
f = np.linspace (0,fs,len(eeg1), endpoint=False)
so f ranges from 0 to just before fs and disregard the second half of the fft result in the output:
plt.plot( f(0:N/2), abs( fft1(0:N/2) ) )
Added: You can use fftshift to exchange both halves, then the correct frequency range is
f = np.linspace (-fs/2,fs/2,len(eeg1), endpoint=False)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.