Unsure how to use FFT data for spectrum analyzer - python

I'm trying to create a home made spectrum analyzer with 8 strips of LED's.
The part i'm struggling with is performing the FFT and understanding how to use the results.
So far this is what I have:
import opc
import time
import pyaudio
import wave
import sys
import numpy
import math
CHUNK = 1024
# Gets the pitch from the audio
def pitch(signal):
# NOT SURE IF ANY OF THIS IS CORRECT
signal = numpy.fromstring(signal, 'Int16');
print "signal = ", signal
testing = numpy.fft.fft(signal)
print "testing = ", testing
wf = wave.open(sys.argv[1], 'rb')
RATE = wf.getframerate()
p = pyaudio.PyAudio() # Instantiate PyAudio
# Open Stream
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Read data
data = wf.readframes(CHUNK)
# Play Stream
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
frequency = pitch(data)
print "%f frequency" %frequency
I'm struggling with what to do in the pitch method. I know i need to perform FFT on the data that is passed in, but am really unsure how to do it.
Also should be using this function?

Because of the way np.fft.fft works, if you use 1024 data points you will get values for 512 frequencies (plus a value zero Hz, DC offset). If you only want 8 frequencies you have to feed it 16 data points.
You might be able to do what you want by down sampling by a factor of 64 - then 16 down sampled points would be time-equivalent to 1024 original points. I've never explored this so I don't know what this entails or what the pitfalls might be.
You're going to have to do some learning - The Scientist and Engineer's Guide to Digital Signal Processing really is an excellant resource, at least it was for me.
Keep in mind that for an audio cd .wav file the sample frequency is 44100 Hz - a 1024 sample chunk is only 23 mS of the sound.
scipy.io.wavfile.read makes getting the data easy.
samp_rate, data = scipy.io.wavfile.read(filename)
data is a 2-d numpy array with one channel in in column zero, data[:,0], and the other in column 1, data[:,1]
Matplotlib's specgram and psd functions can give you the data you want. A graphing analog to what you are trying to do would be.
from matplotlib import pyplot as plt
import scipy.io.wavfile
samp_rate, data = scipy.io.wavfile.read(filename)
Pxx, freqs, bins, im = plt.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
plt.show()
plt.close()
Since you aren't doing any plotting just use matplolib.mlab.specgram.
Pxx, freqs, t = matplolib.mlab.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
Its return values (Pxx, freqs, t) are
- *Pxx*: 2-D array, columns are the periodograms of successive segments
- *freqs*: 1-D array of frequencies corresponding to the rows in Pxx
- *t*: 1-D array of times corresponding to midpoints of segments.
Pxx[1:, 0] would be the values for the frequencies for T0, Pxx[1:, 1] for T1, Pxx[1:, 2] for T2, ... This is what you would feed to your display. You don't use Pxx[0, :] because it is for 0 Hz.
power spectral density - matplotlib.mlab.psd()
Maybe another strategy to get down to 8 bands would be to use large chunks and normalize the values. Then you could break the values up into eight segments and get the sum of each segments. I think this is valid - maybe only for the power spectral density. sklearn.preprocessing.normalize
w = sklearn.preprocessing.normalize(Pxx[1:,:], norm = 'l1', axis = 0)
But then again, I just made all that up.

I don't know about the scipy.io.wavfile.read function that #wwii mentions in his answer, but it seems that his suggestion is the way to go to handle the signal loading. However, I just wanted to comment on the fourier transform.
What I imagine that you intend to do with your LED setup is to change each of the LED's brightnesses according to the power of the spectra in each of the 8 frequency bands that you intend to use. Thus, what I understood that you need, is to compute in some way the power as time goes by. The first complication is "how to compute the spectral power?"
The best way to do this is with the numpy.fft.rfft, which computes the fourier transform for signals that only have real numbers (not complex numbers). On the other hand, the function numpy.fft.fft is a general purpose function that can compute the fast fourier transform for signals with complex numbers. The conceptual difference is that numpy.fft.fft can be used to study travelling waves and their propagation direction. This is seen because the returned amplitudes correspond to positive or negative frequencies that indicate how the wave travels. numpy.fft.rfft yields the amplitude for real-valued frequencies as seen in numpy.fft.rfftfreq, which is what you need.
The last issue is to choose appropriate frequency bands in which to compute the spectral power. The human ear has a huge frequency response range and the width of each band will vary very much, with the low frequency band being very narrow and the high frequency band being very wide. Googling around, I found this nice resource that defines 7 relevant frequency bands
Sub-bass: 20 to 60 Hz
Bass: 60 to 250 Hz
Low midrange: 250 to 500 Hz
Midrange: 500 Hz to 2 kHz
Upper midrange: 2 to 4 kHz
Presence: 4 to 6 kHz
Brilliance: 6 to 20 kHz
I would suggest to use these bands, but split the upper midrange into 2-3 kHz and 3-4kHz. That way you'll be able to use your 8 LED setup. I'm uploading an updated pitch function for you to use
wf = wave.open(sys.argv[1], 'rb')
CHUNK = 1024
RATE = wf.getframerate()
DT = 1./float(RATE) # time between two successive audio frames
FFT_FREQS = numpy.fft.nfftfreq(CHUNCK,DT)
FFT_FREQS_INDS = -numpy.ones_like(FFT_FREQS)
bands_bounds = [[20,60], # Sub-bass
[60,250], # Bass
[250,500], # Low midrange
[500,2000], # Midrange
[2000,3000], # Upper midrange 0
[3000,4000], # Upper midrange 1
[4000,6000], # Presence
[6000,20000]] # Brilliance
for f_ind,freq in enumerate(FFT_FREQS):
for led_ind,bounds in enumerate(bands_bounds):
if freq<bounds[1] and freq>=bounds[0]:
FFT_FREQS_INDS[ind] = led_ind
# Returns the spectral power in each of the 8 bands assigned to the LEDs
def pitch(signal):
# CONSIDER SWITCHING TO scipy.io.wavfile.read TO GET SIGNAL
signal = numpy.fromstring(signal, 'Int16');
amplitude = numpy.fft.rfft(signal.astype(numpy.float))
power = [np.sum(np.abs(amplitude[FFT_FREQS_INDS==led_ind])**2) for led_ind in range(len(bands_bounds))]
return power
The first part of the code computes the fft frequencies and constructs the array FFT_FREQS_INDS that indicates to which of the 8 frequency bands the fft frequency corresponds to. Then, in pitch the power of the spectra in each of the bands is computed. Of course, this can be optimized but I tried to make the code self-explanatory.

Related

Librosa - Audio Spectrogram/Frequency Bins to Spectrum

I've read around for several days but haven't been to find a solution... I'm able to build Librosa spectrograms and extract amplitude/frequency data using the following:
audio, sr = librosa.load('short_piano melody_keyCmin_110bpm.wav', sr = 22500)
spectrum = librosa.stft(audio, n_fft=2048, window=scipy.signal.windows.hamming)
D = librosa.amplitude_to_db(np.abs(spectrum), ref=np.max)
n = D.shape[0]
Nfft = 1+2*(n-1)
freq_bins = librosa.fft_frequencies(sr=sr, n_fft=Nfft)
However, I cannot turn the data in D and freq_bins back into a spectrum. Once I am able to do this I can convert the new spectrum into a .wav file and listen to my reconstructed audio... Any advice would be appreciated! Thank you.
When I get your question right, you want to reconstruct the real/imaginary spectrum from your magnitude values. You will need the phase component for that, then its all simple complex number arithmetic. You should be aware that the output of an STFT is an array of complex numbers, and the amplitude is the absulute value of each number, while the phase is the angle of each number
Here´s an example of a time-domain signal transformed to magnitude/phase and back without modifying it:
% get the complex valued spectrum from a sample
spectrum = librosa.stft(audio, n_fft=2048,window=scipy.signal.windows.hamming)
# get magnitude and phase from the complex numbers
magnitude = np.abs(spectrum)
phase = np.angle(spectrum)
# reconstruct real/imaginary parts from magnitude and phase
spectrum = magnitude * np.exp(1j*phase)
# transform back to time-domain
In your case, you should first convert the db-values back to amplitude values, of course. Even having no experience with librosa, I´m sure that there is also a function for that.

Normalizing FFT spectrum magnitude to 0dB

I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.
Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.
What's physical meaning of the amplitude in the second picture?
How to normalize the amplitude to 0dB like the one in Audacity?
Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?
Related code:
import numpy as np
from pylab import plot, show
from scipy.io import wavfile
sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0
rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))
plot(f, p)
show()
Update
I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.
And scale the y-axis to decibel as #Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?
Updated code:
fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))
About the bounty
With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.
I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.
After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
segment_size = 512
fs, x = wavfile.read('sine3k6k.wav')
x = x / 32768.0 # scale signal to [-1.0 .. 1.0]
noverlap = segment_size / 2
f, Pxx = signal.welch(x, # signal
fs=fs, # sample rate
nperseg=segment_size, # segment size
window='hanning', # window type to use
nfft=segment_size, # num. of samples in FFT
detrend=False, # remove DC part
scaling='spectrum', # return power spectrum [V^2]
noverlap=noverlap) # overlap between segments
# set 0 dB to energy of sine wave with maximum amplitude
ref = (1/np.sqrt(2)**2) # simply 0.5 ;)
p = 10 * np.log10(Pxx/ref)
fill_to = -150 * (np.ones_like(p)) # anything below -150dB is irrelevant
plt.fill_between(f, p, fill_to )
plt.xlim([f[2], f[-1]])
plt.ylim([-90, 6])
# plt.xscale('log') # uncomment if you want log scale on x-axis
plt.xlabel('f, Hz')
plt.ylabel('Power spectrum, dB')
plt.grid(True)
plt.show()
Some necessary explanations on parameters:
wave file is read as 16-bit PCM, in order to be compatible with Audacity it should be scaled to be |A|<1.0
segment_size is corresponding to Size in Audacity's GUI.
default window type is 'Hanning', you can change it if you want.
overlap is segment_size/2 as in Audacity code.
output window is framed to follow Audacity style. They throw away first low frequency bins and cut everything below -90dB
What's physical meaning of the amplitude in the second picture?
It is basically amount of energy in the frequency bin.
How to normalize the amplitude to 0dB like the one in Audacity?
You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.
Can I treat the amplitude below -90dB so low that it can be ignored?
Yes, anything below -40dB is usually considered negligeble

What is the proper way to plot spectrum of a complex signal sampled in a narrow range?

I have some complex data (small bandwidth around a set frequency) that I'm curious to plot, but I am slightly lost as to how should I proceed about interpreting a complex signal sampled in a particular range.
So, for instance, here is the code (and my rather poor attempt at the problem) that I wrote so I have a clean example to experiment with artificial signal, that generates complex representation of 78 KHz wave. What I am trying to do is to get a plot that is centred at 120 KHz, and spans from 70 to 170 KHz, mimicking narrow sampling range of the real receiver.
import numpy as np
import matplotlib.pyplot as plt
#sampling rate, samples/second; 100 KHz
rate = 100*10**3
#sample spacing in time, seconds/sample
interval = np.true_divide(1, rate)
#length of the fourier transform
n = 256
#time vector
t = np.linspace(0.0, n*interval, n)
#frequency of artificial signal; 78 KHz
f = 78*10**3
#complex signal
s = np.exp(1j*2*np.pi*f*t)
#dft of the data
dft = np.fft.fft(s)
#frequency bins
x = np.fft.fftfreq(n, d=interval)
#center zero-frequency component in data; take absolute values
dft = np.abs(np.fft.fftshift(dft))
#center zero frequency component in bins; naively add the center frequency, 120 KHz
x = np.fft.fftshift(x) + 120*10**3
plt.plot(x, dft)
plt.show()
The output, is wrong, as expected with the crude attempt to mimic particular frequency range.
Plot made by the code snippet above
P.S. Different plot , with f = 88*10*83 - why has the magnitude changed here, suddenly?
Edit: My post has been marked for duplicated with a topic related purely to plotting, while what I'm actually after is processing and/or inversion of bandpass-filtered data.
Nyquist Frequency and Aliasing
Your signal should be a (complex) exponential oscillation at +78 kHz, sampled at 100 kHz. This doesn't work. What you see instead is an alias frequency at -22 kHz (78 kHz - 100 kHz). You have to make sure that no frequency of your signal is higher than half your sampling frequency. For a signal of 78 kHz take a sample frequency of 200 kHz, for example.
import numpy as np
from matplotlib import pyplot as plt
sample_frequency = 200e3 # 200 kHz
sample_interval = 1 / sample_frequency
samples = 256 # you don't necessarily have to use a power of 2
time = np.linspace(0, samples*sample_interval, samples)
signal_frequency = 78e3 # 78 kHz
signal = np.exp(2j*np.pi*signal_frequency*time)
FFT and fftfreq
np.fft.fftfreq already returns the right frequencies, adding a "center frequency" mekes no sense. Don't do it.
signal_spectrum = np.fft.fftshift(np.fft.fft(signal))
freqs = np.fft.fftshift(np.fft.fftfreq(samples, d=sample_interval))
Plotting
The plotting part of your question is only about setting the axes. Use plt.xlim.
plt.figure(figsize=(10,5))
plt.plot(freqs / 1e3, np.abs(signal_spectrum)) # in kHz
plt.xlim(70, 170)
The plotted line ends just before 100 kHz, because as mentioned above your signal cannot have a frequency part higher than your half sample frequency.
Magnitude of spectrum
Since your signal is time-discrete (several single samples, not a continuous function), your spectrum is continuous. The Discrete Fourier Transform, however, only returns discrete samples of the continuous spectrum. If you would fit a curve through the sampling points, its peak would have the same magnitude for different frequencies.
Alternatively, you could increase the number of FFT sampling points by zero-padding your signal (take a look at the numpy.fft.fft documentation):
signal_spectrum = np.fft.fftshift(np.fft.fft(signal, 10*samples))
freqs = np.fft.fftshift(np.fft.fftfreq(10*samples, d=sample_interval))
plt.figure(figsize=(10,5))
plt.plot(freqs / 1e3, np.abs(signal_spectrum)) # in kHz
plt.xlim(65, 95).
plt.grid()
If you're asking yourself why the spectrum looks so rippled, Take a look at spectral leakage.

FFT using Python - unexpected low frequencies

I'm still trying to get frequency analysis for this data using FFT in Python.
The sampling rate is 1 data point per minute.
My code is:
from scipy.fftpack import fft
df3 = pd.read_csv('Pressure - Dates by Minute.csv', sep=",", skiprows=0)
df3['Pressure FFT'] = df3['ATMOSPHERIC PRESSURE (hPa) mean'] - df3['ATMOSPHERIC PRESSURE (hPa) mean'].mean()
Pressure = df3['Pressure FFT']
Fs = 1/60
Ts = 1.0/Fs
n = len(Pressure)
k = np.arange(n)
T = n/Fs
t = np.arange(0,1,1/n) # time vector
frq = k/T # two sides frequency range
frq = frq[range(int(n/2))] # one side frequency range
Y = np.fft.fft(Pressure)/n # fft computing and normalization
Y = Y[range(int(n/2))]
fig, ax = plt.subplots(2, 1)
ax[0].plot(t,Pressure)
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Amplitude')
ax[1].plot(frq,abs(Y),'r') # plotting the spectrum
ax[1].set_xlabel('Freq (Hz)')
ax[1].set_ylabel('|Y(freq)|')
But the result gives:
So my problems are:
1) Why there are no frequencies at all ? The data is clearly periodic.
2) Why the frequency spectrum is so low ? (0 - 0.009)
3) Maybe I should try different filtering technique?
Any insights ?
Thanks !!!
1) Why there are no frequencies at all ? The data is clearly periodic.
Well, there is frequency content, it's just not exactly visible because of its structure. Try changing the line that plots the frequency spectrum, from ax[1].plot(frq,abs(Y),'r') to ax[1].semilogy(frq,abs(Y),'r')
This will result to:
Where we have now applied a simple transformation that boosts low values and limits high values. For more information please see this link. Of course, having removed the DC (as you do on line 3 of your code) helps too.
This still seems a bit blurry and it is, but if we zoom in to the lower part of the spectrum, we see this:
Which shows a spike at approximately 2.3e-05 Hz which corresponds to approximately 12 hours.
2) Why the frequency spectrum is so low ? (0 - 0.009)
Because you sample once every 60 seconds, therefore your sampling frequency is (approximately) 0.016 Hz. Your spectrum contains everything between DC (0Hz) and 0.0083Hz. For more information, please see this link
3) Maybe I should try different filtering technique?
You can try windowing if you can't resolve a harmonic but it doesn't look like it's needed here.
Hope this helps.
Part of the reason why those frequencies seem so low is because the time axis in your amplitude plot is scaled weirdly. If you really have one sample per 60 seconds then the x-axis should range between 0 and 1690260 seconds (i.e. ~20 days!).
By eye, you seem to have about one small peak every 50000 seconds (~2 per day), which would correspond to a frequency of about 2x10⁻⁵ Hz. Your periodogram therefore looks pretty reasonable to me, given how massive the scale of the x-axis is.

Python frequency detection

Ok what im trying to do is a kind of audio processing software that can detect a prevalent frequency an if the frequency is played for long enough (few ms) i know i got a positive match. i know i would need to use FFT or something simiral but in this field of math i suck, i did search the internet but didn not find a code that could do only this.
the goal im trying to accieve is to make myself a custom protocol to send data trough sound, need very low bitrate per sec (5-10bps) but im also very limited on the transmiting end so the recieving software will need to be able custom (cant use an actual hardware/software modem) also i want this to be software only (no additional hardware except soundcard)
thanks alot for the help.
The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.
However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.
It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
If you're going to use FSK (frequency shift keying) for encoding data, you're probably better off using the Goertzel algorithm so you can check just the frequencies you want, instead of a full DFT/FFT.
You can find the frequency spectrum of the sliding windows over your sound from here and then check the presence of the prevalent frequency band via finding the area under the frequency spectrum curve for that band from here.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)
# Sine sample with a frequency of 5hz and add some noise
sr = 32 # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)
# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)
# Calculate percentage for a frequency range
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%
While I haven't tried audio processing with Python before, perhaps you could build something based on SciPy (or its subproject NumPy), a framework for efficient scientific/engineering numerical computation? You might start by looking at scipy.fftpack for your FFT.

Categories

Resources