Python: Frequency Analysis of Sound Files - python

I am generating some sound files that play tones at various frequencies with a certain number of harmonics.
Ultimately, these sounds will be played on a device with a small speaker.
I have the frequency response curve of the speaker and want to do the following in Python:
Plot the frequency spectrum of sound file. I need a take the FFT of the file and plot it with gnuplot
Apply a nonlinear transfer function based on the frequency response curve in the data sheet.
Plot the result after the function is applied.
Does anyone know :
What the simplest way to do this would be?
or of an Application (GNU/Linux based) that could do this for me?

I know you didn't mention Pylab/Matplotlib, but it works. Here is an example (assumes single-channel signal):
x, fs, nbits = audiolab.wavread('schubert.wav')
audiolab.play(x, fs)
N = 4*fs # four seconds of audio
X = scipy.fft(x[:N])
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, N, endpoint=False)
pylab.plot(f, Xdb)
pylab.xlim(0, 5000) # view up to 5 kHz
Y = X*H
y = scipy.real(scipy.ifft(Y))

you can use numpy and matPlotLib. Something like the code below:
spectrum = numpy.fft.fft(signal)
frequencies = numpy.fft.fftfreq(len(spectrum))
pylab.plot(frequencies,spectrum)
pylab.show()
That will show a graph of the fft spectrum.

scipy has an FFT and hooks nicely into gnuplot. You should be able to use the signal module to do the math.

Related

Librosa - Audio Spectrogram/Frequency Bins to Spectrum

I've read around for several days but haven't been to find a solution... I'm able to build Librosa spectrograms and extract amplitude/frequency data using the following:
audio, sr = librosa.load('short_piano melody_keyCmin_110bpm.wav', sr = 22500)
spectrum = librosa.stft(audio, n_fft=2048, window=scipy.signal.windows.hamming)
D = librosa.amplitude_to_db(np.abs(spectrum), ref=np.max)
n = D.shape[0]
Nfft = 1+2*(n-1)
freq_bins = librosa.fft_frequencies(sr=sr, n_fft=Nfft)
However, I cannot turn the data in D and freq_bins back into a spectrum. Once I am able to do this I can convert the new spectrum into a .wav file and listen to my reconstructed audio... Any advice would be appreciated! Thank you.
When I get your question right, you want to reconstruct the real/imaginary spectrum from your magnitude values. You will need the phase component for that, then its all simple complex number arithmetic. You should be aware that the output of an STFT is an array of complex numbers, and the amplitude is the absulute value of each number, while the phase is the angle of each number
Here´s an example of a time-domain signal transformed to magnitude/phase and back without modifying it:
% get the complex valued spectrum from a sample
spectrum = librosa.stft(audio, n_fft=2048,window=scipy.signal.windows.hamming)
# get magnitude and phase from the complex numbers
magnitude = np.abs(spectrum)
phase = np.angle(spectrum)
# reconstruct real/imaginary parts from magnitude and phase
spectrum = magnitude * np.exp(1j*phase)
# transform back to time-domain
In your case, you should first convert the db-values back to amplitude values, of course. Even having no experience with librosa, I´m sure that there is also a function for that.

Normalizing FFT spectrum magnitude to 0dB

I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.
Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.
What's physical meaning of the amplitude in the second picture?
How to normalize the amplitude to 0dB like the one in Audacity?
Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?
Related code:
import numpy as np
from pylab import plot, show
from scipy.io import wavfile
sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0
rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))
plot(f, p)
show()
Update
I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.
And scale the y-axis to decibel as #Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?
Updated code:
fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))
About the bounty
With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.
I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.
After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
segment_size = 512
fs, x = wavfile.read('sine3k6k.wav')
x = x / 32768.0 # scale signal to [-1.0 .. 1.0]
noverlap = segment_size / 2
f, Pxx = signal.welch(x, # signal
fs=fs, # sample rate
nperseg=segment_size, # segment size
window='hanning', # window type to use
nfft=segment_size, # num. of samples in FFT
detrend=False, # remove DC part
scaling='spectrum', # return power spectrum [V^2]
noverlap=noverlap) # overlap between segments
# set 0 dB to energy of sine wave with maximum amplitude
ref = (1/np.sqrt(2)**2) # simply 0.5 ;)
p = 10 * np.log10(Pxx/ref)
fill_to = -150 * (np.ones_like(p)) # anything below -150dB is irrelevant
plt.fill_between(f, p, fill_to )
plt.xlim([f[2], f[-1]])
plt.ylim([-90, 6])
# plt.xscale('log') # uncomment if you want log scale on x-axis
plt.xlabel('f, Hz')
plt.ylabel('Power spectrum, dB')
plt.grid(True)
plt.show()
Some necessary explanations on parameters:
wave file is read as 16-bit PCM, in order to be compatible with Audacity it should be scaled to be |A|<1.0
segment_size is corresponding to Size in Audacity's GUI.
default window type is 'Hanning', you can change it if you want.
overlap is segment_size/2 as in Audacity code.
output window is framed to follow Audacity style. They throw away first low frequency bins and cut everything below -90dB
What's physical meaning of the amplitude in the second picture?
It is basically amount of energy in the frequency bin.
How to normalize the amplitude to 0dB like the one in Audacity?
You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.
Can I treat the amplitude below -90dB so low that it can be ignored?
Yes, anything below -40dB is usually considered negligeble

Unsure how to use FFT data for spectrum analyzer

I'm trying to create a home made spectrum analyzer with 8 strips of LED's.
The part i'm struggling with is performing the FFT and understanding how to use the results.
So far this is what I have:
import opc
import time
import pyaudio
import wave
import sys
import numpy
import math
CHUNK = 1024
# Gets the pitch from the audio
def pitch(signal):
# NOT SURE IF ANY OF THIS IS CORRECT
signal = numpy.fromstring(signal, 'Int16');
print "signal = ", signal
testing = numpy.fft.fft(signal)
print "testing = ", testing
wf = wave.open(sys.argv[1], 'rb')
RATE = wf.getframerate()
p = pyaudio.PyAudio() # Instantiate PyAudio
# Open Stream
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Read data
data = wf.readframes(CHUNK)
# Play Stream
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
frequency = pitch(data)
print "%f frequency" %frequency
I'm struggling with what to do in the pitch method. I know i need to perform FFT on the data that is passed in, but am really unsure how to do it.
Also should be using this function?
Because of the way np.fft.fft works, if you use 1024 data points you will get values for 512 frequencies (plus a value zero Hz, DC offset). If you only want 8 frequencies you have to feed it 16 data points.
You might be able to do what you want by down sampling by a factor of 64 - then 16 down sampled points would be time-equivalent to 1024 original points. I've never explored this so I don't know what this entails or what the pitfalls might be.
You're going to have to do some learning - The Scientist and Engineer's Guide to Digital Signal Processing really is an excellant resource, at least it was for me.
Keep in mind that for an audio cd .wav file the sample frequency is 44100 Hz - a 1024 sample chunk is only 23 mS of the sound.
scipy.io.wavfile.read makes getting the data easy.
samp_rate, data = scipy.io.wavfile.read(filename)
data is a 2-d numpy array with one channel in in column zero, data[:,0], and the other in column 1, data[:,1]
Matplotlib's specgram and psd functions can give you the data you want. A graphing analog to what you are trying to do would be.
from matplotlib import pyplot as plt
import scipy.io.wavfile
samp_rate, data = scipy.io.wavfile.read(filename)
Pxx, freqs, bins, im = plt.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
plt.show()
plt.close()
Since you aren't doing any plotting just use matplolib.mlab.specgram.
Pxx, freqs, t = matplolib.mlab.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
Its return values (Pxx, freqs, t) are
- *Pxx*: 2-D array, columns are the periodograms of successive segments
- *freqs*: 1-D array of frequencies corresponding to the rows in Pxx
- *t*: 1-D array of times corresponding to midpoints of segments.
Pxx[1:, 0] would be the values for the frequencies for T0, Pxx[1:, 1] for T1, Pxx[1:, 2] for T2, ... This is what you would feed to your display. You don't use Pxx[0, :] because it is for 0 Hz.
power spectral density - matplotlib.mlab.psd()
Maybe another strategy to get down to 8 bands would be to use large chunks and normalize the values. Then you could break the values up into eight segments and get the sum of each segments. I think this is valid - maybe only for the power spectral density. sklearn.preprocessing.normalize
w = sklearn.preprocessing.normalize(Pxx[1:,:], norm = 'l1', axis = 0)
But then again, I just made all that up.
I don't know about the scipy.io.wavfile.read function that #wwii mentions in his answer, but it seems that his suggestion is the way to go to handle the signal loading. However, I just wanted to comment on the fourier transform.
What I imagine that you intend to do with your LED setup is to change each of the LED's brightnesses according to the power of the spectra in each of the 8 frequency bands that you intend to use. Thus, what I understood that you need, is to compute in some way the power as time goes by. The first complication is "how to compute the spectral power?"
The best way to do this is with the numpy.fft.rfft, which computes the fourier transform for signals that only have real numbers (not complex numbers). On the other hand, the function numpy.fft.fft is a general purpose function that can compute the fast fourier transform for signals with complex numbers. The conceptual difference is that numpy.fft.fft can be used to study travelling waves and their propagation direction. This is seen because the returned amplitudes correspond to positive or negative frequencies that indicate how the wave travels. numpy.fft.rfft yields the amplitude for real-valued frequencies as seen in numpy.fft.rfftfreq, which is what you need.
The last issue is to choose appropriate frequency bands in which to compute the spectral power. The human ear has a huge frequency response range and the width of each band will vary very much, with the low frequency band being very narrow and the high frequency band being very wide. Googling around, I found this nice resource that defines 7 relevant frequency bands
Sub-bass: 20 to 60 Hz
Bass: 60 to 250 Hz
Low midrange: 250 to 500 Hz
Midrange: 500 Hz to 2 kHz
Upper midrange: 2 to 4 kHz
Presence: 4 to 6 kHz
Brilliance: 6 to 20 kHz
I would suggest to use these bands, but split the upper midrange into 2-3 kHz and 3-4kHz. That way you'll be able to use your 8 LED setup. I'm uploading an updated pitch function for you to use
wf = wave.open(sys.argv[1], 'rb')
CHUNK = 1024
RATE = wf.getframerate()
DT = 1./float(RATE) # time between two successive audio frames
FFT_FREQS = numpy.fft.nfftfreq(CHUNCK,DT)
FFT_FREQS_INDS = -numpy.ones_like(FFT_FREQS)
bands_bounds = [[20,60], # Sub-bass
[60,250], # Bass
[250,500], # Low midrange
[500,2000], # Midrange
[2000,3000], # Upper midrange 0
[3000,4000], # Upper midrange 1
[4000,6000], # Presence
[6000,20000]] # Brilliance
for f_ind,freq in enumerate(FFT_FREQS):
for led_ind,bounds in enumerate(bands_bounds):
if freq<bounds[1] and freq>=bounds[0]:
FFT_FREQS_INDS[ind] = led_ind
# Returns the spectral power in each of the 8 bands assigned to the LEDs
def pitch(signal):
# CONSIDER SWITCHING TO scipy.io.wavfile.read TO GET SIGNAL
signal = numpy.fromstring(signal, 'Int16');
amplitude = numpy.fft.rfft(signal.astype(numpy.float))
power = [np.sum(np.abs(amplitude[FFT_FREQS_INDS==led_ind])**2) for led_ind in range(len(bands_bounds))]
return power
The first part of the code computes the fft frequencies and constructs the array FFT_FREQS_INDS that indicates to which of the 8 frequency bands the fft frequency corresponds to. Then, in pitch the power of the spectra in each of the bands is computed. Of course, this can be optimized but I tried to make the code self-explanatory.

Python perform FFT on wave data for a 25fps animation

I am doing a project where I want to use the data of a .wav file to drive animation. The problems I am facing are mainly due to the fact that the animation is 25fps and I have 44100 samples per second in the .wav file, so I've broken down apart to 44100/25 samples. Working with the amplitude is fine and I created an initial test to try it out and it worked. This is the code
import wave
import struct
wav = wave.open('test.wav', 'rb')
rate = 44100
nframes = wav.getnframes()
data = wav.readframes(-1)
wav.close()
data_c = [data[offset::2] for offset in range(2)]
ch1 = struct.unpack('%ih' % nframes, data_c[0])
ch2 = struct.unpack('%ih' % nframes, data_c[1])
kf = []
for i in range(0, len(ch2), 44100/25):
cur1 = 0
cur2 = 0
for j in range(i, i+44100/25):
cur1+=ch2[j]
cur2+=ch1[j]
cur = (cur1+cur2) / 44100. / 25. / 2.
kf.append(cur)
min_v = min(kf)
max_v = max(kf)
if abs(max_v) > abs(min_v):
kf = [float(i)/max_v for i in kf]
else:
kf = [float(i)/min_v for i in kf]
Now I want to get the spectrum for each separate keyframe as I do for the amplitude, but I am struggling to think of a way to do it. I can get the spectrum for the whole file using FFT, but that's not I want, because ideally I would like to have different movements of the objects in accordance to different frequencies.
Look at scipy wavfile. It'll turn the wave file into a numpy array. Numpy also has fft functions. Scipy/matplotlib has a spectrogram plot for the entire spectrogram.
from scipy.io import wavfile
sample_rate, data = wavfile.read(filename)
Then you have to get your timing of how you want to read the data. Matplotlib has animation tools that will call a function at a given interval. The other way of doing it is to use PyAudio. If you use pyaudio you can listen to the data while it is displayed.
Next run the data through the FFT. Store the FFT values in a spectrogram array and use matplotlib imshow to display the spectrogram array. You will probably have to rotate the array in some fashion when you display the spectrogram.
From personal experience be careful of python threads. Threading works for I/O, but for calculations the thread can just dominate the whole application slowing everything down. Also GUI elements (like plotting) don't really work in threads. Use matplotlibs animation tools for the plotting.

Python frequency detection

Ok what im trying to do is a kind of audio processing software that can detect a prevalent frequency an if the frequency is played for long enough (few ms) i know i got a positive match. i know i would need to use FFT or something simiral but in this field of math i suck, i did search the internet but didn not find a code that could do only this.
the goal im trying to accieve is to make myself a custom protocol to send data trough sound, need very low bitrate per sec (5-10bps) but im also very limited on the transmiting end so the recieving software will need to be able custom (cant use an actual hardware/software modem) also i want this to be software only (no additional hardware except soundcard)
thanks alot for the help.
The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.
However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.
It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.
Source Code:
# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np
chunk = 2048
# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
p.get_format_from_width(wf.getsampwidth()),
channels = wf.getnchannels(),
rate = RATE,
output = True)
# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
# write data out to the audio stream
stream.write(data)
# unpack the data and times by the hamming window
indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
data))*window
# Take the fft and square each value
fftData=abs(np.fft.rfft(indata))**2
# find the maximum
which = fftData[1:].argmax() + 1
# use quadratic interpolation around the max
if which != len(fftData)-1:
y0,y1,y2 = np.log(fftData[which-1:which+2:])
x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
# find the frequency and output it
thefreq = (which+x1)*RATE/chunk
print "The freq is %f Hz." % (thefreq)
else:
thefreq = which*RATE/chunk
print "The freq is %f Hz." % (thefreq)
# read some more data
data = wf.readframes(chunk)
if data:
stream.write(data)
stream.close()
p.terminate()
If you're going to use FSK (frequency shift keying) for encoding data, you're probably better off using the Goertzel algorithm so you can check just the frequencies you want, instead of a full DFT/FFT.
You can find the frequency spectrum of the sliding windows over your sound from here and then check the presence of the prevalent frequency band via finding the area under the frequency spectrum curve for that band from here.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)
# Sine sample with a frequency of 5hz and add some noise
sr = 32 # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)
# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)
# Calculate percentage for a frequency range
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%
While I haven't tried audio processing with Python before, perhaps you could build something based on SciPy (or its subproject NumPy), a framework for efficient scientific/engineering numerical computation? You might start by looking at scipy.fftpack for your FFT.

Categories

Resources