Librosa - Audio Spectrogram/Frequency Bins to Spectrum

Librosa - Audio Spectrogram/Frequency Bins to Spectrum - python

I've read around for several days but haven't been to find a solution... I'm able to build Librosa spectrograms and extract amplitude/frequency data using the following:
audio, sr = librosa.load('short_piano melody_keyCmin_110bpm.wav', sr = 22500)
spectrum = librosa.stft(audio, n_fft=2048, window=scipy.signal.windows.hamming)
D = librosa.amplitude_to_db(np.abs(spectrum), ref=np.max)
n = D.shape[0]
Nfft = 1+2*(n-1)
freq_bins = librosa.fft_frequencies(sr=sr, n_fft=Nfft)
However, I cannot turn the data in D and freq_bins back into a spectrum. Once I am able to do this I can convert the new spectrum into a .wav file and listen to my reconstructed audio... Any advice would be appreciated! Thank you.

When I get your question right, you want to reconstruct the real/imaginary spectrum from your magnitude values. You will need the phase component for that, then its all simple complex number arithmetic. You should be aware that the output of an STFT is an array of complex numbers, and the amplitude is the absulute value of each number, while the phase is the angle of each number
Here´s an example of a time-domain signal transformed to magnitude/phase and back without modifying it:
% get the complex valued spectrum from a sample
spectrum = librosa.stft(audio, n_fft=2048,window=scipy.signal.windows.hamming)
# get magnitude and phase from the complex numbers
magnitude = np.abs(spectrum)
phase = np.angle(spectrum)
# reconstruct real/imaginary parts from magnitude and phase
spectrum = magnitude * np.exp(1j*phase)
# transform back to time-domain
In your case, you should first convert the db-values back to amplitude values, of course. Even having no experience with librosa, I´m sure that there is also a function for that.

Related

python Spectrogram by using value in timeseries

I am new to spectrogram and try to plot spectrogram by using relative velocity variations value of ambient seismic noise.
So the format of the data I have is 'time', 'station pair', 'velocity variation value' as below. (If error is needed, I can add it on the data)
2013-11-24,05_PK01_05_SS01,0.057039371136200
2013-11-25,05_PK01_05_SS01,-0.003328071661900
2013-11-26,05_PK01_05_SS01,0.137221779659000
2013-11-27,05_PK01_05_SS01,0.068823721831000
2013-11-28,05_PK01_05_SS01,-0.006876687060810
2013-11-29,05_PK01_05_SS01,-0.023895268916200
2013-11-30,05_PK01_05_SS01,-0.105762098404000
2013-12-01,05_PK01_05_SS01,-0.028069540807700
2013-12-02,05_PK01_05_SS01,0.015091601414300
2013-12-03,05_PK01_05_SS01,0.016353885353700
2013-12-04,05_PK01_05_SS01,-0.056654092859700
2013-12-05,05_PK01_05_SS01,-0.044520608528500
2013-12-06,05_PK01_05_SS01,0.020226437197700
...
But I searched for it, I can only see people using data of network, station, location, channel, or wav data.
Therefore, I have no idea what I have to start because the data format is different..
If you know some ways to get spectrogram by using 'value' of timeseries.
p.s. I would compute cross correlation with velocity variation value and other environmental data such as air temperature, air pressure etc.
###Edit (I add two pictures but the notice pops up that I cannot post images yet but only link)
I would write about groundwater level or other environmental data because those are easier to see variations.
The plot that I want to make similarly is from David et al., 2021 as below.
enter image description here
X axis shows time series and y axis shows cycles/day.
So when the light color is located at 1 then it means diurnal cycle (if 2, semidiurnal cycle).
Now I plot spectrogram and make the frequency as cycles / 1day.
enter image description here
But what I found to edit are two.
In the reference, it is normalized as log scale.
So I need to find the way to edit it as log scale.
In the reference, the x axis becomes 1*10^7.
But in my data, there are only 755 points in time series (dates in 2013-2015).
So what do I have to do to make x axis to time series?
p.s. The code I made
fil=pd.read_csv('myfile.csv')
cf=fil.iloc[:,1]
cf=cf/max(abs(cf))
nfft=128 #The number of data points
fs=1/86400 #Hz [0, fs/2] cycles / unit time
n=len(cf)
fr=fs/n
spec, freq, tt, pplot = pylab.specgram(cf, NFFT=nfft, Fs=fs, detrend=pylab.detrend,
window=pylab.window_hanning, noverlap=100, mode='psd')
pylab.title('%s' % e_n)
plt.colorbar()
plt.ylabel("Frequency (cycles / %s Day)" % str(1/fs/86400))
plt.xlabel("days")
plt.show()

If you look closely at it, wav data is basically just an array of numbers (sound amplitude), recorded at a certain interval.
Note: You have an array of equally spaced samples, but they are for velocity difference, not amplitude. So while the following is technically valid, I don't think that the resulting frequencies represent seismic sound frequencies?
So the discrete Fourier transform (in the form of np.fft.rfft) would normally be the right thing to use.
If you give the function np.fft.rfft() n numbers, it will return n/2+1 frequencies. This is because of the inherent symmetry in the transform.
However, one thing to keep in mind is the frequency resolution of FFT. For example if you take n=44100 samples from a wav file sampled at Fs=44100 Hz, you get a convenient frequency resolution of Fs/n = 1 Hz. Which means that the first number in the FFT result is 0 Hz, the second number is 1 Hz et cetera.
It seems that the sampling frequency in your dataset is once per day, i.e. Fs= 1/(24x3600) =0.000012 Hz. Suppose you have n = 10000 samples, then the FFT will return 5001 numbers, with a frequency resolution of Fs/n= 0.0000000012 Hz. That means that the highest frequency you will be able to detect from data sampled at this frequncy is 0.0000000012*5001 = 0.000006 Hz.
So the highest frequency you can detect is approximately Fs/2!
I'm no domain expert, but that value seems to be a bit low for seismic noise?

Generating new 2D data using power spectrum density function from spatial frequency domain via ifft?

This is my first post so apologies for any formatting related issues.
So I have a dataset which was obtained from an atomic microscope. The data looks like a 1024x1024 matrix which is composed of different measurements taken from the sample in units of meters, eg.
data = [[1e-07 ... 4e-08][ ... ... ... ][3e-09 ... 12e-06]]
np.size(data) == (1024,1024)
From this data, I was hoping to 1) derive some statistics about the real data; and 2) using the power spectrum density (PSD) distribution, hopefully create a new dataset which is different, but statistically similar to the characteristics of the original data. My plan to do this was 2a) take a 2d fft of data, calculate the power spectrum density 2b) some method?, 2c) take the 2d ifft of the modified signal to turn it back into a new sample with the same power spectrum density as the original.
Moreover, regarding part 2b) this was the closest link I could find regarding a time series based solution; however, I am not understanding exactly how to implement this so far, since I am not exactly sure what the phase, frequency, and amplitudes of the fft data represent in this 2d case, and also since we are now talking about a 2d ifft I'm not exactly sure how to construct this complex matrix while incorporating the random number generation, and amplitude/phase shifts in a way that will translate back to something meaningful.
So basically, I have been having some trouble with my intuition. For this problem, we are working with a 2d Fourier transform of spatial data with no temporal component, so I believe that methods which are applied to time series data could be applied here as well. Since the fft of the original data is the 'frequency in the spatial domain', the x-axis of the PSD should be either pixels or meters, but then what is the 'power' in the y-axis describing? I was hoping that someone could help me figure this problem out.
My code is below, hopefully someone could help me solve my problem. Bonus if someone could help me understand what this shifted frequency vs amplitude plot is saying:
here is the image with the fft, shifted fft, and freq. vs aplitude plots.
Fortunately the power spectrum density function is a bit easier to understand
Thank you all for your time.
data = np.genfromtxt('asample3.0_00001-filter.txt')
x = np.arange(0,int(np.size(data,0)),1)
y = np.arange(0,int(np.size(data,1)),1)
z = data
npix = data.shape[0]
#taking the fourier transform
fourier_image = np.fft.fft2(data)
#Get power spectral density
fourier_amplitudes = np.abs(fourier_image)**2
#calculate sampling frequency fs (physical distance between pixels)
fs = 92e-07/npix
freq_shifted = fs/2 * np.linspace(-1,1,npix)
freq = fs/2 * np.linspace(0,1,int(npix/2))
print("Plotting 2d Fourier Transform ...")
fig, axs = plt.subplots(2,2,figsize=(15, 15))
axs[0,0].imshow(10*np.log10(np.abs(fourier_image)))
axs[0,0].set_title('fft')
axs[0,1].imshow(10*np.log10(np.abs(np.fft.fftshift(fourier_image))))
axs[0,1].set_title('shifted fft')
axs[1,0].plot(freq,10*np.log10(np.abs(fourier_amplitudes[:npix//2])))
axs[1,0].set_title('freq vs amplitude')
for ii in list(range(npix//2)):
axs[1,1].plot(freq_shifted,10*np.log10(np.fft.fftshift(np.abs(fourier_amplitudes[ii]))))
axs[1,1].set_title('shifted freq vs amplitude')
#constructing a wave vector array
## Get frequencies corresponding to signal PSD
kfreq = np.fft.fftfreq(npix) * npix
kfreq2D = np.meshgrid(kfreq, kfreq)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
#creating the power spectrum
kbins = np.arange(0.5, npix//2+1, 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= np.pi * (kbins[1:]**2 - kbins[:-1]**2)
print("Plotting power spectrum of surface ...")
fig = plt.figure(figsize=(10, 10))
plt.loglog(fs/kvals, Abins)
plt.xlabel("Spatial Frequency $k$ [meters]")
plt.ylabel("Power per Spatial Frequency $P(k)$")
plt.tight_layout()

Normalizing FFT spectrum magnitude to 0dB

I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.
Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.
What's physical meaning of the amplitude in the second picture?
How to normalize the amplitude to 0dB like the one in Audacity?
Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?
Related code:
import numpy as np
from pylab import plot, show
from scipy.io import wavfile
sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0
rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))
plot(f, p)
show()
Update
I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.
And scale the y-axis to decibel as #Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?
Updated code:
fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))
About the bounty
With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.
I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.

After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
segment_size = 512
fs, x = wavfile.read('sine3k6k.wav')
x = x / 32768.0 # scale signal to [-1.0 .. 1.0]
noverlap = segment_size / 2
f, Pxx = signal.welch(x, # signal
fs=fs, # sample rate
nperseg=segment_size, # segment size
window='hanning', # window type to use
nfft=segment_size, # num. of samples in FFT
detrend=False, # remove DC part
scaling='spectrum', # return power spectrum [V^2]
noverlap=noverlap) # overlap between segments
# set 0 dB to energy of sine wave with maximum amplitude
ref = (1/np.sqrt(2)**2) # simply 0.5 ;)
p = 10 * np.log10(Pxx/ref)
fill_to = -150 * (np.ones_like(p)) # anything below -150dB is irrelevant
plt.fill_between(f, p, fill_to )
plt.xlim([f[2], f[-1]])
plt.ylim([-90, 6])
# plt.xscale('log') # uncomment if you want log scale on x-axis
plt.xlabel('f, Hz')
plt.ylabel('Power spectrum, dB')
plt.grid(True)
plt.show()
Some necessary explanations on parameters:
wave file is read as 16-bit PCM, in order to be compatible with Audacity it should be scaled to be |A|<1.0
segment_size is corresponding to Size in Audacity's GUI.
default window type is 'Hanning', you can change it if you want.
overlap is segment_size/2 as in Audacity code.
output window is framed to follow Audacity style. They throw away first low frequency bins and cut everything below -90dB
What's physical meaning of the amplitude in the second picture?
It is basically amount of energy in the frequency bin.
How to normalize the amplitude to 0dB like the one in Audacity?
You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.
Can I treat the amplitude below -90dB so low that it can be ignored?
Yes, anything below -40dB is usually considered negligeble

Unsure how to use FFT data for spectrum analyzer

I'm trying to create a home made spectrum analyzer with 8 strips of LED's.
The part i'm struggling with is performing the FFT and understanding how to use the results.
So far this is what I have:
import opc
import time
import pyaudio
import wave
import sys
import numpy
import math
CHUNK = 1024
# Gets the pitch from the audio
def pitch(signal):
# NOT SURE IF ANY OF THIS IS CORRECT
signal = numpy.fromstring(signal, 'Int16');
print "signal = ", signal
testing = numpy.fft.fft(signal)
print "testing = ", testing
wf = wave.open(sys.argv[1], 'rb')
RATE = wf.getframerate()
p = pyaudio.PyAudio() # Instantiate PyAudio
# Open Stream
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# Read data
data = wf.readframes(CHUNK)
# Play Stream
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
frequency = pitch(data)
print "%f frequency" %frequency
I'm struggling with what to do in the pitch method. I know i need to perform FFT on the data that is passed in, but am really unsure how to do it.
Also should be using this function?

Because of the way np.fft.fft works, if you use 1024 data points you will get values for 512 frequencies (plus a value zero Hz, DC offset). If you only want 8 frequencies you have to feed it 16 data points.
You might be able to do what you want by down sampling by a factor of 64 - then 16 down sampled points would be time-equivalent to 1024 original points. I've never explored this so I don't know what this entails or what the pitfalls might be.
You're going to have to do some learning - The Scientist and Engineer's Guide to Digital Signal Processing really is an excellant resource, at least it was for me.
Keep in mind that for an audio cd .wav file the sample frequency is 44100 Hz - a 1024 sample chunk is only 23 mS of the sound.
scipy.io.wavfile.read makes getting the data easy.
samp_rate, data = scipy.io.wavfile.read(filename)
data is a 2-d numpy array with one channel in in column zero, data[:,0], and the other in column 1, data[:,1]
Matplotlib's specgram and psd functions can give you the data you want. A graphing analog to what you are trying to do would be.
from matplotlib import pyplot as plt
import scipy.io.wavfile
samp_rate, data = scipy.io.wavfile.read(filename)
Pxx, freqs, bins, im = plt.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
plt.show()
plt.close()
Since you aren't doing any plotting just use matplolib.mlab.specgram.
Pxx, freqs, t = matplolib.mlab.specgram(data[:1024,0], NFFT = 16, noverlap = 0, Fs = samp_rate)
Its return values (Pxx, freqs, t) are
- *Pxx*: 2-D array, columns are the periodograms of successive segments
- *freqs*: 1-D array of frequencies corresponding to the rows in Pxx
- *t*: 1-D array of times corresponding to midpoints of segments.
Pxx[1:, 0] would be the values for the frequencies for T0, Pxx[1:, 1] for T1, Pxx[1:, 2] for T2, ... This is what you would feed to your display. You don't use Pxx[0, :] because it is for 0 Hz.
power spectral density - matplotlib.mlab.psd()
Maybe another strategy to get down to 8 bands would be to use large chunks and normalize the values. Then you could break the values up into eight segments and get the sum of each segments. I think this is valid - maybe only for the power spectral density. sklearn.preprocessing.normalize
w = sklearn.preprocessing.normalize(Pxx[1:,:], norm = 'l1', axis = 0)
But then again, I just made all that up.

I don't know about the scipy.io.wavfile.read function that #wwii mentions in his answer, but it seems that his suggestion is the way to go to handle the signal loading. However, I just wanted to comment on the fourier transform.
What I imagine that you intend to do with your LED setup is to change each of the LED's brightnesses according to the power of the spectra in each of the 8 frequency bands that you intend to use. Thus, what I understood that you need, is to compute in some way the power as time goes by. The first complication is "how to compute the spectral power?"
The best way to do this is with the numpy.fft.rfft, which computes the fourier transform for signals that only have real numbers (not complex numbers). On the other hand, the function numpy.fft.fft is a general purpose function that can compute the fast fourier transform for signals with complex numbers. The conceptual difference is that numpy.fft.fft can be used to study travelling waves and their propagation direction. This is seen because the returned amplitudes correspond to positive or negative frequencies that indicate how the wave travels. numpy.fft.rfft yields the amplitude for real-valued frequencies as seen in numpy.fft.rfftfreq, which is what you need.
The last issue is to choose appropriate frequency bands in which to compute the spectral power. The human ear has a huge frequency response range and the width of each band will vary very much, with the low frequency band being very narrow and the high frequency band being very wide. Googling around, I found this nice resource that defines 7 relevant frequency bands
Sub-bass: 20 to 60 Hz
Bass: 60 to 250 Hz
Low midrange: 250 to 500 Hz
Midrange: 500 Hz to 2 kHz
Upper midrange: 2 to 4 kHz
Presence: 4 to 6 kHz
Brilliance: 6 to 20 kHz
I would suggest to use these bands, but split the upper midrange into 2-3 kHz and 3-4kHz. That way you'll be able to use your 8 LED setup. I'm uploading an updated pitch function for you to use
wf = wave.open(sys.argv[1], 'rb')
CHUNK = 1024
RATE = wf.getframerate()
DT = 1./float(RATE) # time between two successive audio frames
FFT_FREQS = numpy.fft.nfftfreq(CHUNCK,DT)
FFT_FREQS_INDS = -numpy.ones_like(FFT_FREQS)
bands_bounds = [[20,60], # Sub-bass
[60,250], # Bass
[250,500], # Low midrange
[500,2000], # Midrange
[2000,3000], # Upper midrange 0
[3000,4000], # Upper midrange 1
[4000,6000], # Presence
[6000,20000]] # Brilliance
for f_ind,freq in enumerate(FFT_FREQS):
for led_ind,bounds in enumerate(bands_bounds):
if freq<bounds[1] and freq>=bounds[0]:
FFT_FREQS_INDS[ind] = led_ind
# Returns the spectral power in each of the 8 bands assigned to the LEDs
def pitch(signal):
# CONSIDER SWITCHING TO scipy.io.wavfile.read TO GET SIGNAL
signal = numpy.fromstring(signal, 'Int16');
amplitude = numpy.fft.rfft(signal.astype(numpy.float))
power = [np.sum(np.abs(amplitude[FFT_FREQS_INDS==led_ind])**2) for led_ind in range(len(bands_bounds))]
return power
The first part of the code computes the fft frequencies and constructs the array FFT_FREQS_INDS that indicates to which of the 8 frequency bands the fft frequency corresponds to. Then, in pitch the power of the spectra in each of the bands is computed. Of course, this can be optimized but I tried to make the code self-explanatory.

Python: Frequency Analysis of Sound Files

I am generating some sound files that play tones at various frequencies with a certain number of harmonics.
Ultimately, these sounds will be played on a device with a small speaker.
I have the frequency response curve of the speaker and want to do the following in Python:
Plot the frequency spectrum of sound file. I need a take the FFT of the file and plot it with gnuplot
Apply a nonlinear transfer function based on the frequency response curve in the data sheet.
Plot the result after the function is applied.
Does anyone know :
What the simplest way to do this would be?
or of an Application (GNU/Linux based) that could do this for me?

I know you didn't mention Pylab/Matplotlib, but it works. Here is an example (assumes single-channel signal):
x, fs, nbits = audiolab.wavread('schubert.wav')
audiolab.play(x, fs)
N = 4*fs # four seconds of audio
X = scipy.fft(x[:N])
Xdb = 20*scipy.log10(scipy.absolute(X))
f = scipy.linspace(0, fs, N, endpoint=False)
pylab.plot(f, Xdb)
pylab.xlim(0, 5000) # view up to 5 kHz
Y = X*H
y = scipy.real(scipy.ifft(Y))

you can use numpy and matPlotLib. Something like the code below:
spectrum = numpy.fft.fft(signal)
frequencies = numpy.fft.fftfreq(len(spectrum))
pylab.plot(frequencies,spectrum)
pylab.show()
That will show a graph of the fft spectrum.

scipy has an FFT and hooks nicely into gnuplot. You should be able to use the signal module to do the math.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Librosa - Audio Spectrogram/Frequency Bins to Spectrum - python

Related

python Spectrogram by using value in timeseries

Generating new 2D data using power spectrum density function from spatial frequency domain via ifft?

Normalizing FFT spectrum magnitude to 0dB

Unsure how to use FFT data for spectrum analyzer

Python: Frequency Analysis of Sound Files

Categories

Resources