Audio spectrum extraction from audio file by python

Audio spectrum extraction from audio file by python - python

Sorry if I submit a duplicate, but I wonder if there is any lib in python which makes you able to extract sound spectrum from audio files. I want to be able to take an audio file and write an algoritm which will return a set of data {TimeStampInFile; Frequency-Amplitude}.
I heard that this is usually called Beat Detection, but as far as I see beat detection is not a precise method, it is good only for visualisation, while I want to manipulate on the extracted data and then convert it back to an audio file. I don't need to do this real-time.
I will appreciate any suggestions and recommendations.

You can compute and visualize the spectrum and the spectrogram this using scipy, for this test i used this audio file: vignesh.wav
from scipy.io import wavfile # scipy library to read wav files
import numpy as np
AudioName = "vignesh.wav" # Audio File
fs, Audiodata = wavfile.read(AudioName)
# Plot the audio signal in time
import matplotlib.pyplot as plt
plt.plot(Audiodata)
plt.title('Audio signal in time',size=16)
# spectrum
from scipy.fftpack import fft # fourier transform
n = len(Audiodata)
AudioFreq = fft(Audiodata)
AudioFreq = AudioFreq[0:int(np.ceil((n+1)/2.0))] #Half of the spectrum
MagFreq = np.abs(AudioFreq) # Magnitude
MagFreq = MagFreq / float(n)
# power spectrum
MagFreq = MagFreq**2
if n % 2 > 0: # ffte odd
MagFreq[1:len(MagFreq)] = MagFreq[1:len(MagFreq)] * 2
else:# fft even
MagFreq[1:len(MagFreq) -1] = MagFreq[1:len(MagFreq) - 1] * 2
plt.figure()
freqAxis = np.arange(0,int(np.ceil((n+1)/2.0)), 1.0) * (fs / n);
plt.plot(freqAxis/1000.0, 10*np.log10(MagFreq)) #Power spectrum
plt.xlabel('Frequency (kHz)'); plt.ylabel('Power spectrum (dB)');
#Spectrogram
from scipy import signal
N = 512 #Number of point in the fft
f, t, Sxx = signal.spectrogram(Audiodata, fs,window = signal.blackman(N),nfft=N)
plt.figure()
plt.pcolormesh(t, f,10*np.log10(Sxx)) # dB spectrogram
#plt.pcolormesh(t, f,Sxx) # Lineal spectrogram
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [seg]')
plt.title('Spectrogram with scipy.signal',size=16);
plt.show()
i tested all the code and it works, you need, numpy, matplotlib and scipy.
cheers

I think your question has three separate parts:
How to load audio files into python?
How to calculate spectrum in python?
What to do with the spectrum?
1. How to load audio files in python?
You are probably best off by using scipy, as it provides a lot of signal processing functions. For loading audio files:
import scipy.io.wavfile
samplerate, data = scipy.io.wavfile.read("mywav.wav")
Now you have the sample rate (samples/s) in samplerate and data as a numpy.array in data. You may want to transform the data into floating point, depending on your application.
There is also a standard python module wave for loading wav-files, but numpy/scipy offers a simpler interface and more options for signal processing.
2. How to calculate the spectrum
Brief answer: Use FFT. For more words of wisdom, see:
Analyze audio using Fast Fourier Transform
Longer answer is quite long. Windowing is very important, otherwise you'll have strange spectra.
3. What to do with the spectrum
This is a bit more difficult. Filtering is often performed in time domain for longer signals. Maybe if you tell us what you want to accomplish, you'll receive a good answer for this one. Calculating the frequency spectrum is one thing, getting meaningful results with it in signal processing is a bit more complicated.
(I know you did not ask this one, but I see it coming with a probability >> 0. Of course, it may be that you have good knowledge on audio signal processing, in which case this is irrelevant.)

Related

how do I correct dc offset with torchaudio high pass filter?

I am segmenting drum audio files at each transient and exporting the audio to individual wav files. The problem is, all of my files have a dc offset that I cannot seem to get rid of which is causing popping sounds at the end of the file. I am able to use Audacity's built in high-pass filter to verify that applying a filter would fix my problem, but I have not yet been able to replicate those results with code.
My preference is to use torchaudio's highpass_biquad() method but I am open to using scipy filters too. The main goal is to remove the offset so that the audio files do not have a popping sound at the end.
How do I implement a high pass filter to correct the dc offset like Audacity's high pass filter does as shown in the pictures?
torchaudio approach
from torchaudio.functional import highpass_biquad
import librosa
wav, sample_rate = librosa.load(path, sr=None, mono=False) # files are 24 bit 44.1k
wav_tensor = torch.from_numpy(wav)
cuttoff_freq = 15.0
wav_filtered = highpass_biquad(wav_tensor, sample_rate, cutoff_freq)
scipy approach
from scipy import signal
import librosa
wav, sample_rate = librosa.load(path, sr=None, mono=False) # files are 24 bit 44.1k
cutoff_freq = 15
# METHOD 1
b, a = signal.butter(N=5, Wn=cutoff_freq, btype='high', fs=sample_rate)
wav_filtered = signal.filtfilt(b, a, wav)
# METHOD 2
sos = signal.butter(N=5, Wn=cutoff_freq, btype='hp', fs=sample_rate, output='sos')
wav_filtered = signal.sosfilt(sos, wav)
Picture 1 is the output of torch highpass_biquad method. The scipy approach yields similar results.
Picture 2 is the audio after applying highpass effect in audacity. This is the desired output of my code.
Picture 3 is an example of output with no high pass filtering applied. Most files come out centered below 0dB.

It turns out the dc offset was produced by a scaling function in wavio when writing the file. The highpass filters were working correctly after all.

Is there a way to add gain to an audio signal with Librosa in python?

I am currently working on augmenting audio in Python. I've been using Librosa due to its speed and simplicity but need to fallback on PyDub for some other utilities such as applying gain.
Is there a mathematical way to add gain to the Numpy array provided with librosa.load? In PyDub it is quite easy but I have to constantly convert back between Pydub's get_array_of_samples() to np.array then to the proper 32 bit float representation on the [-1,1) scale (that Librosa uses by default). I'd rather keep it all in one library for simplicity.
Also a normalization of an audio signal to 0 db gain beforehand would be useful too. I am a bit new to a lot of the terminology used in audio signal processing.
This is what I am currently doing. Down the road I would like to make this a class method which starts with using librosa's numpy array, so if there is a way to mathematically add specified gain in a certain unit to a numpy array from librosa that would be ideal.
Thanks
import librosa
import numpy as np
from pydub import AudioSegment, effects
pydub_audio = AudioSegment.from_file(audio_file_path)
pydub_audio = pydub_audio.set_frame_rate(16000) # make file 16k khz frame rate
print("Original dBFS is {}".format(pydub_audio.dBFS))
pydub_audio = pydub_audio.apply_gain(20) # apply 20db of gain to introduce clipping
#pydub_audio = effects.normalize(pydub_audio)
print("New dBFS is {}".format(pydub_audio.dBFS))
pydub_array = pydub_audio.get_array_of_samples()
pydub_array = np.array(pydub_array)
print("PyDub audio type is {}".format(pydub_array.dtype))
pydub_array_32bitfloat = pydub_array.astype(np.float32, order = 'C') / 32768 # rescaling to between [-1, 1] like librosa
print("Rescaled Pydub type is {}".format(pydub_array_32bitfloat.dtype))
import soundfile as sf
sf.write(r"test_pydub_gain.wav", pydub_array_32bitfloat, samplerate = 16000, format = 'wav')

thinking about it, (if i am not wrong), mathematicaly the gain is:
dBFS = 20 * log (level2 / level1)
so i would multiply all elements of the array by
10**(dBFS/20) to apply the gain

Getting MFCC from a spectrogram time/ frequency series array

I have several spectrogra time/frequency [500,1024] files.
I need to calculate the MFCC of these files. There are lot's of the library for calculating MFCC on a raw audio file but I'm looking a method in python for calculating directly from np.array.

This can be done with librosa, as it allows to pass in spectrograms instead of audio waveform using the parameter S.
I am assuming that you have a STFT magnitude spectrogram (linear spectrogram with phase discarded). Then need to convert this into a mel-filtered spectrogram, perform log-scaling, and then do the DCT-2 and truncation to obtain MFCC coefficients. Skeleton code below:
import librosa
import numpy
# TODO: you need to provide these
sr = my_samplerate
my_stft
mels = librosa.feature.melspectrogram(S=my_stft, sr=sr, n_mels=64)
log_mels = librosa.core.amplitude_to_db(mels, ref=numpy.max)
mfcc = librosa.feature.mfcc(S=log_mels, sr=sr, n_mfcc=20)
See the librosa API reference for more details.

Difference between load of librosa and read of scipy.io.wavfile

I have a question about the difference between the load function of librosa and the read function of scipy.io.wavfile.
from scipy.io import wavfile
import librosa
fs, data = wavfile.read(name)
data, fs = librosa.load(name)
The imported voice file is the same file. If you run the code above, the values of the data come out of the two functions differently. I want to know why the value of the data is different.

From the docstring of librosa.core.load:
Load an audio file as a floating point time series.
Audio will be automatically resampled to the given rate (default sr=22050).
To preserve the native sampling rate of the file, use sr=None.
scipy.io.wavfile.read does not automatically resample the data, and the samples are not converted to floating point if they are integers in the file.

It's worth also mentioning that librosa.load() normalizes the data (so that all the data points are between 1 and -1), whereas wavfile.read() does not.

The data is different because scipy does not normalize the input signal.
Here is a snippet showing how to change scipy output to match librosa's:
nbits = 16
l_wave, rate = librosa.core.load(path, sr=None)
rate, s_wave = scipy.io.wavfile.read(path)
s_wave /= 2 ** (nbits - 1)
all(s_wave == l_wave)
# True

librosa.core.load has support for 24 bit audio files and 96kHz sample rates. Because of this, converting to float and default resampling, it can be considerably slower than scipy.io.wavfile.read in many cases.

Calculating the average FIR for bunch of wave files, plotting it and saving to txt as table

Just the second day met with Python and with troubles...
I've got a lot of CD-standard (16 bit, 44100 Hz) stereo wave files and need to find their average (arithmetic mean) FIR. The algorithm is easy to say... - the sum of amplitudes for each freq. divides on the amount of files. Then the achieved FIR is being plotted and written down to the text file as the table.
I rolled over some similar posts like this exciting Python Scipy FFT wav files but there are still too many things, even alphabet, I lose touch in and compiler mistkes follow every time I try to repeat the examples.
I would appreciate any help that can move mу from the dead-end. So, these are my shy paces...
As the number of files may vary it is probably useful to have a list of files at the elbow:
import os
a = os.path.expanduser(u"~") # absolute user path var.
b = "integrator\\files" # base folder to use with files in it
c = os.path.join(a, b)
flist = os.listdir(c)
images = filter(lambda x: x.endswith('.wav'), flist) # filter non-wavs
for i in range(len(flist)):
print(flist[i])
print()
And it works fine for me! But I still cannot catch how to organize the multiple files reading, and calculating their mean FIR massive
As I keeked I need something like "global package":
import glob
import mainfile
files = glob.glob('./*.wav')
for ele in files:
f(ele)
quit()
Wherу the mainfile.py looks somethng like that:
import matplotlib.pyplot as plt
from scipy.io import wavfile # get the api
from scipy.fftpack import fft
from pylab import *
def f(filename):
fs, data = wavfile.read(filename) # load the data
a = data.T[0] # this is a two channel soundtrack, I get the first track
b=[(ele/2**16.)*2-1 for ele in a] # this is 16-bit track, now normalized on [-1,1)
c = fft(b) # create a list of complex number
d = len(c)/2 # you only need half of the fft list
And here I just don;t know what should I better do with 'd's - summing in cycle or... Then this code example operated just 1 channel for plotting - I need the output FIR as seqence of pairs for each channel. Yet still it's not clear how to tweak FFT window to Hanning with at least 65536 FFT-size (oh yes, I know thу)calculations are slow as hell).
In the end we can plot and save the graph:
plt.plot(abs(c[:(d-1)]),'r')
savefig(filename+'.png',bbox_inches='tight')
... and somehow write average FIR to the txt table file
I'd be happy enough if this script worked as the console application (though at first I dreamt of kinda minimalistic GUI with ability choose any folder containing files with certian overview button and with progress bar to make sure that app is still breathing... though hard covering ten or twenty five wavs with FFT slow "scythe".
Got C:\Anaconda2 (with numpy, scipy and matplotlib properly installed) on Windows 7 x86 PC
Thank you in advance!
With regards,
Me.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Audio spectrum extraction from audio file by python - python

Related

how do I correct dc offset with torchaudio high pass filter?

Is there a way to add gain to an audio signal with Librosa in python?

Getting MFCC from a spectrogram time/ frequency series array

Difference between load of librosa and read of scipy.io.wavfile

Calculating the average FIR for bunch of wave files, plotting it and saving to txt as table

Categories

Resources