I have a question about the difference between the load function of librosa and the read function of scipy.io.wavfile.
from scipy.io import wavfile
import librosa
fs, data = wavfile.read(name)
data, fs = librosa.load(name)
The imported voice file is the same file. If you run the code above, the values of the data come out of the two functions differently. I want to know why the value of the data is different.
From the docstring of librosa.core.load:
Load an audio file as a floating point time series.
Audio will be automatically resampled to the given rate (default sr=22050).
To preserve the native sampling rate of the file, use sr=None.
scipy.io.wavfile.read does not automatically resample the data, and the samples are not converted to floating point if they are integers in the file.
It's worth also mentioning that librosa.load() normalizes the data (so that all the data points are between 1 and -1), whereas wavfile.read() does not.
The data is different because scipy does not normalize the input signal.
Here is a snippet showing how to change scipy output to match librosa's:
nbits = 16
l_wave, rate = librosa.core.load(path, sr=None)
rate, s_wave = scipy.io.wavfile.read(path)
s_wave /= 2 ** (nbits - 1)
all(s_wave == l_wave)
# True
librosa.core.load has support for 24 bit audio files and 96kHz sample rates. Because of this, converting to float and default resampling, it can be considerably slower than scipy.io.wavfile.read in many cases.
Related
Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.
But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.
Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?
This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).
import wave
import struct
# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')
# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']
# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)
number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip
format = '<{}h'.format(number_of_channels)
print('# Channels:', format)
# Read >> Second
for index in range(number_of_frames):
frame = inf.readframes(1)
data = struct.unpack(format, frame)
# Here, I change data to (0, 0), testing purposes
print('Before Audio Data:', data)
print('After Modifying Audio Data', (0, 0))
# Change Audio Data
data = (0, 0)
value = data[0]
value = (value * 2) // 3
outf.writeframes(struct.pack('<h', value))
# Close In File
inf.close()
# Close Out File
outf.close()
Is there a better practice or reference material if simply just modifying data segments of .wav files?
Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.
Performance comparison
Let's examine first 3 ways to read WAVE files.
The slowest one - wave module
As you might have noticed already, wave module can be painfully slow. Consider this code:
import wave
import struct
wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
For a WAVE as defined below:
Input File : 'audio.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size : 55.3M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.
The convenient one - audiofile
A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.
import audiofile as af
signal, sampling_rate = af.read(audio.wav)
This gave me on average 33 ms to read the mentioned file.
The fastest one - numpy
If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:
import numpy as np
byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)
The header structure (that tells us what offset to use) is defined here.
The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.
Modifying the audio
Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).
As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).
If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:
from scipy.io.wavfile import write
fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate
with open('audio_out.wav', 'a') as fout:
new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
write(fout, fs, new_data)
It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).
I basically have this audio file that is 16-bit PCM WAV mono at 44100hz and I'm trying to convert it into a spectrogram. But I want a spectrogram of the audio every 20ms (Trying this for speech recognition), but whenever I try to compare what I have to Audacity, its really different. I'm kind of new to python so I was trying to base this off of my java knowledge. Any help would be appreciated. I think I'm either splitting the read samples incorrectly (What I did was split it every 220 elements in the array since I believe Audio Data is just samples in the time domain to get it to 20ms audio)
Here's the code I have right now:
import librosa.display
import numpy
audioPath = 'C:\\Users\\pawar\\Desktop\\Resister.wav'
audioData, sampleRate = librosa.load(audioPath, sr=None)
print(sampleRate)
new = numpy.zeros(shape=(220, 1))
counter = 0
for i in range(0, len(audioData), 882):
new = (audioData[i: i + 882])
STFT = librosa.stft(new, n_fft=882)
print(type(STFT))
audioDatainDB = librosa.amplitude_to_db(abs(STFT))
print(type(audioDatainDB))
librosa.display.specshow(audioDatainDB, sr=sampleRate, x_axis='time', y_axis='hz')
#plt.figure(figsize=(20,10))
plt.show()
counter += 1
print("Your local debug print statement ", counter)
As for the values, well I was playing around with them quite a bit trying to get it to work. Doesn't seem to be of any luck ;/
Here's the output:
https://i.stack.imgur.com/EVntx.png
And here's what Audacity shows:
https://i.stack.imgur.com/GIGy8.png
I know its not 20ms in the audacity one but you can see the two don't look even a bit similar
I am currently working on augmenting audio in Python. I've been using Librosa due to its speed and simplicity but need to fallback on PyDub for some other utilities such as applying gain.
Is there a mathematical way to add gain to the Numpy array provided with librosa.load? In PyDub it is quite easy but I have to constantly convert back between Pydub's get_array_of_samples() to np.array then to the proper 32 bit float representation on the [-1,1) scale (that Librosa uses by default). I'd rather keep it all in one library for simplicity.
Also a normalization of an audio signal to 0 db gain beforehand would be useful too. I am a bit new to a lot of the terminology used in audio signal processing.
This is what I am currently doing. Down the road I would like to make this a class method which starts with using librosa's numpy array, so if there is a way to mathematically add specified gain in a certain unit to a numpy array from librosa that would be ideal.
Thanks
import librosa
import numpy as np
from pydub import AudioSegment, effects
pydub_audio = AudioSegment.from_file(audio_file_path)
pydub_audio = pydub_audio.set_frame_rate(16000) # make file 16k khz frame rate
print("Original dBFS is {}".format(pydub_audio.dBFS))
pydub_audio = pydub_audio.apply_gain(20) # apply 20db of gain to introduce clipping
#pydub_audio = effects.normalize(pydub_audio)
print("New dBFS is {}".format(pydub_audio.dBFS))
pydub_array = pydub_audio.get_array_of_samples()
pydub_array = np.array(pydub_array)
print("PyDub audio type is {}".format(pydub_array.dtype))
pydub_array_32bitfloat = pydub_array.astype(np.float32, order = 'C') / 32768 # rescaling to between [-1, 1] like librosa
print("Rescaled Pydub type is {}".format(pydub_array_32bitfloat.dtype))
import soundfile as sf
sf.write(r"test_pydub_gain.wav", pydub_array_32bitfloat, samplerate = 16000, format = 'wav')
thinking about it, (if i am not wrong), mathematicaly the gain is:
dBFS = 20 * log (level2 / level1)
so i would multiply all elements of the array by
10**(dBFS/20) to apply the gain
In Python, I have an array of floats representing the voltages of an analog signal.
Can anyone explain how I can change the array into a .wav format? I have seen this
Do I first need to change the data format from [1.23,1.24,1.25,1.26] (for example) to 1.231.241.251.26 before adding the headers so that it's read correctly?
I eventually plan on using FFT on the values to derive the fundamental frequencies is there a better way to store the values in this case?
Thank you
If you know the sampling frequency of your signal and data is already scaled appropriately by max(abs(data)) then you can do it very easily using scipy:
from __future__ import print_function
import scipy.io.wavfile as wavf
import numpy as np
if __name__ == "__main__":
samples = np.random.randn(44100)
fs = 44100
out_f = 'out.wav'
wavf.write(out_f, fs, samples)
You can also use the standard wave module.
Sorry if I submit a duplicate, but I wonder if there is any lib in python which makes you able to extract sound spectrum from audio files. I want to be able to take an audio file and write an algoritm which will return a set of data {TimeStampInFile; Frequency-Amplitude}.
I heard that this is usually called Beat Detection, but as far as I see beat detection is not a precise method, it is good only for visualisation, while I want to manipulate on the extracted data and then convert it back to an audio file. I don't need to do this real-time.
I will appreciate any suggestions and recommendations.
You can compute and visualize the spectrum and the spectrogram this using scipy, for this test i used this audio file: vignesh.wav
from scipy.io import wavfile # scipy library to read wav files
import numpy as np
AudioName = "vignesh.wav" # Audio File
fs, Audiodata = wavfile.read(AudioName)
# Plot the audio signal in time
import matplotlib.pyplot as plt
plt.plot(Audiodata)
plt.title('Audio signal in time',size=16)
# spectrum
from scipy.fftpack import fft # fourier transform
n = len(Audiodata)
AudioFreq = fft(Audiodata)
AudioFreq = AudioFreq[0:int(np.ceil((n+1)/2.0))] #Half of the spectrum
MagFreq = np.abs(AudioFreq) # Magnitude
MagFreq = MagFreq / float(n)
# power spectrum
MagFreq = MagFreq**2
if n % 2 > 0: # ffte odd
MagFreq[1:len(MagFreq)] = MagFreq[1:len(MagFreq)] * 2
else:# fft even
MagFreq[1:len(MagFreq) -1] = MagFreq[1:len(MagFreq) - 1] * 2
plt.figure()
freqAxis = np.arange(0,int(np.ceil((n+1)/2.0)), 1.0) * (fs / n);
plt.plot(freqAxis/1000.0, 10*np.log10(MagFreq)) #Power spectrum
plt.xlabel('Frequency (kHz)'); plt.ylabel('Power spectrum (dB)');
#Spectrogram
from scipy import signal
N = 512 #Number of point in the fft
f, t, Sxx = signal.spectrogram(Audiodata, fs,window = signal.blackman(N),nfft=N)
plt.figure()
plt.pcolormesh(t, f,10*np.log10(Sxx)) # dB spectrogram
#plt.pcolormesh(t, f,Sxx) # Lineal spectrogram
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [seg]')
plt.title('Spectrogram with scipy.signal',size=16);
plt.show()
i tested all the code and it works, you need, numpy, matplotlib and scipy.
cheers
I think your question has three separate parts:
How to load audio files into python?
How to calculate spectrum in python?
What to do with the spectrum?
1. How to load audio files in python?
You are probably best off by using scipy, as it provides a lot of signal processing functions. For loading audio files:
import scipy.io.wavfile
samplerate, data = scipy.io.wavfile.read("mywav.wav")
Now you have the sample rate (samples/s) in samplerate and data as a numpy.array in data. You may want to transform the data into floating point, depending on your application.
There is also a standard python module wave for loading wav-files, but numpy/scipy offers a simpler interface and more options for signal processing.
2. How to calculate the spectrum
Brief answer: Use FFT. For more words of wisdom, see:
Analyze audio using Fast Fourier Transform
Longer answer is quite long. Windowing is very important, otherwise you'll have strange spectra.
3. What to do with the spectrum
This is a bit more difficult. Filtering is often performed in time domain for longer signals. Maybe if you tell us what you want to accomplish, you'll receive a good answer for this one. Calculating the frequency spectrum is one thing, getting meaningful results with it in signal processing is a bit more complicated.
(I know you did not ask this one, but I see it coming with a probability >> 0. Of course, it may be that you have good knowledge on audio signal processing, in which case this is irrelevant.)