Reading a wav file with scipy and librosa in python

Reading a wav file with scipy and librosa in python - python

I am trying to load a .wav file in Python using the scipy folder. My final objective is to create the spectrogram of that audio file. The code for reading the file could be summarized as follows:
import scipy.io.wavfile as wav
(sig, rate) = wav.read(_wav_file_)
For some .wav files I am receiving the following error:
WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning) ** ValueError: Incomplete wav chunk.
Therefore, I decided to use librosa for reading the files using the:
import librosa
(sig, rate) = librosa.load(_wav_file_, sr=None)
That is working properly for all cases, however, I noticed a difference in the colors of the spectrogram. While it was the same exact figure, however, somehow the colors were inversed. More specifically, I noticed that when keeping the same function for calculation of the specs and changing only the way I am reading the .wav there was this difference. Any idea what can produce that thing? Is there a default difference between the way the two approaches read the .wav file?
EDIT:
(rate1, sig1) = wav.read(spec_file) # rate1 = 16000
sig, rate = librosa.load(spec_file) # rate 22050
sig = np.array(α*sig, dtype = "int16")
Something that almost worked is to multiple the result of sig with a constant α alpha that was the scale between the max values of the signal from scipy wavread and the signal derived from librosa. Still though the signal rates were different.

This sounds like a quantization problem. If samples in the wave file are stored as float and librosa is just performing a straight cast to an int, and value less than 1 will be truncated to 0. More than likely, this is why sig is an array of all zeros. The float must be scaled to map it into range of an int. For example,
>>> a = sp.randn(10)
>>> a
array([-0.04250369, 0.244113 , 0.64479281, -0.3665814 , -0.2836227 ,
-0.27808428, -0.07668698, -1.3104602 , 0.95253315, -0.56778205])
Convert a to type int without scaling
>>> a.astype(int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Convert a to int with scaling for 16-bit integer
>>> b = (a* 32767).astype(int)
>>> b
array([ -1392, 7998, 21127, -12011, -9293, -9111, -2512, -42939,
31211, -18604])
Convert scaled int back to float
>>> c = b/32767.0
>>> c
array([-0.04248177, 0.24408704, 0.64476455, -0.36655782, -0.28360851,
-0.27805414, -0.0766625 , -1.31043428, 0.9525132 , -0.56776635])
c and b are only equal to about 3 or 4 decimal places due to quantization to int.
If librosa is returning a float, you can scale it by 2**15 and cast it to an int to get same range of values that scipy wave reader is returning. Since librosa is returning a float, chances are the values going to lie within a much smaller range, such as [-1, +1], than a 16-bit integer which will be in [-32768, +32767]. So you need to scale one to get the ranges to match. For example,
sig, rate = librosa.load(spec_file, mono=True)
sig = sig × 32767

If you yourself do not want to do the quantization, then you could use pylab using the pylab.specgram function, to do it for you. You can look inside the function and see how it uses vmin and vmax.
It is not completely clear from your post (at least for me) what you want to achieve (as there is also neither a sample input file nor any script beforehand from you). But anyways, to check if the spectrogram of a wave file has significant differences depending on the case that the signal data returned from any of the read functions is float32 or int, I tested the following 3 functions.
Python Script:
_wav_file_ = "africa-toto.wav"
def spectogram_librosa(_wav_file_):
import librosa
import pylab
import numpy as np
(sig, rate) = librosa.load(_wav_file_, sr=None, mono=True, dtype=np.float32)
pylab.specgram(sig, Fs=rate)
pylab.savefig('spectrogram3.png')
def graph_spectrogram_wave(wav_file):
import wave
import pylab
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=3, figsize=(10, 6))
pylab.title('spectrogram pylab with wav_file')
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram2.png')
def graph_wavfileread(_wav_file_):
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
import numpy as np
sample_rate, samples = wavfile.read(_wav_file_)
frequencies, times, spectrogram = signal.spectrogram(samples,sample_rate,nfft=1024)
plt.pcolormesh(times, frequencies, 10*np.log10(spectrogram))
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.savefig("spectogram1.png")
spectogram_librosa(_wav_file_)
#graph_wavfileread(_wav_file_)
#graph_spectrogram_wave(_wav_file_)
which produced the following 3 outputs:
which apart from the minor differences in size and intensity seem quite similar, no matter the read method, library or data type, which makes me question a little, for what purpose need the outputs be 'exactly' same and how exact should they be.
I do find strange though that the librosa.load() function offers a dtype parameter but works anyways only with float values. Googling in this regard led to me to only this issue which wasn't much help and this issue says that that's how it will stay with librosa, as internally it seems to only use floats.

To add on to what has been said, Librosa has a utility to convert integer arrays to floats.
float_audio = librosa.util.buf_to_float(sig)
I use this to great success when producing spectrograms of Pydub audiosegments. Keep in mind, one of its arguments is the number of bytes per sample. It defaults to 2. You can read about it more in the documentation here. Here is the source code:
def buf_to_float(x, n_bytes=2, dtype=np.float32):
"""Convert an integer buffer to floating point values.
This is primarily useful when loading integer-valued wav data
into numpy arrays.
See Also
--------
buf_to_float
Parameters
----------
x : np.ndarray [dtype=int]
The integer-valued data buffer
n_bytes : int [1, 2, 4]
The number of bytes per sample in `x`
dtype : numeric type
The target output type (default: 32-bit float)
Returns
-------
x_float : np.ndarray [dtype=float]
The input data buffer cast to floating point
"""
# Invert the scale of the data
scale = 1./float(1 << ((8 * n_bytes) - 1))
# Construct the format string
fmt = '<i{:d}'.format(n_bytes)
# Rescale and format the data buffer
return scale * np.frombuffer(x, fmt).astype(dtype)

Related

Generated sine wave sounds buzzy or 'square' instead of sine

I asked a similar question earlier, but I made the question more complex than it had to be. I am generating a 100 hz sine wave, that I then playback using simpleaudio.
Note: I had this problem when I encoded the wave to a .wav file. Sounded exactly the same as with simple audio. Also changing channels from 2 to 1 changes the sound, but does not fix this problem.
To install simple audio:
sudo apt-get install -y python3-dev libasound2-dev
python -m pip install simpleaudio
Stand alone code:
import numpy as np
import simpleaudio as sa
import matplotlib.pyplot as plt
def generate_sine_tone(numsamples, sample_time, frequency):
t = np.arange(numsamples) * sample_time # Time vector
signal = 8388605*np.sin(2*np.pi * frequency*t)
return signal
if __name__ == "__main__":
duration = 1
samprate = 44100 # Sampling rate
numsamples = samprate*duration# Sample count
st = 1.0 / samprate # Sample time
t = np.arange(numsamples) * st # Time vecto
nchannels = 2
sampwidth = 3
signal = generate_sine_tone(numsamples, st, 100)
signal2 = np.asarray([ int(x) for x in signal ])
play_obj = sa.play_buffer(signal2, nchannels, sampwidth, samprate)
print(signal2)
plt.figure(0)
plt.plot(signal2)
plt.show()
Running this in the command line will produce a graph of the sine wave for 1 second, or 44100, samples, which is 100 periods of the sine wave. It will also play the sound into your speakers, so turn your system sound down a good bit before running.
My other posts on this issue: Trying to generate a sine wave '.wav' file in Python. Comes out as a square wave
https://music.stackexchange.com/questions/110688/generated-sine-wave-in-python-comes-out-buzzy-or-square-ey
expected sound: https://www.youtube.com/watch?v=eDk1bOX-P3w&t=4s
received sound (approx): https://www.youtube.com/watch?v=F7DnVBJ9R34
This problem is annoying me sooo much, I would greatly appreciate any help that can be provided.

There are two problems here.
The lesser one is that you are creating a single array and playing it back as if it were stereo. You need to set nchannels = 1 (or duplicate all the values by creating an array with two columns).
The other problem is trying to create 24-bit samples. Very few people have good enough equipment and good enough ears to tell the difference between 24-bit and 16-bit audio. Using a sample width of 2 makes things much easier. You can generate 24-bit samples if you wish and normalize them to 16-bit for playback: signal *= 32767 / np.max(np.abs(signal))
This code works
import numpy as np
import simpleaudio as sa
def generate_sine_tone(numsamples, sample_time, frequency):
t = np.arange(numsamples) * sample_time # Time vector
signal = 32767*np.sin(2*np.pi * frequency*t)
return signal
duration = 1
samprate = 44100 # Sampling rate
numsamples = samprate*duration# Sample count
st = 1.0 / samprate # Sample time
nchannels = 1
sampwidth = 2
signal = generate_sine_tone(numsamples, st, 100)
signal2 = signal.astype(np.int16)
#signal2 = np.asarray([ int(x) for x in signal ])
play_obj = sa.play_buffer(signal2, nchannels, sampwidth, samprate)
play_obj.wait_done()

The simpleaudio.play_buffer() function does not convert your data. It only takes the exact memory buffer (i.e. the buffer it gets from the object you gave) and interprets it as what you claim it to contain. In your program your description of what the buffer contains (2 * 3 byte items) is not what it actually contains (1 * 8 byte items). Unfortunately in your example program this does not result in an error, because the size of the buffer you gave it coincidentally happens to be an exact multiple of 6, the size in bytes you claim your memory buffer's items to have. If you try it with one more sample, numsamples = 44101, you will get an error, because 44101 * 8 is not divisible by 6:
ValueError: Buffer size (in bytes) is not a multiple of bytes-per-sample and the number of channels.
Try what print(signal2.itemsize) shows. It's not the 3 * 2 that you claim it to be in your call to simpleaudio.play_buffer(). If the following is still correct, there's no way to get 24 bit buffers from Numpy even if you tried to: NumPy: 3-byte, 6-byte types (aka uint24, uint48)
And perhaps that's why the tutorial tells you to just use 16-bit data type for Numpy buffers, see https://github.com/hamiltron/py-simple-audio/blob/master/docs/tutorial.rst
Numpy arrays can be used to store audio but there are a few crucial
requirements. If they are to store stereo audio, the array must have
two columns since each column contains one channel of audio data. They
must also have a signed 16-bit integer dtype and the sample amplitude
values must consequently fall in the range of -32768 to 32767.
What are these "buffers"? They are a way for Python objects to pass low-level raw byte data between each other and libraries written in e.g. C. See this: https://docs.python.org/3/c-api/buffer.html or this: https://jakevdp.github.io/blog/2014/05/05/introduction-to-the-python-buffer-protocol/
If you want to create 24 bit buffers from your audio data, then you'll have to use some other library or low-level byte-by-byte hacking for creating the memory buffer, because Numpy won't do it for you. But you might be able to use dtype=numpy.float32 to get 32-bit floats that have 4-byte samples per channel. Simpleaudio detects this from the sample size, for example for Alsa:
https://github.com/hamiltron/py-simple-audio/blob/master/c_src/simpleaudio_alsa.c
/* set that format appropriately */
if (bytes_per_chan == 1) {
sample_format = SND_PCM_FORMAT_U8;
} else if (bytes_per_chan == 2) {
sample_format = SND_PCM_FORMAT_S16_LE;
} else if (bytes_per_chan == 3) {
sample_format = SND_PCM_FORMAT_S24_3LE;
} else if (bytes_per_chan == 4) {
sample_format = SND_PCM_FORMAT_FLOAT_LE;
} else {
ALSA_EXCEPTION("Unsupported Sample Format.", 0, "", err_msg_buf);
return NULL;
}
That's a little bit like using the weight of a vehicle for determining if it's a car, a motorcycle or a bicycle. It works, but it might feel odd to only be asked about the weight of a vehicle and not at all about its type.
So. To fix your program, use the dtype parameter of asarray() to convert your data to the buffer format you want, and declare the correct format in play_buffer(). And perhaps remove the scaling factor 8388605 from the sine generation, replace it with whatever you actually want and place it somewhere near the format specification.

How to Encode a 16 Bit WAV file in 8 bit in Python?

I am trying to play a sound from a sawtooth wave. I created the waveform in Python and was able to save it as a WAV file, but when I try to play it it says the file is unplayable because the file type is unsupported, the file extension is incorrect, or the file is corrupt. I used this individual's tutorial (https://thehackerdiary.wordpress.com/2017/06/09/it-is-ridiculously-easy-to-generate-any-audio-signal-using-python/) and they worked around it by encoding the raw waveform from 16 bit to 8 bit in Audacity. How can this be done using only Python?
import soundfile
data, samplerate = soundfile.read('sawtooth_100_hz.wav')
soundfile.write('sawtooth_100_hz_8bit.wav', data, samplerate, subtype='PCM_S8')
^^ I tried this and got the following error: ValueError: Invalid combination of format, subtype and endian

I think the people who wrote this tutorial went for the long way. There is an easier way to convert a NumPy array to a wav file which is used below to generate the same wav file as the one generated in the tutorial:
import numpy as np
from scipy.io import wavfile
sampling_rate = 44100
freq = 440
samples = 44100
x = np.arange(samples)
y = 100*np.sin(2 * np.pi * freq * x / sampling_rate)
wavfile.write("test.wav", sampling_rate, y)
And you can use wavfile.read() method to read this file with no problem

Surprisingly, the underlying libsndfile library doesn't support WAV files with signed 8 bit samples (only unsigned), see http://www.mega-nerd.com/libsndfile/#Features.
You can also check this with the soundfile module:
>>> import soundfile as sf
>>> sf.available_subtypes('wav')
{'PCM_16': 'Signed 16 bit PCM', 'PCM_24': 'Signed 24 bit PCM', 'PCM_32': 'Signed 32 bit PCM', 'PCM_U8': 'Unsigned 8 bit PCM', 'FLOAT': '32 bit float', 'DOUBLE': '64 bit float', 'ULAW': 'U-Law', 'ALAW': 'A-Law', 'IMA_ADPCM': 'IMA ADPCM', 'MS_ADPCM': 'Microsoft ADPCM', 'GSM610': 'GSM 6.10', 'G721_32': '32kbs G721 ADPCM'}
You could try using AIFF or FLAC instead?
Or you could create a RAW file (i.e. a headerless file containing no information about its own data format), which is incidentally what they did in the tutorial you were mentioning (note that they are using these options: -t raw -e signed -b 8).
For a bit more information about creating and playing signals, see:
https://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/simple-signals.ipynb
https://nbviewer.jupyter.org/github/spatialaudio/communication-acoustics-exercises/blob/master/intro.ipynb

It sounds like you just want to generate the samples and playback from within Python?
If so, it looks like library "sounddevice" will let you write samples directly to your audio device:
https://python-sounddevice.readthedocs.io/en/0.3.15/usage.html#playback
I'm not in a python environment right now, so haven't tested, but mixing it with your sample code would just be:
import sounddevice as sd
import numpy as np
sampling_rate = 44100
freq = 440
samples = 44100
x = np.arange(samples)
y = 100*np.sin(2 * np.pi * freq * x / sampling_rate)
sd.play(y, sampling_rate)
Sounddevice's author is on SO, see his reply to a similar question: https://stackoverflow.com/a/34179010/1339735
You might have some scaling to do - not sure if it accepts values from -1 to 1 like most float playback or +/- 100 like in your example.

All of the above answers were helpful, but ultimately I found a solution to my problem from this thread: How to generate audio from a numpy array?
This was my code:
import numpy as np
from scipy.io.wavfile import write
from scipy import signal as sg
#data = np.random.uniform(-1,1,44100) # 44100 random samples between -1 and 1
sampling_rate = 44100 ## Sampling Rate
freq = 150 ## Frequency (in Hz)
duration = 3 # in seconds, may be float
t = np.linspace(0, duration, sampling_rate*duration) # Creating time vector
data = sg.sawtooth(2 * np.pi * freq * t, 0) # Sawtooth signal
'''
Scaling data to 16 bit. Divide each number by max number in array to get
fraction and multiply data by 32767 because that is the max value a 16 bit
integer can take
'''
scaled = np.int16(data/np.max(np.abs(data)) * 32767)
write('test.wav', 44100, scaled) # Write to file. Can be overridden

How to manipulate wav file data in Python?

I'm trying to read a wav file, then manipulate its contents, sample by sample
Here's what I have so far:
import scipy.io.wavfile
import math
rate, data = scipy.io.wavfile.read('xenencounter_23.wav')
for i in range(len(data)):
data[i][0] = math.sin(data[i][0])
print data[i][0]
The result I get is:
0
0
0
0
0
0
etc
It is reading properly, because if I write print data[i] instead I get usually non-zero arrays of size 2.

The array data returned by wavfile.read is a numpy array with an integer data type. The data type of a numpy array can not be changed in place, so this line:
data[i][0] = math.sin(data[i][0])
casts the result of math.sin to an integer, which will always be 0.
Instead of that line, create a new floating point array to store your computed result.
Or use numpy.sin to compute the sine of all the elements in the array at once:
import numpy as np
import scipy.io.wavfile
rate, data = scipy.io.wavfile.read('xenencounter_23.wav')
sin_data = np.sin(data)
print sin_data
From your additional comments, it appears that you want to take the sine of each value and write out the result as a new wav file.
Here is an example that (I think) does what you want. I'll use the file 'M1F1-int16-AFsp.wav' from here: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Samples.html. The function show_info is just a convenient way to illustrate the results of each step. If you are using an interactive shell, you can use it to inspect the variables and their attributes.
import numpy as np
from scipy.io import wavfile
def show_info(aname, a):
print "Array", aname
print "shape:", a.shape
print "dtype:", a.dtype
print "min, max:", a.min(), a.max()
print
rate, data = wavfile.read('M1F1-int16-AFsp.wav')
show_info("data", data)
# Take the sine of each element in `data`.
# The np.sin function is "vectorized", so there is no need
# for a Python loop here.
sindata = np.sin(data)
show_info("sindata", sindata)
# Scale up the values to 16 bit integer range and round
# the value.
scaled = np.round(32767*sindata)
show_info("scaled", scaled)
# Cast `scaled` to an array with a 16 bit signed integer data type.
newdata = scaled.astype(np.int16)
show_info("newdata", newdata)
# Write the data to 'newname.wav'
wavfile.write('newname.wav', rate, newdata)
Here's the output. (The initial warning means there is perhaps some metadata in the file that is not understood by scipy.io.wavfile.read.)
<snip>/scipy/io/wavfile.py:147: WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
Array 'data'
shape: (23493, 2)
dtype: int16
min, max: -7125 14325
Array 'sindata'
shape: (23493, 2)
dtype: float32
min, max: -0.999992 0.999991
Array 'scaled'
shape: (23493, 2)
dtype: float32
min, max: -32767.0 32767.0
Array 'newdata'
shape: (23493, 2)
dtype: int16
min, max: -32767 32767
The new file 'newname.wav' contains two channels of signed 16 bit values.

How to read and write 24-bit wav file using scipy or common alternative?

Frequently, wav files are or need to be 24-bit yet I do not see a way to write or read 24-bit wav files using scipy module. The documentation for wavfile.write() states that the resolution of the wav file is determined by the data type. That must mean 24-bit is not supported since I do not know of a 24-bit integer data type. If an alternative is necessary it would be nice if it were common so that files can be easily exchanged without the need for others with scipy to install additional module.
import numpy as np
import scipy.io.wavfile as wavfile
fs=48000
t=1
nc=2
nbits=24
x = np.random.rand(t*fs,nc) * 2 - 1
wavfile.write('white.wav', fs, (x*(2**(nbits-1)-1)).astype(np.int32))

This is very easy using PySoundFile:
import soundfile as sf
x = ...
fs = ...
sf.write('white.wav', x, fs, subtype='PCM_24')
The conversion from floating point to PCM is done automatically.
See also my other answer.
UPDATE:
In PySoundFile version 0.8.0, the argument order of sf.write() was changed.
Now the file name comes first, and the data array is the second argument. I've changed this in the example above.

I have came across this problem as well. I have a buffer containing all the 32-bit signed samples, while in each sample, only 24 bits are used (highest 8 bits are 0 padding, even if the number is negative). My solution is:
samples_4byte = self.buffer.tobytes()
byte_format = ('%ds %dx ' % (3, 1)) * self.sample_len * 2
samples_3byte = b''.join(struct.unpack(byte_format, samples_4byte))
Now I have a bytearray that can be written into the wave file:
with wave.open(file_abs, 'wb') as wav_file:
# Set the number of channels
wav_file.setnchannels(2)
# Set the sample width to 3 bytes
wav_file.setsampwidth(3)
# Set the frame rate to sample_rate
wav_file.setframerate(self.sample_rate)
# Set the number of frames to sample_len
wav_file.setnframes(self.sample_len)
# Set the compression type and description
wav_file.setcomptype('NONE', "not compressed")
# Write data
wav_file.writeframes(samples_3byte)

Trying to Use FFT to Analyze Audio Signal in Python

I've been trying to use FFT to get a frequency of a signal, and I'm having a bit of trouble dealing with it. I found a site that talked about using FFT to analyze and plot a signal here:
http://macdevcenter.com/pub/a/python/2001/01/31/numerically.html?page=2
But I've run into an issue implementing it with Python 2.7. EDIT I updated the code with the improved version. This one works, actually, and plots the waveforms (a bit slowly) onto a chart. I'm wondering if this is the correct method for reading frames, though - I read that even numbered array indices are for the left-channel (and so the odd-numbered ones would be for the right, I suppose).
So, I guess that I should read however many frames, but divide it by the sample width, and then sample every other even frame for the left channel if it's stereo, huh?
import scipy
import wave
import struct
import numpy
import pylab
fp = wave.open('./music.wav', 'rb')
samplerate = fp.getframerate()
totalsamples = fp.getnframes()
fft_length = 256 # Guess
num_fft = (totalsamples / fft_length) - 2
#print (samplerate)
temp = numpy.zeros((num_fft, fft_length), float)
leftchannel = numpy.zeros((num_fft, fft_length), float)
rightchannel = numpy.zeros((num_fft, fft_length), float)
for i in range(num_fft):
tempb = fp.readframes(fft_length / fp.getnchannels() / fp.getsampwidth());
up = (struct.unpack("%dB"%(fft_length), tempb))
temp[i,:] = numpy.array(up, float) - 128.0
temp = temp * numpy.hamming(fft_length)
temp.shape = (-1, fp.getnchannels())
fftd = numpy.fft.fft(temp)
pylab.plot(abs(fftd[:,1]))
pylab.show()
The music I'm loading in is some that I made myself.
EDIT: So now, I'm getting the audio file read through reading the frames, and dividing the current number to read by the number of channels and the number of bits per frame. Am I losing any data by doing this? This is the only way that I could get any data at all - otherwise it would be too much data for the file handler to read into the struct.unpack function. Also, I'm trying to separate the left channel from the right channel (get the FFT data for each channel). How would I go about doing this?

I have not used scipy's version of numpy/numarray in a long time, but seek out the function frombuffer. It is a lot easier to use than trying to shuffle all of the data through struct.unpack. An example reading the data using numpy:
fp = wave.open('./music.wav', 'rb')
assert fp.getnchannels() == 1, "Assumed 1 channel"
assert fp.getsampwidth() == 2, "Assuming int16 data"
numpy.frombuffer(fp.getnframes(fp.readframes()), 'i2')
Keep in mind that wave files can have different data types in them and multiple channels, so be aware of that when unpacking.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a wav file with scipy and librosa in python - python

Related

Generated sine wave sounds buzzy or 'square' instead of sine

How to Encode a 16 Bit WAV file in 8 bit in Python?

How to manipulate wav file data in Python?

How to read and write 24-bit wav file using scipy or common alternative?

Trying to Use FFT to Analyze Audio Signal in Python

Categories

Resources