Read Nist Wav File in TIMIT database into python numpy array - python

Is this possible?? I seem to be getting this error when using wavread from scikits.audiolab:
x86_64.egg/scikits/audiolab/pysndfile/matapi.pyc in basic_reader(filename, last, first)
93 if not hdl.format.file_format == filetype:
94 raise ValueError, "%s is not a %s file (is %s)" \
---> 95 % (filename, filetype, hdl.format.file_format)
96
97 fs = hdl.samplerate
ValueError: si762.wav is not a wav file (is nist)
I'm guessing it can't read NIST wav files but is there another way to easily read them into a numpy array? If not, what is the best way to go about reading in the data?
Possibly rewriting the audiolab wavread to recognize the nist header??

Answer my own question because figured it out but you can use the Sndfile class from scikits.audiolab which supports a multitude of reading and writing file formats depending on the libsndfile you have. Then you just use:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(10000)
play(data) # Just to test the read data

To expand upon J Spen's answer, when using scikits.audiolab, if you want to read the whole file, not just a specified number of frames, you can use the nframes parameter of the Sndfile class to read the whole thing. For example:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(f.nframes)
play(data) # Just to test the read data
I couldn't find any references to this in the documentation, but it is there in the source.

In contrast to the above answers, there is another alternative way to read
audio files in multiple formats, e.g., .wav, .aif, .mp3 etc.
import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://freewavesamples.com/files/Alesis-Sanctuary-QCard-Crotales-C6.wav
data, fs = sf.read('Alesis-Sanctuary-QCard-Crotales-C6.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()
Output:
(88116, 2) 44100

Related

How to write an audio file that is inside a for loop in python

I have to write an echo code. So I wrote this code. I want to know how to add each of these in a separate wav file. Can someone provide me with an answer. Thanks in advance.
import sounddevice as sd
from scipy.io import wavfile
import numpy as np
from scipy.io.wavfile import write
fs,x=wavfile.read('hello.wav')
amp=1
for i in range (2,6):
nx=(amp/i**3)*x
sd.play(nx,fs)
sd.wait()
write('hello[i]',fs,myrecording)
There are two things you need to do here:
replace myrecording (which isn't defined in your code snippet) with the sound data array that you want to write out to file. Inspection of your code makes me think it is actually nx that you want to write out.
you'll need a different filename for each file written out or else you will just write over the same file and you'll only be left with the one written out last. There are many ways to generate/structure the string for this filename but, based on what you had, I used one that will label the files things like "hello_2.wav" etc.
I have integrated both those things into the code here:
import sounddevice as sd
from scipy.io import wavfile
from scipy.io.wavfile import write
fs, x = wavfile.read('hello.wav')
AMP = 1
for i in range(2, 6):
nx = (AMP/i**3) * x
sd.play(nx, fs)
sd.wait()
filename = f'hello_{i}.wav'
write(filename, fs, nx)

Python3 modifying wav audio data correctly

Learning how to modify different types of audio files, .wav, .mp3, etcetera using Python3 using the wave module. Specifically .wav file format, in this regard for this question. Presently, I know there are ISO standards for audio formats, and any references for this subject are greatly appreciated regarding audio standards for the .wav file format as well on a side note.
But in terms of my question, simply ignoring the RIFF, FMT headers, in a .wav file using the Python3 wave module import.
Is there a more efficient way to skip the RIFF headers, other containers, and go straight to the data container to modify its contents?
This crude example simply is converting a two-channel audio .wav file to a single-channel audio .wav file while modifying all values to (0, 0).
import wave
import struct
# Open Files
inf = wave.open(r"piano2.wav", 'rb')
outf = wave.open(r"output.wav", 'wb')
# Input Parameters
ip = list(inf.getparams())
print('Input Parameters:', ip)
# Example Output: Input Parameters: [2, 2, 48000, 302712, 'NONE', 'not compressed']
# Output Parameters
op = ip[:]
op[0] = 1
outf.setparams(op)
number_of_channels, sample_width, frame_rate, number_of_frames, comp_type, comp_name = ip
format = '<{}h'.format(number_of_channels)
print('# Channels:', format)
# Read >> Second
for index in range(number_of_frames):
frame = inf.readframes(1)
data = struct.unpack(format, frame)
# Here, I change data to (0, 0), testing purposes
print('Before Audio Data:', data)
print('After Modifying Audio Data', (0, 0))
# Change Audio Data
data = (0, 0)
value = data[0]
value = (value * 2) // 3
outf.writeframes(struct.pack('<h', value))
# Close In File
inf.close()
# Close Out File
outf.close()
Is there a better practice or reference material if simply just modifying data segments of .wav files?
Say you wanted to literally add a sound at a specific timestamp, that would be a more appropriate result to my question.
Performance comparison
Let's examine first 3 ways to read WAVE files.
The slowest one - wave module
As you might have noticed already, wave module can be painfully slow. Consider this code:
import wave
import struct
wavefile = wave.open('your.wav', 'r') # check e.g. freesound.org for samples
length = wavefile.getnframes()
for i in range(0, length):
wavedata = wavefile.readframes(1)
data = struct.unpack("<h", wavedata)
For a WAVE as defined below:
Input File : 'audio.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:09:35.71 = 27634080 samples ~ 43178.2 CDDA sectors
File Size : 55.3M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM
it took on average 27.7s to load the full audio. The flip side to the wave module it is that is available out of the box and will work on any system.
The convenient one - audiofile
A much more convenient and faster solution is e.g. audiofile. According to the project description, its focus is on reading speed.
import audiofile as af
signal, sampling_rate = af.read(audio.wav)
This gave me on average 33 ms to read the mentioned file.
The fastest one - numpy
If we decide to skip header (as OP asks) and go solely for speed, numpy is a great choice:
import numpy as np
byte_length = np.fromfile(filename, dtype=np.int32, count=1, offset=40)[0]
data = np.fromfile(filename, dtype=np.int16, count=byte_length // np.dtype(np.int16).itemsize, offset=44)
The header structure (that tells us what offset to use) is defined here.
The execution of that code takes ~6 ms, 5x less than the audioread. Naturally it comes with a price / preconditions: we need to know in advance what is the data type.
Modifying the audio
Once you have the audio in a numpy array, you can modify it at will, you can also decide to stream the file rather than reading everything at once. Be warned though: since sound is a wave, in a typical scenario simply injecting new data at arbitrary time t will lead to distortion of that audio (unless it was silence).
As for writing the stream back, "modifying the container" would be terribly slow in Python. That's why you should either use arrays or switch to a more suitable language (e.g. C).
If we go with arrays, we should mind that numpy knows nothing about the WAVE format and therefore we'd have to define the header ourselves and write individual bytes. Perfectly feasible exercise, but clunky. Luckily, scipy provides a convenient function that has the benefits of numpy speed (it uses numpy underneath), while making the code much more readable:
from scipy.io.wavfile import write
fs = np.fromfile('audio.wav', dtype=np.int32, count=1, offset=24)[0] # we need sample rate
with open('audio_out.wav', 'a') as fout:
new_data = data.append(np.zeros(2 * fs)) # append 2 seconds of zeros
write(fout, fs, new_data)
It could be done in a loop, where you read a chunk with numpy / scipy, modify the array (data) and write to the file (with a for append).

Python, speech_recognition tool does not recognize .wav file

I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")

Read and write stereo .wav file with python + metadatas

What's the easiest way to read and write a stereo .wav file in Python ?
Should I use scipy.io.wavfile.read ?
Should I use a 2-dimension array (how ?) in order to have x[n,j] where j is the channel number?
I also want to read/write metadatas stored in the wav file like the markers, MIDI root note (Soundforge, as well as other sound editors, can read/write this specific .wav metadata called "MIDI root note")
Thank you
PS : I already know how to do with a mono file :
from scipy.io.wavfile import read
(fs, x) = read('test.wav')
Here is an updated version of scipy.io.wavfile that adds:
24 bit .wav files support for read/write,
access to cue markers,
cue marker labels,
some other metadata like pitch (if defined), etc.
wavfile.py (enhanced)
Old (original) answer: a solution for only a part of the question (ie read stereo samples):
(fs, x) = read('stereo_small-file.wav')
print len(x.shape) # 1 if mono, 2 if stereo
# if stereo, x is a 2-dimensional array, so we can access both channels with :
print x[:,0]
print x[:,1]
Take a look at Pythons' wave module

Plot spectogram from mp3

I am trying to plot a spectogram straight from an mp3 file in python 2.7.3 (using ubuntu). I can do it from a wav file as follows.
#!/usr/bin/python
from scikits.audiolab import wavread
from pylab import *
signal, fs, enc = wavread('XC124158.wav')
specgram(signal)
show()
What's the cleanest way to do the same thing from an mp3 file instead of a wav? I don't want to convert all the mp3 files to wav if I can avoid it.
Another very simple way of plotting spectrogram of mp3 file.
from pydub import AudioSegment
import matplotlib.pyplot as plt
from scipy.io import wavfile
from tempfile import mktemp
mp3_audio = AudioSegment.from_file('speech.mp3', format="mp3") # read mp3
wname = mktemp('.wav') # use temporary file
mp3_audio.export(wname, format="wav") # convert to wav
FS, data = wavfile.read(wname) # read wav file
plt.specgram(data, Fs=FS, NFFT=128, noverlap=0) # plot
plt.show()
This uses the pydub library which is more convenient compared to calling external commands.
This way you can iterate over all your .mp3 files without having to convert them to .wav prior to plotting.
I'd install the Debian/Ubuntu package libav-tools and call avconv to decode the mp3 to a temporary wav file:
Edit: Your other question was closed, so I'll expand my answer here a bit with a simple bandpass filtering example. In the file you linked it looks like most of the birdsong is concentrated in 4 kHz - 5.5 kHz.
import os
from subprocess import check_call
from tempfile import mktemp
from scikits.audiolab import wavread, play
from scipy.signal import remez, lfilter
from pylab import *
# convert mp3, read wav
mp3filename = 'XC124158.mp3'
wname = mktemp('.wav')
check_call(['avconv', '-i', mp3filename, wname])
sig, fs, enc = wavread(wname)
os.unlink(wname)
# bandpass filter
bands = array([0,3500,4000,5500,6000,fs/2.0]) / fs
desired = [0, 1, 0]
b = remez(513, bands, desired)
sig_filt = lfilter(b, 1, sig)
sig_filt /= 1.05 * max(abs(sig_filt)) # normalize
subplot(211)
specgram(sig, Fs=fs, NFFT=1024, noverlap=0)
axis('tight'); axis(ymax=8000)
title('Original')
subplot(212)
specgram(sig_filt, Fs=fs, NFFT=1024, noverlap=0)
axis('tight'); axis(ymax=8000)
title('Filtered')
show()
play(sig_filt, fs)

Categories

Resources