I have generated a .wav audio file containing some speech with some other interference speech in the background.
This code worked for me for a test .wav file:
import speech_recognition as sr
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
text = r.recognize_google(audio)
If I use my .wav file, I get the following error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The situation slightly improves if I save this .wav file with soundfile:
import soundfile as sf
wav, samplerate = sf.read(wav_path)
sf.write(saved_wav_path, original_wav, fs)
and then load the new saved_wav_path back into the first block of code, this time I get:
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
The audio files were saved as
wavfile.write(wav_path, fs, data)
where wav_path = 'data.wav'. Any ideas?
SOLUTION:
Saving the audio data the following way generates the correct .wav files:
import wavio
wavio.write(wav_path, data, fs ,sampwidth=2)
From a brief look at the code in the speech_recognition package, it appears that it uses wave from the Python standard library to read WAV files. Python's wave library does not handle floating point WAV files, so you'll have to ensure that you use speech_recognition with files that were saved in an integer format.
SciPy's function scipy.io.wavfile.write will create an integer file if you pass it an array of integers. So if data is a floating point numpy array, you could try this:
from scipy.io import wavfile
# Convert `data` to 32 bit integers:
y = (np.iinfo(np.int32).max * (data/np.abs(data).max())).astype(np.int32)
wavfile.write(wav_path, fs, y)
Then try to read that file with speech_recognition.
Alternatively, you could use wavio (a small library that I created) to save your data to a WAV file. It also uses Python's wave library to create its output, so speech_recognition should be able to read the files that it creates.
I couldn't figure out what the sampwidth should be for wavio from its documentation; however, I added the following line sounddevice.default.dtype='int32', 'int32' which allowed sounddevice, scipy.io.wavfile.write / soundfile, and speech_recognizer to finally work together. The default dtype for sounddevice was float32 for both input and output. I tried changing only the output but it didnt work. Weirdly, audacity still thinks the output files are in float32. I am not suggesting this is a better solution, but it did work with both soundfile and scipy.
I also noticed another oddity. When sounddevice.default.dtype was left at the default [float32, float32] and I opened the resulting file in audacity. From audacity, I exported it and this exported wav would work with speechrecognizer. Audacity says its export is float32 and the same samplerate, so I don't fully understand. I am a noob but looked at both files in a hex editor and they look the same for the first 64 hex values then they differ... so it seems like the header is the same. Those two look very different than the file I made using int32 output, so seems like there's another factor at play...
Similar to Warren's answer, I was able to resolve this issue by rewriting the WAV file using pydub:
from pydub import AudioSegment
filename = "payload.wav" # File that already exists.
sound = AudioSegment.from_mp3(filename)
sound.export(filename, format="wav")
Related
I have a series of wav files I would like to combine and export as a single wav using Pydub. I would like the audio from the original files to play back at different times in the exported file e.g. the audio in audio_1.wav starts at time=0 in the exported file while the audio in audio_2.wav starts at time=5 instead of both starting at time=0 as the overlay function has them. Is there any way to do this? Below is the code I currently have for importing, overlaying, and exporting the audio files.
from pydub import AudioSegment
audio_1 = AudioSegment.from_file("audio_1.wav",
format="wav")
audio_2 = AudioSegment.from_file("audio_2.wav",
format="wav")
overlay = vln_audio_1.overlay(vla_audio_2)
file_handle = overlay.export("output2.wav", format="wav")
I didn't test it but based on documentation it may need overlay(..., position=5000)
BTW:
you may also add silence at the beginning to move audio
silence_5_seconds = AudioSegment.silent(duration=5000)
audio_2 = silence_5_seconds + audio_2
I'm trying to Convert Speech to text using speech recognition library.
but when I run the Code it shows Value Error About the Audio Type I Tried to change the file format to a lot of audio format like: "PCM, WAV, AIFF, AIFF-C, Mp3, Mp4, FLAC, WebM, wav..." by renaming the file extension. But, it still show the Same Error.
The Error:
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
The Code:
import speech_recognition as sr
filename = "hello.mp3"
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio = r.record(source)
text = r.recognize_google(audio)
print(text)
I dont think renaming the file extension will help you, you should use a file converter to make sure the audio data is beeing correctly encoded in another format. Try using SoundConverter
I have the following code to load a .wav file and play it:
import base64
import winsound
with open('file.wav','rb') as f:
data = base64.b64encode(f.read())
winsound.PlaySound(base64.b64decode(data), winsound.SND_MEMORY)
It plays the file no problem but now I would like to extract a 'chunk' let's say from 233 to 300 and play that portion only.
seg = data[233:300]
winsound.PlaySound(base64.b64decode(seg), winsound.SND_MEMORY)
I Get: TypeError: 'sound' must be str or None, not 'bytes'
PlaySound() is expecting a fully-formed WAV file, not a segment of PCM audio data. From the docs for PlaySound(), the underlying function called by Python's winsound:
The SND_MEMORY flag indicates that the lpszSoundName parameter is a pointer to an in-memory image of the WAVE file.
(emphasis added)
Rather than playing around with the internals of WAV files (though that isn't too hard, if you're interested), I'd suggest a more flexible audio library like pygame's. There, you could use the music module and it's set_pos function or use the Sound class to access the raw audio data and cut it like you proposed in your question.
What's the easiest way to read and write a stereo .wav file in Python ?
Should I use scipy.io.wavfile.read ?
Should I use a 2-dimension array (how ?) in order to have x[n,j] where j is the channel number?
I also want to read/write metadatas stored in the wav file like the markers, MIDI root note (Soundforge, as well as other sound editors, can read/write this specific .wav metadata called "MIDI root note")
Thank you
PS : I already know how to do with a mono file :
from scipy.io.wavfile import read
(fs, x) = read('test.wav')
Here is an updated version of scipy.io.wavfile that adds:
24 bit .wav files support for read/write,
access to cue markers,
cue marker labels,
some other metadata like pitch (if defined), etc.
wavfile.py (enhanced)
Old (original) answer: a solution for only a part of the question (ie read stereo samples):
(fs, x) = read('stereo_small-file.wav')
print len(x.shape) # 1 if mono, 2 if stereo
# if stereo, x is a 2-dimensional array, so we can access both channels with :
print x[:,0]
print x[:,1]
Take a look at Pythons' wave module
Is this possible?? I seem to be getting this error when using wavread from scikits.audiolab:
x86_64.egg/scikits/audiolab/pysndfile/matapi.pyc in basic_reader(filename, last, first)
93 if not hdl.format.file_format == filetype:
94 raise ValueError, "%s is not a %s file (is %s)" \
---> 95 % (filename, filetype, hdl.format.file_format)
96
97 fs = hdl.samplerate
ValueError: si762.wav is not a wav file (is nist)
I'm guessing it can't read NIST wav files but is there another way to easily read them into a numpy array? If not, what is the best way to go about reading in the data?
Possibly rewriting the audiolab wavread to recognize the nist header??
Answer my own question because figured it out but you can use the Sndfile class from scikits.audiolab which supports a multitude of reading and writing file formats depending on the libsndfile you have. Then you just use:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(10000)
play(data) # Just to test the read data
To expand upon J Spen's answer, when using scikits.audiolab, if you want to read the whole file, not just a specified number of frames, you can use the nframes parameter of the Sndfile class to read the whole thing. For example:
from scikits.audiolab import Sndfile, play
f = Sndfile(filename, 'r')
data = f.read_frames(f.nframes)
play(data) # Just to test the read data
I couldn't find any references to this in the documentation, but it is there in the source.
In contrast to the above answers, there is another alternative way to read
audio files in multiple formats, e.g., .wav, .aif, .mp3 etc.
import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://freewavesamples.com/files/Alesis-Sanctuary-QCard-Crotales-C6.wav
data, fs = sf.read('Alesis-Sanctuary-QCard-Crotales-C6.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()
Output:
(88116, 2) 44100