I'm attempting to get speech recognition for searching working for a project I'm working on. At the minute, I'm only focussing on getting the speech recognition working and I'm using pygsr to do this. I found a post about pygsr on here earlier but I'm currently struggling to get it to work. This is the code that I'm using:
from pygsr import Pygsr
speech = Pygsr()
speech.record(3)
phrase, complete_response =speech.speech_to_text('en_US')
print phrase
After spending a while installing the library with OS X, I finally got it to actually sort of work. It detected the library and worked seemingly but then I would get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pygsr/__init__.py", line 33, in record
data = stream.read(self.chunk)
File "/Library/Python/2.7/site-packages/pyaudio.py", line 605, in read
return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981
I have no idea if this is due to something I'm doing wrong or if I can't use pygsr on OS X. If there is no way for this to work, does anyone have any recommendations for a speech recognition library for OS X that uses Python 2.7?
you could test if it works correctly pyaudio running this script:
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
I received many reports that pygsr macOS is not working, but I could not fix it because I could not test it on a mac.
Related
In Short
Is there a way to convert raw audio data (obtained by PyAudio module) into the form of virtual file (can be obtained by using python open() function), without saving it to the disk and read it from the disk? Details are provided as belows.
What Am I Doing
I'm using PyAudio to record audio, then it will be fed into a tensorflow model to get prediction. Currently, it works when I firstly save the recorded sound as .wav file on the disk, and then read it again to feed it into the model. Here is the code of recording and saving:
import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)] # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
After I get the raw audio data (variable frames), it can be saved by using python wave module as belows. We can see that when saving, some meta message must be saved by calling functions like wf.setxxx.
import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
And here is the code of using the saved file to run inference on tensorflow model. It just simply read it as binary then the model will handle the rest.
import classifier # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)
THE PROBLEM
For real-time needs, I need to keep streaming the audio and feeding it into the model once a while. But it seems unreasonable to keep saving the file on the disk and then read it again and again, which will spend losts of time on I/O.
I want to keep the data in memeory and use it directly, rather than saving and reading it repeatedly. However, python wave module does not support reading and writing simultaneously (refers here).
If I directly feed the data without some meta data (e.g. channels, frame rate) (which can be added by wave module during saving) like this:
w = b''.join(frames)
classifier.run_graph(w, labels, 5)
I will get error as belows:
2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found
Traceback (most recent call last):
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found
The tensorflow model I'm using is provided here: ML-KWS-for-MCU, hope this helps.
Here is the code that produces the error: (classifier.run_graph())
def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0
You should be able to use io.BytesIO instead of a physical file, they share the same interface but BytesIO is only kept in memory:
import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()
This should allow you to continuously stream the data into the file while reading the excess using your TensorFlow code.
I am using librosa library to do data analysis on an audio file in .wav format. But it seems librosa can only read or write audio file in form of an array apart from feature extraction. I would also like to play the audio file with my analysis code.
In Ipython notebook, I can use Ipython.display.audio to play audio directly in Ipython ntoebook, but when I convert code to .py, I doesn't work, so I need something that can be used for the same purpose.
You could use pydub to load the audio file (mp3, wav, ogg, raw) and simpleaudio for playback. Just do
sound = pydub.AudioSegment.from_wav('audiofile.wav')
playback = simpleaudio.play_buffer(
sound.raw_data,
num_channels=sound.channels,
bytes_per_sample=sound.sample_width,
sample_rate=sound.frame_rate
)
And voila! you finally got your beats going. To stop just call playback.stop()
If you want to use a blacking mode where the execution will wait until the streaming is finished you can use pyaudio blocking mode full documentation
example:
"""PyAudio Example: Play a wave file."""
import pyaudio
import wave
import sys
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s filename.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
# read data
data = wf.readframes(CHUNK)
# play stream (3)
while len(data) > 0:
stream.write(data)
data = wf.readframes(CHUNK)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
Installed pyglet, pygame and other libraries to run the audio file recorded using the pyAudio code. No error occurs and the code runs successfully but no sound is heard in the output speaker/headsets/earphones.
The audio output jack is working otherwise but doesn't play audio when we run any audio file through python shell module coding.
What can be the issue?
import pyaudio
import wave
import sys
CHUNK = 1024
if len(sys.argv) < 2:
print("Plays a wave file.\n\nUsage: %s test2.wav" % sys.argv[0])
sys.exit(-1)
wf = wave.open(sys.argv[1], 'rb')
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output=True)
data = wf.readframes(CHUNK)
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
stream.stop_stream()
stream.close()
p.terminate()
I need to transcribe the speech that is being written to a wav file. I've implemented the following iterator to try to incrementally read the audio from the file:
import wave
def read_audio(path, chunk_size=1024):
wave_file = wave.open(open(path, 'rb'))
while True:
data = wave_file.readframes(chunk_size)
if data != "":
yield data
In order to test the generator, I've implemented a function that keeps writing to a wav file the audio captured by the computer's microphone:
import pyaudio
def record_to_file(out_path):
fmt = pyaudio.paInt16
channels = 1
rate = 16000
chunk = 1024
audio = pyaudio.PyAudio()
stream = audio.open(format=fmt, channels=channels,
rate=rate, input=True,
frames_per_buffer=chunk)
wave_file = wave.open(out_path, 'wb')
wave_file.setnchannels(channels)
wave_file.setsampwidth(audio.get_sample_size(fmt))
wave_file.setframerate(rate)
while True:
data = stream.read(chunk)
waveFile.writeframes(data)
Below is the test script:
import threading
import time
WAV_PATH='out.wav'
def record_worker():
record_to_file(WAV_PATH)
if __name__=='__main__':
t = threading.Thread(target=record_worker)
t.setDaemon(True)
t.start()
time.sleep(5)
reader = read_audio(WAV_PATH)
for chunk in reader:
print(len(chunk))
It doesn't work as I'd expect - the reader stops yielding after a while. Since the test is successful if I adapt record_file to set the wav file's nframes to a very large number beforehand and do the writing with writeframesraw, my guess is that wave.open eagerly reads nframes, not trying to read anything after that number of frames has been read.
Is it possible to obtain that incremental read in Python 2.7 without resorting to this setnframes hack? It's worth noting that, contrary to the test script, I have no control of the wav file's generation in the scenario in which I plan to utilize such feature. The writing gets done by a SWIG-adapted C library named pjsip (http://www.pjsip.org/python/pjsua.htm), so I don't expect it to be possible to do any modifications on that end.
I have a script which is
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.MIMEBase import MIMEBase
from email import encoders
import pyaudio
import wave
from time import sleep
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "file.wav"
audio = pyaudio.PyAudio()
# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,
frames_per_buffer=CHUNK)
print "recording..."
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"
# stop Recording
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()
sleep(10)
which records my voice for 5 seconds and saves it in a wav file. Now to loop it I have tried adding the command
while invalid_input :
start()
at the bottom of the script and the command invalid_input = False at the top of the script with no luck. Please make me understand how to loop this script when started; after the sleep(10) command. And also please cooperate with me as I am a newbie in python
Regards,
EDIT: I think I was not clear.
I want it that once it is started and reaches the end of the script, it again goes to the top and then does it again over and over till somone kills it
Alright, as mentioned in the comments, you NEED to indent. Python is a coding language that uses indentation instead of end or using {} ex:
def function():
#Do stuff
Next, I'm not sure what start() is defined as but it won't by default start your script, you need to def start(): and put the recording and saving script inside that function. And then you can call it later using start()
Lastly, your while statement is inverted. If you want to run the loop when invalid_input is false you need to while invalid_input==False:
To make a loop in Python, you need to indent properly. In your code, you have not indented correctly, per Python's convention:
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print "finished recording"