Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have been using Google Speech Recognition for Python. Here is my code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(r.recognize_google(audio))
Although the recognition is very accurate, it takes about 4-5 seconds before it spits out the recognized text. Since I am creating a voice assistant, I want to modify the above code to allow speech recognition to be much faster.
Is there any way we can lower this number to about 1-2 seconds? If possible, I am trying to make recognition as fast as services such as Siri and Ok Google.
I am very new to python, so my apologies if there is a simple answer to my question.
You could use another speech recognition program. For example, you could set up an account with IBM to use their Watson Speech To Text.
If possible, try and use their websocket interface, because then it actively transcribes what you are saying while you are still speaking.
An example (not using websockets) would be:
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Adjusting for background noise. One second")
r.adjust_for_ambient_noise(source)
print("Say something!")
audio = r.listen(source)
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))
You could also attempt using pocketsphinx, but personally, I have not had particularly good experiences with it. It is offline (a plus) but, for me, wasn't particularly accurate. You could probably tweak with some detection settings and cancel out some background noise. I believe there is also a training option to get it modified to your voice, but it doesn't look straightforward.
Some useful links:
Speech recognition
Microphone recognition example
IBM Watson Speech to Text
Good luck. Once speech recognition works correctly, it is very useful and rewarding!
Use proper input channel and adjustment for best results:
def speech_to_text():
required=-1
for index, name in enumerate(sr.Microphone.list_microphone_names()):
if "pulse" in name:
required= index
r = sr.Recognizer()
with sr.Microphone(device_index=required) as source:
r.adjust_for_ambient_noise(source)
print("Say something!")
audio = r.listen(source, phrase_time_limit=4)
try:
input = r.recognize_google(audio)
print("You said: " + input)
return str(input)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
Related
I am currently following the Real_Python guide on the SpeechRecognition package, and I have completed it with success, but my use case isn't covered in the tutorial, as I am trying to make an assistant similar to Alexa, or Siri, but much more basic. I am not able to get it to start recognizing as I say a keyword. However, I am unsure of where to start with this. Here is what I have so far:
r = sr.Recognizer()
file = sr.AudioFile('C:\PP\CodingTrash\chill.wav')
mic = sr.Microphone()
r.dynamic_energy_threshold = True
with mic as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
text = r.recognize_google(audio)
print(text)
Any help is greatly appreciated.
if you are good to go with google then you may try with google cloud, then is a bunch of things on the internet for google cloud in python integration or you can prefer code with harry's youtube channel regarding speech recognition with pyttsx3.
I was able to build a simple chatbot and I converted it to voice enabled voicebot with the help of this YouTube tutorial. So as the step 1) I convert voice input to text and step 2) convert bot message to audio clip and play it so the user can hear it. Since I am creating a voice clip inside my project folder, if multiple users try to use the bot at the same time I must have a mechanism to create unique voice clip for each chat session and play it. How to handle this kind of scenario?
I solved it by shifting to pyttsx3 library
engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id) #female voice
engine.say(bot_message)
engine.runAndWait()
I am playing with Google Cloud Speech API. I was wondering if I use the python speech recognition library and call the google cloud speech API, is that still a valid way to use the API? I just want to transcribe the text.
I am confused about the difference between them and if there is any suggested way if I just want to transcribe the audio.
Using Python SpeechRecognition:
import speech_recognition as sr
r = sr.Recognizer()
r.recognize_google_cloud()
harvard = sr.AudioFile('harvard.wav')
with harvard as source:
audio = r.record(source)
r.recognize_google(audio)
Not using Python SpeechRecognition:
from google.cloud import speech_v1 as speech
def speech_to_text(config, audio):
client = speech.SpeechClient()
response = client.recognize(config, audio)
print_sentences(response)
def print_sentences(response):
for result in response.results:
best_alternative = result.alternatives[0]
transcript = best_alternative.transcript
confidence = best_alternative.confidence
print('-' * 80)
print(f'Transcript: {transcript}')
print(f'Confidence: {confidence:.0%}')
config = {'language_code': 'en-US'}
audio = {'uri': 'gs://cloud-samples-data/speech/brooklyn_bridge.flac'}
If you only plan to use Google Cloud Platform for speech recognition, then SpeechClient would be better because that is maintained by Google.
If you want to try out different speech recognition services, speech_recognition would help with that since it is more generic.
Any way of calling the api is fine. The libraries are just to make it easier for you.
Google Cloud Client Libraries are the recommended option for accessing Cloud APIs programmatically:
Provide idiomatic, generated or hand-written code in each language, making the Cloud API simple and intuitive to use.
Handle all the low-level details of communication with the server, including authenticating with Google.
Can be installed using familiar package management tools such as npm and pip.
In some cases, give you performance benefits by using gRPC. You can find out more in the gRPC APIs section below.
Also, be aware of the best practices to get better results from the API.
I am trying to develop a french learning app, for which i using Python Speech recognition API to detect what a person has said and then provide him feedback about what he said and how much he had to improve. But the response of the API is very-very slow. What could be the reason.
In one of the answer from the stack overflow, i found to check the input source for my application. I tried both with internal microphone and with my headset microphone but nothing worked. Parallel to this, i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor, so i assume the application is receiving the sound from the microphone.
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print("Testing Online ASR module. Backend Google Web Speech API..\n")
while(1):
print("Speak Now..\n")
audio = r.listen(source)
try:
text = r.recognize_google(audio, language='fr-FR')
response = json.dumps(text, ensure_ascii=False).encode('utf8')
print("You Said: " + str(response))
except Exception as e:
print(" ")
What could be the reason.
It sends the data on the other side of the planed where data is stored and analyzed by NSA first, only when NSA approves you get the results
i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor
The right way would be to try something NN-based like Kaldi
I am using the following code to use voice recognition in python
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print("Say something!")
audio = r.listen(source)
print(type(audio))
BING_KEY = 'KEY' # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
print(type(r.recognize_bing(audio, key=BING_KEY)))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
But it is very slow, it even lags for 20 seconds! which is very slow, can you recommend any REAL TIME voice recognition api in python? or any suggested modifications for that code
I use Bing Speech API but I dont use a client library like you. I use the REST api. I get audio live using PyAudio and when I detect that the noise level has gone up I start recording the sound to a wav file then when its finished I send to the audio data to the endpoint that the api documentation gives you. It gives me a response rather quickly, at most 3 seconds but it kinda depends on your wifi speed. My method is more involved than yours but it is worth it.
here is a link to the documentation. They use C# in their examples but since it is an online api, if you send the right information in the headers and such it should still work.