I was able to build a simple chatbot and I converted it to voice enabled voicebot with the help of this YouTube tutorial. So as the step 1) I convert voice input to text and step 2) convert bot message to audio clip and play it so the user can hear it. Since I am creating a voice clip inside my project folder, if multiple users try to use the bot at the same time I must have a mechanism to create unique voice clip for each chat session and play it. How to handle this kind of scenario?
I solved it by shifting to pyttsx3 library
engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id) #female voice
engine.say(bot_message)
engine.runAndWait()
Related
I am currently following the Real_Python guide on the SpeechRecognition package, and I have completed it with success, but my use case isn't covered in the tutorial, as I am trying to make an assistant similar to Alexa, or Siri, but much more basic. I am not able to get it to start recognizing as I say a keyword. However, I am unsure of where to start with this. Here is what I have so far:
r = sr.Recognizer()
file = sr.AudioFile('C:\PP\CodingTrash\chill.wav')
mic = sr.Microphone()
r.dynamic_energy_threshold = True
with mic as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
text = r.recognize_google(audio)
print(text)
Any help is greatly appreciated.
if you are good to go with google then you may try with google cloud, then is a bunch of things on the internet for google cloud in python integration or you can prefer code with harry's youtube channel regarding speech recognition with pyttsx3.
I am playing with Google Cloud Speech API. I was wondering if I use the python speech recognition library and call the google cloud speech API, is that still a valid way to use the API? I just want to transcribe the text.
I am confused about the difference between them and if there is any suggested way if I just want to transcribe the audio.
Using Python SpeechRecognition:
import speech_recognition as sr
r = sr.Recognizer()
r.recognize_google_cloud()
harvard = sr.AudioFile('harvard.wav')
with harvard as source:
audio = r.record(source)
r.recognize_google(audio)
Not using Python SpeechRecognition:
from google.cloud import speech_v1 as speech
def speech_to_text(config, audio):
client = speech.SpeechClient()
response = client.recognize(config, audio)
print_sentences(response)
def print_sentences(response):
for result in response.results:
best_alternative = result.alternatives[0]
transcript = best_alternative.transcript
confidence = best_alternative.confidence
print('-' * 80)
print(f'Transcript: {transcript}')
print(f'Confidence: {confidence:.0%}')
config = {'language_code': 'en-US'}
audio = {'uri': 'gs://cloud-samples-data/speech/brooklyn_bridge.flac'}
If you only plan to use Google Cloud Platform for speech recognition, then SpeechClient would be better because that is maintained by Google.
If you want to try out different speech recognition services, speech_recognition would help with that since it is more generic.
Any way of calling the api is fine. The libraries are just to make it easier for you.
Google Cloud Client Libraries are the recommended option for accessing Cloud APIs programmatically:
Provide idiomatic, generated or hand-written code in each language, making the Cloud API simple and intuitive to use.
Handle all the low-level details of communication with the server, including authenticating with Google.
Can be installed using familiar package management tools such as npm and pip.
In some cases, give you performance benefits by using gRPC. You can find out more in the gRPC APIs section below.
Also, be aware of the best practices to get better results from the API.
I am trying to develop a french learning app, for which i using Python Speech recognition API to detect what a person has said and then provide him feedback about what he said and how much he had to improve. But the response of the API is very-very slow. What could be the reason.
In one of the answer from the stack overflow, i found to check the input source for my application. I tried both with internal microphone and with my headset microphone but nothing worked. Parallel to this, i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor, so i assume the application is receiving the sound from the microphone.
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print("Testing Online ASR module. Backend Google Web Speech API..\n")
while(1):
print("Speak Now..\n")
audio = r.listen(source)
try:
text = r.recognize_google(audio, language='fr-FR')
response = json.dumps(text, ensure_ascii=False).encode('utf8')
print("You Said: " + str(response))
except Exception as e:
print(" ")
What could be the reason.
It sends the data on the other side of the planed where data is stored and analyzed by NSA first, only when NSA approves you get the results
i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor
The right way would be to try something NN-based like Kaldi
I am a noob to dialogflow as well as twilio. I am trying to connect my dialogflow bot to a Twilio number.
I got a twilio number and i am using dialogflow small talk set of intents.
What are the steps to connect a dialogflow (voice)bot to a twilio number with voice (not sms, or chat messaging) ?
On twilio side, i found this code
from flask import Flask
from twilio.twiml.voice_response import Gather, Redirect, VoiceResponse, Say
app = Flask(name)
#app.route("/answer", methods=['GET', 'POST'])
def answer_call():
"""Respond to incoming phone calls with a brief message."""
# Start our TwiML response
response = VoiceResponse()
# Read a message aloud to the caller
gather = Gather(input='speech',action='some_url')
gather.say('Welcome to Paradise, please tell us why you\'re calling')
response.append(gather)
return str(response)
if __name__ == "__main__":
app.run(debug=True)
1) I understood that i should put the url of my dialogflow bot into the action argument. Am I right?
2) If yes, where do i find this url? Is it related to this ? => https://cloud.google.com/dialogflow-enterprise/docs/reference/rest/v2/projects.agent.sessions/detectIntent
3) Then, what would be the session name?
I am trying to use the box on the right "Try this API": but whatever string i write, the output received is :
"name does not match pattern: /^projects/[^/]+/agent/sessions/[^/]+/contexts/[^/]+$/"
As mentionned i am a newby, so any insights on the above would be greatly appreciated!
Thank you so much in advance!
Twilio developer evangelist here.
I've not worked with Dialogflow and Twilio Voice directly (I prefer to connect voice with Twilio Autopilot as it works out of the box). However I know that there is no direct connect between Twilio Voice and Dialogflow.
You are on the right track though. Using <Gather> will capture the user's speech and translate it to text (actually using Google's Cloud Speech API). That text will be sent to your action URL as the SpeechResult. You can't connect that directly to your Dialogflow API endpoint because Dialogflow will expect the parameter to be different and Twilio will expect the result to be TwiML.
Instead you will want to setup the action endpoint on your own server, retrieve the SpeechResult and then send that on to Dialogflow for the result. You might find it easier to interact with the Dialogflow API by installing the Dialogflow Python client and using it to send the request (check out the documentation here). Once you get the result back from Dialogflow you can then use it to construct TwiML to create a new <Gather> for further input or just a <Say> to return the response.
Let me know if this points you in the right direction.
I am using the following code to use voice recognition in python
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print("Say something!")
audio = r.listen(source)
print(type(audio))
BING_KEY = 'KEY' # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
try:
print(type(r.recognize_bing(audio, key=BING_KEY)))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
But it is very slow, it even lags for 20 seconds! which is very slow, can you recommend any REAL TIME voice recognition api in python? or any suggested modifications for that code
I use Bing Speech API but I dont use a client library like you. I use the REST api. I get audio live using PyAudio and when I detect that the noise level has gone up I start recording the sound to a wav file then when its finished I send to the audio data to the endpoint that the api documentation gives you. It gives me a response rather quickly, at most 3 seconds but it kinda depends on your wifi speed. My method is more involved than yours but it is worth it.
here is a link to the documentation. They use C# in their examples but since it is an online api, if you send the right information in the headers and such it should still work.