I have the following script which works when running in a terminal:
All is does is turn microphone speech to text.
import speech_recognition as sr
# obtain audio from microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
Would it be possible to have this work on Django after pushing a button?
Something like:
View:
import speech_recognition as sr
# Create your views here.
def index(request):
return render(request, 'app/index.html')
def text(request):
r = sr.Recognizer()
with sr.Microphone() as source:
#print("Say something!")
audio = r.listen(source)
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
speech = r.recognize_google(audio)
except sr.UnknownValueError:
speech = "Google Speech Recognition could not understand audio"
except sr.RequestError as e:
speech = "Could not request results from Google Speech Recognition service; {0}".format(e)
return render(request, 'app/text', {'speech': speech})
Template:
<form action="/text/" method="post">
<input type="button" value="Start listening" />
</form>
Is this possible? Am I close or not at all?
Django has no access to the users' computers so if you try to record from a microphone you will be using the server's microphone, if it has one.
You need to record using JS/HTML5 and then send the data to django to process with AJAX. You might even be able to stream it but it's probably not worth the effort.
Related
I'm writing a simple calculator using Google API and need to store the recognized text from stream audio input to process it further. The API has a single_utterance config option that seems to be extremely helpful, yet I can't find a way to set it.
In Cloud Text-to-Speech library there's a line that says:
set the single_utterance field to true in the StreamingRecognitionConfig object.
Example code on Speech Recognition github
#!/usr/bin/env python3
# NOTE: this example requires PyAudio because it uses the Microphone class
from threading import Thread
try:
from queue import Queue # Python 3 import
except ImportError:
from Queue import Queue # Python 2 import
import speech_recognition as sr
r = sr.Recognizer()
audio_queue = Queue()
def recognize_worker():
# this runs in a background thread
while True:
audio = audio_queue.get() # retrieve the next audio processing job from the main thread
if audio is None: break # stop processing if the main thread is done
# received audio data, now we'll recognize it using Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
audio_queue.task_done() # mark the audio processing job as completed in the queue
# start a new thread to recognize audio, while this thread focuses on listening
recognize_thread = Thread(target=recognize_worker)
recognize_thread.daemon = True
recognize_thread.start()
with sr.Microphone() as source:
try:
while True: # repeatedly listen for phrases and put the resulting audio on the audio processing job queue
audio_queue.put(r.listen(source))
except KeyboardInterrupt: # allow Ctrl + C to shut down the program
pass
audio_queue.join() # block until all current audio processing jobs are done
audio_queue.put(None) # tell the recognize_thread to stop
recognize_thread.join() # wait for the recognize_thread to actually stop
Recognizer class from the same github
def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.
The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
"""
assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string"
assert isinstance(language, str), "``language`` must be a string"
flac_data = audio_data.get_flac_data(
convert_rate=None if audio_data.sample_rate >= 8000 else 8000, # audio samples must be at least 8 kHz
convert_width=2 # audio samples must be 16-bit
)
if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw"
url = "http://www.google.com/speech-api/v2/recognize?{}".format(urlencode({
"client": "chromium",
"lang": language,
"key": key,
"pFilter": pfilter
}))
request = Request(url, data=flac_data, headers={"Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate)})
# obtain audio transcription results
try:
response = urlopen(request, timeout=self.operation_timeout)
except HTTPError as e:
raise RequestError("recognition request failed: {}".format(e.reason))
except URLError as e:
raise RequestError("recognition connection failed: {}".format(e.reason))
response_text = response.read().decode("utf-8")
# ignore any blank blocks
actual_result = []
for line in response_text.split("\n"):
if not line: continue
result = json.loads(line)["result"]
if len(result) != 0:
actual_result = result[0]
break
# return results
if show_all: return actual_result
if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
if "confidence" in actual_result["alternative"]:
# return alternative with highest confidence score
best_hypothesis = max(actual_result["alternative"], key=lambda alternative: alternative["confidence"])
else:
# when there is no confidence available, we arbitrarily choose the first hypothesis.
best_hypothesis = actual_result["alternative"][0]
if "transcript" not in best_hypothesis: raise UnknownValueError()
return best_hypothesis["transcript"]
The recognize_google() function in Recognizer class seems to have no passable argument that could set the single_utterance field to true.
Apart from the github, Google docs don't really include the Speech Recognition library.
I've tried to change the key in order to connect using my own API and take a look from console's perspective, but I've ran into problems there as well.
I'm trying to make a Python app, which behaves like Alexa, Cortana or Google's "Ok, Google".
I want it to constantly listen for a specific keyword. After it hears the keyword I want it to execute a function.
How can I do this?
Take a look at Speech Recognition This is a library that allows speech recognition including Google Cloud Speech API.
Relating to the second part of your question this seems relevant:
How can i detect one word with speech recognition in Python
Once you can listen for a word just call your function.
import speech_recognition as sr
# get audio from the microphone
r = sr.Recognizer()
keywork = "hello"
with sr.Microphone() as source:
print("Speak:")
audio = r.listen(source)
try:
if r.recognize_google(audio)) == keyword:
myfunction()
except sr.UnknownValueError:
print("Could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
This code was adapted from this tutorial.
This will search for your keyword in the entire speech.
import speech_recognition as sr
r = sr.Recognizer()
keyWord = 'joker'
with sr.Microphone() as source:
print('Please start speaking..\n')
while True:
audio = r.listen(source)
try:
text = r.recognize_google(audio)
if keyWord.lower() in text.lower():
print('Keyword detected in the speech.')
except Exception as e:
print('Please speak again.')
To improve your accuracy, you can even fetch all possible rhyming words and compare.
You can also do a character by character comparison to calculate the match percentage and set a minimum percentage threshold for the comparison success.
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
BING_KEY = "" #API KEY HERE
try:
print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY))
except sr.UnknownValueError:
print("Microsoft Bing Voice Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))
########################################################################################################
voiceLine = r.recognize_bing(audio, key=BING_KEY)
if r.recognize_bing(audio, key=BING_KEY) == "Hello":
print("big nerd")
#### if voiceLine == "Hello":
#### print("big nerd")
The problem seems to be with the voiceline not acting like a normal string in "if" statements... (I'm farely new to python so please go easy on me :c) I am also aware that the indents are not in the right place, I don't know how to use this site, kms.
I tried this and it worked:
faudio = r.recognize_bing(audio, key=BING_KEY)
if faudio.strip() == "whatever":
do something...
If you are using external or Bluetooth microphone, you may need to install extra python package:
!pip install pyaudio
Then import the package in your code without changing anything, but you can check for the list of supported mics in your pc
import pyaudio
# check the list of microphones
sr.Microphone.list_microphone_names()
for a complete tutorial on speech recognition, it worth checking this website:
https://realpython.com/python-speech-recognition/
I'm trying to use Google Speech recognition, My problem is that after saying something into microphone, results are always same.
My Function looks like this
def RecordAudio():
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Speech recognition using Google Speech Recognition
try:
print("You said: " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return audio
When I'm trying to get data
data = RecordAudio()
Data is always equals to this
speech_recognition.AudioData object at 0x111a8b358
But what is strange for me is that, when this code is in one file, without function and without returning result, in that time this works.
For example I've test.py file
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# Speech recognition using Google Speech Recognition
try:
print("You said: " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
This is working but first example not.
Any idea what's happening ?
I Solved it.
I think that my problem was, that this function
audio = r.listen(source)
returns AudioData and when I was printing it
thats why it was speech_recognition.AudioData object at 0x111a8b358
To recognize it we must return
return r.recognize_google(audio)
It will return real string and not AudioData.
If somebody will have same problem,hope it helps.
I wish to get a simple speech recognition that works. I have been looking at this on speech_recognition, When I execute the code the following error occurs
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
try:
print("You said " + r.recognize(audio))
except LookupError:
print("You said " + r.recognize(audio)) # recognize speech using Google Speech Recognition
AttributeError: 'Recognizer' object has no attribute 'recognize'
print("Could not understand audio")
This was copied from their examples on their web page
I was facing the same problem. The problem was that I had not set the minimum threshold level. So I added this code.
import speech_recognition as sr
r = sr.Recognizer()
m = sr.Microphone()
#set threhold level
with m as source: r.adjust_for_ambient_noise(source)
print("Set minimum energy threshold to {}".format(r.energy_threshold))
# obtain audio from the microphone
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(r.recognize_google(audio))
Now its working perfect!!!
I got it working.
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(r.recognize_google(audio))
What version of SpeechRecognition library do you use?
You can check it with:
import speech_recognition as sr
print sr.__version__
If you are using the latest version (SpeechRecognition 3.1.3), you should use a recognize_google() method instead of recognize(), to use Google API.
As for the latest documentation, you can look it up here, and also some useful examples.
I have taken a code that actually executes but fails to print out what I say
NOTE: this example requires PyAudio because it uses the Microphone class
import speech_recognition as sr
import pyaudio
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Speech Recognition
try:
# for testing purposes, we're just using the default API key
# to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
# instead of `r.recognize_google(audio)`
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
# recognize speech using Wit.ai
WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings
try:
print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY))
except sr.UnknownValueError:
print("Wit.ai could not understand audio")
except sr.RequestError as e:
print("Could not request results from Wit.ai service; {0}".format(e))
# recognize speech using IBM Speech to Text
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))
# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from AT&T Speech to Text service; {0}".format(e))
I get the following read out
Say something!
Google Speech Recognition thinks you said hello
Could not request results from Wit.ai service; recognition request failed: Bad Request
Could not request results from IBM Speech to Text service; recognition request failed: Unauthorized
Could not request results from AT&T Speech to Text service; credential request failed: Unauthorized