I am playing with Google Cloud Speech API. I was wondering if I use the python speech recognition library and call the google cloud speech API, is that still a valid way to use the API? I just want to transcribe the text.
I am confused about the difference between them and if there is any suggested way if I just want to transcribe the audio.
Using Python SpeechRecognition:
import speech_recognition as sr
r = sr.Recognizer()
r.recognize_google_cloud()
harvard = sr.AudioFile('harvard.wav')
with harvard as source:
audio = r.record(source)
r.recognize_google(audio)
Not using Python SpeechRecognition:
from google.cloud import speech_v1 as speech
def speech_to_text(config, audio):
client = speech.SpeechClient()
response = client.recognize(config, audio)
print_sentences(response)
def print_sentences(response):
for result in response.results:
best_alternative = result.alternatives[0]
transcript = best_alternative.transcript
confidence = best_alternative.confidence
print('-' * 80)
print(f'Transcript: {transcript}')
print(f'Confidence: {confidence:.0%}')
config = {'language_code': 'en-US'}
audio = {'uri': 'gs://cloud-samples-data/speech/brooklyn_bridge.flac'}
If you only plan to use Google Cloud Platform for speech recognition, then SpeechClient would be better because that is maintained by Google.
If you want to try out different speech recognition services, speech_recognition would help with that since it is more generic.
Any way of calling the api is fine. The libraries are just to make it easier for you.
Google Cloud Client Libraries are the recommended option for accessing Cloud APIs programmatically:
Provide idiomatic, generated or hand-written code in each language, making the Cloud API simple and intuitive to use.
Handle all the low-level details of communication with the server, including authenticating with Google.
Can be installed using familiar package management tools such as npm and pip.
In some cases, give you performance benefits by using gRPC. You can find out more in the gRPC APIs section below.
Also, be aware of the best practices to get better results from the API.
Related
I am currently following the Real_Python guide on the SpeechRecognition package, and I have completed it with success, but my use case isn't covered in the tutorial, as I am trying to make an assistant similar to Alexa, or Siri, but much more basic. I am not able to get it to start recognizing as I say a keyword. However, I am unsure of where to start with this. Here is what I have so far:
r = sr.Recognizer()
file = sr.AudioFile('C:\PP\CodingTrash\chill.wav')
mic = sr.Microphone()
r.dynamic_energy_threshold = True
with mic as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
text = r.recognize_google(audio)
print(text)
Any help is greatly appreciated.
if you are good to go with google then you may try with google cloud, then is a bunch of things on the internet for google cloud in python integration or you can prefer code with harry's youtube channel regarding speech recognition with pyttsx3.
I am trying to use python googletrans package, but it produces lower-quality translations than the web interface of Google Translate.
Here is my code:
from googletrans import Translator
translator = Translator()
text = 'Проверим, насколько качественным получается перевод, если пользоваться веб-интерфейсом.'
result = translator.translate(text, src='ru', dest='en')
print(result.text)
The output is:
We check to see how well it turns out the translation, if you use the web interface.
The translation I obtain using the web interface is as follows: Let's check how high-quality the translation is if you use the web interface.
How can this difference be explained and is there anything I can do about it?
According to the docs, it's not actually using the official translate API:
The maximum character limit on a single text is 15k.
Due to limitations of the web version of google translate, this API does not
guarantee that the library would work properly at all times. (so
please use this library if you don’t care about stability.)
If you want to use a stable API, I highly recommend you to use Google’s
official translate API.
https://py-googletrans.readthedocs.io/en/latest/
They link to the official API documentation here: https://cloud.google.com/translate/docs
I am trying to develop a french learning app, for which i using Python Speech recognition API to detect what a person has said and then provide him feedback about what he said and how much he had to improve. But the response of the API is very-very slow. What could be the reason.
In one of the answer from the stack overflow, i found to check the input source for my application. I tried both with internal microphone and with my headset microphone but nothing worked. Parallel to this, i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor, so i assume the application is receiving the sound from the microphone.
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print("Testing Online ASR module. Backend Google Web Speech API..\n")
while(1):
print("Speak Now..\n")
audio = r.listen(source)
try:
text = r.recognize_google(audio, language='fr-FR')
response = json.dumps(text, ensure_ascii=False).encode('utf8')
print("You Said: " + str(response))
except Exception as e:
print(" ")
What could be the reason.
It sends the data on the other side of the planed where data is stored and analyzed by NSA first, only when NSA approves you get the results
i am also using CMUSphinx speech API which detect the sound and responses quickly but the accuracy is very poor
The right way would be to try something NN-based like Kaldi
I tried to use some of the services provided by the Google cloud API, followed the tutorial, and after installing the SDK and authorization files, I started calling. But either way, the program gets stuck after it executes, and then it waits, with no results or any errors. Trying to use Java and python is the same thing. How do you troubleshoot this problem now
I use it in China, but I already use the global proxy model. Otherwise I wouldn't be able to download some of the python packages
import six
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
text = 'President Kennedy spoke at the White House.'
client = language.LanguageServiceClient()
if isinstance(text, six.binary_type):
text = text.decode('utf-8')
# Instantiates a plain text document.
document = types.Document(
content=text,
type=enums.Document.Type.PLAIN_TEXT)
# Detects syntax in the document. You can also analyze HTML with:
# document.type == enums.Document.Type.HTML
tokens = client.analyze_syntax(document).tokens
for token in tokens:
part_of_speech_tag = enums.PartOfSpeech.Tag(token.part_of_speech.tag)
print(u'{}: {}'.format(part_of_speech_tag.name,
token.text.content))
No useful information or error messages could be obtained, and the program did not interrupt, waiting for the result to be returned
China blocks many/most Google services (endpoints) from being accessed inside China. If you are doing this at home, this is almost a certainty.
Your best option is to signup with Alibaba Cloud and then use a VM to access Google services. Alibaba's network will "usually" get you thru the great firewall.
I have problems with the authentication in the Python Library of Google Cloud API.
At first it worked for some days without problem, but suddenly the API calls are not showing up in the API Overview of the Google CloudPlatform.
I created a service account and stored the json file locally. Then I set the environment variable GCLOUD_PROJECT to the project ID and GOOGLE_APPLICATION_CREDENTIALS to the path of the json file.
from google.cloud import speech
client = speech.Client()
print(client._credentials.service_account_email)
prints the correct service account email.
The following code transcribes the audio_file successfully, but the Dashboard for my Google Cloud project doesn't show anything for the activated Speech API Graph.
import io
with io.open(audio_file, 'rb') as f:
audio = client.sample(f.read(), source_uri=None, sample_rate=48000, encoding=speech.encoding.Encoding.FLAC)
alternatives = audio.sync_recognize(language_code='de-DE')
At some point the code also ran in some errors, regarding the usage limit. I guess due to the unsuccessful authentication, the free/limited option is used somehow.
I also tried the alternative option for authentication by installing the Google Cloud SDK and gcloud auth application-default login, but without success.
I have no idea where to start troubleshooting the problem.
Any help is appreciated!
(My system is running Windows 7 with Anaconda)
EDIT:
The error count (Fehler) is increasing with calls to the API. How can I get detailed information about the error?!
Make sure you are using an absolute path when setting the GOOGLE_APPLICATION_CREDENTIALS environment variable. Also, you might want to try inspecting the access token using OAuth2 tokeninfo and make sure it has "scope": "https://www.googleapis.com/auth/cloud-platform" in its response.
Sometimes you will get different error information if you initialize the client with GRPC enabled:
0.24.0:
speech_client = speech.Client(_use_grpc=True)
0.23.0:
speech_client = speech.Client(use_gax=True)
Usually it's an encoding issue, can you try with the sample audio or try generating LINEAR16 samples using something like the Unix rec tool:
rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 5
...
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio_sample = speech_client.sample(
content,
source_uri=None,
encoding='LINEAR16',
sample_rate=44100)
Other notes:
Sync Recognize is limited to 60 seconds of audio, you must use async for longer audio
If you haven't already, set up billing for your account
With regards to the usage problem, the issue is in fact that when you use the new google-cloud library to access ML APIs, it seems everyone authenticates to a project shared by everyone (hence it says you've used up your limit even though you've not used anything). To check and confirm this, you can call an ML API that you have not enabled by using the python client library, which will give you a result even though it shouldn't. This problem persists to other language client libraries and OS, so I suspect it's an issue with their grpc.
Because of this, to ensure consistency I always use the older googleapiclient that uses my API key. Here is an example to use the translate API:
from googleapiclient import discovery
service = discovery.build('translate', 'v2', developerKey='')
service_request = service.translations().list(q='hello world', target='zh')
result = service_request.execute()
print(result)
For the speech API, it's something along the lines of:
from googleapiclient import discovery
service = discovery.build('speech', 'v1beta1', developerKey='')
service_request = service.speech().syncrecognize()
result = service_request.execute()
print(result)
You can get the list of the discovery APIs at https://developers.google.com/api-client-library/python/apis/ with the speech one located in https://developers.google.com/resources/api-libraries/documentation/speech/v1beta1/python/latest/.
One of the other benefits of using the discovery library is that you get a lot more options compared to the current library, although often times it's a bit more of a pain to implement.