Google Cloud Speech API real time recognition - python

I am developing a Python application for real-time translation. I need to recognize speech in real time: as user says something it automatically sends this piece of audio to Google Speech API and returns a text. So I want the recognized text appearing immediately while speaking.
I've found Streaming Speech Recognition but it seems that I still need to record the full speech first and then send it to the server. Also, there are no examples of how to use it in Python
Is it possible to do this with Google Speech API?

You can do it with Google Speech API.
But, it has a 1 minute content limit.
Please check the link below.
https://cloud.google.com/speech/quotas
So you have to restart every 1 minute.
and the link below is example code of microphone streaming by python.
https://cloud.google.com/speech/docs/streaming-recognize#speech-streaming-recognize-python

Check this link out:
https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py
This is an example for obtaining audio from the microphone. There are several components for the recognition process. In my experience the Sphinx Recognition lacks on accuracy. The Google Speech Recognition works very well.

Working with Google Speech API for real-time transcription is a bit cumbersome. You can use this repository for inspiration
https://github.com/saharmor/realtime-transcription
It transcribes client-side's microphone in real-time (disclaimer: I'm the author).

Related

Does azure conversation transcription service is available in "python"?

Problem statement: Need to transcript the speech to text in real-time and distinguish the user
as speaker 1 and speaker 2 using azure cognitive speech service.
Until now I explore the documentation of azure regarding conversation transcription which provides the sample code for Javascript and C#link for the documentationbut I was not able to find the sample code in python so does that means azure's this service is not available in python?
Does azure conversation transcription service is available in "python"?
No, at present Conversation Transcription SDK does not support Python language.
Conversation Transcription SDK supports only c# and javascript and is only available in few regions like centralus, eastasia, eastus, westeurope.
You can reach Microsoft here for support.

Live Speech to Text Transcription in Python

This is my first post ever, so I hope it is alright.
I am working on a Raspberry Pi Zero W, and I am trying to make a live speech-to-text translator. I have researched, and I think I need to use the SpeechRecognition module, and I have been doing that and did end up writing a program that does what I need it to using the Google speech to text module, and it does the job just not live.
I think for me to make it transcribe it live, I need to use the IBM Watson Speech to Text with something called Websockets.
I can not seem to find a lot of information about those two together yet alone any code, if any of you have any experience with transcribing live to text using this or any other way in Python I would really appreciate if you could point me in the right direction, and any code would be fantastic.
Google have live speech to text transcription API. They also provide the source code to get you started on it. Check this github page. All it does is it listens to your microphone and sends you the text version of whatever you are saying in real time.
This is an example software that works right out of the box. All you need to do is to run with your GOOGLE_APPLICATION_CREDENTIALS saved in your environment variables.
If you have already used it once, you should have set up a billing account already. If not, please do so here.

How to get Hindi voice using google translate/google cloud

I am building a personal assistant, which requires to speak back in hindi. I find it weird that google cloud text to speech doesn't offer hindi language,
https://cloud.google.com/text-to-speech/docs/voices
while google translator speaks back if you ever translate from english to Hindi and click on speaker button.
https://translate.google.com/
So, I read on Internet that
https://translate.google.com.vn/translate_tts?ie=UTF-8&q=ANYTHING_TEXT&tl=en&client=tw-ob
can do the trick. But it is for english. SO if i change the tl to hi, it should work if i replace ANYTHING_TEXT to anything in hindi, but doing so gives me these results:
https://translate.google.com.vn/translate_tts?ie=UTF-8&q=आप%20कैसे%20हैं&tl=hi&client=tw-ob
it is giving me audio I cant understand.
So, my questions are
1) Why we cant access hindi voice using google cloud and can using google translate?
2) How to work around to get hindi voice working in my python file.
3) Google cloud offers google translation api but it only translates the text and gives text at output and not audio. Tell me please if it's true.
https://cloud.google.com/translate/docs/
1) It seems that Google has already developed its Hindi voice which can be used as best effort in a free service such as Google Translate, but if it isn't good enough as you point out, they may be improving it before supporting it for the paid TTS service.
2) You would have to use an alternate TTS service that supports the hindi language until this language gets supported by the Google's TTS API.
3) Yes, the Translation API is only meant for text. You could use the output of the Translation API as input for the TTS API, but since it doesn't support the language you want, you may have to submit a feature request to the Google Cloud's Public Issue Tracker to show interest in the Hindi language support for this service.

Difference between Google Speech Recognition and Google Cloud Speech API

I haven't found anything online, though Python webpage https://pypi.python.org/pypi/SpeechRecognition/ clearly shows that there's a difference. I have been using Google Speech Recognition (not Google Cloud Speech API) without any API key and I am not sure whether it is okay to do so.

How to receive answer from Google Assistant as a String, not as an audio stream

I am using the python libraries from the Assistant SDK for speech recognition via gRPC. I have the speech recognized and returned as a string calling the method resp.result.spoken_request_text from \googlesamples\assistant\__main__.py and I have the answer as an audio stream from the assistant API with the method resp.audio_out.audio_data also from \googlesamples\assistant\__main__.py
I would like to know if it is possible to have the answer from the service as a string as well (hoping it is available in the service definition or that it could be included), and how I could access/request the answer as string.
Thanks in advance.
Currently (Assistant SDK Developer Preview 1), there is no direct way to do this. You can probably feed the audio stream into a Speech-to-Text system, but that really starts getting silly.
Speaking to the engineers on this subject while at Google I/O, they indicated that there are some technical complications on their end to doing this, but they understand the use cases. They need to see questions like this to know that people want the feature.
Hopefully it will make it into an upcoming Developer Preview.
Update: for
google.assistant.embedded.v1alpha2
the assistant SDK includes the field supplemental_display_text
which is meant to extract the assistant response as text which aids
the user's understanding
or to be displayed on screens. Still making the text available to the developer. Goolge assistant documentation

Categories

Resources