This is my first post ever, so I hope it is alright.
I am working on a Raspberry Pi Zero W, and I am trying to make a live speech-to-text translator. I have researched, and I think I need to use the SpeechRecognition module, and I have been doing that and did end up writing a program that does what I need it to using the Google speech to text module, and it does the job just not live.
I think for me to make it transcribe it live, I need to use the IBM Watson Speech to Text with something called Websockets.
I can not seem to find a lot of information about those two together yet alone any code, if any of you have any experience with transcribing live to text using this or any other way in Python I would really appreciate if you could point me in the right direction, and any code would be fantastic.
Google have live speech to text transcription API. They also provide the source code to get you started on it. Check this github page. All it does is it listens to your microphone and sends you the text version of whatever you are saying in real time.
This is an example software that works right out of the box. All you need to do is to run with your GOOGLE_APPLICATION_CREDENTIALS saved in your environment variables.
If you have already used it once, you should have set up a billing account already. If not, please do so here.
Related
I'm doing a simple chatbot with watson. I have a python script. Assume script is this for simplicity:
x=5
x
And in watson i want to return :
result is 5
However, I'm not sure, how to interact with python. My research showed that it is something related to NodeJS and JSON, but I couldn't find any example or tutorial that suites my requirements.
Could someone route me what course of actions should i take or any documentation?
The data between Watson Assistant and a client, an application, is exchanged as JSON-formatted data. The service itself has a REST API and you can use it from any programming language or with command-line tools. For Python, there is even a SDK.
There are some samples written in Python. I recommend my own code :). There is a tool I wrote to interact with Watson Assistant / Watson Conversation (blog entry is here). Another Python sample is what I called EgoBot (blog is here). It shows how you can even change the dialog itself from within the chatbot. Basically, you can tell the bot to learn stuff. The examples should get you started.
So, I've been wanting to make my own personal assistant using Python. I would speak to my headset, call out it's name. Give a command or ask a question. Get a response like either a webpage opening, a program, a spoken string ...
My problem is getting started. I had the idea to use Google's Assistant as a base for my project. I would like to make a framework which would make it extremely easy to add my own commands and questions. I would make something that listens for a keywords. When those keywords get triggered I can program the action that should follow. For example, I could learn it to listen to the keyword "launch", and what comes after would be queried to an array of program shortcuts I made and it would launch the correct program when I ask it to. But when I ask something that I didn't program, the call would get passed on to Google's Assistant and that would give back the response. This would save me the trouble programming all kinds of standard things like "What's the weather?", "What's the time?" etc..
Now I did some research before coming here and there's two big services that keep showing up when I try to find stuff. Wit.ai and Api.ai. Both of these are not what I am looking for. I'm looking for a base personal assistant, preferably as smart as Google's that I can use a base for my project. Can anyone point me in a direction for this? Is it even possible to find a base assistant to start working on top off like this in Python?
You came to the right tag! You're looking for the Google Assistant SDK and the Assistant Library for Python. Although still in Developer Preview, it sounds like it provides much (but not all) of what you're looking for.
I am using the python libraries from the Assistant SDK for speech recognition via gRPC. I have the speech recognized and returned as a string calling the method resp.result.spoken_request_text from \googlesamples\assistant\__main__.py and I have the answer as an audio stream from the assistant API with the method resp.audio_out.audio_data also from \googlesamples\assistant\__main__.py
I would like to know if it is possible to have the answer from the service as a string as well (hoping it is available in the service definition or that it could be included), and how I could access/request the answer as string.
Thanks in advance.
Currently (Assistant SDK Developer Preview 1), there is no direct way to do this. You can probably feed the audio stream into a Speech-to-Text system, but that really starts getting silly.
Speaking to the engineers on this subject while at Google I/O, they indicated that there are some technical complications on their end to doing this, but they understand the use cases. They need to see questions like this to know that people want the feature.
Hopefully it will make it into an upcoming Developer Preview.
Update: for
google.assistant.embedded.v1alpha2
the assistant SDK includes the field supplemental_display_text
which is meant to extract the assistant response as text which aids
the user's understanding
or to be displayed on screens. Still making the text available to the developer. Goolge assistant documentation
I am developing a Python application for real-time translation. I need to recognize speech in real time: as user says something it automatically sends this piece of audio to Google Speech API and returns a text. So I want the recognized text appearing immediately while speaking.
I've found Streaming Speech Recognition but it seems that I still need to record the full speech first and then send it to the server. Also, there are no examples of how to use it in Python
Is it possible to do this with Google Speech API?
You can do it with Google Speech API.
But, it has a 1 minute content limit.
Please check the link below.
https://cloud.google.com/speech/quotas
So you have to restart every 1 minute.
and the link below is example code of microphone streaming by python.
https://cloud.google.com/speech/docs/streaming-recognize#speech-streaming-recognize-python
Check this link out:
https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py
This is an example for obtaining audio from the microphone. There are several components for the recognition process. In my experience the Sphinx Recognition lacks on accuracy. The Google Speech Recognition works very well.
Working with Google Speech API for real-time transcription is a bit cumbersome. You can use this repository for inspiration
https://github.com/saharmor/realtime-transcription
It transcribes client-side's microphone in real-time (disclaimer: I'm the author).
I need to be able to switch participants in and out of a video conference from say a database of online users. I've been working with Hangout since they are the only open source video conferencing service I know. I know in Hangout you can get an extension to 'kick' someone, but I can't seem to find it in their API. Does anyone know how I can kick and add people either automatically, or manually? I want to ideally do it in Ruby, and I've tried with Python since Hangout is in python, but to no avail. Any help would be appreciated.
In the api, you can only change who is in the broadcast but not in the actual Hangout (ie no kick API.) This functionality is the same is in the builtin "Cameraman" app for Hangout On Air. Read more about the api at setParticipantBroadcast on the Developer Site.