A C# Example for using Micorosft Cognitive Vision API on Real time videos can be found here. https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/vision-api-how-to-topics/howtoanalyzevideo_vision
I cannot find something similar for Python.
How would I go about do that in Python?
Related
Problem statement: Need to transcript the speech to text in real-time and distinguish the user
as speaker 1 and speaker 2 using azure cognitive speech service.
Until now I explore the documentation of azure regarding conversation transcription which provides the sample code for Javascript and C#link for the documentationbut I was not able to find the sample code in python so does that means azure's this service is not available in python?
Does azure conversation transcription service is available in "python"?
No, at present Conversation Transcription SDK does not support Python language.
Conversation Transcription SDK supports only c# and javascript and is only available in few regions like centralus, eastasia, eastus, westeurope.
You can reach Microsoft here for support.
I have been trying to implement an api to integrate with live meeting to face recognize. Not sure if Ms teams allows access to the live video.
We can include Preciate to our Microsoft Teams to add Employees Recognition and also we have Recognize which is an extension for Employee Recognition for MS Teams.
Also, we have something called Face under Cognitive Service in Microsoft Azure which is used for Identity verification.
We can detect faces using few Clients Library SDK’s and has a guide to Detect and analyze face.
Here is a quick start for using face library with multiple languages such as C#, Go, JavaScript, Python and Rest API.
Device capabilities, Tab Device Permission Demo - These documentation will help you to capture the images in the video. But there isn't any API to implement face recognition in Microsoft teams.
I am building a personal assistant, which requires to speak back in hindi. I find it weird that google cloud text to speech doesn't offer hindi language,
https://cloud.google.com/text-to-speech/docs/voices
while google translator speaks back if you ever translate from english to Hindi and click on speaker button.
https://translate.google.com/
So, I read on Internet that
https://translate.google.com.vn/translate_tts?ie=UTF-8&q=ANYTHING_TEXT&tl=en&client=tw-ob
can do the trick. But it is for english. SO if i change the tl to hi, it should work if i replace ANYTHING_TEXT to anything in hindi, but doing so gives me these results:
https://translate.google.com.vn/translate_tts?ie=UTF-8&q=आप%20कैसे%20हैं&tl=hi&client=tw-ob
it is giving me audio I cant understand.
So, my questions are
1) Why we cant access hindi voice using google cloud and can using google translate?
2) How to work around to get hindi voice working in my python file.
3) Google cloud offers google translation api but it only translates the text and gives text at output and not audio. Tell me please if it's true.
https://cloud.google.com/translate/docs/
1) It seems that Google has already developed its Hindi voice which can be used as best effort in a free service such as Google Translate, but if it isn't good enough as you point out, they may be improving it before supporting it for the paid TTS service.
2) You would have to use an alternate TTS service that supports the hindi language until this language gets supported by the Google's TTS API.
3) Yes, the Translation API is only meant for text. You could use the output of the Translation API as input for the TTS API, but since it doesn't support the language you want, you may have to submit a feature request to the Google Cloud's Public Issue Tracker to show interest in the Hindi language support for this service.
I'm doing a simple chatbot with watson. I have a python script. Assume script is this for simplicity:
x=5
x
And in watson i want to return :
result is 5
However, I'm not sure, how to interact with python. My research showed that it is something related to NodeJS and JSON, but I couldn't find any example or tutorial that suites my requirements.
Could someone route me what course of actions should i take or any documentation?
The data between Watson Assistant and a client, an application, is exchanged as JSON-formatted data. The service itself has a REST API and you can use it from any programming language or with command-line tools. For Python, there is even a SDK.
There are some samples written in Python. I recommend my own code :). There is a tool I wrote to interact with Watson Assistant / Watson Conversation (blog entry is here). Another Python sample is what I called EgoBot (blog is here). It shows how you can even change the dialog itself from within the chatbot. Basically, you can tell the bot to learn stuff. The examples should get you started.
I am developing a Python application for real-time translation. I need to recognize speech in real time: as user says something it automatically sends this piece of audio to Google Speech API and returns a text. So I want the recognized text appearing immediately while speaking.
I've found Streaming Speech Recognition but it seems that I still need to record the full speech first and then send it to the server. Also, there are no examples of how to use it in Python
Is it possible to do this with Google Speech API?
You can do it with Google Speech API.
But, it has a 1 minute content limit.
Please check the link below.
https://cloud.google.com/speech/quotas
So you have to restart every 1 minute.
and the link below is example code of microphone streaming by python.
https://cloud.google.com/speech/docs/streaming-recognize#speech-streaming-recognize-python
Check this link out:
https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py
This is an example for obtaining audio from the microphone. There are several components for the recognition process. In my experience the Sphinx Recognition lacks on accuracy. The Google Speech Recognition works very well.
Working with Google Speech API for real-time transcription is a bit cumbersome. You can use this repository for inspiration
https://github.com/saharmor/realtime-transcription
It transcribes client-side's microphone in real-time (disclaimer: I'm the author).