I am trying to send the stream from UI to python API as stream. I need python Azure Speech logic to convert the speech to text. I am not sure about how to use pull/pusha audio input stream for speech to text
In my case I receive an audio stream from some other source. When the connection with my application is made (upon reception of the first package), a PushAudioInputStream is started. This stream pushes the data to SDK for each package that is received. The speech recognition with push stream is thus used in this case. See snippet of code below. This has worked for my case.
if newConnection:
stream = speechsdk.audio.PushAudioInputStream()
speech_recognition_with_push_stream(stream)
stream_data = base64.b64decode(data)
stream.write(stream_data)
There is a sample for using cognitive services speech sdk.
Specifically, for using it with pull stream, you may refer to: speech_recognition_with_pull_stream() , and for using it with push stream, you may refer to: speech_recognition_with_push_stream().
Hope it helps.
Related
Is there support for streaming back a response in Nuclio? The workflow I'm trying to achieve is to have the UI request a large file from a Nuclio function running inside a docker container and having it stream back the large file.
For example this is how Flask supports streaming contents:
https://flask.palletsprojects.com/en/2.2.x/patterns/streaming/
I can't seem to find anywhere that mentions how to have Nuclio stream back large data/file.
I do see they mention some stuff about stream triggers, but I don't know if that'll help with streaming back the response:
https://nuclio.io/docs/latest/concepts/architecture/
https://nuclio.io/docs/latest/reference/triggers/
If there's no support, would my best bet be to stream the data to some 3rd party platform and have the UI download the data/file from there?
I'm experimenting AZURE Speech-to-Text to convert audio to text. For experimentation sake, I'm hardcoding the audio file name. While I'm able to convert a single utterance (using recognize at once function), I'm unable to do the same for longer audios.
Also has someone tried providing base64 encoded format to Azure Speech-To-Text? Are there any examples I can look at. AZURE documentation does not mention anything about base64 formats.
Using the Flask web framework.
As per this MS Document try using GStreamer for encoding Audio file to text using text to speech
I am playing with Google Cloud Speech API. I was wondering if I use the python speech recognition library and call the google cloud speech API, is that still a valid way to use the API? I just want to transcribe the text.
I am confused about the difference between them and if there is any suggested way if I just want to transcribe the audio.
Using Python SpeechRecognition:
import speech_recognition as sr
r = sr.Recognizer()
r.recognize_google_cloud()
harvard = sr.AudioFile('harvard.wav')
with harvard as source:
audio = r.record(source)
r.recognize_google(audio)
Not using Python SpeechRecognition:
from google.cloud import speech_v1 as speech
def speech_to_text(config, audio):
client = speech.SpeechClient()
response = client.recognize(config, audio)
print_sentences(response)
def print_sentences(response):
for result in response.results:
best_alternative = result.alternatives[0]
transcript = best_alternative.transcript
confidence = best_alternative.confidence
print('-' * 80)
print(f'Transcript: {transcript}')
print(f'Confidence: {confidence:.0%}')
config = {'language_code': 'en-US'}
audio = {'uri': 'gs://cloud-samples-data/speech/brooklyn_bridge.flac'}
If you only plan to use Google Cloud Platform for speech recognition, then SpeechClient would be better because that is maintained by Google.
If you want to try out different speech recognition services, speech_recognition would help with that since it is more generic.
Any way of calling the api is fine. The libraries are just to make it easier for you.
Google Cloud Client Libraries are the recommended option for accessing Cloud APIs programmatically:
Provide idiomatic, generated or hand-written code in each language, making the Cloud API simple and intuitive to use.
Handle all the low-level details of communication with the server, including authenticating with Google.
Can be installed using familiar package management tools such as npm and pip.
In some cases, give you performance benefits by using gRPC. You can find out more in the gRPC APIs section below.
Also, be aware of the best practices to get better results from the API.
I use google speech to text API to transcript voice to text.
I am trying to terminate this service when we hit the is_final=True
requests = (types.StreamingRecognizeRequest(audio_content=message.chunk)
for message in messages)
responses = client.streaming_recognize(streaming_config, requests)
I have tried: responses.cancel() but it is coming up with error
I found that in Java there is this method available to terminate streaming recognition service:
SpeechClient speech;
speech.close();
But I can not find the same method in Python client library. Can somebody guide me how to terminate streaming service properly in Python?
Like Leferis S already mentioned. If you want to stop a StreamingRecognizeRequest after receiving is_final=True you have to set single_utterance=True inside your StreamingRecognitionConfig.
An extended solution would be to parse the responses. Here (line 147-160) is an official example, where you can see how to exit the streaming process if a special word was finally recognized.
I am working on a simple service to remotely record line input from an audio interface attached to a server, via REST API request.
My current solution, using PyAudio to manage the audio interface:
1) send HTTP request to start recording to a file on server filesystem.
2) send HTTP request to stop recording and pull the recorded audio file from the server filesystem
Instead, I would like to be able to just "stream" the line input to any http client who wants to download the audio stream.
Is there any simple python library solution to lossless http audio streaming directly from any audio interface's input?
More importantly, does this make sense or should I use RTSP instead? (More than efficiency I would like to focus on being able to download the audio stream by a simple http link on a browser or even via curl or simple programmatic request, and I'll usually not have more than one connected client at a time, that's why I'd prefer to avoid RTSP.)
I have done this using Python flask to provide the REST endpoint to stream audio, and the pyfaac module to pack PCM frames into the AAC format (this format is needed for streaming). Then, for example, you use the standard HTML5 audio tag with src set to your streaming endpoint.