How to integrate Azure text to speech with streamlit?

How to integrate Azure text to speech with streamlit? - python

I am trying to integrate azure text to speech with streamlit.
import azure.cognitiveservices.speech as speechsdk
import streamlit as st
st.title("Let's learn Math!")
def recognize_from_microphone():
speech_config = speechsdk.SpeechConfig(subscription="743ae1f5555f49f9a5de4457d4e91b2d", region="australiaeast")
speech_config.speech_recognition_language="en-US"
#To recognize speech from an audio file, use `filename` instead of `use_default_microphone`:
#audio_config = speechsdk.audio.AudioConfig(filename="YourAudioFile.wav")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
st.text("Speak into your microphone.")
speech_recognition_result = speech_recognizer.recognize_once_async().get()
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
st.text("Recognized: {}".format(speech_recognition_result.text))
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
st.text("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_recognition_result.cancellation_details
st.text("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
st.text("Error details: {}".format(cancellation_details.error_details))
st.text("Did you set the speech resource key and region values?")
text = st.text_input("Enter text", value="Hi", max_chars=5)
def audio_output(text):
speech_config = speechsdk.SpeechConfig(subscription="743ae1f5555f49f9a5de4457d4e91b2d", region="australiaeast")
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
# The language of the voice that speaks.
speech_config.speech_synthesis_voice_name='en-US-JennyNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
# Get text from the console and synthesize to the default speaker.
st.write("Enter some text that you want to speak >")
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
st.write("Speech synthesized for text [{}]".format(text))
elif speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_synthesis_result.cancellation_details
st.write("Speech synthesis canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
if cancellation_details.error_details:
st.write("Error details: {}".format(cancellation_details.error_details))
st.write("Did you set the speech resource key and region values?")
recognize_from_microphone()
audio_output(text)
This is my code, but streamlit is not loading the functions at all. Is there any fix? I am new to streamlit and azure.

You declared the functions but didn't called them.

Related

ValueError: '0' is not in list when running EXEcutable from Python

I have this simple code: notify me with audio and notification tasks every time I press Alt+Shift to switch in one of the four (Italia, Hebrew, English, and Spanish) keyboard layouts.
import keyboard
import time
import ctypes
import json
import plyer
import pyttsx3
def on_alt(event):
if keyboard.is_pressed('alt'):
time.sleep(0.5)
user32 = ctypes.WinDLL('user32', use_last_error=True)
curr_window = user32.GetForegroundWindow()
thread_id = user32.GetWindowThreadProcessId(curr_window, 0)
klid = user32.GetKeyboardLayout(thread_id)
lid = klid & (2**16 - 1)
key_list = list(lcid.keys())
val_list = list(lcid.values())
position = key_list.index(str(lid))
if(str(lid) == '1040'):
engine = pyttsx3.init()
engine.say("Italian")
engine.runAndWait()
elif(str(lid) == '1037'):
engine = pyttsx3.init()
engine.say("Hebrew")
engine.runAndWait()
elif(str(lid) == '2058'):
engine = pyttsx3.init()
engine.say("Spanish")
engine.runAndWait()
elif(str(lid) == '1033'):
engine = pyttsx3.init()
engine.say("English")
engine.runAndWait()
plyer.notification.notify(
title = "Language Keyboard Switch",
message = str(val_list[position]),
timeout = 0.1
)
lcid = {"1040": "it_IT",
"1033": "en_US",
"1037": "he_IL",
"2058": "es_MX",
}
keyboard.on_press_key('shift', on_alt)
time.sleep(1e6)
Running the script as it is, it works fine. But when I build EXEcutable in Windows 10 with pyinstaller it returns this error when I press Alt+Shift
It's weird that the same executable built in Windows 11 it works fine.

Change languag of text to speech

I want to change the voice of azure from python, with these characteristics
languageCode = 'es‑MX'
ssmlGender = 'FEMALE'
voicName = 'es‑MX‑DaliaNeural'
but i'm new to azure so i don't know how, this is my code:
import PyPDF2
import azure.cognitiveservices.speech as sdk
key = "fake key"
region = "fake region"
config = sdk.SpeechConfig(subscription=key, region=region)
synthesizer = sdk.SpeechSynthesizer(speech_config=config)
book = open("prueba.pdf", "rb")
reader = PyPDF2.PdfFileReader(book)
for num in range(0,reader.numPages):
text = reader.getPage(num).extractText()
result = synthesizer.speak_text_async(text).get()

Acording to the documentation https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-speech-synthesis?tabs=browserjs%2Cterminal&pivots=programming-language-python#select-synthesis-language-and-voice you should be able to do:
config.speech_synthesis_language = "es‑MX"
config.speech_synthesis_voice_name ="es-MX-DaliaNeural"
The list of voices is here https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts

pyttsx3 makes pauses evere 82 words

I am learning to use pyttsx3 module.
I want to make a continuous speech without any pauses so I deleted all '.' and ',', but pauses are created in what I thought were random places. So dug deeper and figured that it makes a pause every 82 (+/-1 words).
Any idea how to fix it?
Here is my code:
import pyttsx3
with open('D:\D-Chilldom\QuickMovieRecap\Movie.txt', 'r') as f:
text = f.read()
text = text.replace('.', '')
text = text.replace(',', '')
# Initialize the TTS engine
engine = pyttsx3.init()
# Set the rate of speech (words per minute)
rate = 250
engine.setProperty('rate', rate)
# Set the volume of the voice
volume = 1
engine.setProperty('volume', volume)
# Set the voice to use
voice_id = "com.apple.speech.synthesis.voice.samantha"
engine.setProperty('voice', voice_id)
engine.save_to_file(text, 'D:/D-Chilldom/QuickMovieRecap/rec/zzzz.mp3')
engine.runAndWait()

How to display text on the screen it is said over the audio

As a personal project, I decided to create one of the reddit text-to-speech bot.
I pulled all the data from reddit with praw
import praw, random
def scrapeData(subredditName):
# Instantiate praw
reddit = praw.Reddit()
# Get subreddit
subreddit = reddit.subreddit(subredditName)
# Get a bunch of posts and convert them into a list
posts = list(subreddit.new(limit=100))
# Get random number
randomNumber = random.randint(0, 100)
# Store post's title and description in variables
postTitle = posts[randomNumber].title
postDesc = posts[randomNumber].selftext
return postTitle + " " + postDesc
Then, I converted it to speech stored in a .mp3 file with gTTS.
from google.cloud import texttospeech
def convertTextToSpeech(textString):
# Instantiate TTS
client = texttospeech.TextToSpeechClient().from_service_account_json("path/to/json")
# Set text input to be synthesized
synthesisInput = texttospeech.SynthesisInput(text=textString)
# Build the voice request
voice = texttospeech.VoiceSelectionParams(language_code = "en-us",
ssml_gender = texttospeech.SsmlVoiceGender.MALE)
# Select the type of audio file
audioConfig = texttospeech.AudioConfig(audio_encoding =
texttospeech.AudioEncoding.MP3)
# Perform the TTS request on the text input
response = client.synthesize_speech(input = synthesisInput, voice =
voice, audio_config= audioConfig)
# Convert from binary to mp3
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
I've created an .mp4 with moviepy that has generic footage in the background with the audio synced over it,
from moviepy.editor import *
from moviepy.video.tools.subtitles import SubtitlesClip
# get vide and audio source files
clip = VideoFileClip("background.mp4").subclip(20,30)
audio = AudioFileClip("output.mp3").subclip(0, 10)
# Set audio and create final video
videoClip = clip.set_audio(audio)
videoClip.write_videofile("output.mp4")
but my issue is I can't find a way to have only the current word or sentence displayed on screen as a subtitle, rather than the entire post.

how to pass edited wav between functions without saving wav in between?

I have a wav conversation of 2 people(customer and tech support)
I have 3 separate functions that extract 1 voice, cut 10 seconds and transform it to embedding.
def get_customer_voice(file):
print('getting customer voice only')
wav = wf.read(file)
ch = wav[1].shape[1]#customer voice always in 1st track
sr = wav[0]
c1 = wav[1][:,1]
#print('c0 %i'%c0.size)
if ch==1:
exit()
vad = VoiceActivityDetection()
vad.process(c1)
voice_samples = vad.get_voice_samples()
#this is trouble - how to pass it without saving anywhere as wav?
wf.write('%s_customer.wav'%file,sr,voice_samples)
function below cuts 10 seconds of wav file from function above.
import sys
from pydub import AudioSegment
def get_customer_voice_10_seconds(file):
voice = AudioSegment.from_wav(file)
new_voice = voice[0:10000]
file = str(file) + '_10seconds.wav'
new_voice.export(file, format='wav')
if __name__ == '__main__':
if len(sys.argv) < 2:
print('give wav file to process!')
else:
print(sys.argv)
get_customer_voice_10_seconds(sys.argv[1])
how to pass it as wav or other format without saving it to some directory? It's to be used in rest api, i don't know where it will save that wav, so preferably it should be passed somehow.

I figured it out - the function below just works without saving, buffer etc.
It receives a wav file and edits it and just sends straight to the get math embedding function:
def get_customer_voice_and_cutting_10_seconds_embedding(file):
print('getting customer voice only')
wav = read(file)
ch = wav[1].shape[1]
sr = wav[0]
c1 = wav[1][:,1]
vad = VoiceActivityDetection()
vad.process(c1)
voice_samples = vad.get_voice_samples()
audio_segment = AudioSegment(voice_samples.tobytes(), frame_rate=sr,sample_width=voice_samples.dtype.itemsize, channels=1)
audio_segment = audio_segment[0:10000]
file = str(file) + '_10seconds.wav'
return get_embedding(file)
the key is tobytes() in Audio segment, it just assembles all them together in 1 track again

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to integrate Azure text to speech with streamlit? - python

You declared the functions but didn't called them.

Related

ValueError: '0' is not in list when running EXEcutable from Python

Change languag of text to speech

pyttsx3 makes pauses evere 82 words

How to display text on the screen it is said over the audio

how to pass edited wav between functions without saving wav in between?

Categories

Resources