I need to use SSML to play an audio file with the tag in my Alexa Skill (as per Amazon's instructions).
Problem is, I don't know how to use SSML with Python. I know I can use it with Java but I want to build my skills with Python. I've looked all over, but haven't found any working examples of SSML in a Python script/program - does anyone know?
This was asked two years ago but maybe someone will benefit from the below.
I've just checked and if you use Alexa Skills Kit SDK for Python you can simply add SSML to your response, for example:
#sb.request_handler(can_handle_func=is_request_type("LaunchRequest"))
def launch_request_handler(handler_input):
speech_text = "Wait for it 3 seconds<break time="3s"/> Buuuu!"
return handler_input.response_builder.speak(speech_text).response
Hope this helps.
SSML audio resides in the response.outputSpeech.ssml attribute. Here is an
example obj with other required parameters removed:
{
"response": {
"outputSpeech": {
"type": "SSML",
"ssml": "<speak>
Welcome to Car-Fu.
<audio src="https://carfu.com/audio/carfu-welcome.mp3" />
You can order a ride, or request a fare estimate. Which will it be?
</speak>"
}
}
Further reference:
JSON Interface Reference for Custom Skills
Speech Synthesis Markup Language (SSML) Reference
Install ssml-builder "pip install ssml-builder", and use it:
from ssml_builder.core import Speech
speech = Speech()
speech.add_text('sample text')
ssml = speech.speak()
print(ssml)
These comments really helped a lot in figuring out how to make SSML works using the ask-sdk-python. Instead of
speech_text = "Wait for it 3 seconds<break time="3s"/> Buuuu!" - from wmatt's comment
I defined variables that represents the start and end of every tags that I'm using
ssml_start = '<speak>'
speech_text = ssml_start + whispered_s + "Here are the latest alerts from MMDA" + whispered_e
using single quotes and concatenate those strings to the speech output and it worked! Thanks a lot guys! I appreciate it a lot!
This question was somewhat vague, however I did manage to figure out how to incorporate SSML into a Python script. Here's a snippet that plays some audio:
if 'Item' in intent['slots']:
chosen_item = intent['slots']['Item']['value']
session_attributes = create_attributes(chosen_item)
speech_output = '<speak> Here is something to play' + \
chosen_item + \
'<audio src="https://s3.amazonaws.com/example/example.mp3" /> </speak>'
The ssml package for python exists.
you can install like below by pip
$ pip install pyssml
or
$ pip3 install pyssml
so example is link below
http://blog.naver.com/chandong83/221145083125
sorry. it is korean.
# -*- coding: utf-8 -*-
# for amazon
import re
import os
import sys
import time
from boto3 import client
from botocore.exceptions import BotoCoreError, ClientError
import vlc
from pyssml.PySSML import PySSML
# amazon service fuction
# if isSSML is True, SSML format
# else Text format
def aws_polly(text, isSSML = False):
voiceid = 'Joanna'
try:
polly = client("polly", region_name="ap-northeast-2")
if isSSML:
textType = 'ssml'
else:
textType = 'text'
response = polly.synthesize_speech(
TextType=textType,
Text=text,
OutputFormat="mp3",
VoiceId=voiceid)
# get Audio Stream (mp3 format)
stream = response.get("AudioStream")
# save the audio Stream File
with open('aws_test_tts.mp3', 'wb') as f:
data = stream.read()
f.write(data)
# VLC play audio
# non block
p = vlc.MediaPlayer('./aws_test_tts.mp3')
p.play()
except ( BotoCoreError, ClientError) as err:
print(str(err))
if __name__ == '__main__':
# normal pyssml
#s = PySSML()
# amazon speech ssml
s = AmazonSpeech()
# normal
s.say('i am normal')
# speed is very slow
s.prosody({'rate':"x-slow"}, 'i am very slow')
# volume is very loud
s.prosody({'volume':'x-loud'}, 'my voice is very loud')
# take a one sec
s.pause('1s')
# pitch is very high
s.prosody({'pitch':'x-high'}, 'my tone is very high')
# amazone
s.whisper('i am whispering')
# print to convert to ssml format
print(s.ssml())
# request aws polly and play
aws_polly(s.ssml(), True)
# Wait while playback.
time.sleep(50)
Related
I am completely stuck as when dabbling in Reddit's API aka Praw I wanted to learn to save the number 1 hottest post as an mp4 however Reddit saves all of their gifs on Imgur which convert all gifs to gifv, how would I go around converting the gifv to mp4 so I can read them? Btw simply renaming it seems to lead to corruption.
This is my code so far: (details have been xxxx'd for confidentiality)
reddit = praw.Reddit(client_id ="xxxx" , client_secret ="xxxx", username = "xxxx", password ="xxxx", user_agent="xxxx")
subreddit = reddit.subreddit("dankmemes")
hot_dm = subreddit.hot(limit=1);
for sub in hot_dm:
print(sub)
url = sub.url
print(url)
print(sub.permalink)
meme = requests.get(url)
newF = open("{}.mp4".format(sub), "wb") #here the file is created but when played is corrupted
newF.write(meme.content)
newF.close()
Some posts already have an mp4 conversion inside the preview > variants portion of the json response.
Therefore to download only those posts that have a gif and therefore have an mp4 version you could do something like this:
subreddit = reddit.subreddit("dankmemes")
hot_dm = subreddit.hot(limit=10)
for sub in hot_dm:
if sub.selftext == "": # check that the post is a link to some content (image/video/link)
continue
try: # try to access variants and catch the exception thrown
has_variants = sub.preview['images'][0]['variants'] # variants contain both gif and mp4 versions (if available)
except AttributeError:
continue # no conversion available as variants doesn't exist
if 'mp4' not in has_variants: # check that there is an mp4 conversion available
continue
mp4_video = has_variants['mp4']['source']['url']
print(sub, sub.url, sub.permalink)
meme = requests.get(mp4_video)
with open(f"{sub}.mp4", "wb") as newF:
newF.write(meme.content)
Though you are most likely going to want to increase the limit of posts that you look through when searching through hot as the first post may be a pinned post (usually some rules about the subreddit), this is why I initially checked the selftext. In addition, there may be other posts that are only images, therefore with a small limit you might not return any posts that could be converted to mp4s.
I have a small group of Raspberry Pis, all on the same local network (192.168.1.2xx) All are running Python 3.7.3, one (R Pi CM3) on Raspbian Buster, the other (R Pi 4B 8gig) on Raspberry Pi OS 64.
I have a file on one device (the Pi 4B), located at /tmp/speech.wav, that is generated on the fly, real-time:
192.168.1.201 - /tmp/speech.wav
I have a script that works well on that device, that tells me the play duration time of the .wav file in seconds:
import wave
import contextlib
def getPlayTime():
fname = '/tmp/speech.wav'
with contextlib.closing(wave.open(fname,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = round(frames / float(rate), 2)
return duration
However - the node that needs to operate on that duration information is running on another node at 192.168.1.210. I cannot simply move the various files all to the same node as there is a LOT going on, things are where they are for a reason.
So what I need to know is how to alter my approach such that I can change the script reference to something like this pseudocode:
fname = '/tmp/speech.wav # 192.168.1.201'
Is such a thing possible? Searching the web it seems that I am up against millions of people looking for how to obtain IP addresses, fix multiple IP address issues, fix duplicate ip address issues... but I can't seem yet to find how to simply examine a file on a different ip address as I have described here. I have no network security restrictions, so any setting is up for consideration. Help would be much appreciated.
There are lots of possibilities, and it probably comes down to how often you need to check the duration, from how many clients, and how often the file changes and whether you have other information that you want to share between the nodes.
Here are some options:
set up an SMB (Samba) server on the Pi that has the WAV file and let the other nodes mount the filesystem and access the file as if it was local
set up an NFS server on the Pi that has the WAV file and let the other nodes mount the filesystem and access the file as if it was local
let other nodes use ssh to login and extract the duration, or scp to retrieve the file - see paramiko in Python
set up Redis on one node and throw the WAV file in there so anyone can get it - this is potentially attractive if you have lots of lists, arrays, strings, integers, hashes, queues or sets that you want to share between Raspberry Pis very fast. Example here.
Here is a very simple example of writing a sound track into Redis from one node (say Redis is on 192.168.0.200) and reading it back from any other. Of course, you may just want the writing node to write the duration in there rather than the whole track - which would be more efficient. Or you may want to store loads of other shared data or settings.
This is the writer:
#!/usr/bin/env python3
import redis
from pathlib import Path
host='192.168.1.200'
# Connect to Redis
r = redis.Redis(host)
# Load some music, or otherwise create it
music = Path('song.wav').read_bytes()
# Put music into Redis where others can see it
r.set("music",music)
And this is the reader:
#!/usr/bin/env python3
import redis
from pathlib import Path
host='192.168.1.200'
# Connect to Redis
r = redis.Redis(host)
# Retrieve music track from Redis
music = r.get("music")
print(f'{len(music)} bytes read from Redis')
Then, during testing, you may want to manually push a track into Redis from the Terminal:
redis-cli -x -h 192.168.0.200 set music < OtherTrack.wav
Or manually retrieve the track from Redis to a file:
redis-cli -h 192.168.0.200 get music > RetrievedFromRedis.wav
OK, this is what I finally settled on - and it works great. Using ZeroMQ for message passing, I have the function to get the playtime of the wav, and another gathers data about the speech about to be spoken, then all that is sent to the motor core prior to sending the speech. The motor core handles the timing issues to sync the jaw to the speech. So, I'm not actually putting the code that generates the wav and also returns the length of the wav playback time onto the node that ultimately makes use of it, but it turns out that message passing is fast enough so there is plenty of time space to receive, process and implement the motion control to match the speech perfectly. Posting this here in case it's helpful for folks in the future working on similar issues.
import time
import zmq
import os
import re
import wave
import contextlib
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:5555") #Listens for speech to output
print("Connecting to Motor Control")
jawCmd = context.socket(zmq.PUB)
jawCmd.connect("tcp://192.168.1.210:5554") #Sends to MotorFunctions for Jaw Movement
def getPlayTime(): # Checks to see if current file duration has changed
fname = '/tmp/speech.wav' # and if yes, sends new duration
with contextlib.closing(wave.open(fname,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = round(frames / float(rate), 3)
speakTime = str(duration)
return speakTime
def set_voice(V,T):
T2 = '"' + T + '"'
audioFile = "/tmp/speech.wav" # /tmp set as tmpfs, or RAMDISK to reduce SD Card write ops
if V == "A":
voice = "Allison"
elif V == "B":
voice = "Belle"
elif V == "C":
voice = "Callie"
elif V == "D":
voice = "Dallas"
elif V == "V":
voice = "David"
else:
voice = "Belle"
os.system("swift -n " + voice + " -o " + audioFile + " " +T2) # Record audio
tailTrim = .5 # Calculate Jaw Timing
speakTime = eval(getPlayTime()) # Start by getting playlength
speakTime = round((speakTime - tailTrim), 2) # Chop .5 s for trailing silence
wordList = T.split()
jawString = []
for index in range(len(wordList)):
wordLen = len(wordList[index])
jawString.append(wordLen)
jawString = str(jawString)
speakTime = str(speakTime)
jawString = speakTime + "|" + jawString # 3.456|[4, 2, 7, 4, 2, 9, 3, 4, 3, 6] - will split on "|"
jawCmd.send_string(jawString) # Send Jaw Operating Sequence
os.system("aplay " + audioFile) # Play audio
pronunciationDict = {'teh':'the','process':'prawcess','Maeve':'Mayve','Mariposa':'May-reeposah','Lila':'Lala','Trump':'Ass hole'}
def adjustResponse(response): # Adjusts spellings in output string to create better speech output.
for key, value in pronunciationDict.items():
if key in response or key.lower() in response:
response = re.sub(key, value, response, flags=re.I)
return response
SpeakText="Speech center connected and online."
set_voice(V,SpeakText) # Cepstral Voices: A = Allison; B = Belle; C = Callie; D = Dallas; V = David;
while True:
SpeakText = socket.recv().decode('utf-8') # .decode gets rid of the b' in front of the string
SpeakTextX = adjustResponse(SpeakText) # Run the string through the pronunciation dictionary
print("SpeakText = ",SpeakTextX)
set_voice(V,SpeakTextX)
print("Received request: %s" % SpeakTextX)
socket.send_string(str(SpeakTextX)) # Send data back to source for confirmation
I was interested in coding a long time. So this year I got a job where I was supposed to scrape text from old event programs. The pictures where terrible quality and results with normal OCR where horrible. I checked the google vision api and tested it manually and the results where great, so I used this opportunity to learn about coding.( I did some python before but the lack of usefulness always drove me away).
I wrote this program, I know its walking on crouches but it worked and did exactly what I wanted for 3 months now. I dont use it regularly but today when I wanted to use it again, it didnt work any more it just jumps to the end of the program, and finishes without making the api requests, at least so it seems to me.
I am really mostly finished with my work and this request doesnt make sense in terms of efficiency but I am just curious why the program I created suddenly stopped working.
If somebody can hint me in the right direction I would highly appreciate it, also if somebody wants to use the program, if it works for them sure let it work instead of yourself :D
I am not sure but I am using linux mint and maybe it stopped working because of some updates of python or vision-api or something.
# coding=utf-8
from google.cloud import vision
import os
import io
import sys
reload(sys)
sys.setdefaultencoding('utf8')
directory = "/home/weareone/Documents/programming/test/here"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/weareone/Documents/programming/test/key.json"
def workinghard(page):
client = vision.ImageAnnotatorClient()
#file_name = os.path.join( os.path.dirname(__file__), page) # Loads the image into memory
#page_er = os.path.abspath(os.path.join(os.path.dirname(page))) <--- my improvisation
with io.open( page, 'rb') as image_file: # after io.open.(file_name <---exchanged with "page")
content = image_file.read()
request = {
"image": {
"content": content
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
response = client.annotate_image(request)
storage = response.full_text_annotation.text
return storage
def listdirs(folder):
return [
d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
if os.path.isdir(d)
]
directories = listdirs(directory)
for year in directories:
logtxt = open(year + ".txt", "w+" )
for root, dirs, files in os.walk(year):
files.sort()
for file23 in files:
if file23.endswith('.jpg'):
pathparent = os.path.join(year,file23)
logtxt.write(workinghard(pathparent))
logtxt.write("-------------------------------------------------------------------------")
print(pathparent)
logtxt.close()
print("DONE")
Thank you very much dear internet
EDIT: I solved it by changing this line the statement apaprently equaled to FALSE.
if file23.endswith('.JPG'):
I'm looking for playing a video with vlc player via python3.8. I'm able to play a movie (mp4) but I would like to add additional audio tracks. I read that the 'add slave' method is the (new) way but I'm not able to use it properly: I'm not able to add subtitles nor audio track.
To summerize: what I want to achieve with Python is roughly the following:
https://wiki.videolan.org/VLC_HowTo/Play_an_external_audio_track_for_a_video
my current (non working) snippet:
import vlc
base_path = r"Z:/test/libvlc/"
video_file = base_path + "original.mp4"
audio_file = base_path + "2xlcDLHY7k0-instru+vocal_stereo.wav"
sub_file = base_path + "word.ass"
Instance = vlc.Instance()
player = Instance.media_player_new()
Media = Instance.media_new(video_file)
AdditionalTrack = player.add_slave(player, audio_file, True, i_type="audio")
Sub = player.add_slave(player,sub_file, True)
player.set_media(Media)
while True:
player.play()
I found doc for 'add_slave' func here:
https://www.olivieraubert.net/vlc/python-ctypes/doc/
but I'm not able to use it properly
libvlc_media_slaves_add(p_md, i_type, i_priority, psz_uri)
Add a slave to the current media. A slave is an external input source that may contains an additional subtitle track (like a .srt) or an additional audio track (like a .ac3).
Parameters: p_md - media descriptor object. i_type - subtitle or audio. i_priority - from 0 (low priority) to 4 (high priority). psz_uri - Uri of the slave (should contain a valid scheme).
If anyone know how to add subtitles or additional audio track,
I would be grateful to him if he can advise me how to,
thanks a lot !
use this code instead.
AdditionalTrack = player.add_slave(player, audio_file, True, i_type=vlc.MediaSlaveType(1))
i-type parameter is not a string type.
This is my first time to ask something here. I've been trying to access the Youtube API to get something for an experiment I'm doing. Everything's working so far. I just wanted to ask about this very inconsistent error that I'm getting.
-----------
1
Title: All Movie Trailers of New York Comic-Con (2016) Power Rangers, John Wick 2...
Uploaded by: KinoCheck International
Uploaded on: 2016-10-12T14:43:42.000Z
Video ID: pWOH-OZQUj0
2
Title: Movieclips Trailers
Uploaded by: Movieclips Trailers
Uploaded on: 2011-04-01T18:43:14.000Z
Video ID: Traceback (most recent call last):
File "scrapeyoutube.py", line 24, in <module>
print "Video ID:\t", search_result['id']['videoId']
KeyError: 'videoId'
I tried getting the video ID ('videoID' as per documentation). But for some reason, the code works for the 1st query, and then totally flops for the 2nd one. It's weird because it's only happening for this particular element. Everything else ('description','publishedAt', etc.) is working. Here's my code:
from apiclient.discovery import build
import json
import pprint
import sys
APINAME = 'youtube'
APIVERSION = 'v3'
APIKEY = 'secret teehee'
service = build(APINAME, APIVERSION, developerKey = APIKEY)
#volumes source ('public'), search query ('androide')
searchrequest = service.search().list(q ='movie trailers', part ='id, snippet', maxResults = 25).execute()
searchcount = 0
print "-----------"
for search_result in searchrequest.get("items", []):
searchcount +=1
print searchcount
print "Title:\t", search_result['snippet']['title']
# print "Description:\t", search_result['snippet']['description']
print "Uploaded by:\t", search_result['snippet']['channelTitle']
print "Uploaded on:\t", search_result['snippet']['publishedAt']
print "Video ID:\t", search_result['id']['videoId']
Hope you guys can help me. Thanks!
Use 'get' method for result.
result['id'].get('videoId')
there are in some element no this key.
if you use square parenteces, python throw exeption keyError, but if you use 'get' method, python return None for element whitch have not key videoId
Using search() method returns channels, playlists as well together with videos in search. That might be why your problem.
I use their interactive playgrounds to learn the structure of returned JSON, functions, etc. For your question, I suggest to visit https://developers.google.com/youtube/v3/docs/search/list .
Make sure if a kind of an item is "youtube#video", then access videoId of that item.
Sample of code:
...
for index in response["items"]: # response is a JSON file I have got from API
tmp = {} # temporary dict to assert into my custom JSON
if index["id"]["kind"] == "youtube#video":
tmp["videoID"] = index["id"]["videoId"]
...
This is a part of code from my personal project I am currently working on.
Because some results to Key "ID", return:
{u'kind': u'youtube#playlist', u'playlistId': u'PLd0_QArxznVHnlvJp0ki5bpmBj4f64J7P'}
You can see, there is no key "videoId".