My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL
One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them?
I just want my list of strings returned as audio files.
(I put my actual key in the block above. I'm just not going to share it here.)
Configure Python App for JSON file and Install Client Library
Create a Service Account
Create a Service Account Key using the Service Account here
The JSON file downloads and save it securely
Include the Google Application Credentials in your Python App
Install the library: pip install --upgrade google-cloud-texttospeech
Using Google's Python examples found:
https://cloud.google.com/text-to-speech/docs/reference/libraries
Note: In Google's example it is not including the name parameter correctly.
and
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py
Below is the modified from the example using google app credentials and wavenet voice of a female.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")
# Build the voice request, select the language code ("en-US")
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Voices,Name, Language Code, SSML Gender, Etc
List of Voices: https://cloud.google.com/text-to-speech/docs/voices
In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
Found the answer and lost the link among 150 Google documentation pages I had open.
#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
My time consuming pursuit was to try to send the request through a JSON with Python, but this appears to be through there own modules, which works fine.
Notice the default voice gender is 'neutral'.
If you would like to avoid using the google Python API, you can simply do this:
import requests
import json
url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"
text = "This is a text"
data = {
"input": {"text": text},
"voice": {"name": "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
"audioConfig": {"audioEncoding": "MP3"}
};
headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }
r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)
It is similar to what you did but you need to include your API key.
Related
In my script, I have the following:
response = requests.get(list_url[0], allow_redirects=True)
s = io.BytesIO()
s.write(response.content)
s.seek(0)
mimetype="application/octet-stream"
document = {'file': s.read(), 'mime': mimetype}
request = {"name": name, "document": document}
However, when I send a request to the server:
result = client.process_document(request=request)
I get ValueError: Protocol message Document has no "file" field.
Is this due because google docAI doesn't accept octet-stream?
I checked the latest version code of the document ai python client DocumentProcessorServiceClient and found this function pass on its request field a Process Request object. You can check details of that function on the process_document github code page.
Process Request will accept either inline_document or a raw_document (both are mutual exclusive). Based on your code it looks like you are passing a raw_document which only accepts fields content and mime_type which should be used instead of file and mime.
If you check the sample of using python library client for document ai you will find this lines which explain how it should be implemented:
...
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
...
For additional details, you can check the official github project for document ai and the official google page for the python client library.
I used the sample code to receive a call from a number to twilio number.
Now I need to save the recording as mp3. I cant understand how to do it. I tried to call various parameters but failed. I am new to twilio.
> `from flask import Flask
from twilio.twiml.voice_response import VoiceResponse
app = Flask(__name__)
#app.route("/record", methods=['GET', 'POST'])
def record():
"""Returns TwiML which prompts the caller to record a message"""
# Start our TwiML response
response = VoiceResponse()
# Use <Say> to give the caller some instructions
response.say('Hello. Please leave a message after the beep.')
# Use <Record> to record the caller's message
response.record()
# End the call with <Hangup>
response.hangup()
return str(response)
def record(response):
# function to save file to .wav format
if __name__ == "__main__":
app.run(debug = True)
I followed the link but cant understand how to link it with flask to save the file.
https://www.twilio.com/docs/voice/api/recording?code-sample=code-filter-recordings-with-range-match&code-language=Python&code-sdk-version=6.x
Twilio developer evangelist here.
When you use <Record> to record a user, you can provide a URL as the recordingStatusCallback attribute. Then, when the recording is ready, Twilio will make a request to that URL with the details about the recording.
So, you can update your record TwiML to something like this:
# Use <Record> to record the caller's message
response.record(
recording_status_callback="/recording-complete",
recording_status_callback_event="completed"
)
Then you will need a new route for /recording-complete in which you receive the callback and download the file. There is a good post on how to download files in response to a webhook but it covers MMS messages. However, we can take what we learn from there to download the recording.
First, install and import the requests library. Also import request from Flask
import requests
from flask import Flask, request
Then, create the /recording-complete endpoint. We'll read the recording URL from the request. You can see all the request parameters in the documentation. Then we'll open a file using the recording SID as the file name, download the recording using requests and write the contents of the recording to the file. We can then respond with an empty <Response/>.
#app.route("/recording-complete", methods=['GET', 'POST'])
def recording_complete():
response = VoiceResponse()
# The recording url will return a wav file by default, or an mp3 if you add .mp3
recording_url = request.values['RecordingUrl'] + '.mp3'
filename = request.values['RecordingSid'] + '.mp3'
with open('{}/{}'.format("directory/to/download/to", filename), 'wb') as f:
f.write(requests.get(recording_url).content)
return str(resp)
Let me know how you get on with that.
I am trying to use the google cloud platform (GCP) for the speech to text API in python but for some reason I can't seem to get access to the GCP to use the API. How do I authenticate my credentials?
I have tried to follow the instructions provided by google to authenticate my credentials but I am just so lost as nothing seems to be working.
I have created a GCP project, set-up billing information, enabled API and created service account without any problems.
I have tried to set my environment using command line to set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
and then run the following code which has been taken straight from the google tutorial page:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
with io.open(stream_file, 'rb') as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)
for chunk in stream)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(streaming_config, requests)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print('Finished: {}'.format(result.is_final))
print('Stability: {}'.format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print('Confidence: {}'.format(alternative.confidence))
print(u'Transcript: {}'.format(alternative.transcript))
I get the following error message:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
You can also set credentials directly in your script
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file("/path/to/your/crendentials.json")
client = speech.SpeechClient(credentials=credentials)
Strictly using code provided as example to run google cloud text to speech api:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
I have got above error mesage google.api_core.exceptions.MethodNotImplemented 501 Method not found.
It seems to be a Google internal error. I've double check my credentials.
Error comes from this particular line: response = client.synthesize_speech(input_text, voice, audio_config)
Please help.
At this moment, for some reason, the python client library has code for v1 and v1beta1 API versions. However, checking the dicovery service for Google APIs, we can see that the only available endpoints are for v1beta1.
If you follow this library docs and use from google.cloud import texttospeech_v1, your client will try to use https://texttospeech.googleapis.com/v1/ instead of https://texttospeech.googleapis.com/v1beta1/. The fact that the error is not a 4XX but rather a 5XX makes me think that maybe the endpoints themselves exist, but they have not been yet implemented, or you're not allowed to discover them (so they look not implemented).
Try to change your import for the one in the v1beta1 doc (from google.cloud import texttospeech_v1beta1), and see if this makes any difference.
I know that I can send data (Text in this case) to DialogFlow by using Python in the following way:
ai = apiai.ApiAI(CLIENT_ACCESS_TOKEN)
request = ai.text_request()
request.lang = 'de' # optional, default value equal 'en'
request.session_id = "<SESSION ID, UNIQUE FOR EACH USER>"
request.query = "Hello"
response = request.getresponse()
print (response.read())
But I'm not sure if I could send an audio file to DialogFlow, does anyone know about that?
There are two ways to use audio files in Google Action/Dialogflow responses: SSML with the <audio> tag and Media responses. Both expect the audio file to be provided via a HTTPS URL, the file itself is usually stored in a cloud storage service like Google Cloud Storage or Amazon S3.
SSML (Speech Synthesis Markup Language) is a markup language for audio output, just like HTML is for visual output. It is supported by Google Actions and can be used as drop-in replacement for the normal text response Instead of including the response text like this:
{
"speech": "This is the text that the users hears",
...
}
you would mark it up with SSML like this:
{
"speech": "<speak><audio src="https://some_cloud_storage.com/my_audio_file.ogg"></audio></speak>",
...
}
Note that the <speak> tags must always surround the entire response so that Google nows that it has to render the text with SSML (just like the <html> tag on websites). The <audio> tag can take several optional attributes, see the documentation for details.
SSML has the benefit of being very easy to use for you as the developer, but the audio files are limited to a length of 120 seconds and a file size of 5MB and it gives the user no playback control.
Media responses do not have theses limits and are displayed as a card with an image and playback controls, but they currently work only on Google Home and Android devices.