I know that I can send data (Text in this case) to DialogFlow by using Python in the following way:
ai = apiai.ApiAI(CLIENT_ACCESS_TOKEN)
request = ai.text_request()
request.lang = 'de' # optional, default value equal 'en'
request.session_id = "<SESSION ID, UNIQUE FOR EACH USER>"
request.query = "Hello"
response = request.getresponse()
print (response.read())
But I'm not sure if I could send an audio file to DialogFlow, does anyone know about that?
There are two ways to use audio files in Google Action/Dialogflow responses: SSML with the <audio> tag and Media responses. Both expect the audio file to be provided via a HTTPS URL, the file itself is usually stored in a cloud storage service like Google Cloud Storage or Amazon S3.
SSML (Speech Synthesis Markup Language) is a markup language for audio output, just like HTML is for visual output. It is supported by Google Actions and can be used as drop-in replacement for the normal text response Instead of including the response text like this:
{
"speech": "This is the text that the users hears",
...
}
you would mark it up with SSML like this:
{
"speech": "<speak><audio src="https://some_cloud_storage.com/my_audio_file.ogg"></audio></speak>",
...
}
Note that the <speak> tags must always surround the entire response so that Google nows that it has to render the text with SSML (just like the <html> tag on websites). The <audio> tag can take several optional attributes, see the documentation for details.
SSML has the benefit of being very easy to use for you as the developer, but the audio files are limited to a length of 120 seconds and a file size of 5MB and it gives the user no playback control.
Media responses do not have theses limits and are displayed as a card with an image and playback controls, but they currently work only on Google Home and Android devices.
Related
In my script, I have the following:
response = requests.get(list_url[0], allow_redirects=True)
s = io.BytesIO()
s.write(response.content)
s.seek(0)
mimetype="application/octet-stream"
document = {'file': s.read(), 'mime': mimetype}
request = {"name": name, "document": document}
However, when I send a request to the server:
result = client.process_document(request=request)
I get ValueError: Protocol message Document has no "file" field.
Is this due because google docAI doesn't accept octet-stream?
I checked the latest version code of the document ai python client DocumentProcessorServiceClient and found this function pass on its request field a Process Request object. You can check details of that function on the process_document github code page.
Process Request will accept either inline_document or a raw_document (both are mutual exclusive). Based on your code it looks like you are passing a raw_document which only accepts fields content and mime_type which should be used instead of file and mime.
If you check the sample of using python library client for document ai you will find this lines which explain how it should be implemented:
...
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
...
For additional details, you can check the official github project for document ai and the official google page for the python client library.
I currently use this solution to download attachments from Gmail using Gmail API via python.
However, every time an attachment exceeds 25MB, the attachments automatically get uploaded to Google Drive and the files are linked in the mail. In such cases, there is no attachmentId in the message.
I can only see the file names in 'snippet' section of the message file.
Is there any way I can download the Google dive attachments from mail?
There is a similar question posted here, but there's no solution provided to it yet
How to download a Drive "attachment"
The "attachment" referred to is actually just a link to a Drive file, so confusingly it is not an attachment at all, but just text or HTML.
The issue here is that since it's not an attachment as such, you won't be able to fetch this with the Gmail API by itself. You'll need to use the Drive API.
To use the Drive API you'll need to get the file ID. Which will be within the HTML content part among others.
You can use the re module to perform a findall on the HTML content, I used the following regex pattern to recognize drive links:
(?<=https:\/\/drive\.google\.com\/file\/d\/).+(?=\/view\?usp=drive_web)
Here is a sample python function to get the file IDs. It will return a list.
def get_file_ids(service, user_id, msg_id):
message = service.users().messages().get(userId=user_id, id=msg_id).execute()
for part in message['payload']['parts']:
if part["mimeType"] == "text/html":
b64 = part["body"]["data"].encode('UTF-8')
unencoded_data = str(base64.urlsafe_b64decode(b64))
results = re.findall(
'(?<=https:\/\/drive\.google\.com\/file\/d\/).+(?=\/view\?usp=drive_web)',
unencoded_data
)
return results
Once you have the IDs then you will need to make a call to the Drive API.
You could follow the example in the docs:
file_ids = get_file_ids(service, "me", "[YOUR_MSG_ID]"
for id in file_ids:
request = service.files().get_media(fileId=id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print "Download %d%%." % int(status.progress() * 100)
Remember, seeing as you will now be using the Drive API as well as the Gmail API, you'll need to change the scopes in your project. Also remember to activate the Drive API in the developers console, update your OAuth consent screen, credentials and delete the local token.pickle file.
References
Drive API Docs
Managing Downloads Guide
Gmail API Docs
Drive API has also limtitation of downloading 10MBs only
Im using Google Drive API for creating and opening html file. But the problem is that the document opens with the technical content (links to css, js files, html tags ...) like this
How to make it so that it would open correctly, in a user-friendly form?
part of my google-api code
def file_to_drive(import_file=None):
service = build('drive', 'v3', credentials=creds)
file_name = import_file
media_body = MediaFileUpload(file_name, resumable=True, mimetype='text/html')
body = {
'title': file_name,
'description': 'Uploaded By You'}
file = service.files().create(body=body, media_body=media_body, fields='id')
The google drive API is a file store api. It allows you to upload and download files. It does not have the ability to open files. You could share a link to the file with someone that has access then when they click on the link it will open for them in the google drive web application.
The only api able to open files for editing would be the Google docs api which gives you limited ability to open google doc files. that however would require that you covert your html file to a google docs format. Even if this was an option you would need to create your own "user friendly form" Google apis return data as json and not user friendly options thats not what APIs are for.
I'm trying to upload a large video (around 1.5 GB) through Video Indexer API. My machine however takes up lot of RAM to do so. The deployment system has quite a small amount of RAM. I want to use the API so that the video is uploaded in multiple parts without using up too much memory (around 100MB would suffice).
I've tried to use ffmpeg to split the video in chunks and upload it piece by piece but Video Indexer recognizes them as different videos and gives separate insights for each. It would be better if the video is aggregated online.
How can I do chunked video upload to MS Video Indexer?
Let my guess. Previously, you followed the offical tutorial Tutorial: Use the Video Indexer API and the Upload Video API reference (the Python sample code at the end of API reference page as the figure below) to upload your large video.
It cost a lot of memory because the code below send the data block {body} read from memory, and its value comes from the code open("<your local file name>").read().
conn.request("POST", "/{location}/Accounts/{accountId}/Videos?name={name}&accessToken={accessToken}&%s" % params, "{body}", headers)
However, if you read the subsection videoUrl of the document Upload and index your videos and the following C# code carefully, even the explaination for videoUrl in API reference, you will see the video file passed as a multipart/form body content is not the only way.
videoUrl
A URL of the video/audio file to be indexed. The URL must point at a media file (HTML pages are not supported). The file can be protected by an access token provided as part of the URI and the endpoint serving the file must be secured with TLS 1.2 or higher. The URL needs to be encoded.
If the videoUrl is not specified, the Video Indexer expects you to pass the file as a multipart/form body content.
The screenshot for the C# code with videoUrl
The screenshot for the videoUrl parameter in API reference
You can first upload a large video file to Azure Blob Storage or other online services satisfied the videoUrl requirement via Python streaming upload code or other tools like azcopy or Azure Storage Explorer, then using Azure Blob Storage as example to generate a blob url with sas token (Python code as below) to pass it as videoUrl to API request for uploading.
Python code for generating a blob url with sas token
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlockBlobService, BlobPermissions
from datetime import datetime, timedelta
account_name = '<your account name>'
account_key = '<your account key>'
container_name = '<your container name>'
blob_name = '<your blob name>'
service = BaseBlobService(account_name=account_name, account_key=account_key)
token = service.generate_blob_shared_access_signature(container_name, blob_name, BlobPermissions.READ, datetime.utcnow() + timedelta(hours=1),)
blobUrlWithSas = f"https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}?{token}"
Hope it helps.
My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL
One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them?
I just want my list of strings returned as audio files.
(I put my actual key in the block above. I'm just not going to share it here.)
Configure Python App for JSON file and Install Client Library
Create a Service Account
Create a Service Account Key using the Service Account here
The JSON file downloads and save it securely
Include the Google Application Credentials in your Python App
Install the library: pip install --upgrade google-cloud-texttospeech
Using Google's Python examples found:
https://cloud.google.com/text-to-speech/docs/reference/libraries
Note: In Google's example it is not including the name parameter correctly.
and
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py
Below is the modified from the example using google app credentials and wavenet voice of a female.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")
# Build the voice request, select the language code ("en-US")
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Voices,Name, Language Code, SSML Gender, Etc
List of Voices: https://cloud.google.com/text-to-speech/docs/voices
In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
Found the answer and lost the link among 150 Google documentation pages I had open.
#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
My time consuming pursuit was to try to send the request through a JSON with Python, but this appears to be through there own modules, which works fine.
Notice the default voice gender is 'neutral'.
If you would like to avoid using the google Python API, you can simply do this:
import requests
import json
url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"
text = "This is a text"
data = {
"input": {"text": text},
"voice": {"name": "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
"audioConfig": {"audioEncoding": "MP3"}
};
headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }
r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)
It is similar to what you did but you need to include your API key.