Google DocumentAI -> ValueError: Protocol message Document has no "file" field

Google DocumentAI -> ValueError: Protocol message Document has no "file" field - python

In my script, I have the following:
response = requests.get(list_url[0], allow_redirects=True)
s = io.BytesIO()
s.write(response.content)
s.seek(0)
mimetype="application/octet-stream"
document = {'file': s.read(), 'mime': mimetype}
request = {"name": name, "document": document}
However, when I send a request to the server:
result = client.process_document(request=request)
I get ValueError: Protocol message Document has no "file" field.
Is this due because google docAI doesn't accept octet-stream?

I checked the latest version code of the document ai python client DocumentProcessorServiceClient and found this function pass on its request field a Process Request object. You can check details of that function on the process_document github code page.
Process Request will accept either inline_document or a raw_document (both are mutual exclusive). Based on your code it looks like you are passing a raw_document which only accepts fields content and mime_type which should be used instead of file and mime.
If you check the sample of using python library client for document ai you will find this lines which explain how it should be implemented:
...
document = {"content": image_content, "mime_type": "application/pdf"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
...
For additional details, you can check the official github project for document ai and the official google page for the python client library.

Related

Is there any way to export data from pardot using python?

Currently its a manual process as mentioned below in the link:
https://help.salesforce.com/articleView?id=pardot_export_prospects.htm&type=5
Can we use python to fetch data instead of doing manual work? Please share an example of the code

Use Official Pardot API Documentation and PyPardot - Python API wrapper for Pardot.
See querying objects for code example and querying prospects section of API docs for supported operations.

Review the developer docs shared above and this doc: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_oauth_and_connected_apps.htm , setup the connected app. Then retrieve the token using the keys and business unit
then use the token to grab what you are looking for using the pardot api
response = requests.post(auth_url, data = {
'client_id':client_id,
'client_secret':client_secret,
'grant_type':'password',
'username':sfdc_user,
'password':sfdc_pass
})
print (response)
# Retrieve token
json_res = response.json()
access_token = json_res['access_token']
#print(access_token)
auth = {'Authorization':'Bearer ' + access_token,'Pardot-Business-Unit-Id':sfdc_buid}
response = requests.post(listmember_uri, headers=auth,data ={})

I was trying to do so but found out that as of February 1, 2021, Pardot will no longer accept userkey, password, and username combinations for Pardot API authentication.

How to use Google's Text-to-Speech API in Python

My key is ready to go to make requests and get speech from text from Google.
I tried these commands and many more.
The docs offer no straight forward solutions to getting started with Python that I've found. I don't know where my API key goes along with the JSON and URL
One solution in their docs here is for CURL.. But involves downloading a txt after the request that has to be sent back to them in order to get the file. Is there a way to do this in Python that doesn't involve the txt I have to return them?
I just want my list of strings returned as audio files.
(I put my actual key in the block above. I'm just not going to share it here.)

Configure Python App for JSON file and Install Client Library
Create a Service Account
Create a Service Account Key using the Service Account here
The JSON file downloads and save it securely
Include the Google Application Credentials in your Python App
Install the library: pip install --upgrade google-cloud-texttospeech
Using Google's Python examples found:
https://cloud.google.com/text-to-speech/docs/reference/libraries
Note: In Google's example it is not including the name parameter correctly.
and
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/texttospeech/cloud-client/quickstart.py
Below is the modified from the example using google app credentials and wavenet voice of a female.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/yourproject-12345.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Do no evil!")
# Build the voice request, select the language code ("en-US")
# ****** the NAME
# and the ssml voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
Voices,Name, Language Code, SSML Gender, Etc
List of Voices: https://cloud.google.com/text-to-speech/docs/voices
In the above code example I changed the voice from Google's example code to include the name parameter and to use the Wavenet voice (much improved but more expensive $16/million chars) and the SSML Gender to FEMALE.
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
name='en-US-Wavenet-C',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)

Found the answer and lost the link among 150 Google documentation pages I had open.
#(Since I'm using a Jupyter Notebook)
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/Path/to/JSON/file/jsonfile.json"
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL)
# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(synthesis_input, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
My time consuming pursuit was to try to send the request through a JSON with Python, but this appears to be through there own modules, which works fine.
Notice the default voice gender is 'neutral'.

If you would like to avoid using the google Python API, you can simply do this:
import requests
import json
url = "https://texttospeech.googleapis.com/v1beta1/text:synthesize"
text = "This is a text"
data = {
"input": {"text": text},
"voice": {"name": "fr-FR-Wavenet-A", "languageCode": "fr-FR"},
"audioConfig": {"audioEncoding": "MP3"}
};
headers = {"content-type": "application/json", "X-Goog-Api-Key": "YOUR_API_KEY" }
r = requests.post(url=url, json=data, headers=headers)
content = json.loads(r.content)
It is similar to what you did but you need to include your API key.

Python Mendeley SDK API returns error in authentication

I have a mendeley account which I am using from their online version. I created a userid and client secret, saved it in config.yml file from and using it to authenticate. I am using the below code available on their website
import yaml
from mendeley import Mendeley
with open('config.yml') as f:
config = yaml.load(f)
REDIRECT_URI = 'http://localhost:5000/oauth'
mendeley = Mendeley(config['clientId'], config['clientSecret'], REDIRECT_URI)
auth = mendeley.start_client_credentials_flow()
session = auth.authenticate()
This code works fine and I got not errors. But when I am trying to access data using the commands in the example it throws error. For example
>> print (session.profiles.me.display_name)
mendeley.exception.MendeleyApiException: The Mendeley API returned an error (status: 400, message: No userid found in auth token)
>> for document in session.documents.iter():
print document.title
mendeley.exception.MendeleyApiException: The Mendeley API returned an error (status: 403, message: Access to this document is not allowed)
I am stuck here and do not know how to access the data or articles I have on mendeley fom its API. Any help will be highly appreciated.

You are using the Client Credentials flow (a term from OAuth). Quoting from the docs:
This flow does not require the user to log in. However, it only
provides access to a limited set of resources (the read-only Mendeley
Catalog of crowd sourced documents).
The two calls from your questions are private (user) information, and will therefor fail.
On the other hand, the following should work:
print session.catalog.by_identifier(doi='10.1371/journal.pmed.0020124', view='stats').reader_count
If you want to access private information, you will need users to log in. This question should help.

Export spreadsheet as text/csv using Drive v3 gives 500 Internal Error

I was trying to export a Google Spreadsheet in csv format using the Google client library for Python:
# OAuth and setups...
req = g['service'].files().export_media(fileId=fileid, mimeType=MIMEType)
fh = io.BytesIO()
downloader = http.MediaIoBaseDownload(fh, req)
# Other file IO handling...
This works for MIMEType: application/pdf, MS Excel, etc.
According to Google's documentation, text/csv is supported. But when I try to make a request, the server gives a 500 Internal Error.
Even using google's Drive API playground, it gives the same error.
Tried:
Like in v2, I added a field:
gid = 0
to the request to specify the worksheet, but then it's a bad request.

This is a known bug in Google's code. https://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=4289
However, if you manually build your own request, you can download the whole file in bytes (the media management stuff won't work).
With file as the file ID, http as the http object that you've authorized against you can download a file with:
from apiclient.http import HttpRequest
def postproc(*args):
return args[1]
data = HttpRequest(http=http,
postproc=postproc,
uri='https://docs.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=csv' % file,
headers={ }).execute()
data here is a bytes object that contains your CSV. You can open it something like:
import io
lines = io.TextIOWrapper(io.BytesIO(data), encoding='utf-8', errors='replace')
for line in lines:
#Do whatever

You just need to implement an Exponential Backoff.
Looking at this documentation of ExponentialBackOffPolicy.
The idea is that the servers are only temporarily unavailable, and they should not be overwhelmed when they are trying to get back up.
The default implementation requires back off for 500 and 503 status codes. Subclasses may override if different status codes are required.
Here is an snippet of an implementation of Exponential Backoff from the first link:
ExponentialBackOff backoff = ExponentialBackOff.builder()
.setInitialIntervalMillis(500)
.setMaxElapsedTimeMillis(900000)
.setMaxIntervalMillis(6000)
.setMultiplier(1.5)
.setRandomizationFactor(0.5)
.build();
request.setUnsuccessfulResponseHandler(new HttpBackOffUnsuccessfulResponseHandler(backoff));
You may want to look at this documentation for the summary of the ExponentialBackoff implementation.

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.

You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).

I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google DocumentAI -> ValueError: Protocol message Document has no "file" field - python

Related

Is there any way to export data from pardot using python?

How to use Google's Text-to-Speech API in Python

Python Mendeley SDK API returns error in authentication

Export spreadsheet as text/csv using Drive v3 gives 500 Internal Error

GAE - how to use blobstore stub in testbed?

Categories

Resources