Google Cloud Video API gives error on using "input_content" argument - python

I am trying to use Google Video API and pass a video which is on my local drive using the "input_content" argument but I get this error: InvalidArgument: 400 Either `input_uri` or `input_content` should be set.
Here is the code based on Google Documentation:
"""Detect labels given a file path."""
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.LABEL_DETECTION]
cwd = "E:/Google_Video_API/videos/video.mp4"
with io.open(cwd, "rb") as movie:
input_content = movie.read()
operation = video_client.annotate_video(
request={"features": features, "input_content": input_content}
)

 Video file need to be Base64 encoded so try this:
import base64
...
operation = video_client.annotate_video(
request={"features": features, "input_content": base64.b64encode(input_content)}
)

Related

Error trying to write CSV file to Google Cloud Storage from Dataflow pipeline

I'm working on building a Dataflow pipeline that reads a CSV file (containing 250,000 rows) from my Cloud Storage bucket, modifies the value of each row and then writes the modified contents to a new CSV in the same bucket. With the code below I'm able to read and modify the contents of the original file, but when I attempt to write the contents of the new file in GCS I get the following error:
google.api_core.exceptions.TooManyRequests: 429 POST https://storage.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=multipart: {
"error": {
"code": 429,
"message": "The rate of change requests to the object my-bucket/product-codes/URL_test_codes.csv exceeds the rate limit. Please reduce the rate of create, update, and delete requests.",
"errors": [
{
"message": "The rate of change requests to the object my-bucket/product-codes/URL_test_codes.csv exceeds the rate limit. Please reduce the rate of create, update, and delete requests.",
"domain": "usageLimits",
"reason": "rateLimitExceeded"
}
]
}
}
: ('Request failed with status code', 429, 'Expected one of', <HTTPStatus.OK: 200>) [while running 'Store Output File']
My code in Dataflow:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import traceback
import sys
import pandas as pd
from cryptography.fernet import Fernet
import google.auth
from google.cloud import storage
fernet_secret = 'aD4t9MlsHLdHyuFKhoyhy9_eLKDfe8eyVSD3tu8KzoP='
bucket = 'my-bucket'
inputFile = f'gs://{bucket}/product-codes/test_codes.csv'
outputFile = 'product-codes/URL_test_codes.csv'
#Pipeline Logic
def product_codes_pipeline(project, env, region='us-central1'):
options = PipelineOptions(
streaming=False,
project=project,
region=region,
staging_location="gs://my-bucket-dataflows/Templates/staging",
temp_location="gs://my-bucket-dataflows/Templates/temp",
template_location="gs://my-bucket-dataflows/Templates/Generate_Product_Codes.py",
subnetwork='https://www.googleapis.com/compute/v1/projects/{}/regions/us-central1/subnetworks/{}-private'.format(project, env)
)
# Transform function
def genURLs(code):
f = Fernet(fernet_secret)
encoded = code.encode()
encrypted = f.encrypt(encoded)
decrypted = f.decrypt(encrypted.decode().encode())
decoded = decrypted.decode()
if code != decoded:
print(f'Error: Code {code} and decoded code {decoded} do not match')
sys.exit(1)
url = 'https://some-url.com/redeem/product-code=' + encrypted.decode()
return url
class WriteCSVFIle(beam.DoFn):
def __init__(self, bucket_name):
self.bucket_name = bucket_name
def start_bundle(self):
self.client = storage.Client()
def process(self, urls):
df = pd.DataFrame([urls], columns=['URL'])
bucket = self.client.get_bucket(self.bucket_name)
bucket.blob(f'{outputFile}').upload_from_string(df.to_csv(index=False), 'text/csv')
# End function
p = beam.Pipeline(options=options)
(p | 'Read Input CSV' >> beam.io.ReadFromText(inputFile, skip_header_lines=1)
| 'Map Codes' >> beam.Map(genURLs)
| 'Store Output File' >> beam.ParDo(WriteCSVFIle(bucket)))
p.run()
The code produces URL_test_codes.csv in my bucket, but the file only contains one row (not including the 'URL' header) which tells me that my code is writing/overwriting the file as it processes each row. Is there a way to bulk write the contents of the entire file instead of making a series of requests to update the file? I'm new to Python/Dataflow so any help is greatly appreciated.
Let's point out the issues: the evident one is a quota issue from GCS side, reflected by the '429' error codes. But as you noted, this is derived from the inherent issue, which is more related to how you try to write your data to your blob.
Since a Beam Pipeline generates a Parallel Collection of elements, when you add elements to your PCollection, each pipeline step will be executed for each element, in other words, your ParDo function will try to write something to your output file once per element in your PCollection.
So, there are some issues with your WriteCSVFIle function. For example, in order to write your PCollection to GCS, it would be better to use a separate pipeline task focused on writing the whole PCollection, such as follows:
First, you can import this Function already included in Apache Beam:
from apache_beam.io import WriteToText
Then, you use it at the end of your pipeline:
| 'Write PCollection to Bucket' >> WriteToText('gs://{0}/{1}'.format(bucket_name, outputFile))
With this option, you don't need to create a storage client or reference a blob, the function just needs to receive the GCS URI where it would write the final result and you can adjust it according to the parameters you can find in the documentation.
With this, you only need to address the Dataframe created in your WriteCSVFIle function. Each pipeline step creates a new PCollection, so if a Dataframe-creator function should receive an element from a PCollection of URLs, then the new PCollection elements resulting from the Dataframe function will have 1 dataframe per url following your current logic, but since it seems you just want to write the results from genURLs considering that 'URL' is the only column in your dataframe, maybe going directly from genURLs to WriteToText can output what you're looking for.
Either way, you can adjust your pipeline accordingly, but at least with the WriteToText transform it would take care of writing your whole final PCollection to your Cloud Storage bucket.

batch predictions google automl via python

I'm pretty new using stackoverflow as well as using the google cloud platform, so apologies if am not asking this question in the right format. I am currently facing an issue with getting the predictions from my model.
I've trained a multilabel automl model on the google cloud platform and and now i want to use that model to score out new data entries.
Since the platform only allows one entry at the same time i want to make use of python to do batch predictions.
I've stored my data entries in seperate .txt files on the google cloud bucket and created a .txt file where i'm listing the gs:// references to those files (like they recommend in the documentation).
I've exported a .json file with my credentials from the service account and specified the id's and paths in my code:
# import API credentials and specify model / path references
path = 'xxx.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path
model_name = 'xxx'
model_id = 'TCN1234567890'
project_id = '1234567890'
model_full_id = f"https://eu-automl.googleapis.com/v1/projects/{project_id}/locations/eu/models/{model_id}"
input_uri = f"gs://bucket_name/{model_name}/file_list.txt"
output_uri = f"gs://bucket_name/{model_name}/outputs/"
prediction_client = automl.PredictionServiceClient()
And then i'm running the following code to get the predictions:
# score batch of file_list
gcs_source = automl.GcsSource(input_uris=[input_uri])
input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
gcs_destination=gcs_destination
)
response = prediction_client.batch_predict(
name=model_full_id,
input_config=input_config,
output_config=output_config
)
print("Waiting for operation to complete...")
print(
f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)
However, i'm getting the following error: InvalidArgument: 400 Request contains an invalid argument.
Would anyone have a hince what is causing this issue?
Any input would be appreciated! Thanks!
Found the issue!
I needed to set the client to the 'eu' environment first:
options = ClientOptions(api_endpoint='eu-automl.googleapis.com')
prediction_client = automl.PredictionServiceClient(client_options=options)

python boto3 text to speech 'polly'

i'm using python 3 and when i try to run my code, i get an error:
raise NoRegionError()
botocore.exceptions.NoRegionError: You must specify a region.
my code:
import boto3
client = boto3.client('polly')
output = client.synthesize_speech (
Text = "Some random text I want to convert", OutputFormat = "mp3", VoiceId = 'Aditi'
)
print(output['AudioStream'])
file = open('speech.mp3', 'wb')
file.write(output['AudioStream'].read())
file.close()
You need to try and add the region name somewhere in the client class.
I.E
polly_client = boto3.Session(
aws_access_key_id=,
aws_secret_access_key=,
region_name='us-west-2').client('polly')
This doesn't necessarily what you intended, but you get the point. the region must be included somewhere

AWS Rekognition detect label Invalid image encoding error

I am using boto3 to make calls to recognition's detect label method which takes an image (in form of base64-encoded bytes) as an input. However I keep getting InvalidImageFormatException and I don't see why. I have read the documentation and looked at some examples but I really can't figure out why I am receiving this error.
Below is my code and what I've tried so far
self.rekog_client = boto3.client('rekognition', 'us-east-1')
with open('abc100.jpg', "rb") as cf:
base64_image=base64.b64encode(cf.read()).decode("ascii")
#also tried this) ==> base64_image=base64.b64encode(cf.read())
resp = self.rekog_client.detect_labels(Image={'Bytes': base64_image})
Output/Exception:
botocore.errorfactory.InvalidImageFormatException: An error occurred(InvalidImageFormatException) when calling the DetectLabels operation: Invalid image encoding
Figured it out, the method actually required base64 encoded binary data, which wasn't really specified in the docs, the doc just said base64-encoded bytes.
self.rekog_client = boto3.client('rekognition', 'us-east-1')
with open('cat_pic600.jpg', "rb") as cf:
base64_image=base64.b64encode(cf.read())
base_64_binary = base64.decodebytes(base64_image)
resp = self.rekog_client.detect_labels(Image={'Bytes': base_64_binary})

How to get "PDF" file from the binary data of SoftLayer's quote?

I got the binary data by "getPdf" method of SoftLayer's API.
Ref.
BillingSoftLayer_Billing_Order_Quote::getPdf | SoftLayer Development Network - http://sldn.softlayer.com/reference/services/SoftLayer_Billing_Order_Quote/getPdf
Then I wanna create the PDF file from the binary data.
Do you know how to proceed it?
the method return a binary data encoded in base 64, what you need to do is decode the binary data.
see this article about enconde and decode binary data.
https://code.tutsplus.com/tutorials/base64-encoding-and-decoding-using-python--cms-25588
the Python client returns a xmlrpc.client.Binary object so you need to work with that object here an example using the Python client and Python 3
#!/usr/bin/env python
import SoftLayer
import xmlrpc.client
import base64
import os
USERNAME = 'set me'
API_KEY = 'set me'
quoteId = 1560845
client = SoftLayer.Client(username=USERNAME, api_key=API_KEY)
accountClient = client['SoftLayer_Billing_Order_Quote']
binaryData = accountClient.getPdf(id=quoteId)
decodeBinary = binaryData.data
file = open('test.pdf','wb')
file.write(decodeBinary)
Regards
This is my answer fo my question.
# import
import SoftLayer
import sys
parm=sys.argv
quoteId=parm[1]
# account info
client = SoftLayer.create_client_from_env()
# getPdf as a binary data
getPdf = client['Billing_Order_Quote'].getPdf(id=quoteId)
# Save as a PDF
quoteFileName = "Quote_ID_%s.pdf" % quoteId
w = open(quoteFileName, "wb")
w.write(getPdf.data)
w.close()

Categories

Resources