batch predictions google automl via python - python

I'm pretty new using stackoverflow as well as using the google cloud platform, so apologies if am not asking this question in the right format. I am currently facing an issue with getting the predictions from my model.
I've trained a multilabel automl model on the google cloud platform and and now i want to use that model to score out new data entries.
Since the platform only allows one entry at the same time i want to make use of python to do batch predictions.
I've stored my data entries in seperate .txt files on the google cloud bucket and created a .txt file where i'm listing the gs:// references to those files (like they recommend in the documentation).
I've exported a .json file with my credentials from the service account and specified the id's and paths in my code:
# import API credentials and specify model / path references
path = 'xxx.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path
model_name = 'xxx'
model_id = 'TCN1234567890'
project_id = '1234567890'
model_full_id = f"https://eu-automl.googleapis.com/v1/projects/{project_id}/locations/eu/models/{model_id}"
input_uri = f"gs://bucket_name/{model_name}/file_list.txt"
output_uri = f"gs://bucket_name/{model_name}/outputs/"
prediction_client = automl.PredictionServiceClient()
And then i'm running the following code to get the predictions:
# score batch of file_list
gcs_source = automl.GcsSource(input_uris=[input_uri])
input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
gcs_destination=gcs_destination
)
response = prediction_client.batch_predict(
name=model_full_id,
input_config=input_config,
output_config=output_config
)
print("Waiting for operation to complete...")
print(
f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)
However, i'm getting the following error: InvalidArgument: 400 Request contains an invalid argument.
Would anyone have a hince what is causing this issue?
Any input would be appreciated! Thanks!

Found the issue!
I needed to set the client to the 'eu' environment first:
options = ClientOptions(api_endpoint='eu-automl.googleapis.com')
prediction_client = automl.PredictionServiceClient(client_options=options)

Related

Pulling metrics from azure storage with the python sdk

I'm trying to get metrics from azure storage, like transaction_count, ingress, egress, server sucess latency etc.
I am trying with the following code :
from azure.storage.blob import BlobAnalyticsLogging, Metrics, CorsRule, RetentionPolicy
# Create logging settings
logging = BlobAnalyticsLogging(read=True, write=True, delete=True, retention_policy=RetentionPolicy(enabled=True, days=5))
# Create metrics for requests statistics
hour_metrics = Metrics(enabled=True, include_apis=True, retention_policy=RetentionPolicy(enabled=True, days=5))
minute_metrics = Metrics(enabled=True, include_apis=True,retention_policy=RetentionPolicy(enabled=True, days=5))
# Create CORS rules
cors_rule = CorsRule(['www.xyz.com'], ['GET'])
cors = [cors_rule]
# Set the service properties
blob_service_client.set_service_properties(logging, hour_metrics, minute_metrics, cors)
# [END set_blob_service_properties]
# [START get_blob_service_properties]
properties = blob_service_client.get_service_properties()
# [END get_blob_service_properties]
print (properties)
This one does not give an error, but returns the following output:
{'analytics_logging': <azure.storage.blob._models.BlobAnalyticsLogging object at 0x7ffa629d8880>, 'hour_metrics': <azure.storage.blob._models.Metrics object at 0x7ffa629d84c0>, 'minute_metrics': <azure.storage.blob._models.Metrics object at 0x7ffa629d8940>, 'cors': [<azure.storage.blob._models.CorsRule object at 0x7ffa629d8a60>], 'target_version': None, 'delete_retention_policy': <azure.storage.blob._models.RetentionPolicy object at 0x7ffa629d85e0>, 'static_website': <azure.storage.blob._models.StaticWebsite object at 0x7ffa629d8a00>}
I understand that maybe I am missing something, the documentation is quite dense and I don't understand it very well.
Thanks in advance for any possible answers
You would probably want to read the service properties individually, i.E.:
analytics_logging = blob_service_client.get_service_properties().get('analytics_logging')
This way you should be able to process (or read) the output further in your code.

Google Cloud Video API gives error on using "input_content" argument

I am trying to use Google Video API and pass a video which is on my local drive using the "input_content" argument but I get this error: InvalidArgument: 400 Either `input_uri` or `input_content` should be set.
Here is the code based on Google Documentation:
"""Detect labels given a file path."""
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.LABEL_DETECTION]
cwd = "E:/Google_Video_API/videos/video.mp4"
with io.open(cwd, "rb") as movie:
input_content = movie.read()
operation = video_client.annotate_video(
request={"features": features, "input_content": input_content}
)
 Video file need to be Base64 encoded so try this:
import base64
...
operation = video_client.annotate_video(
request={"features": features, "input_content": base64.b64encode(input_content)}
)

How to export GCP's Security Center Assets to a Cloud Storage via cloud Function?

I have a cloud function calling SCC's list_assets and converting the paginated output to a List (to fetch all the results). However, since I have quite a lot of assets in the organization tree, it is taking a lot of time to fetch and cloud function times out (540 seconds max timeout).
asset_iterator = security_client.list_assets(org_name)
asset_fetch_all=list(asset_iterator)
I tried to export via WebUI and it works fine (took about 5 minutes). Is there a way to export the assets from SCC directly to a Cloud Storage bucket using the API?
I develop the same thing, in Python, for exporting to BQ. Searching in BigQuery is easier than in a file. The code is very similar for GCS storage. Here my working code with BQ
import os
from google.cloud import asset_v1
from google.cloud.asset_v1.proto import asset_service_pb2
from google.cloud.asset_v1 import enums
def GCF_ASSET_TO_BQ(request):
client = asset_v1.AssetServiceClient()
parent = 'organizations/{}'.format(os.getenv('ORGANIZATION_ID'))
output_config = asset_service_pb2.OutputConfig()
output_config.bigquery_destination.dataset = 'projects/{}/datasets/{}'.format(os.getenv('PROJECT_ID'),os.getenv('DATASET'))
content_type = enums.ContentType.RESOURCE
output_config.bigquery_destination.table = 'asset_export'
output_config.bigquery_destination.force = True
response = client.export_assets(parent, output_config, content_type=content_type)
# For waiting the finish
# response.result()
# Do stuff after export
return "done", 200
if __name__ == "__main__":
GCF_ASSET_TO_BQ('')
As you can see, there is some values in Env Var (OrganizationID, projectId and Dataset). For exporting to Cloud Storage, you have to change the definition of the output_config like this
output_config = asset_service_pb2.OutputConfig()
output_config.gcs_destination.uri = 'gs://path/to/file'
You have example in other languages here
Try something like this:
We use it to upload finding into a bucket. Make sure to give the SP the function is running the right permissions to the bucket.
def test_list_medium_findings(source_name):
# [START list_findings_at_a_time]
from google.cloud import securitycenter
from google.cloud import storage
# Create a new client.
client = securitycenter.SecurityCenterClient()
#Set query paramaters
organization_id = "11112222333344444"
org_name = "organizations/{org_id}".format(org_id=organization_id)
all_sources = "{org_name}/sources/-".format(org_name=org_name)
#Query Security Command Center
finding_result_iterator = client.list_findings(all_sources,filter_=YourFilter)
#Set output file settings
bucket="YourBucketName"
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
output_file_name = "YourFileName"
my_file = bucket.blob(output_file_name)
with open('/tmp/data.txt', 'w') as file:
for i, finding_result in enumerate(finding_result_iterator):
file.write(
"{}: name: {} resource: {}".format(
i, finding_result.finding.name, finding_result.finding.resource_name
)
)
#Upload to bucket
my_file.upload_from_filename("/tmp/data.txt")

Problem with authorize view in BigQuery: Dataset access entries is empty

I'm trying to authorize a view programmatically in BigQuery and I have the following issue: I just tried the code proposed in the Google docs (https://cloud.google.com/bigquery/docs/dataset-access-controls) but when it comes the part of getting the current access entries for the dataset the result is always empty. I don't want to overwrite the current configuration. Any idea about this behavior?
def authorize_view(dataset_id, view_name):
dataset_ref = client.dataset(dataset_id)
view_ref = dataset_ref.table(view_name)
source_dataset = bigquery.Dataset(client.dataset('mydataset'))
access_entries = source_dataset.access_entries # This returns []
access_entries.append(
bigquery.AccessEntry(None, 'view', view_ref.to_api_repr())
)
source_dataset.access_entries = access_entries
source_dataset = client.update_dataset(
source_dataset, ['access_entries']) # API request

How to upload a file to Google Drive using a Python script?

I need to backup various file types to GDrive (not just those convertible to GDocs formats) from some linux server.
What would be the simplest, most elegant way to do that with a python script? Would any of the solutions pertaining to GDocs be applicable?
You can use the Documents List API to write a script that writes to Drive:
https://developers.google.com/google-apps/documents-list/
Both the Documents List API and the Drive API interact with the same resources (i.e. same documents and files).
This sample in the Python client library shows how to upload an unconverted file to Drive:
http://code.google.com/p/gdata-python-client/source/browse/samples/docs/docs_v3_example.py#180
The current documentation for saving a file to google drive using python can be found here:
https://developers.google.com/drive/v3/web/manage-uploads
However, the way that the google drive api handles document storage and retrieval does not follow the same architecture as POSIX file systems. As a result, if you wish to preserve the hierarchical architecture of the nested files on your linux file system, you will need to write a lot of custom code so that the parent directories are preserved on google drive.
On top of that, google makes it difficult to gain write access to a normal drive account. Your permission scope must include the following link: https://www.googleapis.com/auth/drive and to obtain a token to access a user's normal account, that user must first join a group to provide access to non-reviewed apps. And any oauth token that is created has a limited shelf life.
However, if you obtain an access token, the following script should allow you to save any file on your local machine to the same (relative) path on google drive.
def migrate(file_path, access_token, drive_space='drive'):
'''
a method to save a posix file architecture to google drive
NOTE: to write to a google drive account using a non-approved app,
the oauth2 grantee account must also join this google group
https://groups.google.com/forum/#!forum/risky-access-by-unreviewed-apps
:param file_path: string with path to local file
:param access_token: string with oauth2 access token grant to write to google drive
:param drive_space: string with name of space to write to (drive, appDataFolder, photos)
:return: string with id of file on google drive
'''
# construct drive client
import httplib2
from googleapiclient import discovery
from oauth2client.client import AccessTokenCredentials
google_credentials = AccessTokenCredentials(access_token, 'my-user-agent/1.0')
google_http = httplib2.Http()
google_http = google_credentials.authorize(google_http)
google_drive = discovery.build('drive', 'v3', http=google_http)
drive_client = google_drive.files()
# prepare file body
from googleapiclient.http import MediaFileUpload
media_body = MediaFileUpload(filename=file_path, resumable=True)
# determine file modified time
import os
from datetime import datetime
modified_epoch = os.path.getmtime(file_path)
modified_time = datetime.utcfromtimestamp(modified_epoch).isoformat()
# determine path segments
path_segments = file_path.split(os.sep)
# construct upload kwargs
create_kwargs = {
'body': {
'name': path_segments.pop(),
'modifiedTime': modified_time
},
'media_body': media_body,
'fields': 'id'
}
# walk through parent directories
parent_id = ''
if path_segments:
# construct query and creation arguments
walk_folders = True
folder_kwargs = {
'body': {
'name': '',
'mimeType' : 'application/vnd.google-apps.folder'
},
'fields': 'id'
}
query_kwargs = {
'spaces': drive_space,
'fields': 'files(id, parents)'
}
while path_segments:
folder_name = path_segments.pop(0)
folder_kwargs['body']['name'] = folder_name
# search for folder id in existing hierarchy
if walk_folders:
walk_query = "name = '%s'" % folder_name
if parent_id:
walk_query += "and '%s' in parents" % parent_id
query_kwargs['q'] = walk_query
response = drive_client.list(**query_kwargs).execute()
file_list = response.get('files', [])
else:
file_list = []
if file_list:
parent_id = file_list[0].get('id')
# or create folder
# https://developers.google.com/drive/v3/web/folder
else:
if not parent_id:
if drive_space == 'appDataFolder':
folder_kwargs['body']['parents'] = [ drive_space ]
else:
del folder_kwargs['body']['parents']
else:
folder_kwargs['body']['parents'] = [parent_id]
response = drive_client.create(**folder_kwargs).execute()
parent_id = response.get('id')
walk_folders = False
# add parent id to file creation kwargs
if parent_id:
create_kwargs['body']['parents'] = [parent_id]
elif drive_space == 'appDataFolder':
create_kwargs['body']['parents'] = [drive_space]
# send create request
file = drive_client.create(**create_kwargs).execute()
file_id = file.get('id')
return file_id
PS. I have modified this script from the labpack python module. There is class called driveClient in that module written by rcj1492 which handles saving, loading, searching and deleting files on google drive in a way that preserves the POSIX file system.
from labpack.storage.google.drive import driveClient
I found that PyDrive handles the Drive API elegantly, and it also has great documentation (especially walking the user through the authentication part).
EDIT: Combine that with the material on Automating pydrive verification process and Pydrive google drive automate authentication, and that makes for some great documentation to get things going. Hope it helps those who are confused about where to start.

Categories

Resources