How to deserialize App Engine application logs from StackDriver Logging API?

How to deserialize App Engine application logs from StackDriver Logging API? - python

As part of migrating to Python 3, I need to migrate from logservice to the StackDriver Logging API. I have google-cloud-logging installed, and I can successfully fetch GAE application logs with eg:
>>> from google.cloud.logging_v2 import LoggingServiceV2Client
>>> entries = LoggingServiceV2Client().list_log_entries(('projects/projectname',),
filter_='resource.type="gae_app" AND protoPayload.#type="type.googleapis.com/google.appengine.logging.v1.RequestLog"')
>>> print(next(iter(entries)))
proto_payload {
type_url: "type.googleapis.com/google.appengine.logging.v1.RequestLog"
value: "\n\ts~brid-gy\022\0018\032R5d..."
}
This gets me a LogEntry with text application logs in the proto_payload.value field. How do I deserialize that field? I've found lots of related mentions in the docs, but nothing pointing me to a google.appengine.logging.v1.RequestLog protobuf generated class anywhere that I can use, if that's even the right idea. Has anyone done this?

Woo! Finally got this working. I had to generate and use the Python bindings for the google.appengine.logging.v1.RequestLog protocol buffer myself, by hand. Here's how.
First, I cloned these two repos at head:
https://github.com/googleapis/googleapis.git
https://github.com/protocolbuffers/protobuf.git
Then, I generated request_log_pb2.py from request_log.proto by running:
protoc -I googleapis/ -I protobuf/src/ --python_out . googleapis/google/appengine/logging/v1/request_log.proto
Finally, I pip installed googleapis-common-protos and protobuf. I was then able to deserialize proto_payload with:
from google.cloud.logging_v2 import LoggingServiceV2Client
client = LoggingServiceV2Client(...)
log = next(iter(client.list_log_entries(('projects/brid-gy',),
filter_='logName="projects/brid-gy/logs/appengine.googleapis.com%2Frequest_log"')))
import request_log_pb2
pb = request_log_pb2.RequestLog.FromString(log.proto_payload.value)
print(pb)

You can use the LogEntry.to_api_repr() function to get a JSON version of the LogEntry.
>>> from google.cloud.logging import Client
>>> entries = Client().list_entries(filter_="severity:DEBUG")
>>> entry = next(iter(entries))
>>> entry.to_api_repr()
{'logName': 'projects/PROJECT_NAME/logs/cloudfunctions.googleapis.com%2Fcloud-functions'
, 'resource': {'type': 'cloud_function', 'labels': {'region': 'us-central1', 'function_name': 'tes
t', 'project_id': 'PROJECT_NAME'}}, 'labels': {'execution_id': '1zqolde6afmx'}, 'insertI
d': '000000-f629ab40-aeca-4802-a678-d513e605608e', 'severity': 'DEBUG', 'timestamp': '2019-10-24T2
1:55:14.135056Z', 'trace': 'projects/PROJECT_NAME/traces/9c5201c3061d91c2b624abb950838b4
0', 'textPayload': 'Function execution started'}

Do you really want to use the API v2 ?
If not, use from google.cloud import logging and
set os.environ['GOOGLE_CLOUD_DISABLE_GRPC'] = 'true' - or similar env setting.
That will effectively return a JSON in payload instead of payload_pb

Related

ModelUploadOp step failing with custom prediction container

I am currenlty trying to deploy a Vertex pipeline to achieve the following:
Train a custom model (from a custom training python package) and dump model artifacts (trained model and data preprocessor that will be sed at prediction time). This is step is working fine as I can see new resources being created in the storage bucket.
Create a model resource via ModelUploadOp. This step fails for some reason when specifying serving_container_environment_variables and serving_container_ports with the error message in the errors section below. This is somewhat surprising as they are both needed by the prediction container and environment variables are passed as a dict as specified in the documentation.
This step works just fine using gcloud commands:
gcloud ai models upload \
--region us-west1 \
--display-name session_model_latest \
--container-image-uri gcr.io/and-reporting/pred:latest \
--container-env-vars="MODEL_BUCKET=ml_session_model" \
--container-health-route=//health \
--container-predict-route=//predict \
--container-ports=5000
Create an endpoint.
Deploy the model to the endpoint.
There is clearly something that I am getting wrong with Vertex, the components documentation doesn't help much in this case.
Pipeline
from datetime import datetime
import kfp
from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
from kfp.v2 import compiler
PIPELINE_ROOT = "gs://ml_model_bucket/pipeline_root"
#kfp.dsl.pipeline(name="session-train-deploy", pipeline_root=PIPELINE_ROOT)
def pipeline():
training_op = gcc_aip.CustomPythonPackageTrainingJobRunOp(
project="my-project",
location="us-west1",
display_name="train_session_model",
model_display_name="session_model",
service_account="name#my-project.iam.gserviceaccount.com",
environment_variables={"MODEL_BUCKET": "ml_session_model"},
python_module_name="trainer.train",
staging_bucket="gs://ml_model_bucket/",
base_output_dir="gs://ml_model_bucket/",
args=[
"--gcs-data-path",
"gs://ml_model_data/2019-Oct_short.csv",
"--gcs-model-path",
"gs://ml_model_bucket/model/model.joblib",
"--gcs-preproc-path",
"gs://ml_model_bucket/model/preproc.pkl",
],
container_uri="us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
python_package_gcs_uri="gs://ml_model_bucket/trainer-0.0.1.tar.gz",
model_serving_container_image_uri="gcr.io/my-project/pred",
model_serving_container_predict_route="/predict",
model_serving_container_health_route="/health",
model_serving_container_ports=[5000],
model_serving_container_environment_variables={
"MODEL_BUCKET": "ml_model_bucket/model"
},
)
model_upload_op = gcc_aip.ModelUploadOp(
project="and-reporting",
location="us-west1",
display_name="session_model",
serving_container_image_uri="gcr.io/my-project/pred:latest",
# When passing the following 2 arguments this step fails...
serving_container_environment_variables={"MODEL_BUCKET": "ml_model_bucket/model"},
serving_container_ports=[5000],
serving_container_predict_route="/predict",
serving_container_health_route="/health",
)
model_upload_op.after(training_op)
endpoint_create_op = gcc_aip.EndpointCreateOp(
project="my-project",
location="us-west1",
display_name="pipeline_endpoint",
)
model_deploy_op = gcc_aip.ModelDeployOp(
model=model_upload_op.outputs["model"],
endpoint=endpoint_create_op.outputs["endpoint"],
deployed_model_display_name="session_model",
traffic_split={"0": 100},
service_account="name#my-project.iam.gserviceaccount.com",
)
model_deploy_op.after(endpoint_create_op)
if __name__ == "__main__":
ts = datetime.now().strftime("%Y%m%d%H%M%S")
compiler.Compiler().compile(pipeline, "custom_train_pipeline.json")
pipeline_job = aiplatform.PipelineJob(
display_name="session_train_and_deploy",
template_path="custom_train_pipeline.json",
job_id=f"session-custom-pipeline-{ts}",
enable_caching=True,
)
pipeline_job.submit()
Errors and notes
When specifying serving_container_environment_variables and serving_container_ports the step fails with the following error:
{'code': 400, 'message': 'Invalid JSON payload received. Unknown name "MODEL_BUCKET" at \'model.container_spec.env[0]\': Cannot find field.\nInvalid value at \'model.container_spec.ports[0]\' (type.googleapis.com/google.cloud.aiplatform.v1.Port), 5000', 'status': 'INVALID_ARGUMENT', 'details': [{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'model.container_spec.env[0]', 'description': 'Invalid JSON payload received. Unknown name "MODEL_BUCKET" at \'model.container_spec.env[0]\': Cannot find field.'}, {'field': 'model.container_spec.ports[0]', 'description': "Invalid value at 'model.container_spec.ports[0]' (type.googleapis.com/google.cloud.aiplatform.v1.Port), 5000"}]}]}
When commenting out serving_container_environment_variables and serving_container_ports the model resource gets created but deploying it manually to the endpoint results into a failed deployment with no output logs.

After some time researching the problem I've stumbled upon this Github issue. The problem was originated by a mismatch between google_cloud_pipeline_components and kubernetes_api docs. In this case, serving_container_environment_variables is typed as an Optional[dict[str, str]] whereas it should have been typed as a Optional[list[dict[str, str]]]. A similar mismatch can be found for serving_container_ports argument as well. Passing arguments following kubernetes documentation did the trick:
model_upload_op = gcc_aip.ModelUploadOp(
project="my-project",
location="us-west1",
display_name="session_model",
serving_container_image_uri="gcr.io/my-project/pred:latest",
serving_container_environment_variables=[
{"name": "MODEL_BUCKET", "value": "ml_session_model"}
],
serving_container_ports=[{"containerPort": 5000}],
serving_container_predict_route="/predict",
serving_container_health_route="/health",
)

Upload binary files using python-gitlab API

I'm tasked with migrating repos to gitlab and I decided to automate the process using python-gitlab. Everything works fine except for binary or considered-binary files like compiled object files ( .o ) or .zip files. (I know that repositories are not place for binaries. I work with what I got and what I'm told to do.)
I'm able to upload them using:
import gitlab
project = gitlab.Gitlab("git_adress", "TOKEN")
bin_content = base64.b64encode(open("my_file.o", 'rb').read() ).decode()
and then:
data = {'branch':'main', 'commit_message':'go away', 'actions':[{'action': 'create', 'file_path': "my_file.o", 'content': bin_content, 'encode' : 'base64'}]}
project.commits.create(data)
Problem is that content of such files inside gitlab repository is something like:
f0VMRgIBAQAAAAAAAAAAAAEAPgABAAAAAAAAAAAAA....
Which is not what I want.
If I don't .decode() I get error saying:
TypeError: Object of type bytes is not JSON serializable
Which is expected since I sent file opened in binary mode and encoded with base64.
I'd like to have such files uploaded/stored like when I upload them using web GUI "upload file" option.
Is it possible to achieve this using python-gitlab API ? If so, how?

The problem is that Python's base64.b64encode function will provide you with a bytes object, but REST APIs (specifically, JSON serialization) want strings. Also the argument you want is encoding not encode.
Here's the full example to use:
from base64 import b64encode
import gitlab
GITLAB_HOST = 'https://gitlab.com'
TOKEN = 'YOUR API KEY'
PROJECT_ID = 123 # your project ID
gl = gitlab.Gitlab(GITLAB_HOST, private_token=TOKEN)
project = gl.projects.get(PROJECT_ID)
with open('myfile.o', 'rb') as f:
bin_content = f.read()
b64_content = b64encode(bin_content).decode('utf-8')
# b64_content must be a string!
f = project.files.create({'file_path': 'my_file.o',
'branch': 'main',
'content': b64_content,
'author_email': 'test#example.com',
'author_name': 'yourname',
'encoding': 'base64', # important!
'commit_message': 'Create testfile'})
Then in the UI, you will see GitLab has properly recognized the contents as binary, rather than text:

Openstack Python API novaclient - SecurityGroup Rule list with description

i have been familiar with the python API for a while and there is an annoying thing i can't solve.
in short, I want to get all the security rules in my environment.
it works, which bothers me that I can't get the "description" associated with them at all.
my PY code:
from keystoneauth1 import session
from novaclient import client
import json
from requests import get
...AUTH....
sg_list = nova.security_groups.list()
print(sg_list)
....OUTPUT:
[<SecurityGroup description=192.168.140.0/24, id=123213xxxc2e6156243, name=asdasdasd, rules=[{'from_port': 1, 'group': {}, 'ip_protocol': 'tcp', 'to_port': 65535, 'parent_group_id': '615789e4-d4e214213136156243', 'ip_range': {'cidr': '192.168.140.0/24'},....
Is there a solution for this?
Thanks !

The output is a list , so you can do on first element:
sg_list[0].split("description",1)[1]
and you will get the description only

Attribute error DESCRIPTOR while trying to convert google vision response to dictionary with python

I am on Windows, using Python 3.8.6rc1, protobuf version 3.13.0 and google-cloud-vision version 2.0.0.
My Code is :
from google.protobuf.json_format import MessageToDict
from google.cloud import vision
client = vision.ImageAnnotatorClient()
response = client.annotate_image({
'image': {'source': {'image_uri': 'https://images.unsplash.com/photo-1508138221679-760a23a2285b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=60'}},
})
MessageToDict(response)
It fails at MessageToDict(response), I have an attribute error: "DESCRIPTOR". It seems like the response is not a valid protobuf object. Can someone help me? Thank you

This does not really answer my question but I find that one way to solve it and access the protobuf object is to use response._pb so the code becomes:
response = client.annotate_image({
'image': {'source': {'image_uri': 'https://images.unsplash.com/photo-1508138221679-760a23a2285b?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=60'}},
})
MessageToDict(response._pb)

Look step 3,
Step 1: Import this lib
from google.protobuf.json_format import MessageToDict
Step 2: Send request
keyword_ideas = keyword_plan_idea_service.generate_keyword_ideas(
request=request
)
Step 3: Convert response to json [Look here, add ".pd"]
keyword_ideas_json = MessageToDict(keyword_ideas._pb) // add ._pb at the end of object
Step 4: Do whatever you want with that json
print(keyword_ideas_json)
Github for this same issue: here

maybe have a look at this post
json_string = type(response).to_json(response)
# Alternatively
import proto
json_string = proto.Message.to_json(response)

From the github issue #FriedrichSal posted, you can see that proto does the job and this is still valid in 2022 (library name is proto-plus):
All message types are now defined using proto-plus, which uses different methods for serialization and deserialization.
import proto
objects = client.object_localization(image=image)
json_obs = proto.Message.to_json(objects)
dict_obs = proto.Message.to_dict(objects)
The MessageToJson(objects._pb) still works, but maybe someone prefers not to depend on a "hidden" property.

How do you connect to AWS Elastic Transcoder?

I'm trying to transcode some videos, but something is wrong with the way I am connecting.
Here's my code:
transcode = layer1.ElasticTranscoderConnection()
transcode.DefaultRegionEndpoint = 'elastictranscoder.us-west-2.amazonaws.com'
transcode.DefaultRegionName = 'us-west-2'
transcode.create_job(pipelineId, transInput, transOutput)
Here's the exception:
{u'message': u'The specified pipeline was not found: account=xxxxxx, pipelineId=xxxxxx.'}

To connect to a specific region in boto, you can use:
import boto.elastictranscoder
transcode = boto.elastictranscoder.connect_to_region('us-west-2')
transcode.create_job(...)

I just started using boto the other day, but the previous answer didn't work for me - don't know if the API changed or what (seems a little weird if it did, but anyway). This is how I did it.
#!/usr/bin/env python
# Boto
import boto
# Debug
boto.set_stream_logger('boto')
# Pipeline Id
pipeline_id = 'lotsofcharacters-393824'
# The input object
input_object = {
'Key': 'foo.webm',
'Container': 'webm',
'AspectRatio': 'auto',
'FrameRate': 'auto',
'Resolution': 'auto',
'Interlaced': 'auto'
}
# The object (or objects) that will be created by the transcoding job;
# note that this is a list of dictionaries.
output_objects = [
{
'Key': 'bar.mp4',
'PresetId': '1351620000001-000010',
'Rotate': 'auto',
'ThumbnailPattern': '',
}
]
# Phone home
# - Har har.
et = boto.connect_elastictranscoder()
# Create the job
# - If successful, this will execute immediately.
et.create_job(pipeline_id, input_name=input_object, outputs=output_objects)
Obviously, this is a contrived example and just runs from a standalone python script; it assumes you have a .boto file somewhere with your credentials in it.
Another thing to note is the PresetId's; you can find these in the AWS Management Console for Elastic Transcoder, under Presets. Finally, the values that can be stuffed in the dictionaries are lifted verbatim from the following link - as far as I can tell, they are just interpolated into a REST call (case sensitive, obviously).
AWS Create Job API

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to deserialize App Engine application logs from StackDriver Logging API? - python

Do you really want to use the API v2 ? If not, use from google.cloud import logging and set os.environ['GOOGLE_CLOUD_DISABLE_GRPC'] = 'true' - or similar env setting. That will effectively return a JSON in payload instead of payload_pb

Related

ModelUploadOp step failing with custom prediction container

Upload binary files using python-gitlab API

Openstack Python API novaclient - SecurityGroup Rule list with description

Attribute error DESCRIPTOR while trying to convert google vision response to dictionary with python

How do you connect to AWS Elastic Transcoder?

Categories

Resources