ModelUploadOp step failing with custom prediction container - python

I am currenlty trying to deploy a Vertex pipeline to achieve the following:
Train a custom model (from a custom training python package) and dump model artifacts (trained model and data preprocessor that will be sed at prediction time). This is step is working fine as I can see new resources being created in the storage bucket.
Create a model resource via ModelUploadOp. This step fails for some reason when specifying serving_container_environment_variables and serving_container_ports with the error message in the errors section below. This is somewhat surprising as they are both needed by the prediction container and environment variables are passed as a dict as specified in the documentation.
This step works just fine using gcloud commands:
gcloud ai models upload \
--region us-west1 \
--display-name session_model_latest \
--container-image-uri gcr.io/and-reporting/pred:latest \
--container-env-vars="MODEL_BUCKET=ml_session_model" \
--container-health-route=//health \
--container-predict-route=//predict \
--container-ports=5000
Create an endpoint.
Deploy the model to the endpoint.
There is clearly something that I am getting wrong with Vertex, the components documentation doesn't help much in this case.
Pipeline
from datetime import datetime
import kfp
from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
from kfp.v2 import compiler
PIPELINE_ROOT = "gs://ml_model_bucket/pipeline_root"
#kfp.dsl.pipeline(name="session-train-deploy", pipeline_root=PIPELINE_ROOT)
def pipeline():
training_op = gcc_aip.CustomPythonPackageTrainingJobRunOp(
project="my-project",
location="us-west1",
display_name="train_session_model",
model_display_name="session_model",
service_account="name#my-project.iam.gserviceaccount.com",
environment_variables={"MODEL_BUCKET": "ml_session_model"},
python_module_name="trainer.train",
staging_bucket="gs://ml_model_bucket/",
base_output_dir="gs://ml_model_bucket/",
args=[
"--gcs-data-path",
"gs://ml_model_data/2019-Oct_short.csv",
"--gcs-model-path",
"gs://ml_model_bucket/model/model.joblib",
"--gcs-preproc-path",
"gs://ml_model_bucket/model/preproc.pkl",
],
container_uri="us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest",
python_package_gcs_uri="gs://ml_model_bucket/trainer-0.0.1.tar.gz",
model_serving_container_image_uri="gcr.io/my-project/pred",
model_serving_container_predict_route="/predict",
model_serving_container_health_route="/health",
model_serving_container_ports=[5000],
model_serving_container_environment_variables={
"MODEL_BUCKET": "ml_model_bucket/model"
},
)
model_upload_op = gcc_aip.ModelUploadOp(
project="and-reporting",
location="us-west1",
display_name="session_model",
serving_container_image_uri="gcr.io/my-project/pred:latest",
# When passing the following 2 arguments this step fails...
serving_container_environment_variables={"MODEL_BUCKET": "ml_model_bucket/model"},
serving_container_ports=[5000],
serving_container_predict_route="/predict",
serving_container_health_route="/health",
)
model_upload_op.after(training_op)
endpoint_create_op = gcc_aip.EndpointCreateOp(
project="my-project",
location="us-west1",
display_name="pipeline_endpoint",
)
model_deploy_op = gcc_aip.ModelDeployOp(
model=model_upload_op.outputs["model"],
endpoint=endpoint_create_op.outputs["endpoint"],
deployed_model_display_name="session_model",
traffic_split={"0": 100},
service_account="name#my-project.iam.gserviceaccount.com",
)
model_deploy_op.after(endpoint_create_op)
if __name__ == "__main__":
ts = datetime.now().strftime("%Y%m%d%H%M%S")
compiler.Compiler().compile(pipeline, "custom_train_pipeline.json")
pipeline_job = aiplatform.PipelineJob(
display_name="session_train_and_deploy",
template_path="custom_train_pipeline.json",
job_id=f"session-custom-pipeline-{ts}",
enable_caching=True,
)
pipeline_job.submit()
Errors and notes
When specifying serving_container_environment_variables and serving_container_ports the step fails with the following error:
{'code': 400, 'message': 'Invalid JSON payload received. Unknown name "MODEL_BUCKET" at \'model.container_spec.env[0]\': Cannot find field.\nInvalid value at \'model.container_spec.ports[0]\' (type.googleapis.com/google.cloud.aiplatform.v1.Port), 5000', 'status': 'INVALID_ARGUMENT', 'details': [{'#type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'model.container_spec.env[0]', 'description': 'Invalid JSON payload received. Unknown name "MODEL_BUCKET" at \'model.container_spec.env[0]\': Cannot find field.'}, {'field': 'model.container_spec.ports[0]', 'description': "Invalid value at 'model.container_spec.ports[0]' (type.googleapis.com/google.cloud.aiplatform.v1.Port), 5000"}]}]}
When commenting out serving_container_environment_variables and serving_container_ports the model resource gets created but deploying it manually to the endpoint results into a failed deployment with no output logs.

After some time researching the problem I've stumbled upon this Github issue. The problem was originated by a mismatch between google_cloud_pipeline_components and kubernetes_api docs. In this case, serving_container_environment_variables is typed as an Optional[dict[str, str]] whereas it should have been typed as a Optional[list[dict[str, str]]]. A similar mismatch can be found for serving_container_ports argument as well. Passing arguments following kubernetes documentation did the trick:
model_upload_op = gcc_aip.ModelUploadOp(
project="my-project",
location="us-west1",
display_name="session_model",
serving_container_image_uri="gcr.io/my-project/pred:latest",
serving_container_environment_variables=[
{"name": "MODEL_BUCKET", "value": "ml_session_model"}
],
serving_container_ports=[{"containerPort": 5000}],
serving_container_predict_route="/predict",
serving_container_health_route="/health",
)

Related

Pulling metrics from azure storage with the python sdk

I'm trying to get metrics from azure storage, like transaction_count, ingress, egress, server sucess latency etc.
I am trying with the following code :
from azure.storage.blob import BlobAnalyticsLogging, Metrics, CorsRule, RetentionPolicy
# Create logging settings
logging = BlobAnalyticsLogging(read=True, write=True, delete=True, retention_policy=RetentionPolicy(enabled=True, days=5))
# Create metrics for requests statistics
hour_metrics = Metrics(enabled=True, include_apis=True, retention_policy=RetentionPolicy(enabled=True, days=5))
minute_metrics = Metrics(enabled=True, include_apis=True,retention_policy=RetentionPolicy(enabled=True, days=5))
# Create CORS rules
cors_rule = CorsRule(['www.xyz.com'], ['GET'])
cors = [cors_rule]
# Set the service properties
blob_service_client.set_service_properties(logging, hour_metrics, minute_metrics, cors)
# [END set_blob_service_properties]
# [START get_blob_service_properties]
properties = blob_service_client.get_service_properties()
# [END get_blob_service_properties]
print (properties)
This one does not give an error, but returns the following output:
{'analytics_logging': <azure.storage.blob._models.BlobAnalyticsLogging object at 0x7ffa629d8880>, 'hour_metrics': <azure.storage.blob._models.Metrics object at 0x7ffa629d84c0>, 'minute_metrics': <azure.storage.blob._models.Metrics object at 0x7ffa629d8940>, 'cors': [<azure.storage.blob._models.CorsRule object at 0x7ffa629d8a60>], 'target_version': None, 'delete_retention_policy': <azure.storage.blob._models.RetentionPolicy object at 0x7ffa629d85e0>, 'static_website': <azure.storage.blob._models.StaticWebsite object at 0x7ffa629d8a00>}
I understand that maybe I am missing something, the documentation is quite dense and I don't understand it very well.
Thanks in advance for any possible answers
You would probably want to read the service properties individually, i.E.:
analytics_logging = blob_service_client.get_service_properties().get('analytics_logging')
This way you should be able to process (or read) the output further in your code.

Receiving parse error from SageMaker Multi Model Endpoint using TensorFlow

We are currently moving our models from single model endpoints to multi model endpoints within AWS SageMaker. After deploying the Multi Model Endpoint using prebuilt TensorFlow containers I receive the following error when calling the predict() method:
{"error": "JSON Parse error: The document root must not be followed by other value at offset: 17"}
I invoke the endpoint like this:
data = np.random.rand(n_samples, n_features)
predictor = Predictor(endpoint_name=endpoint_name)
prediction = predictor.predict(data=serializer.serialize(data), target_model=model_name)
My function for processing the input is the following:
def _process_input(data, context):
data = data.read().decode('utf-8')
data = [float(x) for x in data.split(',')]
return json.dumps({'instances': [data]})
For the training I configured my container as follows:
tensorflow_container = TensorFlow(
entry_point=path_script,
framework_version='2.4',
py_version='py37',
instance_type='ml.m4.2xlarge',
instance_count=1,
role=EXECUTION_ROLE,
sagemaker_session=sagemaker_session,
hyperparameters=hyperparameters)
tensorflow_container.fit()
For deploying the endpoint I first initializing a Model from a given Estimator and then a MultiDataModel:
model = estimator.create_model(
role=EXECUTION_ROLE,
image_uri=estimator.training_image_uri(),
entry_point=path_serving)
mdm = MultiDataModel(
name=endpoint_name,
model_data_prefix=dir_model_data,
model=model,
sagemaker_session=sagemaker.Session())
mdm.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name)
Afterwards the single models are added using:
mdm.add_model(
model_data_source=source_path,
model_data_path=model_name)
Thank you for any hints and help.
This issue usually occurs in case you either have damaged or malformed JSON data. Recommend you running it past JSON validator https://jsonlint.com/
I work at AWS and my opinions are my own - Thanks,Raghu

How to propagate mlpipeline-metrics from custom Python function TFX component?

Note: this is a copy of a GitHub issue I reported.
It is re-posted in hope to get more attention, I will update any solutions on either site.
Question
I want to export mlpipeline-metrics from my custom Python function TFX component so that it is displayed in the KubeFlow UI.
This is a minimal example of what I am trying to do:
import json
from tfx.dsl.component.experimental.annotations import OutputArtifact
from tfx.dsl.component.experimental.decorators import component
from tfx.types.standard_artifacts import Artifact
class Metric(Artifact):
TYPE_NAME = 'Metric'
#component
def ShowMetric(MLPipeline_Metrics: OutputArtifact[Metric]):
rmse_eval = 333.33
metrics = {
'metrics':[
{
'name': 'RMSE-validation',
'numberValue': rmse_eval,
'format': 'RAW'
}
]
}
path = '/tmp/mlpipeline-metrics.json'
with open(path, 'w') as _file:
json.dump(metrics, _file)
MLPipeline_Metrics.uri = path
In the KubeFlow UI, the "Run output" tab says "No metrics found for this run." However, the output artefact shows up in the ML MetaData (see screenshot). Any help on how to accomplish this would be greatly appreciated. Thanks!

How to deserialize App Engine application logs from StackDriver Logging API?

As part of migrating to Python 3, I need to migrate from logservice to the StackDriver Logging API. I have google-cloud-logging installed, and I can successfully fetch GAE application logs with eg:
>>> from google.cloud.logging_v2 import LoggingServiceV2Client
>>> entries = LoggingServiceV2Client().list_log_entries(('projects/projectname',),
filter_='resource.type="gae_app" AND protoPayload.#type="type.googleapis.com/google.appengine.logging.v1.RequestLog"')
>>> print(next(iter(entries)))
proto_payload {
type_url: "type.googleapis.com/google.appengine.logging.v1.RequestLog"
value: "\n\ts~brid-gy\022\0018\032R5d..."
}
This gets me a LogEntry with text application logs in the proto_payload.value field. How do I deserialize that field? I've found lots of related mentions in the docs, but nothing pointing me to a google.appengine.logging.v1.RequestLog protobuf generated class anywhere that I can use, if that's even the right idea. Has anyone done this?
Woo! Finally got this working. I had to generate and use the Python bindings for the google.appengine.logging.v1.RequestLog protocol buffer myself, by hand. Here's how.
First, I cloned these two repos at head:
https://github.com/googleapis/googleapis.git
https://github.com/protocolbuffers/protobuf.git
Then, I generated request_log_pb2.py from request_log.proto by running:
protoc -I googleapis/ -I protobuf/src/ --python_out . googleapis/google/appengine/logging/v1/request_log.proto
Finally, I pip installed googleapis-common-protos and protobuf. I was then able to deserialize proto_payload with:
from google.cloud.logging_v2 import LoggingServiceV2Client
client = LoggingServiceV2Client(...)
log = next(iter(client.list_log_entries(('projects/brid-gy',),
filter_='logName="projects/brid-gy/logs/appengine.googleapis.com%2Frequest_log"')))
import request_log_pb2
pb = request_log_pb2.RequestLog.FromString(log.proto_payload.value)
print(pb)
You can use the LogEntry.to_api_repr() function to get a JSON version of the LogEntry.
>>> from google.cloud.logging import Client
>>> entries = Client().list_entries(filter_="severity:DEBUG")
>>> entry = next(iter(entries))
>>> entry.to_api_repr()
{'logName': 'projects/PROJECT_NAME/logs/cloudfunctions.googleapis.com%2Fcloud-functions'
, 'resource': {'type': 'cloud_function', 'labels': {'region': 'us-central1', 'function_name': 'tes
t', 'project_id': 'PROJECT_NAME'}}, 'labels': {'execution_id': '1zqolde6afmx'}, 'insertI
d': '000000-f629ab40-aeca-4802-a678-d513e605608e', 'severity': 'DEBUG', 'timestamp': '2019-10-24T2
1:55:14.135056Z', 'trace': 'projects/PROJECT_NAME/traces/9c5201c3061d91c2b624abb950838b4
0', 'textPayload': 'Function execution started'}
Do you really want to use the API v2 ?
If not, use from google.cloud import logging and
set os.environ['GOOGLE_CLOUD_DISABLE_GRPC'] = 'true' - or similar env setting.
That will effectively return a JSON in payload instead of payload_pb

How do you connect to AWS Elastic Transcoder?

I'm trying to transcode some videos, but something is wrong with the way I am connecting.
Here's my code:
transcode = layer1.ElasticTranscoderConnection()
transcode.DefaultRegionEndpoint = 'elastictranscoder.us-west-2.amazonaws.com'
transcode.DefaultRegionName = 'us-west-2'
transcode.create_job(pipelineId, transInput, transOutput)
Here's the exception:
{u'message': u'The specified pipeline was not found: account=xxxxxx, pipelineId=xxxxxx.'}
To connect to a specific region in boto, you can use:
import boto.elastictranscoder
transcode = boto.elastictranscoder.connect_to_region('us-west-2')
transcode.create_job(...)
I just started using boto the other day, but the previous answer didn't work for me - don't know if the API changed or what (seems a little weird if it did, but anyway). This is how I did it.
#!/usr/bin/env python
# Boto
import boto
# Debug
boto.set_stream_logger('boto')
# Pipeline Id
pipeline_id = 'lotsofcharacters-393824'
# The input object
input_object = {
'Key': 'foo.webm',
'Container': 'webm',
'AspectRatio': 'auto',
'FrameRate': 'auto',
'Resolution': 'auto',
'Interlaced': 'auto'
}
# The object (or objects) that will be created by the transcoding job;
# note that this is a list of dictionaries.
output_objects = [
{
'Key': 'bar.mp4',
'PresetId': '1351620000001-000010',
'Rotate': 'auto',
'ThumbnailPattern': '',
}
]
# Phone home
# - Har har.
et = boto.connect_elastictranscoder()
# Create the job
# - If successful, this will execute immediately.
et.create_job(pipeline_id, input_name=input_object, outputs=output_objects)
Obviously, this is a contrived example and just runs from a standalone python script; it assumes you have a .boto file somewhere with your credentials in it.
Another thing to note is the PresetId's; you can find these in the AWS Management Console for Elastic Transcoder, under Presets. Finally, the values that can be stuffed in the dictionaries are lifted verbatim from the following link - as far as I can tell, they are just interpolated into a REST call (case sensitive, obviously).
AWS Create Job API

Categories

Resources