Sagemaker: Problem with elastic inference when deploying

Sagemaker: Problem with elastic inference when deploying - python

When executing the deploy code to sagemaker using sagemaker-python-sdk I get error as :
UnexpectedStatusException: Error hosting endpoint tensorflow-inference-eia-XXXX-XX-XX-XX-XX-XX-XXX:
Failed. Reason: The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-eia:1.14
-gpu' does not exist..
The code that I am using to deploy is as:
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.p2.xlarge', accelerator_type='ml.eia1.medium')
If I remove the accelerator_type parameter then the endpoint gets deployed with no errors. Any idea on why this happens? Sagemaker seems to be referring to the image that doesn't exist. How do I fix this?
Also, I made sure that the version is supported from here: https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators'. I am on TensorFlow: 1.14.
Edit:
Turns out, this works:
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge', accelerator_type='ml.eia1.medium')
So, I am guessing that elastic inference is not available for GPU instances?
Note: None of the instances that I deploy my endpoint to is using GPU. (Please suggest some ideas if you are familiar or have made it work.)

Elastic Inference Accelerator (EIA) are designed to be attached to CPU endpoints.

Related

debug and deploy featurizer (data processor for imodel inference) of sagemaker endpoint

I am looking at this example to implement the data processing of incoming raw data for a sagemaker endpoint prior to model inference/scoring. This is all great but I have 2 questions:
How can one debug this (e.g can I invoke endpoint without it being exposed as restful API and then use Sagemaker debugger)
Sagemaker can be used "remotely" - e.g. via VSC. Can such a script be uploaded programatically?
Thanks.

Sagemaker Debugger is only to monitor the training jobs.
https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html
I dont think you can use it on Endpoints.
The script that you have provided is used both for training and inference. The container used by the estimator will take care of what functions to run. So it is not possible to debug the script directly. But what are you debugging in the code ? Training part or the inference part ?
While creating the estimator we need to give either the entry_point or the source directory. If you are using the "entry_point" then the value should be relative path to the file, if you are using "source_dir" then you should be able to give an S3 path. So before running the estimator, you can programmatically tar the files and upload it to S3 and then use the S3 path in the estimator.

On Premise MLOps Pipeline Stack

My motive is to build a MLOps pipeline which is 100% independnt from Cloud service like AWS, GCP and Azure. I have a project for a client in a production factory and would like to build a Camera based Object Tracking ML service for them. I want to build this pipeline in my own server or (on-premise computer). I am really confused with what stacks i should use. I keep ending up with a Cloud component based solution. It would be great to get some advice on what are the components that i can use and preferably open source.

Assuming your main objective is to build a 100% no cloud MLOps pipeline you can do that with mostly open source tech. All of the following can be installed on prem / without cloud services
For Training: You can use whatever you want. I'd recommend Pytorch because it plays nicer with some of the following suggestions, but Tensorflow is also a popular choice.
For CI/CD: if this is going to be on prem and you are going to retrain the model with production data / need to trigger updates to your deployment with each code update you can use Jenkins (open source) or CircleCI (commercial)
For Model Packaging: Chassis (open source) is the only project I am aware of for generically turning AI/ ML model files into something useful that can be run on your intended hardware. It basically takes an AI / ML model file as input and creates a docker image as its output. It's open source and supports Intel, ARM, CPU, and GPU. The website is here: http://www.chassis.ml and the git repo is here: https://github.com/modzy/chassis
For Deployment: Chassis model containers are automatically built with internal gRPC servers that can be deployed locally as docker containers. If you just want to stream a single source of data through them, the SDK has methods for doing that. If you want something that accepts multiple streams or auto scales to available resources on infrastructure you'll need a Kubernetes cluster with a deployment solution like Modzy or KServe. Chassis containers work out of the box with either.
KServe (https://github.com/kserve/kserve) is free, but basically just
gives you a centralized processing platform hosting a bunch of copies
of your running model. It doesn't allow later triage of the model's processing history.
Modzy (https://www.modzy.com/) is commercial, but also adds in all
the RBAC, Job history preservation, auditing, etc. Modzy also has an
edge deployment feature if you want to mange your models centrally, but run them in a distributed manner on the camera hardware instead of on a centralized
server.

As per your requirement for on prem solution, you may go ahead with
Kubeflow ,
Also use the following
default storage class : nfs-provisioner
on prem load balancing : metallb

Amazon SageMaker could not get a response from the endpoint

I have built an anomaly detection model using AWS SageMaker inbuilt model: random cut forest.
rcf = RandomCutForest(
role=execution_role,
instance_count=1,
instance_type="ml.m5.xlarge",
num_samples_per_tree=1000,
num_trees=100,
encrypt_inter_container_traffic=True,
enable_network_isolation=True,
enable_sagemaker_metrics=True)
and created the endpoint:-
rcf_inference = rcf.deploy(
initial_instance_count=4, instance_type="ml.m5.xlarge",
endpoint_name='RCF-container2',
enable_network_isolation=True)
But when I tried to get the prediction using the endpoint I am running into the following error:-
results = rcf_inference.predict(df.values)
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Amazon SageMaker could not get a response from the RCF-container2 endpoint. This can occur when CPU or memory utilization is high. To check your utilization, see Amazon CloudWatch. To fix this problem, use an instance type with more CPU capacity or memory."
I have tried with larger cpu instance but still I am getting the same issue. I guess the issue is functional.
Please help.

I would suggest checking the CloudWatch Logs to see if there is any other error that could point to the issue.
I work for AWS and my opinions are my own.

An alternative to tf.distribute.cluster_resolver.TPUClusterResolver( tpu_name) to be used in Sagemaker?

task : object_detection
environment: AWS sagemaker
instance type: 'ml.p2.xlarge' | num_instances = 1
Main file to be run: original
Problematic code segment from the main file:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
FLAGS.tpu_name)
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)
elif FLAGS.num_workers > 1:
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
else:
strategy = tf.compat.v2.distribute.MirroredStrategy()
Problem : Can't find the proper value to be given as tpu_name argument.
My research on the problem:
According to the tensorflow documentation in tf.distribute.cluster_resolver.TPUClusterResolver, it says that this resolver works only on Google Cloud platform.
This is an implementation of cluster resolvers for the Google Cloud
TPU service.
TPUClusterResolver supports the following distinct environments:
Google Compute Engine Google Kubernetes Engine Google internal
It can be passed into tf.distribute.TPUStrategy to support TF2
training on Cloud TPUs.
But from this issue in github, I found out that a similar code also works in Azure.
My question :
Is there a way I can bypass this resolver and initialize my tpu in sagemaker ?
Even better, if I can find a way to insert the name or url of sagemaker gpu to the resolver and initiate it from there ?

Let me clarify some confusion here. TPUs are only offered on Google Cloud and the TPUClusterResolver implementation queries GCP APIs to get the cluster config for the TPU node. Thus, no you can't use TPUClusterResolver with AWS sagemaker, but you should try it out with TPUs on GCP instead or try find some other documentation on Sagemaker's end on how they enable cluster resolving on their end (if they do).

Updating Sagemaker Endpoint with new Endpoint Configuration

A bit confused with automatisation of Sagemaker retraining the model.
Currently I have a notebook instance with Sagemaker LinearLerner model making the classification task. So using Estimator I'm making training, then deploying the model creating Endpoint. Afterwards using Lambda function for invoke this endpoint, I add it to the API Gateway receiving the api endpoint which can be used for POST requests and sending back response with class.
Now I'm facing with the problem of retraining. For that I use serverless approach and lambda function getting environment variables for training_jobs. But the problem that Sagemaker not allow to rewrite training job and you can only create new one. My goal is to automatise the part when the new training job and the new endpoint config will apply to the existing endpoint that I don't need to change anything in API gateway. Is that somehow possible to automatically attach new endpoint config with existing endpoint?
Thanks

Yes, use the UpdateEndpoint endpoint. However, if you are using the Python Sagemaker SDK, be aware, there might be some documentation floating around asking you to call
model.deploy(..., update_endpoint=True)
This is apparently now deprecated in v2 of the Sagemaker SDK:
You should instead use the Predictor class to perform this update:
from sagemaker.predictor import Predictor
predictor = Predictor(endpoint_name="YOUR-ENDPOINT-NAME", sagemaker_session=sagemaker_session_object)
predictor.update_endpoint(instance_type="ml.t2.large", initial_instance_count=1)

If I am understanding the question correctly, you should be able to use CreateEndpointConfig near the end of the training job, then use UpdateEndpoint:
Deploys the new EndpointConfig specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous EndpointConfig (there is no availability loss).
If the API Gateway / Lambda is routed via the endpoint ARN, that should not change after using UpdateEndpoint.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.