Amazon SageMaker could not get a response from the endpoint - python

I have built an anomaly detection model using AWS SageMaker inbuilt model: random cut forest.
rcf = RandomCutForest(
role=execution_role,
instance_count=1,
instance_type="ml.m5.xlarge",
num_samples_per_tree=1000,
num_trees=100,
encrypt_inter_container_traffic=True,
enable_network_isolation=True,
enable_sagemaker_metrics=True)
and created the endpoint:-
rcf_inference = rcf.deploy(
initial_instance_count=4, instance_type="ml.m5.xlarge",
endpoint_name='RCF-container2',
enable_network_isolation=True)
But when I tried to get the prediction using the endpoint I am running into the following error:-
results = rcf_inference.predict(df.values)
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Amazon SageMaker could not get a response from the RCF-container2 endpoint. This can occur when CPU or memory utilization is high. To check your utilization, see Amazon CloudWatch. To fix this problem, use an instance type with more CPU capacity or memory."
I have tried with larger cpu instance but still I am getting the same issue. I guess the issue is functional.
Please help.

I would suggest checking the CloudWatch Logs to see if there is any other error that could point to the issue.
I work for AWS and my opinions are my own.

Related

SageMaker Endpoint: ServiceUnavailable 503 when calling the InvokeEndpoint operation

I've deployed a model as SageMaker Endpoint, it worked fine for some time but now when invoking the model through boto3
import boto3
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
EndpointName="my-sagemaker-endpoint",
ContentType="text/csv",
Body=payload,
)
I got the following error
ServiceUnavailable: An error occurred (ServiceUnavailable) when calling the InvokeEndpoint operation (reached max retries: 4): A transient exception occurred while retrieving variant instances. Please try again later.
Researching about this error in SageMaker Documentation it states the following
The request has failed due to a temporary failure of the server.
I've also checked the Instance Metrics in CW and there's nothing unusual.
I'm not sure why this error is happening, any suggestions will be helpful.
TL; DR The error originates because the Instance is unable to retrieve the SageMaker Model artifact from s3.
Explanation
SageMaker Endpoints implement a /ping route which check if model artifact is able to load within the Instance. The model artifacts is first retrieved from s3 and then loaded into the instance. If model is not available on s3 it shows the following error (image below)
As the model artifact can't be retrieved from s3 because it was accidentally deleted, it can't be loaded which raises the error No such file or directory when calling the /ping route to check if the endpoint is healthy (see image below)
This in turn makes the Load Balancer to assume the instance has some problem, blocking you access to it, so when you try to invoke the endpoint you get a 503: Service Unavailable Error
Solution
I worked this out only by redeploying to a new endpoint but this time considering the following:
At least num_instances=2 to guarantee each instance is at a different AZ, and the LB communicates with at least a healthy instance.
Ensure only specific roles have s3:PutObject permission on the s3 model artifacts route models/model-name/version

When calling a SageMaker deploy_endpoint function with an a1.small instance, I'm given an error that I can't open a m5.xlarge instance

So while executing through a notebook generated by Autopilot, I went to execute the final code cell:
pipeline_model.deploy(initial_instance_count=1,
instance_type='a1.small',
endpoint_name=pipeline_model.name,
wait=True)
I get this error
ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'ml.m5.2xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.
The most important part of that is the last line where it mentions resource limits. I'm not trying to open the type of instance it's giving me an error about opening.
Does the endpoint NEED to be on an ml.m5.2xlarge instance? Or is the code acting up?
Thanks in advance guys and gals.
You should use one of on-demand ML hosting instances supported as detailed at this link. I think non-valid instance_type='a1.small' is replaced by a valid one (ml.m5.2xlarge), and that is not in your AWS service quota. The weird part is that seeing instance_type='a1.small' was generated by SageMaker Autopilot.

Sagemaker: Problem with elastic inference when deploying

When executing the deploy code to sagemaker using sagemaker-python-sdk I get error as :
UnexpectedStatusException: Error hosting endpoint tensorflow-inference-eia-XXXX-XX-XX-XX-XX-XX-XXX:
Failed. Reason: The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-eia:1.14
-gpu' does not exist..
The code that I am using to deploy is as:
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.p2.xlarge', accelerator_type='ml.eia1.medium')
If I remove the accelerator_type parameter then the endpoint gets deployed with no errors. Any idea on why this happens? Sagemaker seems to be referring to the image that doesn't exist. How do I fix this?
Also, I made sure that the version is supported from here: https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators'. I am on TensorFlow: 1.14.
Edit:
Turns out, this works:
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge', accelerator_type='ml.eia1.medium')
So, I am guessing that elastic inference is not available for GPU instances?
Note: None of the instances that I deploy my endpoint to is using GPU. (Please suggest some ideas if you are familiar or have made it work.)
Elastic Inference Accelerator (EIA) are designed to be attached to CPU endpoints.

getting a 403 response with Google AutoML vision Python API despite having assigned right Service Account

I have trained a model using google cloud AutoML Vision API, however when I specifically try to obtain the model performance metrics via the Python package I keep getting a 403 response:
PermissionDenied: 403 Permission 'automl.modelEvaluations.list' denied on resource 'projects/MY_BUCKET_ID/locations/us-central1/models/MY_MODEL_ID' (or it may not exist).
I am using the python code as layed out in the documentation and also not having any unauthorised ops with the other operations (Create Dataset, Train Model), so really struggling to understand why is this the case. Here is the code:
# Get the full path of the model.
model_full_id = client.model_path(project_id, compute_region, model_id)
print(model_full_id)
# List all the model evaluations in the model by applying filter.
response = client.list_model_evaluations(model_full_id, filter_)
Thanks for your help
After a few tests I found the problem. When calling out the model details you need to use model_id and not model_name, whereas in the previous API calls in the documentation the model_name was the identifier to use.
model_full_id = client.model_path(project_id, compute_region, model_id)
This fixed the issue.

Google Cloud ML Engine Error 429 Out of Memory

I uploaded my model to ML-engine and when trying to make a prediction I receive the following error:
ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: { "error": {
"code": 429,
"message": "Prediction server is out of memory, possibly because model size is too big.",
"status": "RESOURCE_EXHAUSTED" } }
My model size is 151.1 MB. I already did all the suggested actions from google cloud website such as quantise. Is there a possible solution or any other thing I could do to make it work?
Thanks
Typically a model of this size should not result in OOM. Since TF does a lot of lazy initialization, some OOMs won't be detected until the first request to initialize the data structure. In rare case certain graph can explode 10x in memory causing OOM.
1) Did you see the prediction error consistently? Due to the way Tensorflow schedules nodes the memory usage for the same graph might be different across runs. Make sure to run prediction multiple times and see if it's 429 every time.
2) Please make sure 151.1MB is the size of your SavedModel Directory.
3) You can also debug the peak memory locally, for instance using top when running gcloud ml-engine local predict or by loading the model into memory in a docker container and use docker stats or some other way to monitor memory usage. You can try tensorflow serving for debugging (https://www.tensorflow.org/serving/serving_basic) and post the results.
4) If you find the memory problem is persistent, please contact cloudml-feedback#google.com for further assistance, make sure you include your project number and associated account for further debugging.

Categories

Resources