I have trained and deployed a model in Pytorch with Sagemaker. I am able to call the endpoint and get a prediction. I am using the default input_fn() function (i.e. not defined in my serve.py).
model = PyTorchModel(model_data=trained_model_location,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='source')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
A prediction can be made as follows:
input ="0.12787057, 1.0612601, -1.1081504"
predictor.predict(np.genfromtxt(StringIO(input), delimiter=",").reshape(1,3) )
I want to be able to serve the model with REST API and am HTTP POST using lambda and API gateway. I was able to use invoke_endpoint() for this with an XGBOOST model in Sagemaker this way. I am not sure what to send into the body for Pytorch.
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(EndpointName=ENDPOINT ,
ContentType='text/csv',
Body=???)
I believe I need to understand how to write the customer input_fn to accept and process the type of data I am able to send through invoke_client. Am I on the right track and if so, how could the input_fn be written to accept a csv from invoke_endpoint?
Yes you are on the right track. You can send csv-serialized input to the endpoint without using the predictor from the SageMaker SDK, and using other SDKs such as boto3 which is installed in lambda:
import boto3
runtime = boto3.client('sagemaker-runtime')
payload = '0.12787057, 1.0612601, -1.1081504'
response = runtime.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload.encode('utf-8'))
result = json.loads(response['Body'].read().decode())
This will pass to the endpoint a csv-formatted input, that you may need to reshape back in the input_fn to put in the appropriate dimension expected by the model.
for example:
def input_fn(request_body, request_content_type):
if request_content_type == 'text/csv':
return torch.from_numpy(
np.genfromtxt(StringIO(request_body), delimiter=',').reshape(1,3))
Note: I wasn't able to test the specific input_fn above with your input content and shape but I used the approach on Sklearn RandomForest couple times, and looking at the Pytorch SageMaker serving doc the above rationale should work.
Don't hesitate to use endpoint logs in Cloudwatch to diagnose any inference error (available from the endpoint UI in the console), those logs are usually much more verbose that the high-level logs returned by the inference SDKs
Related
I have saved and loaded model(IMAGE CLASSIFICATION) which is in pickle format using mlflow.
I have served model using mlflow model serve and REST endpoint has been exposed, i am getting reponse while providing image in numpy arrat,but it is not accepting the image directly , i need to apple the preprocessor befor calling the predict method,
Can anyone tell me how can i achieve this
A bit confused with automatisation of Sagemaker retraining the model.
Currently I have a notebook instance with Sagemaker LinearLerner model making the classification task. So using Estimator I'm making training, then deploying the model creating Endpoint. Afterwards using Lambda function for invoke this endpoint, I add it to the API Gateway receiving the api endpoint which can be used for POST requests and sending back response with class.
Now I'm facing with the problem of retraining. For that I use serverless approach and lambda function getting environment variables for training_jobs. But the problem that Sagemaker not allow to rewrite training job and you can only create new one. My goal is to automatise the part when the new training job and the new endpoint config will apply to the existing endpoint that I don't need to change anything in API gateway. Is that somehow possible to automatically attach new endpoint config with existing endpoint?
Thanks
Yes, use the UpdateEndpoint endpoint. However, if you are using the Python Sagemaker SDK, be aware, there might be some documentation floating around asking you to call
model.deploy(..., update_endpoint=True)
This is apparently now deprecated in v2 of the Sagemaker SDK:
You should instead use the Predictor class to perform this update:
from sagemaker.predictor import Predictor
predictor = Predictor(endpoint_name="YOUR-ENDPOINT-NAME", sagemaker_session=sagemaker_session_object)
predictor.update_endpoint(instance_type="ml.t2.large", initial_instance_count=1)
If I am understanding the question correctly, you should be able to use CreateEndpointConfig near the end of the training job, then use UpdateEndpoint:
Deploys the new EndpointConfig specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous EndpointConfig (there is no availability loss).
If the API Gateway / Lambda is routed via the endpoint ARN, that should not change after using UpdateEndpoint.
I have following this tutorial, which is mainly for jupyter notebook, and made some minimal modification for external processing. I've created a project that could prepare my dataset locally, upload it to S3, train, and finally deploy the model predictor to the same bucket. Perfect!
So, after to train and saved it in S3 bucket:
ss_model.fit(inputs=data_channels, logs=True)
it failed while deploying as an endpoint. So, I have found tricks to host an endpoint in many ways, but not from a model already saved in S3. Because in order to host, you probably need to get the estimator, which in normal way is something like:
self.estimator = sagemaker.estimator.Estimator(self.training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
train_volume_size=50,
train_max_run=360000,
output_path=output,
base_job_name='ss-training',
sagemaker_session=sess)
My question is: is there a way to load an estimator from a model saved in S3 (.tar)? Or, anyway, to create an endpoint without train it again?
So, after to run on many pages, just found a clue here. And I finally found out how to load the model and create the endpoint:
def create_endpoint(self):
sess = sagemaker.Session()
training_image = get_image_uri(sess.boto_region_name, 'semantic-segmentation', repo_version="latest")
role = "YOUR_ROLE_ARN_WITH_SAGEMAKER_EXECUTION"
model = "s3://BUCKET/PREFIX/.../output/model.tar.gz"
sm_model = sagemaker.Model(model_data=model, image=training_image, role=role, sagemaker_session=sess)
sm_model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')
Please, do not forget to disable your endpoint after using. This is really important! Endpoints are charged by "running" not only by the use
I hope it also can help you out!
Deploy the model using the following code
model = sagemaker.Model(role=role,
model_data=### s3 location of tar.gz file,
image_uri=### the inference image uri,
sagemaker_session=sagemaker_session,
name=## model name)
model_predictor = model.deploy(initial_instance_count=1,
instance_type=instance_type)
Initialize the predictor
model_predictor = sagemaker.Predictor(endpoint_name= model.endpoint_name)
Finally predict using
model_predictor.predict(##your payload)
I have trained a model using google cloud AutoML Vision API, however when I specifically try to obtain the model performance metrics via the Python package I keep getting a 403 response:
PermissionDenied: 403 Permission 'automl.modelEvaluations.list' denied on resource 'projects/MY_BUCKET_ID/locations/us-central1/models/MY_MODEL_ID' (or it may not exist).
I am using the python code as layed out in the documentation and also not having any unauthorised ops with the other operations (Create Dataset, Train Model), so really struggling to understand why is this the case. Here is the code:
# Get the full path of the model.
model_full_id = client.model_path(project_id, compute_region, model_id)
print(model_full_id)
# List all the model evaluations in the model by applying filter.
response = client.list_model_evaluations(model_full_id, filter_)
Thanks for your help
After a few tests I found the problem. When calling out the model details you need to use model_id and not model_name, whereas in the previous API calls in the documentation the model_name was the identifier to use.
model_full_id = client.model_path(project_id, compute_region, model_id)
This fixed the issue.
I am currently in the process of trying to deploy a Keras Convolutional Neural Network for a webservice.
I had tried converting my saved keras hdf5 model to a tensorflow.js model and deploying that but it slowed down the client-side app as the model is relatively robust and thus, takes a large amount of space in the client memory.
Thus, I am trying to figure out a way to deploy the model in the cloud and make predictions through a request from the web-app with an image, and then receive a response which holds the prediction tensor. I know that gcloud may have some similar abilities or feature but I am unsure of how to get started.
Essentially, I am asking if there is any service that will allow me to deploy a pre-trained and saved convolutional neural network model to which I can send images in a request and use the model to return a predicted tensor?
You can export a trained Keras model and serve it with TensorFlow Serving. TF Serving allows to host models and call them via either gRPC or REST requests. You could deploy a flask app with an endpoint that accepts an image, wraps it as a payload and calls your model via the requests module.
Your code to export the model as a servable would look like this:
import tensorflow as tf
# The export path contains the name and the version of the model
model = keras.models.load_model('./mymodel.h5')
# Feth the Keras session and save the model
with keras.backend.get_session() as sess:
tf.saved_model.simple_save(
sess,
export_dir,
inputs={'images': model.input},
outputs={t.name:t for t in model.outputs})
This will store the files necessary for TF Serving. From this directory, you can host the model as follows:
tensorflow_model_server --model_base_path=$(pwd) --rest_api_port=9000 --model_name=MyModel
Your request would then look like this:
requests.post('http://ip:9000/v1/models/MyModel:predict', json=payload)
Where payload is a dictionary that contains your request image.
If you want a click-to-deploy solution for serving your model on Google Cloud, consider using Cloud ML Engine's Online Prediction service. First, follow the instructions in #sdcbr's response to export your SavedModel. Copy the model to GCS then you simply create a model and a version:
gcloud ml-engine models create "my_image_model"
gcloud ml-engine versions create "v1"\
--model "my_image_model" --origin $DEPLOYMENT_SOURCE
Or, even easier, use Cloud Console to do the above with a few clicks!
You will get a serverless REST endpoint that includes authentication and authorization, autoscaling (including scale to zero) as well as logging and monitoring, without having to write or maintain a line of code.