I have following this tutorial, which is mainly for jupyter notebook, and made some minimal modification for external processing. I've created a project that could prepare my dataset locally, upload it to S3, train, and finally deploy the model predictor to the same bucket. Perfect!
So, after to train and saved it in S3 bucket:
ss_model.fit(inputs=data_channels, logs=True)
it failed while deploying as an endpoint. So, I have found tricks to host an endpoint in many ways, but not from a model already saved in S3. Because in order to host, you probably need to get the estimator, which in normal way is something like:
self.estimator = sagemaker.estimator.Estimator(self.training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
train_volume_size=50,
train_max_run=360000,
output_path=output,
base_job_name='ss-training',
sagemaker_session=sess)
My question is: is there a way to load an estimator from a model saved in S3 (.tar)? Or, anyway, to create an endpoint without train it again?
So, after to run on many pages, just found a clue here. And I finally found out how to load the model and create the endpoint:
def create_endpoint(self):
sess = sagemaker.Session()
training_image = get_image_uri(sess.boto_region_name, 'semantic-segmentation', repo_version="latest")
role = "YOUR_ROLE_ARN_WITH_SAGEMAKER_EXECUTION"
model = "s3://BUCKET/PREFIX/.../output/model.tar.gz"
sm_model = sagemaker.Model(model_data=model, image=training_image, role=role, sagemaker_session=sess)
sm_model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')
Please, do not forget to disable your endpoint after using. This is really important! Endpoints are charged by "running" not only by the use
I hope it also can help you out!
Deploy the model using the following code
model = sagemaker.Model(role=role,
model_data=### s3 location of tar.gz file,
image_uri=### the inference image uri,
sagemaker_session=sagemaker_session,
name=## model name)
model_predictor = model.deploy(initial_instance_count=1,
instance_type=instance_type)
Initialize the predictor
model_predictor = sagemaker.Predictor(endpoint_name= model.endpoint_name)
Finally predict using
model_predictor.predict(##your payload)
Related
I have trained a Logistic Regression model on my local machine. Saved the model using Joblib and tried deploying it on Aws Sagemaker using "Linear-Learner" image.
Facing issues while deployment as the deployment process keeps continuing and the Status is always as "Creating" and does not turn to "InService".
endpoint_name = "DEMO-LogisticEndpoint" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
create_endpoint_response = sm_client.create_endpoint(
EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print(create_endpoint_response["EndpointArn"])
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)
while status == "Creating":
time.sleep(60)
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)
The while loop keeps executing and the status never change.
Background: What is important to understand is that the endpoint runs a container that includes the serving software. Each container expects a certain type of model. You need to make sure you're model and how you package it matches what the container expects.
Two easy paths forward:
Linear-learner is a SageMaker built-in algorithm, so a straight forward path would be to train it in the cloud. See example, making it very easy to deploy.
Use Scikit-learn Logistic Regression]2, which you can train locally and deploy to SageMaker using the scikit-learn container (XGBoost is another easy path).
Otherwise, you can always go more advanced and use any custom algorithm by bringing your own custom algorithm/framework bybringing your own container. Google for existing implementations (e.g., CatBoost/SageMaker).
Upon deploying a custom pytorch model with the boto3 client in python. I noticed that a new S3 bucket had been created with no visible objects. Is there a reason for this?
The bucket that contained my model was named with the keyword "sagemaker" included, so I don't any issue there.
Here is the code that I used for deployment:
remote_model = PyTorchModel(
name = model_name,
model_data=model_url,
role=role,
sagemaker_session = sess,
entry_point="inference.py",
# image=image,
framework_version="1.5.0",
py_version='py3'
)
remote_predictor = remote_model.deploy(
instance_type='ml.g4dn.xlarge',
initial_instance_count=1,
#update_endpoint = True, # comment or False if endpoint doesns't exist
endpoint_name=endpoint_name, # define a unique endpoint name; if ommited, Sagemaker will generate it based on used container
wait=True
)
It was likely created as a default bucket by the SageMaker Python SDK. Note that the code you wrote about is not boto3 (AWS python SDK), but sagemaker (link), the SageMaker-specific Python SDK, that is higher-level than boto3.
The SageMaker Python SDK uses S3 at multiple places, for example to stage training code when using a Framework Estimator, and to stage inference code when deployment with a Framework Model (your case). It gives you control of the S3 location to use, but if you don't specify it, it may use an automatically generated bucket, if it has the permissions to do so.
To control code staging S3 location, you can use the parameter code_location in either your PyTorchEstimator (training) or your PyTorchModel (serving)
I have trained and deployed a model in Pytorch with Sagemaker. I am able to call the endpoint and get a prediction. I am using the default input_fn() function (i.e. not defined in my serve.py).
model = PyTorchModel(model_data=trained_model_location,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='source')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
A prediction can be made as follows:
input ="0.12787057, 1.0612601, -1.1081504"
predictor.predict(np.genfromtxt(StringIO(input), delimiter=",").reshape(1,3) )
I want to be able to serve the model with REST API and am HTTP POST using lambda and API gateway. I was able to use invoke_endpoint() for this with an XGBOOST model in Sagemaker this way. I am not sure what to send into the body for Pytorch.
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(EndpointName=ENDPOINT ,
ContentType='text/csv',
Body=???)
I believe I need to understand how to write the customer input_fn to accept and process the type of data I am able to send through invoke_client. Am I on the right track and if so, how could the input_fn be written to accept a csv from invoke_endpoint?
Yes you are on the right track. You can send csv-serialized input to the endpoint without using the predictor from the SageMaker SDK, and using other SDKs such as boto3 which is installed in lambda:
import boto3
runtime = boto3.client('sagemaker-runtime')
payload = '0.12787057, 1.0612601, -1.1081504'
response = runtime.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload.encode('utf-8'))
result = json.loads(response['Body'].read().decode())
This will pass to the endpoint a csv-formatted input, that you may need to reshape back in the input_fn to put in the appropriate dimension expected by the model.
for example:
def input_fn(request_body, request_content_type):
if request_content_type == 'text/csv':
return torch.from_numpy(
np.genfromtxt(StringIO(request_body), delimiter=',').reshape(1,3))
Note: I wasn't able to test the specific input_fn above with your input content and shape but I used the approach on Sklearn RandomForest couple times, and looking at the Pytorch SageMaker serving doc the above rationale should work.
Don't hesitate to use endpoint logs in Cloudwatch to diagnose any inference error (available from the endpoint UI in the console), those logs are usually much more verbose that the high-level logs returned by the inference SDKs
I have trained a model using google cloud AutoML Vision API, however when I specifically try to obtain the model performance metrics via the Python package I keep getting a 403 response:
PermissionDenied: 403 Permission 'automl.modelEvaluations.list' denied on resource 'projects/MY_BUCKET_ID/locations/us-central1/models/MY_MODEL_ID' (or it may not exist).
I am using the python code as layed out in the documentation and also not having any unauthorised ops with the other operations (Create Dataset, Train Model), so really struggling to understand why is this the case. Here is the code:
# Get the full path of the model.
model_full_id = client.model_path(project_id, compute_region, model_id)
print(model_full_id)
# List all the model evaluations in the model by applying filter.
response = client.list_model_evaluations(model_full_id, filter_)
Thanks for your help
After a few tests I found the problem. When calling out the model details you need to use model_id and not model_name, whereas in the previous API calls in the documentation the model_name was the identifier to use.
model_full_id = client.model_path(project_id, compute_region, model_id)
This fixed the issue.
I am currently in the process of trying to deploy a Keras Convolutional Neural Network for a webservice.
I had tried converting my saved keras hdf5 model to a tensorflow.js model and deploying that but it slowed down the client-side app as the model is relatively robust and thus, takes a large amount of space in the client memory.
Thus, I am trying to figure out a way to deploy the model in the cloud and make predictions through a request from the web-app with an image, and then receive a response which holds the prediction tensor. I know that gcloud may have some similar abilities or feature but I am unsure of how to get started.
Essentially, I am asking if there is any service that will allow me to deploy a pre-trained and saved convolutional neural network model to which I can send images in a request and use the model to return a predicted tensor?
You can export a trained Keras model and serve it with TensorFlow Serving. TF Serving allows to host models and call them via either gRPC or REST requests. You could deploy a flask app with an endpoint that accepts an image, wraps it as a payload and calls your model via the requests module.
Your code to export the model as a servable would look like this:
import tensorflow as tf
# The export path contains the name and the version of the model
model = keras.models.load_model('./mymodel.h5')
# Feth the Keras session and save the model
with keras.backend.get_session() as sess:
tf.saved_model.simple_save(
sess,
export_dir,
inputs={'images': model.input},
outputs={t.name:t for t in model.outputs})
This will store the files necessary for TF Serving. From this directory, you can host the model as follows:
tensorflow_model_server --model_base_path=$(pwd) --rest_api_port=9000 --model_name=MyModel
Your request would then look like this:
requests.post('http://ip:9000/v1/models/MyModel:predict', json=payload)
Where payload is a dictionary that contains your request image.
If you want a click-to-deploy solution for serving your model on Google Cloud, consider using Cloud ML Engine's Online Prediction service. First, follow the instructions in #sdcbr's response to export your SavedModel. Copy the model to GCS then you simply create a model and a version:
gcloud ml-engine models create "my_image_model"
gcloud ml-engine versions create "v1"\
--model "my_image_model" --origin $DEPLOYMENT_SOURCE
Or, even easier, use Cloud Console to do the above with a few clicks!
You will get a serverless REST endpoint that includes authentication and authorization, autoscaling (including scale to zero) as well as logging and monitoring, without having to write or maintain a line of code.