I have trained a Logistic Regression model on my local machine. Saved the model using Joblib and tried deploying it on Aws Sagemaker using "Linear-Learner" image.
Facing issues while deployment as the deployment process keeps continuing and the Status is always as "Creating" and does not turn to "InService".
endpoint_name = "DEMO-LogisticEndpoint" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
create_endpoint_response = sm_client.create_endpoint(
EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print(create_endpoint_response["EndpointArn"])
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)
while status == "Creating":
time.sleep(60)
resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)
The while loop keeps executing and the status never change.
Background: What is important to understand is that the endpoint runs a container that includes the serving software. Each container expects a certain type of model. You need to make sure you're model and how you package it matches what the container expects.
Two easy paths forward:
Linear-learner is a SageMaker built-in algorithm, so a straight forward path would be to train it in the cloud. See example, making it very easy to deploy.
Use Scikit-learn Logistic Regression]2, which you can train locally and deploy to SageMaker using the scikit-learn container (XGBoost is another easy path).
Otherwise, you can always go more advanced and use any custom algorithm by bringing your own custom algorithm/framework bybringing your own container. Google for existing implementations (e.g., CatBoost/SageMaker).
Related
I am looking at this example to implement the data processing of incoming raw data for a sagemaker endpoint prior to model inference/scoring. This is all great but I have 2 questions:
How can one debug this (e.g can I invoke endpoint without it being exposed as restful API and then use Sagemaker debugger)
Sagemaker can be used "remotely" - e.g. via VSC. Can such a script be uploaded programatically?
Thanks.
Sagemaker Debugger is only to monitor the training jobs.
https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html
I dont think you can use it on Endpoints.
The script that you have provided is used both for training and inference. The container used by the estimator will take care of what functions to run. So it is not possible to debug the script directly. But what are you debugging in the code ? Training part or the inference part ?
While creating the estimator we need to give either the entry_point or the source directory. If you are using the "entry_point" then the value should be relative path to the file, if you are using "source_dir" then you should be able to give an S3 path. So before running the estimator, you can programmatically tar the files and upload it to S3 and then use the S3 path in the estimator.
I have following this tutorial, which is mainly for jupyter notebook, and made some minimal modification for external processing. I've created a project that could prepare my dataset locally, upload it to S3, train, and finally deploy the model predictor to the same bucket. Perfect!
So, after to train and saved it in S3 bucket:
ss_model.fit(inputs=data_channels, logs=True)
it failed while deploying as an endpoint. So, I have found tricks to host an endpoint in many ways, but not from a model already saved in S3. Because in order to host, you probably need to get the estimator, which in normal way is something like:
self.estimator = sagemaker.estimator.Estimator(self.training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
train_volume_size=50,
train_max_run=360000,
output_path=output,
base_job_name='ss-training',
sagemaker_session=sess)
My question is: is there a way to load an estimator from a model saved in S3 (.tar)? Or, anyway, to create an endpoint without train it again?
So, after to run on many pages, just found a clue here. And I finally found out how to load the model and create the endpoint:
def create_endpoint(self):
sess = sagemaker.Session()
training_image = get_image_uri(sess.boto_region_name, 'semantic-segmentation', repo_version="latest")
role = "YOUR_ROLE_ARN_WITH_SAGEMAKER_EXECUTION"
model = "s3://BUCKET/PREFIX/.../output/model.tar.gz"
sm_model = sagemaker.Model(model_data=model, image=training_image, role=role, sagemaker_session=sess)
sm_model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')
Please, do not forget to disable your endpoint after using. This is really important! Endpoints are charged by "running" not only by the use
I hope it also can help you out!
Deploy the model using the following code
model = sagemaker.Model(role=role,
model_data=### s3 location of tar.gz file,
image_uri=### the inference image uri,
sagemaker_session=sagemaker_session,
name=## model name)
model_predictor = model.deploy(initial_instance_count=1,
instance_type=instance_type)
Initialize the predictor
model_predictor = sagemaker.Predictor(endpoint_name= model.endpoint_name)
Finally predict using
model_predictor.predict(##your payload)
I have trained and deployed a model in Pytorch with Sagemaker. I am able to call the endpoint and get a prediction. I am using the default input_fn() function (i.e. not defined in my serve.py).
model = PyTorchModel(model_data=trained_model_location,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='source')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
A prediction can be made as follows:
input ="0.12787057, 1.0612601, -1.1081504"
predictor.predict(np.genfromtxt(StringIO(input), delimiter=",").reshape(1,3) )
I want to be able to serve the model with REST API and am HTTP POST using lambda and API gateway. I was able to use invoke_endpoint() for this with an XGBOOST model in Sagemaker this way. I am not sure what to send into the body for Pytorch.
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(EndpointName=ENDPOINT ,
ContentType='text/csv',
Body=???)
I believe I need to understand how to write the customer input_fn to accept and process the type of data I am able to send through invoke_client. Am I on the right track and if so, how could the input_fn be written to accept a csv from invoke_endpoint?
Yes you are on the right track. You can send csv-serialized input to the endpoint without using the predictor from the SageMaker SDK, and using other SDKs such as boto3 which is installed in lambda:
import boto3
runtime = boto3.client('sagemaker-runtime')
payload = '0.12787057, 1.0612601, -1.1081504'
response = runtime.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload.encode('utf-8'))
result = json.loads(response['Body'].read().decode())
This will pass to the endpoint a csv-formatted input, that you may need to reshape back in the input_fn to put in the appropriate dimension expected by the model.
for example:
def input_fn(request_body, request_content_type):
if request_content_type == 'text/csv':
return torch.from_numpy(
np.genfromtxt(StringIO(request_body), delimiter=',').reshape(1,3))
Note: I wasn't able to test the specific input_fn above with your input content and shape but I used the approach on Sklearn RandomForest couple times, and looking at the Pytorch SageMaker serving doc the above rationale should work.
Don't hesitate to use endpoint logs in Cloudwatch to diagnose any inference error (available from the endpoint UI in the console), those logs are usually much more verbose that the high-level logs returned by the inference SDKs
I have built an ML model (using the sklearn module), and I want to serve it predictions via AWS API Gateway + Lambda function.
My problems are:
I can't install sklearn + numpy etc. because of the lambda capacity limitations. (the bundle is greater than 140MB)
Maybe that a silly question, but, do you know if there are better ways to do that task?
I've tried this tutorial, in order to reduce the bundle size. However, it raises an exception because of the --use-wheel flag.
https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/
bucket = s3.Bucket(os.environ['BUCKET'])
model_stream = bucket.Object(os.environ['MODEL_NAME'])
model = pickle.loads(model_stream)
model.predict(z_features)[0]
Where z_features are my features after using a scalar
Just figure it out!
The solution basically stands on top of the AWS Lambda Layers.
I created a sklearn layer that contains only the relevant compiled libraries.
Then, I run sls package in order to pack a bundle which contains those files together with my own handler.py code.
The last step was to run
sls deploy --package .serverless
Hope it'll be helpful to others.
If you simply want to serve your sklearn model, you could skip the hassle of setting up a lambda function and tinkering with the API Gateway - just upload your model as a pkl file to FlashAI.io which will serve your model automatically for free. It handles high traffic environments and unlimited inference requests. For sklearn models specifically, just check out the user guide and within 5 minutes you'll have your model available as an API.
Disclaimer: I'm the author of this service
I developed a machine learning model locally and wanted to deploy it as web service using Azure Functions.
At first model was save to binary using pickle module and then uploaded into blob storage associated with Azure Function (python language, http-trigger service) instance.
Then, after installing all required modules, following code was used to make a predicition on sample data:
import json
import pickle
import time
postreqdata = json.loads(open(os.environ['req']).read())
with open('D:/home/site/wwwroot/functionName/model_test.pkl', 'rb') as txt:
mod = txt.read()
txt.close()
model = pickle.loads(mod)
scs= [[postreqdata['var_1'],postreqdata['var_2']]]
prediciton = model.predict_proba(scs)[0][1]
response = open(os.environ['res'], 'w')
response.write(str(prediciton))
response.close()
where: predict_proba is a method of a trained model used for training and scs variable is definied for extracting particular values (values of variables for the model) from POST request.
Whole code works fine, predictions are send as a response, values are correct but the execution after sending a request lasts for 150 seconds! (locally it's less than 1s). Moreover, after trying to measure which part of a code takes so long: it's 10th line (pickle.loads(mod)).
Do you have any ideas why it takes such a huge amount of time? Model size is very small (few kB).
Thanks
When a call is made to the AML the first call must warm up the
container. By default a web service has 20 containers. Each container
is cold, and a cold container can cause a large(30 sec) delay.
I suggest you referring to this thread Azure Machine Learning Request Response latency and this article about Azure ML performance.
Hope it helps you.