I am currently in the process of trying to deploy a Keras Convolutional Neural Network for a webservice.
I had tried converting my saved keras hdf5 model to a tensorflow.js model and deploying that but it slowed down the client-side app as the model is relatively robust and thus, takes a large amount of space in the client memory.
Thus, I am trying to figure out a way to deploy the model in the cloud and make predictions through a request from the web-app with an image, and then receive a response which holds the prediction tensor. I know that gcloud may have some similar abilities or feature but I am unsure of how to get started.
Essentially, I am asking if there is any service that will allow me to deploy a pre-trained and saved convolutional neural network model to which I can send images in a request and use the model to return a predicted tensor?
You can export a trained Keras model and serve it with TensorFlow Serving. TF Serving allows to host models and call them via either gRPC or REST requests. You could deploy a flask app with an endpoint that accepts an image, wraps it as a payload and calls your model via the requests module.
Your code to export the model as a servable would look like this:
import tensorflow as tf
# The export path contains the name and the version of the model
model = keras.models.load_model('./mymodel.h5')
# Feth the Keras session and save the model
with keras.backend.get_session() as sess:
tf.saved_model.simple_save(
sess,
export_dir,
inputs={'images': model.input},
outputs={t.name:t for t in model.outputs})
This will store the files necessary for TF Serving. From this directory, you can host the model as follows:
tensorflow_model_server --model_base_path=$(pwd) --rest_api_port=9000 --model_name=MyModel
Your request would then look like this:
requests.post('http://ip:9000/v1/models/MyModel:predict', json=payload)
Where payload is a dictionary that contains your request image.
If you want a click-to-deploy solution for serving your model on Google Cloud, consider using Cloud ML Engine's Online Prediction service. First, follow the instructions in #sdcbr's response to export your SavedModel. Copy the model to GCS then you simply create a model and a version:
gcloud ml-engine models create "my_image_model"
gcloud ml-engine versions create "v1"\
--model "my_image_model" --origin $DEPLOYMENT_SOURCE
Or, even easier, use Cloud Console to do the above with a few clicks!
You will get a serverless REST endpoint that includes authentication and authorization, autoscaling (including scale to zero) as well as logging and monitoring, without having to write or maintain a line of code.
Related
I have saved and loaded model(IMAGE CLASSIFICATION) which is in pickle format using mlflow.
I have served model using mlflow model serve and REST endpoint has been exposed, i am getting reponse while providing image in numpy arrat,but it is not accepting the image directly , i need to apple the preprocessor befor calling the predict method,
Can anyone tell me how can i achieve this
I have following this tutorial, which is mainly for jupyter notebook, and made some minimal modification for external processing. I've created a project that could prepare my dataset locally, upload it to S3, train, and finally deploy the model predictor to the same bucket. Perfect!
So, after to train and saved it in S3 bucket:
ss_model.fit(inputs=data_channels, logs=True)
it failed while deploying as an endpoint. So, I have found tricks to host an endpoint in many ways, but not from a model already saved in S3. Because in order to host, you probably need to get the estimator, which in normal way is something like:
self.estimator = sagemaker.estimator.Estimator(self.training_image,
role,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
train_volume_size=50,
train_max_run=360000,
output_path=output,
base_job_name='ss-training',
sagemaker_session=sess)
My question is: is there a way to load an estimator from a model saved in S3 (.tar)? Or, anyway, to create an endpoint without train it again?
So, after to run on many pages, just found a clue here. And I finally found out how to load the model and create the endpoint:
def create_endpoint(self):
sess = sagemaker.Session()
training_image = get_image_uri(sess.boto_region_name, 'semantic-segmentation', repo_version="latest")
role = "YOUR_ROLE_ARN_WITH_SAGEMAKER_EXECUTION"
model = "s3://BUCKET/PREFIX/.../output/model.tar.gz"
sm_model = sagemaker.Model(model_data=model, image=training_image, role=role, sagemaker_session=sess)
sm_model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')
Please, do not forget to disable your endpoint after using. This is really important! Endpoints are charged by "running" not only by the use
I hope it also can help you out!
Deploy the model using the following code
model = sagemaker.Model(role=role,
model_data=### s3 location of tar.gz file,
image_uri=### the inference image uri,
sagemaker_session=sagemaker_session,
name=## model name)
model_predictor = model.deploy(initial_instance_count=1,
instance_type=instance_type)
Initialize the predictor
model_predictor = sagemaker.Predictor(endpoint_name= model.endpoint_name)
Finally predict using
model_predictor.predict(##your payload)
I have trained my Keras model in google colab and saved the model as model.h5 file in google drive. I made the model.5 file accessible through a public URL. Now I want to load the model with load_model() from this URL. I am getting an error *File system scheme 'https' not implemented
Here is my code.
model=load_model(url)
It would great if I could get a possible solution for this.
I have trained and deployed a model in Pytorch with Sagemaker. I am able to call the endpoint and get a prediction. I am using the default input_fn() function (i.e. not defined in my serve.py).
model = PyTorchModel(model_data=trained_model_location,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='source')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
A prediction can be made as follows:
input ="0.12787057, 1.0612601, -1.1081504"
predictor.predict(np.genfromtxt(StringIO(input), delimiter=",").reshape(1,3) )
I want to be able to serve the model with REST API and am HTTP POST using lambda and API gateway. I was able to use invoke_endpoint() for this with an XGBOOST model in Sagemaker this way. I am not sure what to send into the body for Pytorch.
client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(EndpointName=ENDPOINT ,
ContentType='text/csv',
Body=???)
I believe I need to understand how to write the customer input_fn to accept and process the type of data I am able to send through invoke_client. Am I on the right track and if so, how could the input_fn be written to accept a csv from invoke_endpoint?
Yes you are on the right track. You can send csv-serialized input to the endpoint without using the predictor from the SageMaker SDK, and using other SDKs such as boto3 which is installed in lambda:
import boto3
runtime = boto3.client('sagemaker-runtime')
payload = '0.12787057, 1.0612601, -1.1081504'
response = runtime.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload.encode('utf-8'))
result = json.loads(response['Body'].read().decode())
This will pass to the endpoint a csv-formatted input, that you may need to reshape back in the input_fn to put in the appropriate dimension expected by the model.
for example:
def input_fn(request_body, request_content_type):
if request_content_type == 'text/csv':
return torch.from_numpy(
np.genfromtxt(StringIO(request_body), delimiter=',').reshape(1,3))
Note: I wasn't able to test the specific input_fn above with your input content and shape but I used the approach on Sklearn RandomForest couple times, and looking at the Pytorch SageMaker serving doc the above rationale should work.
Don't hesitate to use endpoint logs in Cloudwatch to diagnose any inference error (available from the endpoint UI in the console), those logs are usually much more verbose that the high-level logs returned by the inference SDKs
I have built an ML model (using the sklearn module), and I want to serve it predictions via AWS API Gateway + Lambda function.
My problems are:
I can't install sklearn + numpy etc. because of the lambda capacity limitations. (the bundle is greater than 140MB)
Maybe that a silly question, but, do you know if there are better ways to do that task?
I've tried this tutorial, in order to reduce the bundle size. However, it raises an exception because of the --use-wheel flag.
https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/
bucket = s3.Bucket(os.environ['BUCKET'])
model_stream = bucket.Object(os.environ['MODEL_NAME'])
model = pickle.loads(model_stream)
model.predict(z_features)[0]
Where z_features are my features after using a scalar
Just figure it out!
The solution basically stands on top of the AWS Lambda Layers.
I created a sklearn layer that contains only the relevant compiled libraries.
Then, I run sls package in order to pack a bundle which contains those files together with my own handler.py code.
The last step was to run
sls deploy --package .serverless
Hope it'll be helpful to others.
If you simply want to serve your sklearn model, you could skip the hassle of setting up a lambda function and tinkering with the API Gateway - just upload your model as a pkl file to FlashAI.io which will serve your model automatically for free. It handles high traffic environments and unlimited inference requests. For sklearn models specifically, just check out the user guide and within 5 minutes you'll have your model available as an API.
Disclaimer: I'm the author of this service