Receiving parse error from SageMaker Multi Model Endpoint using TensorFlow

Receiving parse error from SageMaker Multi Model Endpoint using TensorFlow - python

We are currently moving our models from single model endpoints to multi model endpoints within AWS SageMaker. After deploying the Multi Model Endpoint using prebuilt TensorFlow containers I receive the following error when calling the predict() method:
{"error": "JSON Parse error: The document root must not be followed by other value at offset: 17"}
I invoke the endpoint like this:
data = np.random.rand(n_samples, n_features)
predictor = Predictor(endpoint_name=endpoint_name)
prediction = predictor.predict(data=serializer.serialize(data), target_model=model_name)
My function for processing the input is the following:
def _process_input(data, context):
data = data.read().decode('utf-8')
data = [float(x) for x in data.split(',')]
return json.dumps({'instances': [data]})
For the training I configured my container as follows:
tensorflow_container = TensorFlow(
entry_point=path_script,
framework_version='2.4',
py_version='py37',
instance_type='ml.m4.2xlarge',
instance_count=1,
role=EXECUTION_ROLE,
sagemaker_session=sagemaker_session,
hyperparameters=hyperparameters)
tensorflow_container.fit()
For deploying the endpoint I first initializing a Model from a given Estimator and then a MultiDataModel:
model = estimator.create_model(
role=EXECUTION_ROLE,
image_uri=estimator.training_image_uri(),
entry_point=path_serving)
mdm = MultiDataModel(
name=endpoint_name,
model_data_prefix=dir_model_data,
model=model,
sagemaker_session=sagemaker.Session())
mdm.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name)
Afterwards the single models are added using:
mdm.add_model(
model_data_source=source_path,
model_data_path=model_name)
Thank you for any hints and help.

This issue usually occurs in case you either have damaged or malformed JSON data. Recommend you running it past JSON validator https://jsonlint.com/
I work at AWS and my opinions are my own - Thanks,Raghu

Related

How can I retrive the model.pkl in the experiment in Databricks

I want to retrieve the pickle off my trained model, which I know is in the run file inside my experiments in Databricks.
It seems that the mlflow.pyfunc.load_model can only do the predict method.
There is an option to directly access the pickle?
I also tried to use the path in the run using the pickle.load(path) (example of path: dbfs:/databricks/mlflow-tracking/20526156406/92f3ec23bf614c9d934dd0195/artifacts/model/model.pkl).

Use the frmwk's native load_model() method (e.g. sklearn.load_model()) or download_artifacts()

I recently found the solution which can be done by the following two approaches:
Use the customized predict function at the moment of saving the model (check databricks documentation for more details).
example give by Databricks
class AddN(mlflow.pyfunc.PythonModel):
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
# Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)
Load the model artefacts as we are downloading the artefact:
from mlflow.tracking import MlflowClient
client = MlflowClient()
tmp_path = client.download_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path='model/model.pkl')
f = open(tmp_path,'rb')
model = pickle.load(f)
f.close()
client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="")
client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="model")

batch predictions google automl via python

I'm pretty new using stackoverflow as well as using the google cloud platform, so apologies if am not asking this question in the right format. I am currently facing an issue with getting the predictions from my model.
I've trained a multilabel automl model on the google cloud platform and and now i want to use that model to score out new data entries.
Since the platform only allows one entry at the same time i want to make use of python to do batch predictions.
I've stored my data entries in seperate .txt files on the google cloud bucket and created a .txt file where i'm listing the gs:// references to those files (like they recommend in the documentation).
I've exported a .json file with my credentials from the service account and specified the id's and paths in my code:
# import API credentials and specify model / path references
path = 'xxx.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path
model_name = 'xxx'
model_id = 'TCN1234567890'
project_id = '1234567890'
model_full_id = f"https://eu-automl.googleapis.com/v1/projects/{project_id}/locations/eu/models/{model_id}"
input_uri = f"gs://bucket_name/{model_name}/file_list.txt"
output_uri = f"gs://bucket_name/{model_name}/outputs/"
prediction_client = automl.PredictionServiceClient()
And then i'm running the following code to get the predictions:
# score batch of file_list
gcs_source = automl.GcsSource(input_uris=[input_uri])
input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
gcs_destination=gcs_destination
)
response = prediction_client.batch_predict(
name=model_full_id,
input_config=input_config,
output_config=output_config
)
print("Waiting for operation to complete...")
print(
f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)
However, i'm getting the following error: InvalidArgument: 400 Request contains an invalid argument.
Would anyone have a hince what is causing this issue?
Any input would be appreciated! Thanks!

Found the issue!
I needed to set the client to the 'eu' environment first:
options = ClientOptions(api_endpoint='eu-automl.googleapis.com')
prediction_client = automl.PredictionServiceClient(client_options=options)

How to save your fitted transformer into blob, so your prediction pipeline can use it in AML Service?

I am building a data transformation and training pipeline on Azure Machine Leaning Service. I'd like to save my fitted transformer (e.g. tf-idf) to the blob, so my prediction pipeline can later access it.
transformed_data = PipelineData("transformed_data",
datastore = default_datastore,
output_path_on_compute="my_project/tfidf")
step_tfidf = PythonScriptStep(name = "tfidf_step",
script_name = "transform.py",
arguments = ['--input_data', blob_train_data,
'--output_folder', transformed_data],
inputs = [blob_train_data],
outputs = [transformed_data],
compute_target = aml_compute,
source_directory = project_folder,
runconfig = run_config,
allow_reuse = False)
The above code saves the transformer to a current run's folder, which is dynamically generated during each run.
I want to save the transformer to a fixed location on blob, so I can access it later, when calling a prediction pipeline.
I tried to use an instance of DataReference class as PythonScriptStep output, but it results in an error:
ValueError: Unexpected output type: <class 'azureml.data.data_reference.DataReference'>
It's because PythonScriptStep only accepts PipelineData or OutputPortBinding objects as outputs.
How could I save my fitted transformer so it's later accessible by any aribitraly process (e.g. my prediction pipeline)?

This is likely not flexible enough for your needs (also, I haven't tested this yet), but if you are using scikit-learn one possibility is to include the tf-idf/transformation step into a scikit-learn Pipeline object and register that into your workspace.
Your training script would thus contain:
pipeline = Pipeline([
('vectorizer', TfidfVectorizer(stop_words = list(text.ENGLISH_STOP_WORDS))),
('classifier', SGDClassifier()
])
pipeline.fit(train[label].values, train[pred_label].values)
# Serialize the pipeline
joblib.dump(value=pipeline, filename='outputs/model.pkl')
and your experiment submission script would contain
run = exp.submit(src)
run.wait_for_completion(show_output = True)
model = run.register_model(model_name='my_pipeline', model_path='outputs/model.pkl')
Then, you could use the registered "model" and deploy it as a service as explained in the documentation, by loading it into a scoring script via
model_path = Model.get_model_path('my_pipeline')
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
However this would bake the transformation in your pipeline, and thus would not be as modular as you ask...

Another option will be to use DataTransferStep and use it to copy the output to a "known location." This notebook has examples of using DataTransferStep to copy data from and to various supported datastores.
from azureml.data.data_reference import DataReference
from azureml.exceptions import ComputeTargetException
from azureml.core.compute import ComputeTarget, DataFactoryCompute
from azureml.pipeline.steps import DataTransferStep
blob_datastore = Datastore.get(ws, "workspaceblobstore")
blob_data_ref = DataReference(
datastore=blob_datastore,
data_reference_name="knownloaction",
path_on_datastore="knownloaction")
data_factory_name = 'adftest'
def get_or_create_data_factory(workspace, factory_name):
try:
return DataFactoryCompute(workspace, factory_name)
except ComputeTargetException as e:
if 'ComputeTargetNotFound' in e.message:
print('Data factory not found, creating...')
provisioning_config = DataFactoryCompute.provisioning_configuration()
data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)
data_factory.wait_for_completion()
return data_factory
else:
raise e
data_factory_compute = get_or_create_data_factory(ws, data_factory_name)
# Assuming output data is your output from the step that you want to copy
transfer_to_known_location = DataTransferStep(
name="transfer_to_known_location",
source_data_reference=[output_data],
destination_data_reference=blob_data_ref,
compute_target=data_factory_compute
)
from azureml.pipeline.core import Pipeline
from azureml.core import Workspace, Experiment
pipeline_01 = Pipeline(
description="transfer_to_known_location",
workspace=ws,
steps=[transfer_to_known_location])
pipeline_run_01 = Experiment(ws, "transfer_to_known_location").submit(pipeline_01)
pipeline_run_01.wait_for_completion()

Another solution is to pass DataReference as an input to your PythonScriptStep.
Then inside transform.py you're able to read this DataReference as a command line argument.
You can parse it and use it just as any regular path to save your vectorizer to.
E.g. you can:
step_tfidf = PythonScriptStep(name = "tfidf_step",
script_name = "transform.py",
arguments = ['--input_data', blob_train_data,
'--output_folder', transformed_data,
'--transformer_path', trained_transformer_path],
inputs = [blob_train_data, trained_transformer_path],
outputs = [transformed_data],
compute_target = aml_compute,
source_directory = project_folder,
runconfig = run_config,
allow_reuse = False)
Then inside your script (transform.py in the example above) you can e.g.:
import argparse
import joblib as jbl
import os
from sklearn.feature_extraction.text import TfidfVectorizer
parser = argparse.ArgumentParser()
parser.add_argument('--transformer_path', dest="transformer_path", required=True)
args = parser.parse_args()
tfidf = ### HERE CREATE AND TRAIN YOUR VECTORIZER ###
vect_filename = os.path.join(args.transformer_path, 'my_vectorizer.jbl')
EXTRA: The third way would be to just register the vectorizer as another model in your workspace. You can then use it exactly as any other registered model. (Though this option does not involve explicit writing to blob - as specified in the question above)

How to send a tf.example into a TensorFlow Serving gRPC predict request

I have data in tf.example form and am attempting to make requests in predict form (using gRPC) to a saved model. I am unable to identify the method call to effect this.
I am starting with the well known Automobile pricing DNN regression model (https://github.com/tensorflow/models/blob/master/samples/cookbook/regression/dnn_regression.py) which I have already exported and mounted via the TF Serving docker container
import grpc
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel("localhost:8500"))
tf_ex = tf.train.Example(
features=tf.train.Features(
feature={
'curb-weight': tf.train.Feature(float_list=tf.train.FloatList(value=[5.1])),
'highway-mpg': tf.train.Feature(float_list=tf.train.FloatList(value=[3.3])),
'body-style': tf.train.Feature(bytes_list=tf.train.BytesList(value=[b"wagon"])),
'make': tf.train.Feature(bytes_list=tf.train.BytesList(value=[b"Honda"])),
}
)
)
request = predict_pb2.PredictRequest()
request.model_spec.name = "regressor_test"
# Tried this:
request.inputs['inputs'].CopyFrom(tf_ex)
# Also tried this:
request.inputs['inputs'].CopyFrom(tf.contrib.util.make_tensor_proto(tf_ex))
# This doesn't work either:
request.input.example_list.examples.extend(tf_ex)
# If it did work, I would like to inference on it like this:
result = self.stub.Predict(request, 10.0)
Thanks for any advice

I assume your savedModel has an serving_input_receiver_fn taking string as input and parse to tf.Example. Using SavedModel with Estimators
def serving_example_input_receiver_fn():
serialized_tf_example = tf.placeholder(dtype=tf.string)
receiver_tensors = {'inputs': serialized_tf_example}
features = tf.parse_example(serialized_tf_example, YOUR_EXAMPLE_SCHEMA)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
so, serving_input_receiver_fn accepts a string, so you have to SerializeToString your tf.Example(). Besides, serving_input_receiver_fn works like input_fn to training, data dump into model in a batch fashion.
The code may change to :
request = predict_pb2.PredictRequest()
request.model_spec.name = "regressor_test"
request.model_spec.signature_name = 'your method signature, check use saved_model_cli'
request.inputs['inputs'].CopyFrom(tf.make_tensor_proto([tf_ex.SerializeToString()], dtype=types_pb2.DT_STRING))

#hakunami's answer didn't work for me. But when I modify the last line to
request.inputs['inputs'].CopyFrom(tf.make_tensor_proto([tf_ex.SerializeToString()], dtype=types_pb2.DT_STRING),shape=[1])
it works. If "shape" is None, the resulting tensor proto represents the numpy array precisely.enter link description here

Sagemaker "Could not find model data" when trying to deploy my model

I have a training script in Sagemaker like,
def train(current_host, hosts, num_cpus, num_gpus, channel_input_dirs, model_dir, hyperparameters, **kwargs):
... Train a network ...
return net
def save(net, model_dir):
# save the model
logging.info('Saving model')
y = net(mx.sym.var('data'))
y.save('%s/model.json' % model_dir)
net.collect_params().save('%s/model.params' % model_dir)
def model_fn(model_dir):
symbol = mx.sym.load('%s/model.json' % model_dir)
outputs = mx.symbol.softmax(data=symbol, name='softmax_label')
inputs = mx.sym.var('data')
param_dict = gluon.ParameterDict('model_')
net = gluon.SymbolBlock(outputs, inputs, param_dict)
net.load_params('%s/model.params' % model_dir, ctx=mx.cpu())
return net
Most of which I stole from the MNIST Example.
When I train, everything goes fine, but when trying to deploy like,
m = MXNet("lstm_trainer.py",
role=role,
train_instance_count=1,
train_instance_type="ml.c4.xlarge",
hyperparameters={'batch_size': 100,
'epochs': 20,
'learning_rate': 0.1,
'momentum': 0.9,
'log_interval': 100})
m.fit(inputs) # No errors
predictor = m.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
I get, (full output)
INFO:sagemaker:Creating model with name: sagemaker-mxnet-py2-cpu-2018-01-17-20-52-52-599
---------------------------------------------------------------------------
... Stack dump ...
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://sagemaker-us-west-2-01234567890/sagemaker-mxnet-py2-cpu-2018-01-17-20-52-52-599/output/model.tar.gz.
Looking in my S3 bucket s3://sagemaker-us-west-2-01234567890/sagemaker-mxnet-py2-cpu-2018-01-17-20-52-52-599/output/model.tar.gz, I in fact don't see the model.
What am I missing?

When you are calling the training job you should specify the output directory:
#Bucket location where results of model training are saved.
model_artifacts_location = 's3://<bucket-name>/artifacts'
m = MXNet(entry_point='lstm_trainer.py',
role=role,
output_path=model_artifacts_location,
...)
If you don't specify the output directory the function will use a default location, that it might not have the permissions to create or write to.

I have had the same issue using a different Estimator in a very similar way on Sagemaker.
My issue was after the first deployment on a re-deploy I had to delete the old "Endpoint Configuration" - which was confusingly pointing the endpoint to an old model location. I imagine this could be done from python using the AWS API, although very easy to test on the portal if this is the same issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Receiving parse error from SageMaker Multi Model Endpoint using TensorFlow - python

This issue usually occurs in case you either have damaged or malformed JSON data. Recommend you running it past JSON validator https://jsonlint.com/ I work at AWS and my opinions are my own - Thanks,Raghu

Related

How can I retrive the model.pkl in the experiment in Databricks

batch predictions google automl via python

How to save your fitted transformer into blob, so your prediction pipeline can use it in AML Service?

How to send a tf.example into a TensorFlow Serving gRPC predict request

Sagemaker "Could not find model data" when trying to deploy my model

Categories

Resources