I want to retrieve the pickle off my trained model, which I know is in the run file inside my experiments in Databricks.
It seems that the mlflow.pyfunc.load_model can only do the predict method.
There is an option to directly access the pickle?
I also tried to use the path in the run using the pickle.load(path) (example of path: dbfs:/databricks/mlflow-tracking/20526156406/92f3ec23bf614c9d934dd0195/artifacts/model/model.pkl).
Use the frmwk's native load_model() method (e.g. sklearn.load_model()) or download_artifacts()
I recently found the solution which can be done by the following two approaches:
Use the customized predict function at the moment of saving the model (check databricks documentation for more details).
example give by Databricks
class AddN(mlflow.pyfunc.PythonModel):
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
# Construct and save the model
model_path = "add_n_model"
add5_model = AddN(n=5)
mlflow.pyfunc.save_model(path=model_path, python_model=add5_model)
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)
Load the model artefacts as we are downloading the artefact:
from mlflow.tracking import MlflowClient
client = MlflowClient()
tmp_path = client.download_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path='model/model.pkl')
f = open(tmp_path,'rb')
model = pickle.load(f)
f.close()
client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="")
client.list_artifacts(run_id="0c7946c81fb64952bc8ccb3c7c66bca3", path="model")
Related
I am trying to create a wrapper function that allows my Data Scientists to log their models in MLflow.
This is what the function looks like,
def log_model(self, params, metrics, model, run_name, artifact_path, artifacts=None):
with mlflow.start_run(run_name=run_name):
run_id = mlflow.active_run().info.run_id
mlflow.log_params(params)
mlflow.log_metrics(metrics)
if model:
mlflow.lightgbm.log_model(model, artifact_path=artifact_path)
if artifacts:
for artifact in artifacts:
mlflow.log_artifact(artifact, artifact_path=artifact_path)
return run_id
It can be seen here that the model is being logged as a lightgbm model, however, the model parameter that is passed into this function can be of any type.
How can I update this function, so that it will be able to log any kind of model?
As far as I know, there is no log_model function that comes with mlflow. It's always mlflow.<model_type>.log_model.
How can I go about handling this?
I was able to solve this using the following approach,
def log_model(model, artifact_path):
model_class = get_model_class(model).split('.')[0]
try:
log_model = getattr(mlflow, model_class).log_model
log_model(model, artifact_path)
except AttributeError:
logger.info('The log_model function is not available as expected!')
def get_model_class(model):
klass = model.__class__
module = klass.__module__
if module == 'builtins':
return klass.__qualname__
return module + '.' + klass.__qualname__
From what I have seen, this will be able to handle most cases. The get_model_class() method will return the class used to develop the model and based on this, we can use the getattr() method to extract the relevant log_model() method.
We are currently moving our models from single model endpoints to multi model endpoints within AWS SageMaker. After deploying the Multi Model Endpoint using prebuilt TensorFlow containers I receive the following error when calling the predict() method:
{"error": "JSON Parse error: The document root must not be followed by other value at offset: 17"}
I invoke the endpoint like this:
data = np.random.rand(n_samples, n_features)
predictor = Predictor(endpoint_name=endpoint_name)
prediction = predictor.predict(data=serializer.serialize(data), target_model=model_name)
My function for processing the input is the following:
def _process_input(data, context):
data = data.read().decode('utf-8')
data = [float(x) for x in data.split(',')]
return json.dumps({'instances': [data]})
For the training I configured my container as follows:
tensorflow_container = TensorFlow(
entry_point=path_script,
framework_version='2.4',
py_version='py37',
instance_type='ml.m4.2xlarge',
instance_count=1,
role=EXECUTION_ROLE,
sagemaker_session=sagemaker_session,
hyperparameters=hyperparameters)
tensorflow_container.fit()
For deploying the endpoint I first initializing a Model from a given Estimator and then a MultiDataModel:
model = estimator.create_model(
role=EXECUTION_ROLE,
image_uri=estimator.training_image_uri(),
entry_point=path_serving)
mdm = MultiDataModel(
name=endpoint_name,
model_data_prefix=dir_model_data,
model=model,
sagemaker_session=sagemaker.Session())
mdm.deploy(
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name)
Afterwards the single models are added using:
mdm.add_model(
model_data_source=source_path,
model_data_path=model_name)
Thank you for any hints and help.
This issue usually occurs in case you either have damaged or malformed JSON data. Recommend you running it past JSON validator https://jsonlint.com/
I work at AWS and my opinions are my own - Thanks,Raghu
When I try to save the PyTorch model with this piece of code:
checkpoint = {'model': Net(), 'state_dict': model.state_dict(),'optimizer' :optimizer.state_dict()}
torch.save(checkpoint, 'Checkpoint.pth')
I get the following error:
E:\PROGRAM FILES\Anaconda\envs\staj_projesi\lib\site-packages\torch\serialization.py:251: UserWarning: Couldn't retrieve source code for container of type Net. It won't be checked for correctness upon loading.
...
"type " + obj.__name__ + ". It won't be checked "
Can't pickle local object 'trainModel.<locals>.Net'
When I try to save the PyTorch model with this piece of code:
checkpoint = {'state_dict': model.state_dict(),'optimizer' :optimizer.state_dict()}
torch.save(checkpoint, 'Checkpoint.pth')
I don't don't get any errors, but I want to save the ANN class. How can I solve this problem? Also, I could save the model with the first structure in the other projects before
You can't! torch.save is saving the objects state_dict() only.
When you use the following:
checkpoint = {'model': Net(), 'state_dict': model.state_dict(),'optimizer' :optimizer.state_dict()}
torch.save(checkpoint, 'Checkpoint.pth')
You are trying to save the model itself, but this data is saved in the model.state_dict() and when loading a model with the state_dict you should first initiate a model object.
This is exactly the reason why the second method works properly:
checkpoint = {'state_dict': model.state_dict(),'optimizer' :optimizer.state_dict()}
torch.save(checkpoint, 'Checkpoint.pth')
I would suggest reading the pytorch docs of how to properly save\load a model in the following link:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
Do the usual proper way to save and load models https://pytorch.org/tutorials/beginner/saving_loading_models.html and if you have args or dicts you want to save and perhaps a lambda function sometimes I use dill and the errors go away. e.g.
def save_for_meta_learning(args, ckpt_filename='ckpt.pt'):
if is_lead_worker(args.rank):
import dill
args.logger.save_current_plots_and_stats()
# - ckpt
assert uutils.xor(args.training_mode == 'epochs', args.training_mode == 'iterations')
args_pickable = uutils.make_args_pickable(args)
# args.meta_learner.args = args_pickable
f: nn.Module = get_model_from_ddp(args.base_model)
# pickle vs torch_uu.save https://discuss.pytorch.org/t/advantages-disadvantages-of-using-pickle-module-to-save-models-vs-torch-save/79016
torch.save({'training_mode': args.training_mode, # its or epochs
'it': args.it,
'epoch_num': args.epoch_num,
# 'args': args_pickable,
'args_pickable': args_pickable,
# 'meta_learner': args.meta_learner,
'meta_learner_str': str(args.meta_learner),
# 'f': f,
'f_state_dict': f.state_dict(),
'f_str': str(f),
# 'f_modules': f._modules,
# 'f_modules_str': str(f._modules),
'outer_opt_state_dict': args.outer_opt.state_dict()
},
pickle_module=dill,
f=args.log_root / ckpt_filename)
I am building a data transformation and training pipeline on Azure Machine Leaning Service. I'd like to save my fitted transformer (e.g. tf-idf) to the blob, so my prediction pipeline can later access it.
transformed_data = PipelineData("transformed_data",
datastore = default_datastore,
output_path_on_compute="my_project/tfidf")
step_tfidf = PythonScriptStep(name = "tfidf_step",
script_name = "transform.py",
arguments = ['--input_data', blob_train_data,
'--output_folder', transformed_data],
inputs = [blob_train_data],
outputs = [transformed_data],
compute_target = aml_compute,
source_directory = project_folder,
runconfig = run_config,
allow_reuse = False)
The above code saves the transformer to a current run's folder, which is dynamically generated during each run.
I want to save the transformer to a fixed location on blob, so I can access it later, when calling a prediction pipeline.
I tried to use an instance of DataReference class as PythonScriptStep output, but it results in an error:
ValueError: Unexpected output type: <class 'azureml.data.data_reference.DataReference'>
It's because PythonScriptStep only accepts PipelineData or OutputPortBinding objects as outputs.
How could I save my fitted transformer so it's later accessible by any aribitraly process (e.g. my prediction pipeline)?
This is likely not flexible enough for your needs (also, I haven't tested this yet), but if you are using scikit-learn one possibility is to include the tf-idf/transformation step into a scikit-learn Pipeline object and register that into your workspace.
Your training script would thus contain:
pipeline = Pipeline([
('vectorizer', TfidfVectorizer(stop_words = list(text.ENGLISH_STOP_WORDS))),
('classifier', SGDClassifier()
])
pipeline.fit(train[label].values, train[pred_label].values)
# Serialize the pipeline
joblib.dump(value=pipeline, filename='outputs/model.pkl')
and your experiment submission script would contain
run = exp.submit(src)
run.wait_for_completion(show_output = True)
model = run.register_model(model_name='my_pipeline', model_path='outputs/model.pkl')
Then, you could use the registered "model" and deploy it as a service as explained in the documentation, by loading it into a scoring script via
model_path = Model.get_model_path('my_pipeline')
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
However this would bake the transformation in your pipeline, and thus would not be as modular as you ask...
Another option will be to use DataTransferStep and use it to copy the output to a "known location." This notebook has examples of using DataTransferStep to copy data from and to various supported datastores.
from azureml.data.data_reference import DataReference
from azureml.exceptions import ComputeTargetException
from azureml.core.compute import ComputeTarget, DataFactoryCompute
from azureml.pipeline.steps import DataTransferStep
blob_datastore = Datastore.get(ws, "workspaceblobstore")
blob_data_ref = DataReference(
datastore=blob_datastore,
data_reference_name="knownloaction",
path_on_datastore="knownloaction")
data_factory_name = 'adftest'
def get_or_create_data_factory(workspace, factory_name):
try:
return DataFactoryCompute(workspace, factory_name)
except ComputeTargetException as e:
if 'ComputeTargetNotFound' in e.message:
print('Data factory not found, creating...')
provisioning_config = DataFactoryCompute.provisioning_configuration()
data_factory = ComputeTarget.create(workspace, factory_name, provisioning_config)
data_factory.wait_for_completion()
return data_factory
else:
raise e
data_factory_compute = get_or_create_data_factory(ws, data_factory_name)
# Assuming output data is your output from the step that you want to copy
transfer_to_known_location = DataTransferStep(
name="transfer_to_known_location",
source_data_reference=[output_data],
destination_data_reference=blob_data_ref,
compute_target=data_factory_compute
)
from azureml.pipeline.core import Pipeline
from azureml.core import Workspace, Experiment
pipeline_01 = Pipeline(
description="transfer_to_known_location",
workspace=ws,
steps=[transfer_to_known_location])
pipeline_run_01 = Experiment(ws, "transfer_to_known_location").submit(pipeline_01)
pipeline_run_01.wait_for_completion()
Another solution is to pass DataReference as an input to your PythonScriptStep.
Then inside transform.py you're able to read this DataReference as a command line argument.
You can parse it and use it just as any regular path to save your vectorizer to.
E.g. you can:
step_tfidf = PythonScriptStep(name = "tfidf_step",
script_name = "transform.py",
arguments = ['--input_data', blob_train_data,
'--output_folder', transformed_data,
'--transformer_path', trained_transformer_path],
inputs = [blob_train_data, trained_transformer_path],
outputs = [transformed_data],
compute_target = aml_compute,
source_directory = project_folder,
runconfig = run_config,
allow_reuse = False)
Then inside your script (transform.py in the example above) you can e.g.:
import argparse
import joblib as jbl
import os
from sklearn.feature_extraction.text import TfidfVectorizer
parser = argparse.ArgumentParser()
parser.add_argument('--transformer_path', dest="transformer_path", required=True)
args = parser.parse_args()
tfidf = ### HERE CREATE AND TRAIN YOUR VECTORIZER ###
vect_filename = os.path.join(args.transformer_path, 'my_vectorizer.jbl')
EXTRA: The third way would be to just register the vectorizer as another model in your workspace. You can then use it exactly as any other registered model. (Though this option does not involve explicit writing to blob - as specified in the question above)
I have data in tf.example form and am attempting to make requests in predict form (using gRPC) to a saved model. I am unable to identify the method call to effect this.
I am starting with the well known Automobile pricing DNN regression model (https://github.com/tensorflow/models/blob/master/samples/cookbook/regression/dnn_regression.py) which I have already exported and mounted via the TF Serving docker container
import grpc
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel("localhost:8500"))
tf_ex = tf.train.Example(
features=tf.train.Features(
feature={
'curb-weight': tf.train.Feature(float_list=tf.train.FloatList(value=[5.1])),
'highway-mpg': tf.train.Feature(float_list=tf.train.FloatList(value=[3.3])),
'body-style': tf.train.Feature(bytes_list=tf.train.BytesList(value=[b"wagon"])),
'make': tf.train.Feature(bytes_list=tf.train.BytesList(value=[b"Honda"])),
}
)
)
request = predict_pb2.PredictRequest()
request.model_spec.name = "regressor_test"
# Tried this:
request.inputs['inputs'].CopyFrom(tf_ex)
# Also tried this:
request.inputs['inputs'].CopyFrom(tf.contrib.util.make_tensor_proto(tf_ex))
# This doesn't work either:
request.input.example_list.examples.extend(tf_ex)
# If it did work, I would like to inference on it like this:
result = self.stub.Predict(request, 10.0)
Thanks for any advice
I assume your savedModel has an serving_input_receiver_fn taking string as input and parse to tf.Example. Using SavedModel with Estimators
def serving_example_input_receiver_fn():
serialized_tf_example = tf.placeholder(dtype=tf.string)
receiver_tensors = {'inputs': serialized_tf_example}
features = tf.parse_example(serialized_tf_example, YOUR_EXAMPLE_SCHEMA)
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
so, serving_input_receiver_fn accepts a string, so you have to SerializeToString your tf.Example(). Besides, serving_input_receiver_fn works like input_fn to training, data dump into model in a batch fashion.
The code may change to :
request = predict_pb2.PredictRequest()
request.model_spec.name = "regressor_test"
request.model_spec.signature_name = 'your method signature, check use saved_model_cli'
request.inputs['inputs'].CopyFrom(tf.make_tensor_proto([tf_ex.SerializeToString()], dtype=types_pb2.DT_STRING))
#hakunami's answer didn't work for me. But when I modify the last line to
request.inputs['inputs'].CopyFrom(tf.make_tensor_proto([tf_ex.SerializeToString()], dtype=types_pb2.DT_STRING),shape=[1])
it works. If "shape" is None, the resulting tensor proto represents the numpy array precisely.enter link description here