I am doing an Azure Databricks lab 04. Integrating Azure Databricks and Azure Machine Learning -> 2. Deploying Models in Azure Machine Learning. The idea is to 1. train a model 2) deploy that model in an Azure Container Instance (ACI) in AML and 3) make predictions via HTTPS. However, I get an error when deploying the model.
The full code from the notebook is displayed at the bottom or can be found here: https://adb-4934989010098757.17.azuredatabricks.net/?o=4934989010098757#notebook/4364513836468644/command/4364513836468645 .
I run the actual model deployment in the following way:
aci_service_name='nyc-taxi-service'
service = Model.deploy(workspace=ws,
name=aci_service_name,
models=[registered_model],
inference_config=inference_config,
deployment_config= aci_config,
overwrite=True)
service.wait_for_deployment(show_output=True)
print(service.state)
After running the model deployment, the cell runs for over 25 minutes and breaks when checking the status of the inference endpoint. It gives the following error: "
"Service deployment polling reached non-successful terminal state, current service state: Failed
code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
The scoring script looks like this:
script_dir = 'scripts'
dbutils.fs.mkdirs(script_dir)
script_dir_path = os.path.join('/dbfs', script_dir)
print("Script directory path:", script_dir_path)
%%writefile $script_dir_path/score.py
import json
import numpy as np
import pandas as pd
import sklearn
import joblib
from azureml.core.model import Model
columns = ['passengerCount', 'tripDistance', 'hour_of_day', 'day_of_week',
'month_num', 'normalizeHolidayName', 'isPaidTimeOff', 'snowDepth',
'precipTime', 'precipDepth', 'temperature']
def init():
global model
model_path = Model.get_model_path('nyc-taxi-fare')
model = joblib.load(model_path)
print('model loaded')
def run(input_json):
# Get predictions and explanations for each data point
inputs = json.loads(input_json)
data_df = pd.DataFrame(np.array(inputs).reshape(-1, len(columns)), columns = columns)
# Make prediction
predictions = model.predict(data_df)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist()}
Does someone know how I could fix this potentially? Thanks in advance!
The full code is displayed below:
**Required Libraries**:
* `azureml-sdk[databricks]` via PyPI
* `sklearn-pandas==2.1.0` via PyPI
* `azureml-mlflow` via PyPI
import os
import numpy as np
import pandas as pd
import pickle
import sklearn
import joblib
import math
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn_pandas import DataFrameMapper
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib
import matplotlib.pyplot as plt
import azureml
from azureml.core import Workspace, Experiment, Run
from azureml.core.model import Model
print('The azureml.core version is {}'.format(azureml.core.VERSION))
%md
### Connect to the AML workspace
%md
In the following cell, be sure to set the values for `subscription_id`, `resource_group`, and `workspace_name` as directed by the comments. Please note, you can copy the subscription ID and resource group name from the **Overview** page on the blade for the Azure ML workspace in the Azure portal.
#Provide the Subscription ID of your existing Azure subscription
subscription_id = " ..... "
#Replace the name below with the name of your resource group
resource_group = "RG_1"
#Replace the name below with the name of your Azure Machine Learning workspace
workspace_name = "aml-ws"
print("subscription_id:", subscription_id)
print("resource_group:", resource_group)
print("workspace_name:", workspace_name)
%md
**Important Note**: You will be prompted to login in the text that is output below the cell. Be sure to navigate to the URL displayed and enter the code that is provided. Once you have entered the code, return to this notebook and wait for the output to read `Workspace configuration succeeded`.
*Also note that the sign-on link and code only appear the first time in a session. If an authenticated session is already established, you won't be prompted to enter the code and authenticate when creating an instance of the Workspace.*
ws = Workspace(subscription_id, resource_group, workspace_name)
print(ws)
print('Workspace region:', ws.location)
print('Workspace configuration succeeded')
%md
### Load the training data
In this notebook, we will be using a subset of NYC Taxi & Limousine Commission - green taxi trip records available from [Azure Open Datasets]( https://azure.microsoft.com/en-us/services/open-datasets/). The data is enriched with holiday and weather data. Each row of the table represents a taxi ride that includes columns such as number of passengers, trip distance, datetime information, holiday and weather information, and the taxi fare for the trip.
Run the following cell to load the table into a Spark dataframe and reivew the dataframe.
dataset = spark.sql("select * from nyc_taxi_1_csv").toPandas()
display(dataset)
%md
### Use MLflow with Azure Machine Learning for Model Training
In the subsequent cells you will learn to do the following:
- Set up MLflow tracking URI so as to use Azure ML
- Create MLflow experiment – this will create a corresponding experiment in Azure ML Workspace
- Train a model on Azure Databricks cluster while logging metrics and artifacts using MLflow
- Save the trained model to Databricks File System (DBFS)
import mlflow
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
experiment_name = 'MLflow-AML-Exercise'
mlflow.set_experiment(experiment_name)
print("Training model...")
output_folder = 'outputs'
model_file_name = 'nyc-taxi.pkl'
dbutils.fs.mkdirs(output_folder)
model_file_path = os.path.join('/dbfs', output_folder, model_file_name)
with mlflow.start_run() as run:
df = dataset.dropna(subset=['totalAmount'])
x_df = df.drop(['totalAmount'], axis=1)
y_df = df['totalAmount']
X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=0)
numerical = ['passengerCount', 'tripDistance', 'snowDepth', 'precipTime', 'precipDepth', 'temperature']
categorical = ['hour_of_day', 'day_of_week', 'month_num', 'normalizeHolidayName', 'isPaidTimeOff']
numeric_transformations = [([f], Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])) for f in numerical]
categorical_transformations = [([f], OneHotEncoder(handle_unknown='ignore', sparse=False)) for f in categorical]
transformations = numeric_transformations + categorical_transformations
clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations, df_out=True)),
('regressor', GradientBoostingRegressor())])
clf.fit(X_train, y_train)
y_predict = clf.predict(X_test)
y_actual = y_test.values.flatten().tolist()
rmse = math.sqrt(mean_squared_error(y_actual, y_predict))
mlflow.log_metric('rmse', rmse)
mae = mean_absolute_error(y_actual, y_predict)
mlflow.log_metric('mae', mae)
r2 = r2_score(y_actual, y_predict)
mlflow.log_metric('R2 score', r2)
plt.figure(figsize=(10,10))
plt.scatter(y_actual, y_predict, c='crimson')
plt.yscale('log')
plt.xscale('log')
p1 = max(max(y_predict), max(y_actual))
p2 = min(min(y_predict), min(y_actual))
plt.plot([p1, p2], [p1, p2], 'b-')
plt.xlabel('True Values', fontsize=15)
plt.ylabel('Predictions', fontsize=15)
plt.axis('equal')
results_graph = os.path.join('/dbfs', output_folder, 'results.png')
plt.savefig(results_graph)
mlflow.log_artifact(results_graph)
joblib.dump(clf, open(model_file_path,'wb'))
mlflow.log_artifact(model_file_path)
%md
Run the cell below to list the experiment run in Azure Machine Learning Workspace that you just completed.
aml_run = list(ws.experiments[experiment_name].get_runs())[0]
aml_run
%md
## Exercise 1: Register a databricks-trained model in AML
Azure Machine Learning provides a Model Registry that acts like a version controlled repository for each of your trained models. To version a model, you use the SDK as follows. Run the following cell to register the model with Azure Machine Learning.
model_name = 'nyc-taxi-fare'
model_description = 'Model to predict taxi fares in NYC.'
model_tags = {"Type": "GradientBoostingRegressor",
"Run ID": aml_run.id,
"Metrics": aml_run.get_metrics()}
registered_model = Model.register(model_path=model_file_path, #Path to the saved model file
model_name=model_name,
tags=model_tags,
description=model_description,
workspace=ws)
print(registered_model)
%md
## Exercise 2: Deploy a service that uses the model
%md
### Create the scoring script
script_dir = 'scripts'
dbutils.fs.mkdirs(script_dir)
script_dir_path = os.path.join('/dbfs', script_dir)
print("Script directory path:", script_dir_path)
%%writefile $script_dir_path/score.py
import json
import numpy as np
import pandas as pd
import sklearn
import joblib
from azureml.core.model import Model
columns = ['passengerCount', 'tripDistance', 'hour_of_day', 'day_of_week',
'month_num', 'normalizeHolidayName', 'isPaidTimeOff', 'snowDepth',
'precipTime', 'precipDepth', 'temperature']
def init():
global model
model_path = Model.get_model_path('nyc-taxi-fare')
model = joblib.load(model_path)
print('model loaded')
def run(input_json):
# Get predictions and explanations for each data point
inputs = json.loads(input_json)
data_df = pd.DataFrame(np.array(inputs).reshape(-1, len(columns)), columns = columns)
# Make prediction
predictions = model.predict(data_df)
# You can return any data type as long as it is JSON-serializable
return {'predictions': predictions.tolist()}
%md
### Create the deployment environment
from azureml.core import Environment
from azureml.core.environment import CondaDependencies
my_env_name="nyc-taxi-env"
myenv = Environment.get(workspace=ws, name='AzureML-Minimal').clone(my_env_name)
conda_dep = CondaDependencies()
conda_dep.add_pip_package("numpy==1.18.1")
conda_dep.add_pip_package("pandas==1.1.5")
conda_dep.add_pip_package("joblib==0.14.1")
conda_dep.add_pip_package("scikit-learn==0.24.1")
conda_dep.add_pip_package("sklearn-pandas==2.1.0")
conda_dep.add_pip_package("azure-ml-api-sdk")
myenv.python.conda_dependencies=conda_dep
print("Review the deployment environment.")
myenv
%md
### Create the inference configuration
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script='score.py', source_directory=script_dir_path, environment=myenv)
print("InferenceConfig created.")
%md
### Create the deployment configuration
In this exercise we will use the Azure Container Instance (ACI) to deploy the model
from azureml.core.webservice import AciWebservice, Webservice
description = 'NYC Taxi Fare Predictor Service'
aci_config = AciWebservice.deploy_configuration(
cpu_cores=3,
memory_gb=15,
location='eastus',
description=description,
auth_enabled=True,
tags = {'name': 'ACI container',
'model_name': registered_model.name,
'model_version': registered_model.version
}
)
print("AciWebservice deployment configuration created.")
%md
### Deploy the model as a scoring webservice
Please note that it can take **10-15 minutes** for the deployment to complete.
aci_service_name='nyc-taxi-service'
service = Model.deploy(workspace=ws,
name=aci_service_name,
models=[registered_model],
inference_config=inference_config,
deployment_config= aci_config,
overwrite=True)
service.wait_for_deployment(show_output=True)
print(service.state)
%md
## Exercise 3: Consume the deployed service
%md
**Review the webservice endpoint URL and API key**
api_key, _ = service.get_keys()
print("Deployed ACI test Webservice: {} \nWebservice Uri: {} \nWebservice API Key: {}".
format(service.name, service.scoring_uri, api_key))
%md
**Prepare test data**
#['passengerCount', 'tripDistance', 'hour_of_day', 'day_of_week', 'month_num',
# 'normalizeHolidayName', 'isPaidTimeOff', 'snowDepth', 'precipTime', 'precipDepth', 'temperature']
data1 = [2, 5, 9, 4, 5, 'Memorial Day', True, 0, 0.0, 0.0, 65]
data2 = [[3, 10, 15, 4, 7, 'None', False, 0, 2.0, 1.0, 80],
[2, 5, 9, 4, 5, 'Memorial Day', True, 0, 0.0, 0.0, 65]]
print("Test data prepared.")
dataset.head()
%md
### Consume the deployed webservice over HTTP
import requests
import json
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
response = requests.post(service.scoring_uri, json.dumps(data1), headers=headers)
print('Predictions for data1')
print(response.text)
print("")
response = requests.post(service.scoring_uri, json.dumps(data2), headers=headers)
print('Predictions for data2')
print(response.text)
%md
### Clean-up
When you are done with the exercise, delete the deployed webservice by running the cell below.
service.delete()
print("Deployed webservice deleted.")
I am trying to serve a tensorflow.keras.Model in a Flask + nginx + uwsgi application, using Tensorflow v1.14.
I load the model in the constructor of a class named Prediction in my Flask's application factory function and save the graph as an attribute of
the Flask app, as suggested here.
Then I run the prediction by calling a method Prediction.process in a route named _process of my Flask app, but it gets stuck during the call of tf.keras.Model.predict (self.model.summary() in predict.py is executed, i.e. the summary is shown, but not print("Never gets here :(")).
If I initialize my class Prediction in _process (which I want to avoid to not have to load the model for every prediction), everything works fine.
If I use Flask server, it works fine, too. So it seems that it is related to uwsgi config.
Any suggestion ?
init.py
def create_app():
app = Flask(__name__)
#(...)
app.register_blueprint(bp)
load_tf_model(app)
return app
def load_tf_model(app):
sess = tf.Session(graph=tf.Graph())
app.sess = sess
with sess.graph.as_default():
weights = os.path.join(app.static_folder, 'weights/model.32-0.81.h5')
app.prediction = Prediction(weights)
predict.py
class Prediction:
def __init__(self, weights):
# build model and set weights
inputs = tf.keras.Input(shape=SHAPE, batch_size=1)
outputs = simple_cnn.build_model(inputs, N_CLASSES)
self.model = tf.keras.Model(inputs=inputs, outputs=outputs)
self.model.load_weights(weights)
self.model._make_predict_function()
# create TF mel extractor
self.melspec_ex = tf_feature_utils.MelSpectrogram()
def process(self, audio, sr):
# compute features (in NCHW format) and labels
data = audio2data(
audio,
sr,
class_list=np.arange(N_CLASSES))
features = np.asarray([d[0] for d in data])
features = tf.reshape(features, (features.shape[0], 1, features.shape[1], features.shape[2]))
labels = np.asarray([d[1] for d in data])
# make tf.data.Dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.batch(1)
dataset = dataset.map(lambda data, labels: (
tf.expand_dims(self.melspec_ex.process(tf.squeeze(data, axis=[1,2])), 1)))
# show model (debug)
self.model.summary()
# run prediction
predictions = self.model.predict(dataset)
print("Never gets here :(")
# integrate predictions over time
return np.mean(predictions, axis=0)
routes.py
#bp.route('/_process', methods=['POST'])
def _process():
with current_app.graph.as_default():
# load audio
filepath = session['filepath']
audio, sr = librosa.load(filepath)
# predict
predictions = current_app.prediction.process(audio, sr)
# delete file
os.remove(filepath)
return jsonify(prob=predictions.tolist())
It was a threading issue. I had to add configure uwsgi with the following options:
master = false
processes = 1
cheaper = 0
I wanted to share my findings on how to export a tf model for serving directly from session without creating model checkpoint. my use case requires minimum time to create a pb file, therefore I wanted to get a model.pb file directly from session without creating model checkpoint.
most examples online (and documentation refers to the common case of creating a model checkpoint and loading it in order to create a tf-serving (pb) file. of course this use case is good in case export performance time is not an issue.
import tensorflow as tf
from tensorflow.python.framework import importer
output_path = '/export_directory' # be sure to create it before export
input_ops = ['name/s_of_model_input/s']
output_ops = ['name/s_of_model_output/s']
session = tf.compat.v1.Session()
def get_ops_dict(ops, graph, name='op_'):
out_dict = dict()
for i, op in enumerate(ops):
out_dict[name + str(i)] = tf.compat.v1.saved_model.build_tensor_info(graph.get_tensor_by_name(op + ':0'))
return out_dict
def add_meta_graph(pbtxt_tmp_path, graph_def):
with tf.Graph().as_default() as graph:
importer.import_graph_def(graph_def, name="")
os.unlink(pbtxt_tmp_path)
# used to rename model input/outputs
inputs_dict = get_ops_dict(input_ops, graph, name='input_')
outputs_dict = get_ops_dict(output_ops, graph, name='output_')
prediction_signature = (
tf.compat.v1.saved_model.signature_def_utils.build_signature_def(
inputs=inputs_dict,
outputs=outputs_dict,
method_name=tf.saved_model.PREDICT_METHOD_NAME))
legacy_init_op = tf.group(tf.compat.v1.tables_initializer(), name='legacy_init_op')
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(output_path+'/export')
builder.add_meta_graph_and_variables(
session,
tags=[tf.saved_model.SERVING],
signature_def_map={
tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature},
legacy_init_op=legacy_init_op)
builder.save()
return prediction_signature
def export_model(session, output_path, output_ops):
graph_def = session.graph_def
tf.io.write_graph(graph_or_graph_def=graph_def, logdir=output_path,
name='model.pbtxt', as_text=False)
frozen_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(
session, graph_def, output_ops)
prediction_signature = add_meta_graph(output_path+'/model.pbtxt', frozen_graph_def)
I would like to save my trained Tensorflow model, so it can be deployed by restoring the model file (I'm following this example, which seems to make sense). To do this, however, I need to have named tensors, so that I can do reload the variables with something like:
graph = tf.get_default_graph()
w1 = graph.get_tensor_by_name("my_tensor:0")
I am queuing images from a list of filenames using string_input_producer (code below), but how do I name the tensors so that I can reload them at a later stage?
import tensorflow as tf
flags = tf.app.flags
conf = flags.FLAGS
class ImageDataSet(object):
def __init__(self, img_list_path, num_epoch, batch_size):
# Build the record list queue
input_file = open(images_list_path, 'r')
self.record_list = []
for line in input_file:
line = line.strip()
self.record_list.append(line)
filename_queue = tf.train.string_input_producer(self.record_list, num_epochs=num_epoch)
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file, conf.img_colour_channels)
# preprocess
# ...
min_after_dequeue = 1000
capacity = min_after_dequeue + 400 * batch_size
self.images = tf.train.shuffle_batch(image, batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
I assume that you want to restore the graph for testing or deploying.
For these purposes, you can edit your graph by insert a placeholder as an entrance of the testing data.
To edit the graph, you can use tf's graph editor, or build an new graph with placeholder and save it.