I am trying to get predictions from a Tensor Flow custom routine served on AI platform.
I have managed to serve it with the following settings: --runtime-version 2.3 --python-version 3.7 --machine-type mls1-c4-m2
But I keep getting this error when i try to make any predictions.
ERROR:root:Prediction failed: predict() got an unexpected keyword argument 'stats'
ERROR:root:Prediction failed: unknown error.
The routine has two steps:
Takes the input (a string) and transforms it into a embedding using a bow model in .pkl format
Uses the embeddings for getting the predictions using a keras model saved as an .h5 file
this is my setup.py
from setuptools import setup
REQUIRED_PACKAGES = ['Keras==2.3.1', 'sklearn==0.0', 'h5py<3.0.0', 'numpy==1.16.0', 'scipy==1.4.1', 'pyyaml==5.2']
setup(
name='my_custom_code',
version='0.1',
scripts=['predictor.py'],
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=False,
description=''
)
And this is my predictor.py
import os
import pickle
import tensorflow as tf
import numpy as np
class MyPredictor(object):
def __init__(self, model, bow_model):
self._model = model
self._bow_model = bow_model
def predict(self, instances):
outputs = []
for x in instances:
vector = self.embedding(x)
output = self._model.predict(vector)
outputs.append(output)
return outputs
def embedding(self, statement):
vector = self._bow_model.transform(statement).toarray()
vector = vector.to_list()
return vector
#classmethod
def from_path(cls, model_dir):
model_path = os.path.join(model_dir, 'model.h5')
model = tf.keras.models.load_model(model_path, compile = False)
preprocessor_path = os.path.join(model_dir, 'bow.pkl')
with open(preprocessor_path, 'rb') as f:
bow_model = pickle.load(f)
return cls(model, bow_model)
The script im using for testing is:
import googleapiclient.discovery
instances = ['test','test']]
service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, VERSION_NAME)
response = service.projects().predict(
name=name,
body={'instances': instances}
).execute()
if 'error' in response:
raise RuntimeError(response['error'])
else:
print(response['predictions'])
According to the Custom prediction routine documentation, once creating the predictor class, predict() method should be supplied with self, instances, **kwargs arguments to properly handle the prediction request.
instances: A list of prediction input instances.
**kwargs: A dictionary of keyword args provided as additional fields on the predict request body.
Related
I defined the following custom metric to train my model in tensorflow:
import tensorflow as tf
from tensorflow import keras as ks
N_CLASSES = 15
class MulticlassMeanIoU(tf.keras.metrics.MeanIoU):
def __init__(self,
y_true = None,
y_pred = None,
num_classes = None,
name = "Multi_MeanIoU",
dtype = None):
super(MulticlassMeanIoU, self).__init__(num_classes = num_classes,
name = name, dtype = dtype)
self.__name__ = name
def get_config(self):
base_config = super().get_config()
return {**base_config, "num_classes": self.num_classes}
def update_state(self, y_true, y_pred, sample_weight = None):
y_pred = tf.math.argmax(y_pred, axis = -1)
return super().update_state(y_true, y_pred, sample_weight)
met = MulticlassMeanIoU(num_classes = N_CLASSES)
After training the model, I save the model and I also tried to save the custom object as follows:
with open("/some/path/custom_metrics.pkl", "wb") as f:
pickle.dump(met, f)
However, when I try to load the metric like this:
with open(path_custom_metrics, "rb") as f:
met = pickle.load(f)
I always get some errors, e.g. AttributeError: 'MulticlassMeanIoU' object has no attribute 'update_state_fn'.
Now I wonder whether it is possible to pickle a custom metric at all and if so, how? It would come in handy if I could save custom metrics with the model, so when I load the model in another Python session, I always have the metric which is required to load the model in the first place. It would be possible to define the metric anew through inserting the full code to the other script before loading the model, however, I think this would be bad style and could cause problems in case I would change something about the metric in the training script and forget to copy the code to the other script.
If you need to pickle a metric, one possible solution is to use __getstate__() and __setstate__() methods. During the (de)serialization process, these two methods are called, if they are available. Add these methods to your code and you will have what you need. I tried to make it as general as possible, so that it works for any Metric:
def __getstate__(self):
variables = {v.name: v.numpy() for v in self.variables}
state = {
name: variables[var.name]
for name, var in self._unconditional_dependency_names.items()
if isinstance(var, tf.Variable)}
state['name'] = self.name
state['num_classes'] = self.num_classes
return state
def __setstate__(self, state: Dict[str, Any]):
self.__init__(name=state.pop('name'), num_classes=state.pop('num_classes'))
for name, value in state.items():
self._unconditional_dependency_names[name].assign(value)
I've deployed a custom Pytorch model to the Google AI platform for prediction, but when I try to make a prediction request with image data using gcloud tools I get the following error in response:
{
"error": "Prediction failed: unknown error."
}
I've tried to encode my image data in b64 format or to place it into a multidimensional python array, by doing the following:
pil_im = Image.open('Pic512.png')
pil_im = pil_im.resize((224,224)).convert('RGB')
im_arr = np.asarray(pil_im)
py_arr = im_arr.tolist()
json_instance_1 = {'instances': py_arr}
with open('json_instance_1.json', 'w') as f:
json.dump(json_instance_1, f)
I converted it into b64 like so, after adjusting my Predictor code accordingly:
with open('Pic512.png', 'rb') as f:
byte_im = f.read()
json_instance = {'instances': {'b64': base64.b64encode(byte_im).decode()}}
with open('json_instance.json', 'w') as f:
json.dump(json_instance, f)
I've tried converting with different file formats and similar methods, but all of them give me the same error.
My predictor module:
from facenet_pytorch import MTCNN, InceptionResnetV1, extract_face
import torch
from torchvision import transforms
from torch.nn import functional as F
from PIL import Image
# from sklearn.externals import joblib
import numpy as np
import os
import io
import base64
class MyPredictor(object):
"""An example Predictor for an AI Platform custom prediction routine."""
def __init__(self, model, preprocessor, device):
"""Stores artifacts for prediction. Only initialized via `from_path`.
"""
self._resnet = model
self._mtcnn_mult = preprocessor
self._device = device
self.get_std_tensor = transforms.Compose([
np.float32,
np.uint8,
transforms.ToTensor(),
])
self.tensor2pil = transforms.ToPILImage(mode='RGB')
self.trans_resnet = transforms.Compose([
transforms.Resize((100, 100)),
np.float32,
transforms.ToTensor()
])
def predict(self, instances, **kwargs):
pil_transform = transforms.Resize((512, 512))
imarr = np.uint8(np.array(instances))
# img_bytes_string = io.BytesIO(base64.b64decode(instances))
pil_im = Image.fromarray(imarr)
# pil_im = Image.open(img_bytes_string)
image = pil_im.convert('RGB')
pil_im_512 = pil_transform(image)
boxes, _ = self._mtcnn_mult.detect(pil_im_512)
box = boxes[0]
face_tensor = extract_face(pil_im_512, box, margin=40)
std_tensor = self.get_std_tensor(face_tensor.permute(1, 2, 0))
cropped_pil_im = self.tensor2pil(std_tensor)
face_tensor = self.trans_resnet(cropped_pil_im)
face_tensor4d = face_tensor.unsqueeze(0)
face_tensor4d = face_tensor4d.to(self._device)
self._resnet.eval()
prediction = self._resnet(face_tensor4d)
preds = F.softmax(prediction, dim=1).detach().numpy().reshape(-1)
print('probability of (class1, class2) = ({:.4f}, {:.4f})'.format(preds[0], preds[1]))
return {'probs':preds.tolist()}
#classmethod
def from_path(cls, model_dir):
device_path = os.path.join(model_dir, 'device_cpu.pt')
device = torch.load(device_path)
model_path = os.path.join(model_dir, 'FullResNetRefinedExtra_no_norm_100x100_8634.pt')
classifier = torch.load(model_path, map_location=device)
mtcnn_path = os.path.join(model_dir, 'mtcnn_mult.pt')
mtcnn_mult = torch.load(mtcnn_path)
return cls(classifier, mtcnn_mult, device)
When I test the class locally everything works, so I assume it's a problem related the serialisation and deserialisation on the side of Google Platform. How can I resolve this issue?
I am trying to serve a tensorflow.keras.Model in a Flask + nginx + uwsgi application, using Tensorflow v1.14.
I load the model in the constructor of a class named Prediction in my Flask's application factory function and save the graph as an attribute of
the Flask app, as suggested here.
Then I run the prediction by calling a method Prediction.process in a route named _process of my Flask app, but it gets stuck during the call of tf.keras.Model.predict (self.model.summary() in predict.py is executed, i.e. the summary is shown, but not print("Never gets here :(")).
If I initialize my class Prediction in _process (which I want to avoid to not have to load the model for every prediction), everything works fine.
If I use Flask server, it works fine, too. So it seems that it is related to uwsgi config.
Any suggestion ?
init.py
def create_app():
app = Flask(__name__)
#(...)
app.register_blueprint(bp)
load_tf_model(app)
return app
def load_tf_model(app):
sess = tf.Session(graph=tf.Graph())
app.sess = sess
with sess.graph.as_default():
weights = os.path.join(app.static_folder, 'weights/model.32-0.81.h5')
app.prediction = Prediction(weights)
predict.py
class Prediction:
def __init__(self, weights):
# build model and set weights
inputs = tf.keras.Input(shape=SHAPE, batch_size=1)
outputs = simple_cnn.build_model(inputs, N_CLASSES)
self.model = tf.keras.Model(inputs=inputs, outputs=outputs)
self.model.load_weights(weights)
self.model._make_predict_function()
# create TF mel extractor
self.melspec_ex = tf_feature_utils.MelSpectrogram()
def process(self, audio, sr):
# compute features (in NCHW format) and labels
data = audio2data(
audio,
sr,
class_list=np.arange(N_CLASSES))
features = np.asarray([d[0] for d in data])
features = tf.reshape(features, (features.shape[0], 1, features.shape[1], features.shape[2]))
labels = np.asarray([d[1] for d in data])
# make tf.data.Dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
dataset = dataset.batch(1)
dataset = dataset.map(lambda data, labels: (
tf.expand_dims(self.melspec_ex.process(tf.squeeze(data, axis=[1,2])), 1)))
# show model (debug)
self.model.summary()
# run prediction
predictions = self.model.predict(dataset)
print("Never gets here :(")
# integrate predictions over time
return np.mean(predictions, axis=0)
routes.py
#bp.route('/_process', methods=['POST'])
def _process():
with current_app.graph.as_default():
# load audio
filepath = session['filepath']
audio, sr = librosa.load(filepath)
# predict
predictions = current_app.prediction.process(audio, sr)
# delete file
os.remove(filepath)
return jsonify(prob=predictions.tolist())
It was a threading issue. I had to add configure uwsgi with the following options:
master = false
processes = 1
cheaper = 0
Trying to run the Inceptionv2 Tensorflow model with the architecture and the checkpoint inception_resnet_v2_2016_08_30.ckpt. And my code is for predicting the probability of each classification, for a given image.
I try to construt the tensorflow code using class according to the awesome blog here. But we had error:
NotFoundError (see above for traceback): Tensor name "prediction/InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta"not found in checkpoint files inception_resnet_v2_2016_08_30.ckpt.
My error code as follows.
from inception_resnet_v2 import *
import functools
import inception_preprocessing
import matplotlib.pyplot as plt
import os
import numpy as np
import tensorflow as tf
from scipy.misc import imread
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
def doublewrap(function):
"""
A decorator decorator, allowing to use the decorator to be used without
parentheses if no arguments are provided. All arguments must be optional.
"""
#functools.wraps(function)
def decorator(*args, **kwargs):
if len(args) == 1 and len(kwargs) == 0 and callable(args[0]):
return function(args[0])
else:
return lambda wrapee: function(wrapee, args, *kwargs)
return decorator
#doublewrap
def define_scope(function, scope=None, args, *kwargs):
"""
A decorator for functions that define TensorFlow operations. The wrapped
function will only be executed once. Subsequent calls to it will directly
return the result so that operations are added to the graph only once.
The operations added by the function live within a tf.variable_scope(). If
this decorator is used with arguments, they will be forwarded to the
variable scope. The scope name defaults to the name of the wrapped
function.
"""
attribute = '_cache_' + function.__name__
name = scope or function.__name__
#property
#functools.wraps(function)
def decorator(self):
if not hasattr(self, attribute):
with tf.variable_scope(name, args, *kwargs):
setattr(self, attribute, function(self))
return getattr(self, attribute)
return decorator
class Inception(object):
def __init__(self,
image):
self.image = image
self.process_data # call function process_data
self.prediction
#define_scope
def process_data(self):
image_size = inception_resnet_v2.default_image_size
image = inception_preprocessing.preprocess_image(self.image, image_size, image_size, is_training=False, )
image1 = tf.expand_dims(image, 0)
return image1
#define_scope
def prediction(self):
'''Creates the Inception Resnet V2 model.'''
arg_scope = inception_resnet_v2_arg_scope()
with tf.contrib.slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(self.process_data, is_training=False)
probabilities = tf.nn.softmax(logits)
return probabilities
def main():
tf.reset_default_graph()
image = tf.placeholder(tf.float32, [None, None, 3])
model = Inception(image)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess,
'inception_resnet_v2_2016_08_30.ckpt')
probabilities = sess.run(model.prediction, feed_dict={image: data})
print(probabilities)
if _name_ == '__main__':
data = imread('ILSVRC2012_test_00000003 .JPEG', mode='RGB').astype(np.float)
main()
However, if we don't construct the code using class as above, and we just run sucessfully.
The following is the code which ran without errors.
from inception_resnet_v2 import *
import inception_preprocessing
import os
import numpy as np
import tensorflow as tf
from scipy.misc import imread
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
slim = tf.contrib.slim
tf.reset_default_graph()
# prepare data
data = imread('ILSVRC2012_test_00000003.JPEG', mode='RGB').astype(np.float)
image = tf.placeholder(tf.float32, [None, None, 3])
# pre-processing image
image_size = inception_resnet_v2.default_image_size
processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False,)
processed_image = tf.expand_dims(processed_image, 0)
# Creates the Inception Resnet V2 model.
arg_scope = inception_resnet_v2_arg_scope()
with slim.arg_scope(arg_scope):
logits, end_points = inception_resnet_v2(processed_image, is_training=False)
probabilities = tf.nn.softmax(logits)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, './inception_resnet_v2_2016_08_30.ckpt')
print(sess.run(probabilities, feed_dict={image:data}))
Any help would be appreciated!
The decorator wraps the Inception network into a variable scope named after the function, prediction in this case. As a result, the variable names in the checkpoint don't match up with variable names in the graph anymore.
To verify this, you can change tf.variable_scope() to tf.name_scope() in the decorator. In most use cases, this should also not influence the rest of your program.
If you need the variable scope, you can pass a dict into tf.train.Saver() that maps variable names in the checkpoint to variable objects in the graph.
It's also possible to automate this by reading the variable names in the checkpoint using tf.python.pywrap_tensorflow. NewCheckpointReader() but I don't have a code example ready to share for this.
This article illustrates how to add Runtime statistics to Tensorboard:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary, _ = sess.run([merged, train_step],
feed_dict=feed_dict(True),
options=run_options,
run_metadata=run_metadata)
train_writer.add_run_metadata(run_metadata, 'step%d' % i)
train_writer.add_summary(summary, i)
print('Adding run metadata for', i)
which creates the following details in Tensorboard:
This is fairly straightforward on a single machine. How could one do this in a distributed environment using Estimators?
I use the following hook, based on ProfilerHook, to have the estimator output the run metadata into the model directory and inspect it later with Tensorboard.
import tensorflow as tf
from tensorflow.python.training.session_run_hook import SessionRunHook, SessionRunArgs
from tensorflow.python.training import training_util
from tensorflow.python.training.basic_session_run_hooks import SecondOrStepTimer
class MetadataHook(SessionRunHook):
def __init__ (self,
save_steps=None,
save_secs=None,
output_dir=""):
self._output_tag = "step-{}"
self._output_dir = output_dir
self._timer = SecondOrStepTimer(
every_secs=save_secs, every_steps=save_steps)
def begin(self):
self._next_step = None
self._global_step_tensor = training_util.get_global_step()
self._writer = tf.summary.FileWriter (self._output_dir, tf.get_default_graph())
if self._global_step_tensor is None:
raise RuntimeError("Global step should be created to use ProfilerHook.")
def before_run(self, run_context):
self._request_summary = (
self._next_step is None or
self._timer.should_trigger_for_step(self._next_step)
)
requests = {"global_step": self._global_step_tensor}
opts = (tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
if self._request_summary else None)
return SessionRunArgs(requests, options=opts)
def after_run(self, run_context, run_values):
stale_global_step = run_values.results["global_step"]
global_step = stale_global_step + 1
if self._request_summary:
global_step = run_context.session.run(self._global_step_tensor)
self._writer.add_run_metadata(
run_values.run_metadata, self._output_tag.format(global_step))
self._writer.flush()
self._next_step = global_step + 1
def end(self, session):
self._writer.close()
To use it, one creates the estimator instance (my_estimator) as usual, whether it is pre-made one or a custom estimator. The desired operation is called passing an instance of the class above as a hook. For example:
hook = MetadataHook(save_steps=1, output_dir=<model dir>)
my_estimator.train( train_input_fn, hooks=[hook] )
The run metadata will be placed in the model dir and can be inspected by TensorBoard.
You may use tf.train.ProfilerHook. However the catch is that it was released at 1.14.
Example usage:
estimator = tf.estimator.LinearClassifier(...)
hooks = [tf.train.ProfilerHook(output_dir=model_dir, save_secs=600, show_memory=False)]
estimator.train(input_fn=train_input_fn, hooks=hooks)
Executing the hook will generate files timeline-xx.json in output_dir.
Then open chrome://tracing/ in chrome browser and load the file. You will get a time usage timeline like below.