I have been trying to perform inference using bioBERT(https://github.com/dmis-lab/biobert)
Model and TF serve to perform QA task.
I have succesfully exported the model: My serving function looks like this:
feature_columns = [
tf.feature_column.numeric_column("unique_ids", shape=(FLAGS.max_seq_length,), dtype=tf.int64),
tf.feature_column.numeric_column("input_ids", shape=(FLAGS.max_seq_length,), dtype=tf.int64),
tf.feature_column.numeric_column("input_mask", shape=(FLAGS.max_seq_length,), dtype=tf.int64),
tf.feature_column.numeric_column("segment_ids", shape=(FLAGS.max_seq_length,), dtype=tf.int64)
]
serving_input_fn=tf.estimator.export.build_parsing_serving_input_receiver_fn(tf.feature_column.make_parse_example_spec(feature_columns))
estimator._export_to_tpu = False
estimator_path = estimator.export_saved_model(estimator_base_path, serving_input_fn, checkpoint_path)
##############################################
I am also able to generate a TFrecord File and trying to utilize TFrecordIterator to iterate over tht tf records file and call the GRPC generated stub.
#record_path is the path to TF_record filw
Function below....
all_results = [ ]
record_iterator = tf.python_io.tf_record_iterator(pathToTfRecordFile)
for string_record in record_iterator:
model_request.inputs['examples'].CopyFrom(
tf.contrib.util.make_tensor_proto(string_record,
dtype=tf.string,
shape=[batch_size])
)
result_future = stub.Predict.future(model_request, 30.0)
result = result_future.result().outputs
all_results.append(process_result(result))
The Error I am getting is as follows:
_MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Name: <unknown>, Key: unique_ids, Index: 0. Number of int64 values != expected. Values size: 1 but output shape: [384]
Any help on this issue is appreciated.
Try shaping the unique_ids like this in the service_inputfn.
tf.feature_column.numeric_column("unique_ids", shape=(1,), dtype=tf.int64),
Related
TL;DR: How can I use model.whatever_function(input) instead of model.forward(input) for the onnxruntime?
I use CLIP embedding to create embedding for my Image and texts as:
Code is from the official git merge
! pip install ftfy regex tqdm
! pip install git+https://github.com/openai/CLIP.git
import clip
import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model, preprocess = clip.load("RN50", device=device) # Load any model
model = model.eval() # Inference Only
img_size = model.visual.input_resolution
dummy_image = torch.randn(10, 3, img_size, img_size).to(device)
image_embedding = model.encode_image(dummy_image).to(device))
dummy_texts = clip.tokenize(["quick brown fox", "lorem ipsum"]).to(device)
model.encode_text(dummy_texts)
and it works fine giving me [Batch, 1024] tensors for both for the loaded model.
Now I have quantized my model in Onnx as:
model.forward(dummy_image,dummy_texts) # Original CLIP result (1)
torch.onnx.export(model, (dummy_image, dummy_texts), "model.onnx", export_params=True,
input_names=["IMAGE", "TEXT"],
output_names=["LOGITS_PER_IMAGE", "LOGITS_PER_TEXT"],
opset_version=14,
dynamic_axes={
"IMAGE": {
0: "image_batch_size",
},
"TEXT": {
0: "text_batch_size",
},
"LOGITS_PER_IMAGE": {
0: "image_batch_size",
1: "text_batch_size",
},
"LOGITS_PER_TEXT": {
0: "text_batch_size",
1: "image_batch_size",
},
}
)
and the model is saved.
When I test the model as :
# Now run onnxruntime to verify
import onnxruntime as ort
ort_sess = ort.InferenceSession("model.onnx")
result=ort_sess.run(["LOGITS_PER_IMAGE", "LOGITS_PER_TEXT"],
{"IMAGE": dummy_image.numpy(), "TEXT": dummy_texts.numpy()})
It gives me a list of length 2, one for each image and text and the result[0] has shape of [Batch,2].
If your encode_image on your module isn't calling forward then nothing is stopping you from overriding forward before exporting to Onnx:
>>> model.forward = model.encode_image
>>> torch.onnx.export(model, (dummy_image, dummy_texts), "model.onnx", ...))
I'm predicting using the same model. The result is different, but it shouldn't be since it's same model (first model saved into file, then second model is created from loading the file).
I test to reshape the tensorflow-serving inputs to like "{"input_word_ids": [encodes['input_ids']]...", but it did not work.
Input into tensorflow serving:
inputs= {"instances": [{"input_word_ids": encodes['input_ids'], "input_mask": encodes['input_ids'], "input_type_ids": encodes['input_ids']}]}
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/bert_model:predict',
data=json.dumps(inputs), headers=headers)
result:[[-0.998641431, -0.812168241, 2.3345871, 1.79382575, -0.00233620172, -3.28864908, -0.531166852, -0.705040932, 2.24861217, 1.16766787, 0.893842638, 0.882387042, -0.419321597, -1.92870927, 0.00293750782, 0.571823, -1.0008961, 0.25918898, -2.7151773, 1.1212393, 1.05252075, 0.621426523, 0.428827167]]
Input into local model:
encodes = tokenizer.encode_plus(text,
add_special_tokens=True, # add [CLS], [SEP]
max_length=max_length, # max length of the text that can go to
pad_to_max_length=True, # add [PAD] tokens
return_attention_mask=True, # add attention mask )
inputs = {
'input_word_ids': tf.convert_to_tensor([encodes['input_ids']]),
'input_mask': tf.convert_to_tensor([encodes['token_type_ids']]),
'input_type_ids': tf.convert_to_tensor([encodes['token_type_ids']])}
load_model = tf.saved_model.load(export_dir)
result = load_model(inputs, training=False)
result:tf.Tensor(
[[-1.8024038 -1.3173668 -0.08120027 -2.0854492 -0.50478387 2.9193602
0.2091662 -0.01863982 -1.0049771 -1.7226878 -1.5322843 -0.97847456
-1.1123526 7.5573688 -0.5297349 -2.2933776 -1.3478909 -3.367974
-0.08475648 -1.229377 3.276103 -0.26380143 1.2428844 ]], shape=(1, 23), dtype=float32)
What is wrong here?
I trained a Wide & Deep model using the pre-made Estimator class (DNNLinearCombinedClassifier), by essentially following the tutorial on tensorflow.org.
I wanted to do inference/serving, but without using tensorflow-serving. This basically comes down to feeding some test data to the correct input tensor and retrieving the output tensor.
However, I am not sure what the input nodes/layer should be. In the tensorflow graph (graph.pbtxt), the following nodes seem relevant. But they are also related to the input queue which is mainly used during training, but not necessarily inference (I can just send one instance at a time).
name: "enqueue_input/random_shuffle_queue"
name: "enqueue_input/Placeholder"
name: "enqueue_input/Placeholder_1"
name: "enqueue_input/Placeholder_2"
...
name: "enqueue_input/Placeholder_84"
name: "enqueue_input/random_shuffle_queue_EnqueueMany_1"
name: "enqueue_input/random_shuffle_queue_EnqueueMany_2"
name: "enqueue_input/random_shuffle_queue_EnqueueMany_3"
name: "enqueue_input/random_shuffle_queue_EnqueueMany_4"
name: "enqueue_input/random_shuffle_queue_EnqueueMany"
name: "enqueue_input/sub/y"
name: "enqueue_input/sub"
name: "enqueue_input/Maximum/x"
name: "enqueue_input/Maximum"
name: "enqueue_input/Cast"
name: "enqueue_input/mul/y"
name: "enqueue_input/mul"
Does anyone know the answer? Thanks in advance!
If you want inference, but without using tensorflow-serving, you can just use the tf.estimator.Estimator predict method.
But if you want to do it manually (so that is runs faster), you need a workaround. I am not sure if what I did was exactly the best approach, but it worked. Here's my solution.
1) Let's do the imports and create variables and fake data:
import os
import numpy as np
from functools import partial
import pickle
import tensorflow as tf
N = 10000
EPOCHS = 1000
BATCH_SIZE = 2
X_data = np.random.random((N, 10))
y_data = (np.random.random((N, 1)) >= 0.5).astype(int)
my_dir = os.getcwd() + "/"
2) Define an input_fn, which you will use tf.data.Dataset. Save the tensor names in a dictionary ("input_tensor_map"), which maps the input key to the tensor name.
def my_input_fn(X, y=None, is_training=False):
def internal_input_fn(X, y=None, is_training=False):
if (not isinstance(X, dict)):
X = {"x": X}
if (y is None):
dataset = tf.data.Dataset.from_tensor_slices(X)
else:
dataset = tf.data.Dataset.from_tensor_slices((X, y))
if (is_training):
dataset = dataset.repeat().shuffle(100)
batch_size = BATCH_SIZE
else:
batch_size = 1
dataset = dataset.batch(batch_size)
dataset_iter = dataset.make_initializable_iterator()
if (y is None):
features = dataset_iter.get_next()
labels = None
else:
features, labels = dataset_iter.get_next()
input_tensor_map = dict()
for input_name, tensor in features.items():
input_tensor_map[input_name] = tensor.name
with open(os.path.join(my_dir, 'input_tensor_map.pickle'), 'wb') as f:
pickle.dump(input_tensor_map, f, protocol=pickle.HIGHEST_PROTOCOL)
tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, dataset_iter.initializer)
return (features, labels) if (not labels is None) else features
return partial(internal_input_fn, X=X, y=y, is_training=is_training)
3) Define your model, to be used in your tf.estimator.Estimator. For example:
def my_model_fn(features, labels, mode):
output = tf.layers.dense(inputs=features["x"], units=1, activation=None)
logits = tf.identity(output, name="logits")
prediction = tf.nn.sigmoid(logits, name="predictions")
classes = tf.to_int64(tf.greater(logits, 0.0), name="classes")
predictions_dict = {
"class": classes,
"probabilities": prediction
}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions_dict)
one_hot_labels = tf.squeeze(tf.one_hot(tf.cast(labels, dtype=tf.int32), 2))
loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=one_hot_labels, logits=logits)
tf.summary.scalar("loss", loss)
accuracy = tf.reduce_mean(tf.to_float(tf.equal(labels, classes)))
tf.summary.scalar("accuracy", accuracy)
# Configure the Training Op (for TRAIN mode)
if (mode == tf.estimator.ModeKeys.TRAIN):
train_op = tf.train.AdamOptimizer().minimize(loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
return tf.estimator.EstimatorSpec(mode=mode, loss=loss)
4) Train and freeze your model. The freeze method is from TensorFlow: How to freeze a model and serve it with a python API, which I added a tiny modification.
def freeze_graph(output_node_names):
"""Extract the sub graph defined by the output nodes and convert
all its variables into constant
Args:
model_dir: the root folder containing the checkpoint state file
output_node_names: a string, containing all the output node's names,
comma separated
"""
if (output_node_names is None):
output_node_names = 'loss'
if not tf.gfile.Exists(my_dir):
raise AssertionError(
"Export directory doesn't exists. Please specify an export "
"directory: %s" % my_dir)
if not output_node_names:
print("You need to supply the name of a node to --output_node_names.")
return -1
# We retrieve our checkpoint fullpath
checkpoint = tf.train.get_checkpoint_state(my_dir)
input_checkpoint = checkpoint.model_checkpoint_path
# We precise the file fullname of our freezed graph
absolute_model_dir = "/".join(input_checkpoint.split('/')[:-1])
output_graph = absolute_model_dir + "/frozen_model.pb"
# We clear devices to allow TensorFlow to control on which device it will load operations
clear_devices = True
# We start a session using a temporary fresh Graph
with tf.Session(graph=tf.Graph()) as sess:
# We import the meta graph in the current default Graph
saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices)
# We restore the weights
saver.restore(sess, input_checkpoint)
# We use a built-in TF helper to export variables to constants
output_graph_def = tf.graph_util.convert_variables_to_constants(
sess, # The session is used to retrieve the weights
tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes
output_node_names.split(",") # The output node names are used to select the usefull nodes
)
# Finally we serialize and dump the output graph to the filesystem
with tf.gfile.GFile(output_graph, "wb") as f:
f.write(output_graph_def.SerializeToString())
print("%d ops in the final graph." % len(output_graph_def.node))
return output_graph_def
# *****************************************************************************
tf.logging.set_verbosity(tf.logging.INFO)
estimator = tf.estimator.Estimator(model_fn=my_model_fn, model_dir=my_dir)
if (estimator.latest_checkpoint() is None):
estimator.train(input_fn=my_input_fn(X=X_data, y=y_data, is_training=True), steps=EPOCHS)
freeze_graph("predictions,classes")
tf.logging.set_verbosity(tf.logging.INFO)
estimator = tf.estimator.Estimator(model_fn=my_model_fn, model_dir=my_dir)
if (estimator.latest_checkpoint() is None):
estimator.train(input_fn=my_input_fn(X=X_data, y=y_data, is_training=True), steps=EPOCHS)
freeze_graph("predictions,classes")
5) Finally, you can use the frozen graph for inference, input tensors names are in the dictionary that you saved. Again, the method to load the freezed model from TensorFlow: How to freeze a model and serve it with a python API.
def load_frozen_graph(prefix="frozen_graph"):
frozen_graph_filename = os.path.join(my_dir, "frozen_model.pb")
# We load the protobuf file from the disk and parse it to retrieve the
# unserialized graph_def
with tf.gfile.GFile(frozen_graph_filename, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
# Then, we import the graph_def into a new Graph and returns it
with tf.Graph().as_default() as graph:
# The name var will prefix every op/nodes in your graph
# Since we load everything in a new graph, this is not needed
tf.import_graph_def(graph_def, name=prefix)
return graph
# *****************************************************************************
X_test = {"x": np.random.random((int(N/2), 10))}
prefix = "frozen_graph"
graph = load_frozen_graph(prefix)
for op in graph.get_operations():
print(op.name)
with open(os.path.join(my_dir, 'input_tensor_map.pickle'), 'rb') as f:
input_tensor_map = pickle.load(f)
with tf.Session(graph=graph) as sess:
input_feed = dict()
for key, tensor_name in input_tensor_map.items():
tensor = graph.get_tensor_by_name(prefix + "/" + tensor_name)
input_feed[tensor] = X_test[key]
logits = graph.get_operation_by_name(prefix + "/logits").outputs[0]
probabilities = graph.get_operation_by_name(prefix + "/predictions").outputs[0]
classes = graph.get_operation_by_name(prefix + "/classes").outputs[0]
logits_values, probabilities_values, classes_values = sess.run([logits, probabilities, classes], feed_dict=input_feed)
I wrote a neural network using the Tensorflow framework but, when I try to make some predictions the program doesn't end.
The program contains a lot of files and I will try to post here the main ones but, in short, I have a custom model function and an Experimenter to train and to evaluate it.
I have to use the Experimenter API because the program needs to run both locally and in cloud but, in the first case, I don't want use Tensorflow Serving to run my predictions.
Note: The project is similar to the Google Cloud Platform sample that you can find at this address
My model function
def generate_model_fn(...):
def _model_fn(features, labels, mode):
# Pop the name of the signal.
if 'FN' in features:
names = features.pop('FN')
if 'FT' in features:
labels = features.pop('FT')
columns = [layers.real_valued_column(key) for key, value in features.items()]
inputs = layers.input_from_feature_columns(features, columns)
hidden_layers = None
# Iterate all over the hidden units.
for unit in hidden_units:
hidden_layers = tf.layers.dense(
inputs=inputs if hidden_layers is None else hidden_layers,
activation=tf.nn.relu,
units=unit,
)
dropout_layer = layers.dropout(inputs=hidden_layers,keep_prob=1.0 - dropout)
logits = tf.layers.dense(inputs=dropout_layer, activation=None, units=2)
if mode in (ModeKeys.PREDICT, ModeKeys.EVAL):
probabilities = tf.nn.softmax(logits)
predictions = tf.argmax(logits, 1)
if mode == ModeKeys.PREDICT:
predictions = {
'classes': predictions,
'scores': probabilities,
}
export_outputs = {
'prediction': tf.estimator.export.PredictOutput(predictions)
}
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
export_outputs=export_outputs
)
return _model_fn
The experimenter
def generate_experimenter_fn(**args):
def _experimenter_fn(run_config, hparams):
training_fn = lambda: generate_input_fn(...)
evaluating_fn = lambda: generate_input_fn(...)
return learn.Experiment(
tf.estimator.Estimator(
generate_model_fn(
learning_rate=hparams.learning_rate,
hidden_units=hparams.hidden_units,
dropout=hparams.dropout,
weights=hparams.weights,
),
config=run_config,
),
train_input_fn=training_fn,
eval_input_fn=evaluating_fn,
**args
)
return _experimenter_fn
learn_runner.run(
generate_experimenter_fn(
train_steps=args.train_steps,
eval_steps=args.eval_steps,
),
run_config=run_config.RunConfig(model_dir=args.job_dir),
hparams=hparam.HParams(**args.__dict__),
)
The predictions
Everything posted before works perfectly, now it's time to make some predictions. As I said I don't want use Tensorflow Serving locally so, I'm trying to create an Estimator using the same model and the same configurations.
classifier = tf.estimator.Estimator(
generate_model_fn(
learning_rate=args.learning_rate,
hidden_units=args.hidden_units,
dropout=args.dropout,
weights=args.weights,
),
config=run_config.RunConfig(model_dir=args.job_dir),
)
The model seems restore correctly according to the log file, so I want try to make a simple prediction with just one record. This program will never end and it is running until my machine will be completely tilted.
def predict_input_fn():
x = {
'FN': tf.constant(['A']),
'SS': tf.constant([1]),
'SN': tf.constant([2]),
'SL': tf.constant([3]),
'NS': tf.constant([4]),
'NN': tf.constant([5]),
'NL': tf.constant([6]),
'LS': tf.constant([7]),
'LN': tf.constant([8]),
'LL': tf.constant([9]),
'FT': tf.constant([0])
}
y = x['FT']
return x,y
predictions = classifier.predict(input_fn=predict_input_fn)
Hey I am trying to set an input point for the model that i have writen in tensorflow
this is the code for the classification
n_dim = training_features.shape[1]
x = tf.placeholder(tf.float32, [None,n_dim])
classifier = (...)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
classifier.fit(training_features, training_labels, steps=100)
accuracy_score = classifier.evaluate(testing_features, testing_labels, steps=100)["accuracy"]
print('Accuracy', accuracy_score)
pred_a = np.asarray([x])
prediction = format(list(classifier.predict(pred_a)))
prediction_result = np.array(prediction)
output = tf.convert_to_tensor(prediction_result,dtype=None,name="output", preferred_dtype=None)
and here is my building code
export_path_base = sys.argv[-1]
export_path = os.path.join(
compat.as_bytes(export_path_base),
compat.as_bytes(str(FLAGS.model_version)))
print('Exporting trained model to', export_path)
builder = saved_model_builder.SavedModelBuilder(export_path)
classification_inputs = utils.build_tensor_info(y)
classification_outputs_classes = utils.build_tensor_info(output)
print('classification_signature...')
classification_signature = signature_def_utils.build_signature_def(
inputs={signature_constants.CLASSIFY_INPUTS: classification_inputs},
outputs={
signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes
},
method_name=signature_constants.CLASSIFY_METHOD_NAME)
tensor_info_x = utils.build_tensor_info(x)
print('prediction_signature...')
prediction_signature = signature_def_utils.build_signature_def(
inputs={'input': tensor_info_x},
outputs={
'classes' : classification_outputs_classes
},
method_name=signature_constants.PREDICT_METHOD_NAME)
print('Exporting...')
legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map={
'predict_sound':
prediction_signature,
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature,
},
legacy_init_op=legacy_init_op)
builder.save()
print('Saved...')
I have tried manually passing dummy data before the build and that works but i am trying to have the client stub pass data into the model dynamically.
When i try run that code to build i get this error
InvalidArgumentError (see above for traceback): Shape in
shape_and_slice spec [1,280] does not match the shape stored in
checkpoint: [193,280] [[Node: save/RestoreV2_1 =
RestoreV2[dtypes=[DT_FLOAT],
_device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
May main goal is to have x as an input and output return the results, the output works but cant get the input to work.
Edit: If you just put the np.array as input without going through an input function, it will work but you also give up the chance to check the input.
Tensorflow won't check your input even if it's in the wrong shape or type or somehow corrupted but to throw such an error in the middle of session. Since you can run it with your test data successfully, the problem should be your actual data. Thus, it's recommended to write an input function to check your data before put it into classifier. Note that input function should return tf.Tensor with the shape of [x,1] (x is the number of your features) instead of an np.array.
Please refer to https://www.tensorflow.org/get_started/input_fn to see how to write your own input function and pass it to the classifier.
An example of input function:
def input_fn_predict(): # returns x, None
#do your check here or you can just print it out
feature_tensor = tf.constant(pred_a,shape=[1,pred_a.size])
return feature_tensor,None