MxNet: label_shapes don't match names specified by label_names

MxNet: label_shapes don't match names specified by label_names - python

I wrote a script to do the classification of a single input image using a model I trained with MxNet. To classify the incoming image I feedforward them in through network.
In short here is what I am doing:
symbol, arg_params, aux_params = mx.model.load_checkpoint('model-prefix', 42)
model = mx.mod.Module(symbol=symbol, context=mx.cpu())
model.bind(data_shapes=[('data', (1, 3, 224, 244))], for_training=False)
model.set_params(arg_params, aux_params)
# ... loading the image & resizing ...
# img is the image to classify as numpy array of shape (3, 244, 244)
Batch = namedtuple('Batch', ['data'])
self._model.forward(Batch(data=[mx.nd.array(img)]))
probabilities = self._model.get_outputs()[0].asnumpy()
print(str(probabilities))
This works fine, except that I am getting the following warning
UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
What should I change to avoid getting this warning? It is not clear to me what the label_shapes and label_names parameters are meant for, and what I am expect to fill them with.
Note: I found some thread about them, but none enabled me to solve the problem. Similarly the MxNet documentation doesn't provide much details on what those parameters are and on how they are supposed to be filled.

Set label_names=None and allow_missing=True. That should get rid of the warning.
model = mx.mod.Module(symbol=symbol, context=mx.cpu(), label_names=None)
...
model.set_params(arg_params, aux_params, allow_missing=True)
If you are curious why the warning is printed in the first place,
Every module has associated label. When this model was trained, softmax_label was used as the label (most likely because the output layer was a softmax layer named 'softmax'). When the model was loaded from file, the module that was created had softmax_label as the module's label.
>>>print(model.label_names)
['softmax_label']
model.bind is then called without providing label_shapes.
model.bind(data_shapes=[('data', (1, 3, 224, 244))], for_training=False)
MXNet sees that the module has a label in it which was not provided during bind and complains about it - which is the warning message you see.
I think if bind is called with for_training=False, MXNet shouldn't complain about the missing label. I've created this issue: https://github.com/dmlc/mxnet/issues/6958
However, for this particular case where we load a model from disk, we can load it with None as the label so that MXNet doesn't later complain when bind doesn't provide label - which is what the suggested fix does.

Related

TensorFlow Lite does not recognize op VarHandleOp

I am attempting to convert a TF model to TFLite. The model was saved in .pb format and I have converted it with the following code:
import os
import tensorflow as tf
from tensorflow.core.protobuf import meta_graph_pb2
export_dir = os.path.join('export_dir', '0')
if not os.path.exists('export_dir'):
os.mkdir('export_dir')
tf.compat.v1.enable_control_flow_v2()
tf.compat.v1.enable_v2_tensorshape()
# I took this function from a tutorial on the TF website
def wrap_frozen_graph(graph_def, inputs, outputs):
def _imports_graph_def():
tf.compat.v1.import_graph_def(graph_def, name="")
wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
import_graph = wrapped_import.graph
return wrapped_import.prune(
inputs, outputs)
graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(os.path.join(export_dir, 'saved_model.pb'),'rb').read())
concrete_func = wrap_frozen_graph(
graph_def, inputs=['extern_data/placeholders/data/data:0', 'extern_data/placeholders/data/data_dim0_size:0'],
outputs=['output/output_batch_major:0'])
concrete_func.inputs[0].set_shape([None, 50])
concrete_func.inputs[1].set_shape([None])
concrete_func.outputs[0].set_shape([None, 100])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.post_training_quantize=True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
converter.allow_custom_ops=True
tflite_model = converter.convert()
# Save the model.
if not os.path.exists('tflite'):
os.mkdir('tflite')
output_model = os.path.join('tflite', 'model.tflite')
with open(output_model, 'wb') as f:
f.write(tflite_model)
However, when I try to use the intepretere with this model I get the following error:
INFO: TfLiteFlexDelegate delegate: 8 nodes delegated out of 970 nodes with 3 partitions.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 4 nodes with 0 partitions.
INFO: TfLiteFlexDelegate delegate: 3 nodes delegated out of 946 nodes with 1 partitions.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 1 nodes with 0 partitions.
INFO: TfLiteFlexDelegate delegate: 3 nodes delegated out of 16 nodes with 2 partitions.
Traceback (most recent call last):
File "/path/to/tflite_interpreter.py", line 9, in <module>
interpreter.allocate_tensors()
File "/path/to/lib/python3.6/site-packages/tensorflow/lite/python/interpreter.py", line 243, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: Encountered unresolved custom op: VarHandleOp.Node number 0 (VarHandleOp) failed to prepare.
Now, I don't find any VarHandleOp in the code and I found out that it is actually in tensorflow (https://www.tensorflow.org/api_docs/python/tf/raw_ops/VarHandleOp).
So, why isn't TFLite able to recognize it?

It's certainly hard to provide a minimal reproducible example in the case of model conversion, as the SO guidelines recommend, but the questions would benefit from better pointers. For example, instead of saying “I took this function from a tutorial on the TF website”, it is a much better idea to provide a link to the tutorial. The TF website is vastly huge.
The tutorial that you are referring to is probably from the section on migrating from TF1 to TF2, specifically the part of handling the raw graph files. The crucially important note is
if you have a "Frozen graph" (a tf.Graph where the variables have been turned into constants)
(the bold highlight is mine). Apparently, your graph contains VarHandleOp (the same applies to the Variable and VariableV2 nodes), and is not “frozen” by this definition. Your general approach makes sense, but you need a graph that contains actual trained values for the variables in the form of the Const node. You need variables at the training time, but for inference time, and should be baked into the graph. TFLite, as an inference-time framework, does not support variables.
The rest of your idea seems fine. TFLiteConverter.from_concrete_functions currently takes exactly one concrete_function, but this is what you get from wrapping the graph. With enough luck it may work.
There is a utility tensorflow/python/tools/freeze_graph.py that attempts its best to replace variables in a Graph.pb with constants taken from the latest checkpoint file. If you look at its code, either using the saved metagraph (checkpoint_name.meta) file or pointing the tool to the training directory eliminates a lot of guesswork; also, I think that providing the model directory is the only way to get a single frozen graph a sharded model.
I noticed that you use just input in place of tf.nest.map_structure(import_graph.as_graph_element, inputs) in the example. You may have other reasons for that, but if you do it because as_graph_element complains about datatype/shape, this is likely to be resolved by freezing the graph properly. The concrete_function that you obtain from the frozen graph will have a good idea about its input shapes and datatypes. Generally, it's unexpected to need to manually set them, and the fact that you do seems odd to me (but I do not claim a broad experience with this dark corner of TF).
map_structure has a keyword argument to skip the check.

Keras Model Multi Input - TypeError: ('Keyword argument not understood:', 'input')

I am trying to build a CNN that receives multiple inputs and I am trying the following:
input = keras.Input()
classifier = keras.Model(inputs=input,output=classifier)
When run the code I am receiving the following error for line 6 though:
TypeError: ('Keyword argument not understood:', 'input').
A hint would be much appreciated, thank you!

Some parameters of your code are not specified. I have copied your example with some numbers that you can change back.
import keras
input_dim_1 = 10
input1 = keras.layers.Input(shape=(input_dim_1,1))
cnn_classifier_1 = keras.layers.Conv1D(64, 5, activation='sigmoid')(input1)
cnn_classifier_1 = keras.layers.Dropout(0.5)(cnn_classifier_1)
cnn_classifier_1 = keras.layers.Conv1D(48, 5, activation='sigmoid')(cnn_classifier_1)
cnn_classifier_1 = keras.models.Model(inputs=input1,outputs=cnn_classifier_1)
Some things to note
The imports of your layers were not right. You need to import the layers/models you want from the right places. You can check my code against yours to see this.
With the functional API of keras you do not need to specify the input shape as you have done in the first Conv1D layer. This is handled automatically.
You need to correctly specify the keywords in Model. Specifically inputs and outputs. Different versions of keras use input / output or inputs/outputs as keywords for the call of the class Model.

Hey, its simple, use following code:
classifier = keras.Model(input, classifier)
instead of calling
classifier = keras.Model(inputs = input, output = classifier)
Issue seems to come from latest versions of keras implementation.

tflite quantization how to change the input dtype

see possible solution at the end of the post
I am trying to fully quantize the keras-vggface model from rcmalli to run on an NPU. The model is a Keras model (not tf.keras).
When using TF 1.15 for quantization with:
print(tf.version.VERSION)
num_calibration_steps=5
converter = tf.lite.TFLiteConverter.from_keras_model_file('path_to_model.h5')
#converter.post_training_quantize = True # This only makes the weight in8 but does not initialize model quantization
def representative_dataset_gen():
for _ in range(num_calibration_steps):
pfad='path_to_image(s)'
img=cv2.imread(pfad)
# Get sample input data as a numpy array in a method of your choosing.
yield [img]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()
open("quantized_model", "wb").write(tflite_quant_model)
The model is converted but as I need full int8 quantization, I add:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
This error message appears:
ValueError: Cannot set tensor: Got value of type UINT8 but expected type FLOAT32 for input 0, name: input_1
clearly, the input of the model still requires float32.
Questions:
Do I have to adapt the quantization method that the input dtype is changed ? or
Do I have to change the input layer of the model to dtype int8 beforehand?
Or is that actually reporting that the model is not actually quantized?
If 1 or 2 is the answer, would you also have a best practice tip for me?
Addition:
Using :
h5_path = 'my_model.h5'
model = keras.models.load_model(h5_path)
model.save(os.getcwd() +'/modelTF2')
to save the h5 as pb with TF 2.2 and then using converter=tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
as TF 2.x tflite takes floats, and convert them to uint8s internally . I thought that could be a solution. Unfortunately, this error message appears:
tf.lite.TFLiteConverter.from_keras_model giving 'str' object has no attribute 'call'
Apparently TF2.x cannot handle pure keras models.
using tf.compat.v1.lite.TFLiteConverter.from_keras_model_file() to solve this error just repeats the error from above, as we are back again at "TF 1.15" level.
Addition 2
Another solution is to transfer the keras model to tf.keras manually. I will look into that if there is no other solution.
Regarding the comment of Meghna Natraj
To recreate the model (using TF 1.13.x) just:
pip install git+https://github.com/rcmalli/keras-vggface.git
and
from keras_vggface.vggface import VGGFace
pretrained_model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg') # pooling: None, avg or max
pretrained_model.summary()
pretrained_model.save("my_model.h5") #using h5 extension
The input layer is connected. Too bad, that looked like a good/easy fix.
Possible Solution
It seems to work using TF 1.15.3 I used 1.15.0 beforehand. I will check if I did something else different by accident.

A possible reason why this fails is that the model has input tensors that are not connected to the output tensor, i.,e they are probably unused.
Here is a colab notebook where I've reproduced this error. Modify the io_type at the beginning of the notebook to tf.uint8 to see an error similar to one you got.
SOLUTION
You need to manually inspect the model and to see if there are any inputs that are dangling/lost/not connected to the output and remove them.
Post a link to the model and I can try to debug it as well.

Concatenating two saved models in tensorflow 1.13 [duplicate]

I've trained a DCGAN model and would now like to load it into a library that visualizes the drivers of neuron activation through image space optimization.
The following code works, but forces me to work with (1, width, height, channels) images when doing subsequent image analysis, which is a pain (the library assumptions about the shape of network input).
# creating TensorFlow session and loading the model
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
new_saver = tf.train.import_meta_graph(model_fn)
new_saver.restore(sess, './')
I'd like to change the input_map, After reading the source, I expected this code to work:
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
t_input = tf.placeholder(np.float32, name='images') # define the input tensor
t_preprocessed = tf.expand_dims(t_input, 0)
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input})
new_saver.restore(sess, './')
But got an error:
ValueError: tf.import_graph_def() requires a non-empty name if input_map is used.
When the stack gets down to tf.import_graph_def() the name field is set to import_scope, so I tried the following:
graph = tf.Graph()
sess = tf.InteractiveSession(graph=graph)
t_input = tf.placeholder(np.float32, name='images') # define the input tensor
t_preprocessed = tf.expand_dims(t_input, 0)
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input}, import_scope='import')
new_saver.restore(sess, './')
Which netted me the following KeyError:
KeyError: "The name 'gradients/discriminator/minibatch/map/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayReadV3/RefEnter:0' refers to a Tensor which does not exist. The operation, 'gradients/discriminator/minibatch/map/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayReadV3/RefEnter', does not exist in the graph."
If I set 'import_scope', I get the same error whether or not I set 'input_map'.
I'm not sure where to go from here.

In the newer version of tensorflow>=1.2.0, the following step works fine.
t_input = tf.placeholder(np.float32, shape=[None, width, height, channels], name='new_input') # define the input tensor
# here you need to give the name of the original model input placeholder name
# For example if the model has input as; input_original= tf.placeholder(tf.float32, shape=(1, width, height, channels, name='original_placeholder_name'))
new_saver = tf.train.import_meta_graph(/path/to/checkpoint_file.meta, input_map={'original_placeholder_name:0': t_input})
new_saver.restore(sess, '/path/to/checkpointfile')

So, the main issue is that you're not using the syntax right. Check the documentation for tf.import_graph_def for the use of input_map (link).
Let's breakdown this line:
new_saver = tf.train.import_meta_graph(model_fn, input_map={'images': t_input}, import_scope='import')
You didn't outline what model_fn is, but it needs to be a path to the file.
For the next part, in input_map, you're saying: replace the input in the original graph (DCGAN) whose name is images with my variable (in the current graph) called t_input. Problematically, t_input and images are referencing the same object in different ways as per this line:
t_input = tf.placeholder(np.float32, name='images')
In other words, images in input_map should actually be whatever the variable name is that you're trying to replace in the DCGAN graph. You'll have to import the graph in its base form (i.e., without the input_map line) and figure out what the name of the variable you want to link to is. It'll be in the list returned by tf.get_collection('variables') after you have imported the graph. Look for the dimensions (1, width, height, channels), but with the values in place of the variable names. If it's a placeholder, it'll look something like scope/Placeholder:0 where scope is replaced with whatever the variable's scope is.
Word of caution:
Tensorflow is very finicky about what it expects graphs to look like. So, if in the original graph specification the width, height, and channels are explicitly specified, then Tensorflow will complain (throw an error) when you try to connect a placeholder with a different set of dimensions. And, this makes sense. If the system was trained with some set of dimensions, then it only knows how to generate images with those dimensions.
In theory, you can still stick all kinds of weird stuff on the front of that network. But, you will need to scale it down so it meets those dimensions first (and the Tensorflow documentation says it's better to do that with the CPU outside of the graph; i.e., before inputing it with feed_dict).
Hope that helps!

Unexpected key(s) in state_dict: "model", "opt"

I'm currently using fast.ai to train an image classifier model.
data = ImageDataBunch.single_from_classes(path, classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
learner = cnn_learner(data, models.resnet34)
learner.model.load_state_dict(
torch.load('stage-2.pth', map_location="cpu")
)
which results in :
torch.load('stage-2.pth', map_location="cpu") File
"/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Sequential:
...
Unexpected key(s) in state_dict: "model", "opt".
I have looked around in SO and tried to use the following solution:
# original saved file with DataParallel
state_dict = torch.load('stage-2.pth', map_location="cpu")
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[7:] # remove `module.`
new_state_dict[name] = v
# load params
learner.model.load_state_dict(new_state_dict)
which results in :
RuntimeError: Error(s) in loading state_dict for Sequential:
Unexpected key(s) in state_dict: "".
I'm using Google Colab to train my model and then port the trained model into docker and try to host in in a local server.
What could be the issue? Could it be the different version of pytorch which results in model mismatch?
In my docker config:
# Install pytorch and fastai
RUN pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
RUN pip install fastai
While my Colab is using the following:
!curl -s https://course.fast.ai/setup/colab | bash

My strong guess is that stage-2.pth contains two top-level items: the model itself (its weights) and the final state of the optimizer which was used to train it. To load just the model, you need only the former. Assuming things were done in the idiomatic PyTorch way, I would try
learner.model.load_state_dict(
torch.load('stage-2.pth', map_location="cpu")['model']
)
Update: after applying my first round of advice it becomes clear that you're loading a savepoint create with a different (perhaps differently configured?) model than the one you're loading it into. As you can see in the pastebin, the savepoint contains weights for some extra layers, not present in your model, such as bn3, downsample, etc.
"0.4.0.bn3.running_var", "0.4.0.bn3.num_batches_tracked", "0.4.0.downsample.0.weight"
at the same time some other key names match, but the tensors are of different shapes.
size mismatch for 0.5.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
I see a pattern that you consistently try to load a parameter of shape [2^(x+1), 2^x, 1, 1] in place of [2^(x), 2^(x-1), 1, 1]. Perhaps you're trying to load a model of different depth (ex. loading vgg-16 weights for vgg-11?). Either way, you need to figure out the exact architecture used to create your savepoint and then recreate it before loading the savepoint.
PS. In case you weren't sure - savepoints contain model weights, along with their shapes and (autogenerated) names. They do not contain the full specification of the architecture itself - you need to assure yourself, that you're calling model.load_state_dict with model being of exactly the same architecture as was used to create the savepoint. Otherwise you will likely have weight names mismatching.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.