Computing gradients in extracted Tensorflow subgraph which contains a 'while_loop'

Computing gradients in extracted Tensorflow subgraph which contains a 'while_loop' - python

In some deep learning workflows, it is useful to train a model, extract it out of its graph using tf.graph_util.convert_variables_to_constants or tf.graph_util.extract_sub_graph so training-related tensors are left out, and then connect the extracted subgraph to other model(s) via tf.import_graph_def. In this way, the trained model can serve as a building block in a larger setup.
Often, we'd like to backpropagate through the new, composite model, in order to fine-tune it, optimize the inputs and so on.
However, it appears that one cannot define a gradient through a while_loop tensorflow operation in an imported graph, since it relies on 'outer context', an object added into the metagraph's collections (see TF issue #7404). Slightly adapting the example in this Github issue, here's an example of what I am trying to do:
import tensorflow as tf
g1=tf.Graph()
sess1=tf.Session(graph=g1)
with g1.as_default():
with sess1.as_default():
i=tf.constant(0, name="input")
out=tf.while_loop(lambda i: tf.less(i,5), lambda i: [tf.add(i,1)], [i], name="output")
loss=tf.square(out,name='loss')
graph_def = tf.graph_util.convert_variables_to_constants(sess1,g1.as_graph_def(),['output/Exit'])
g2 = tf.Graph()
with g2.as_default():
tf.import_graph_def(graph_def,name='')
i_imported = g2.get_tensor_by_name("input:0")
out_imported = g2.get_tensor_by_name("output/Exit:0")
tf.gradients(out_imported, i_imported)
The last line raises an AttributeError: 'NoneType' object has no attribute 'outer_context' error.
Tensorflow's solution to this issue is to use tf.train.export_meta_graph and tf.train.import_meta_graph so the outer context is copied, but this copies the entire graph, without editting. In this minimal case, the 'loss' tensor won't be removed.
I tried copying the missing context to the new graph:
g2.add_to_collection('while_context',g1.get_collection('while_context'))
But it doesn't solve the issue.
Is there a way to overcome this limitation or is it an irreparable Tensorflow design flaw?

Related

TensorFlow Lite does not recognize op VarHandleOp

I am attempting to convert a TF model to TFLite. The model was saved in .pb format and I have converted it with the following code:
import os
import tensorflow as tf
from tensorflow.core.protobuf import meta_graph_pb2
export_dir = os.path.join('export_dir', '0')
if not os.path.exists('export_dir'):
os.mkdir('export_dir')
tf.compat.v1.enable_control_flow_v2()
tf.compat.v1.enable_v2_tensorshape()
# I took this function from a tutorial on the TF website
def wrap_frozen_graph(graph_def, inputs, outputs):
def _imports_graph_def():
tf.compat.v1.import_graph_def(graph_def, name="")
wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
import_graph = wrapped_import.graph
return wrapped_import.prune(
inputs, outputs)
graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(os.path.join(export_dir, 'saved_model.pb'),'rb').read())
concrete_func = wrap_frozen_graph(
graph_def, inputs=['extern_data/placeholders/data/data:0', 'extern_data/placeholders/data/data_dim0_size:0'],
outputs=['output/output_batch_major:0'])
concrete_func.inputs[0].set_shape([None, 50])
concrete_func.inputs[1].set_shape([None])
concrete_func.outputs[0].set_shape([None, 100])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.post_training_quantize=True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
tf.lite.OpsSet.SELECT_TF_OPS]
converter.allow_custom_ops=True
tflite_model = converter.convert()
# Save the model.
if not os.path.exists('tflite'):
os.mkdir('tflite')
output_model = os.path.join('tflite', 'model.tflite')
with open(output_model, 'wb') as f:
f.write(tflite_model)
However, when I try to use the intepretere with this model I get the following error:
INFO: TfLiteFlexDelegate delegate: 8 nodes delegated out of 970 nodes with 3 partitions.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 4 nodes with 0 partitions.
INFO: TfLiteFlexDelegate delegate: 3 nodes delegated out of 946 nodes with 1 partitions.
INFO: TfLiteFlexDelegate delegate: 0 nodes delegated out of 1 nodes with 0 partitions.
INFO: TfLiteFlexDelegate delegate: 3 nodes delegated out of 16 nodes with 2 partitions.
Traceback (most recent call last):
File "/path/to/tflite_interpreter.py", line 9, in <module>
interpreter.allocate_tensors()
File "/path/to/lib/python3.6/site-packages/tensorflow/lite/python/interpreter.py", line 243, in allocate_tensors
return self._interpreter.AllocateTensors()
RuntimeError: Encountered unresolved custom op: VarHandleOp.Node number 0 (VarHandleOp) failed to prepare.
Now, I don't find any VarHandleOp in the code and I found out that it is actually in tensorflow (https://www.tensorflow.org/api_docs/python/tf/raw_ops/VarHandleOp).
So, why isn't TFLite able to recognize it?

It's certainly hard to provide a minimal reproducible example in the case of model conversion, as the SO guidelines recommend, but the questions would benefit from better pointers. For example, instead of saying “I took this function from a tutorial on the TF website”, it is a much better idea to provide a link to the tutorial. The TF website is vastly huge.
The tutorial that you are referring to is probably from the section on migrating from TF1 to TF2, specifically the part of handling the raw graph files. The crucially important note is
if you have a "Frozen graph" (a tf.Graph where the variables have been turned into constants)
(the bold highlight is mine). Apparently, your graph contains VarHandleOp (the same applies to the Variable and VariableV2 nodes), and is not “frozen” by this definition. Your general approach makes sense, but you need a graph that contains actual trained values for the variables in the form of the Const node. You need variables at the training time, but for inference time, and should be baked into the graph. TFLite, as an inference-time framework, does not support variables.
The rest of your idea seems fine. TFLiteConverter.from_concrete_functions currently takes exactly one concrete_function, but this is what you get from wrapping the graph. With enough luck it may work.
There is a utility tensorflow/python/tools/freeze_graph.py that attempts its best to replace variables in a Graph.pb with constants taken from the latest checkpoint file. If you look at its code, either using the saved metagraph (checkpoint_name.meta) file or pointing the tool to the training directory eliminates a lot of guesswork; also, I think that providing the model directory is the only way to get a single frozen graph a sharded model.
I noticed that you use just input in place of tf.nest.map_structure(import_graph.as_graph_element, inputs) in the example. You may have other reasons for that, but if you do it because as_graph_element complains about datatype/shape, this is likely to be resolved by freezing the graph properly. The concrete_function that you obtain from the frozen graph will have a good idea about its input shapes and datatypes. Generally, it's unexpected to need to manually set them, and the fact that you do seems odd to me (but I do not claim a broad experience with this dark corner of TF).
map_structure has a keyword argument to skip the check.

How to write to TensorBoard in TensorFlow 2

I'm quite familiar in TensorFlow 1.x and I'm considering to switch to TensorFlow 2 for an upcoming project. I'm having some trouble understanding how to write scalars to TensorBoard logs with eager execution, using a custom training loop.
Problem description
In tf1 you would create some summary ops (one op for each thing you would want to store), which you would then merge into a single op, run that merged op inside a session and then write this to a file using a FileWriter object. Assuming sess is our tf.Session(), an example of how this worked can be seen below:
# While defining our computation graph, define summary ops:
# ... some ops ...
tf.summary.scalar('scalar_1', scalar_1)
# ... some more ops ...
tf.summary.scalar('scalar_2', scalar_2)
# ... etc.
# Merge all these summaries into a single op:
merged = tf.summary.merge_all()
# Define a FileWriter (i.e. an object that writes summaries to files):
writer = tf.summary.FileWriter(log_dir, sess.graph)
# Inside the training loop run the op and write the results to a file:
for i in range(num_iters):
summary, ... = sess.run([merged, ...], ...)
writer.add_summary(summary, i)
The problem is that sessions don't exist anymore in tf2 and I would prefer not disabling eager execution to make this work. The official documentation is written for tf1 and all references I can find suggest using the Tensorboard keras callback. However, as far as I know, this only works if you train the model through model.fit(...) and not through a custom training loop.
What I've tried
The tf1 version of tf.summary functions, outside of a session. Obviously any combination of these functions fails, as FileWriters, merge_ops, etc. don't even exist in tf2.
This medium post states that there has been a "cleanup" in some tensorflow APIs including tf.summary(). They suggest using from tensorflow.python.ops.summary_ops_v2, which doesn't seem to work. This implies using a record_summaries_every_n_global_steps; more on this later.
A series of other posts 1, 2, 3, suggest using the tf.contrib.summary and tf.contrib.FileWriter. However, tf.contrib has been removed from the core TensorFlow repository and build process.
A TensorFlow v2 showcase from the official repo, which again uses the tf.contrib summaries along with the record_summaries_every_n_global_steps mentioned previously. I couldn't make this to work either (even without using the contrib library).
tl;dr
My questions are:
Is there a way to properly use tf.summary in TensroFlow 2?
If not, is there another way to write TensorBoard logs in TensorFlow 2, when using a custom training loop (not model.fit())?

Yes, there is a simpler and more elegant way to use summaries in TensorFlow v2.
First, create a file writer that stores the logs (e.g. in a directory named log_dir):
writer = tf.summary.create_file_writer(log_dir)
Anywhere you want to write something to the log file (e.g. a scalar) use your good old tf.summary.scalar inside a context created by the writer. Suppose you want to store the value of scalar_1 for step i:
with writer.as_default():
tf.summary.scalar('scalar_1', scalar_1, step=i)
You can open as many of these contexts as you like inside or outside of your training loop.
Example:
# create the file writer object
writer = tf.summary.create_file_writer(log_dir)
for i, (x, y) in enumerate(train_set):
with tf.GradientTape() as tape:
y_ = model(x)
loss = loss_func(y, y_)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# write the loss value
with writer.as_default():
tf.summary.scalar('training loss', loss, step=i+1)

How to implement a stacked RNNs in Tensorflow?

I want to implement an RNN using Tensorflow1.13 on GPU. Following the official recommendation, I write the following code to get a stack of RNN cells
lstm = [tk.layers.CuDNNLSTM(128) for _ in range(2)]
cells = tk.layers.StackedRNNCells(lstm)
However, I receive an error message:
ValueError: ('All cells must have a state_size attribute. received cells:', [< tensorflow.python.keras.layers.cudnn_recurrent.CuDNNLSTM object at 0x13aa1c940>])
How can I correct it?

This may be a Tensorflow bug and I would suggest creating an issue on Github. However, if you want to by pass the bug, you can use:
import tensorflow as tf
import tensorflow.keras as tk
lstm = [tk.layers.CuDNNLSTM(128) for _ in range(2)]
stacked_cells = tf.nn.rnn_cell.MultiRNNCell(lstm)
This will work but it will give a deprecation warning that you can suppress.

Thanks #qlzh727. Here, I quote the response:
Either StackedRNNCells or StackedRNNCells only works with Cell, not layer. The difference between the cell and layer in RNN is that cell will only process one time step within the whole sequence, whereas the layer will process the whole sequence. You can treat RNN layer as:
for t in whole_time_steps:
output_t, state_t = cell(input_t, state_t-1)
If you want to stack 2 LSTM layers to together with cudnn in 1.x, you can do:
l1 = tf.layers.CuDNNLSTM(128, return_sequence=True)
l2 = tf.layers.CuDNNLSTM(128)
l1_output = l1(input)
l2_oupput = l2(l1_output)
In tf 2.x, we unify the cudnn and normal implementation together, you can just change the example above with tf.layers.LSTM(128, return_sequence=True), which will use the cudnn impl if available.

Unexpected key(s) in state_dict: "model", "opt"

I'm currently using fast.ai to train an image classifier model.
data = ImageDataBunch.single_from_classes(path, classes, ds_tfms=get_transforms(), size=224).normalize(imagenet_stats)
learner = cnn_learner(data, models.resnet34)
learner.model.load_state_dict(
torch.load('stage-2.pth', map_location="cpu")
)
which results in :
torch.load('stage-2.pth', map_location="cpu") File
"/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py",
line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Sequential:
...
Unexpected key(s) in state_dict: "model", "opt".
I have looked around in SO and tried to use the following solution:
# original saved file with DataParallel
state_dict = torch.load('stage-2.pth', map_location="cpu")
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[7:] # remove `module.`
new_state_dict[name] = v
# load params
learner.model.load_state_dict(new_state_dict)
which results in :
RuntimeError: Error(s) in loading state_dict for Sequential:
Unexpected key(s) in state_dict: "".
I'm using Google Colab to train my model and then port the trained model into docker and try to host in in a local server.
What could be the issue? Could it be the different version of pytorch which results in model mismatch?
In my docker config:
# Install pytorch and fastai
RUN pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
RUN pip install fastai
While my Colab is using the following:
!curl -s https://course.fast.ai/setup/colab | bash

My strong guess is that stage-2.pth contains two top-level items: the model itself (its weights) and the final state of the optimizer which was used to train it. To load just the model, you need only the former. Assuming things were done in the idiomatic PyTorch way, I would try
learner.model.load_state_dict(
torch.load('stage-2.pth', map_location="cpu")['model']
)
Update: after applying my first round of advice it becomes clear that you're loading a savepoint create with a different (perhaps differently configured?) model than the one you're loading it into. As you can see in the pastebin, the savepoint contains weights for some extra layers, not present in your model, such as bn3, downsample, etc.
"0.4.0.bn3.running_var", "0.4.0.bn3.num_batches_tracked", "0.4.0.downsample.0.weight"
at the same time some other key names match, but the tensors are of different shapes.
size mismatch for 0.5.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
I see a pattern that you consistently try to load a parameter of shape [2^(x+1), 2^x, 1, 1] in place of [2^(x), 2^(x-1), 1, 1]. Perhaps you're trying to load a model of different depth (ex. loading vgg-16 weights for vgg-11?). Either way, you need to figure out the exact architecture used to create your savepoint and then recreate it before loading the savepoint.
PS. In case you weren't sure - savepoints contain model weights, along with their shapes and (autogenerated) names. They do not contain the full specification of the architecture itself - you need to assure yourself, that you're calling model.load_state_dict with model being of exactly the same architecture as was used to create the savepoint. Otherwise you will likely have weight names mismatching.

Do not use tf.reset_default_graph() to clear nested graphs

I have a bunch of functions, which create portions of computation graph. In some of such functions I do
with tf.name_scope("my_scope_name"):
self._eye_n_components = tf.eye(se...
At the beginning of topmost function I call
tf.reset_default_graph()
and then call those partial functions and also they can call each other.
Unfortunately, I get an error
Error: Do not use tf.reset_default_graph() to clear nested graphs. If
you need a cleared graph, exit the nesting and create a new graph.
Several questions.
1) What is nesting and how to "exit nesting"?
2) How to create new graph?
3) How to catch, where I am entering the nesting?
4) How to clear entire graph so that tensorflow does not think I am trying to clear nested one?

This error message is displayed when you call tf.reset_default_graph() in one of the following scenarios:
Inside a with graph.as_default(): block.
Inside a with tf.Session(): block.
Between creating a tf.InteractiveSession and calling sess.close().
Each of these scenarios involves registering a default (and potentially "nested") tf.Graph object, which will be unregistered when you exit the block (or close the tf.InteractiveSession). Resetting the default graph in those scenarios would leave the system in an inconsistent state, so you should ensure to exit the block (or close the tf.InteractiveSession) before calling tf.reset_default_graph().

I solved by closing a session and loading the neural network model again.
My answers are:
(1) Exit with... block or sess.close()
(2) Load neural network model (and trained weight) like:
gd = tf.GraphDef.FromString(open(checkpoint + '_frozen.pb', 'rb').read())
inp, predictions = tf.import_graph_def(gd, return_elements=['input:0', 'MobilenetV2/Predictions/Reshape_1:0'])
(3) When you print out model you may see like Tensorflow object <VSR.Backend.TF.Framework.Trainer.VSR object at 0x000001E5DA53C898>
(4) I heard tf.reset_default_graph() and tf.keras.backend.clear_session() from here, but I never make the code work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.